Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes! Register now.

Reply
cbsalling
Frequent Visitor

Timeouts with Power BI Service DirectQuery to HDInsight Spark Cluster

I’m trying to set up a connection from Azure Data Lake Store (ADLS) to Power BI through a HDInsight Spark Cluster. My exact data pipeline is as follows:

  • CSV file with time series data stored in ADLS
  • From Spark cluster, run Jupyter notebook that pulls in CSV data and outputs it to a table
  • From Power BI Cloud Service, connect to Spark cluster and query tables previously connected

 

I got this pipeline to work in a test environment with a different set of data, but now that I’m working with more realistic data, it’s not quite working. Everything is the same up until I go to create the visualizations in Power BI. The metadata for the table is imported correctly, but when I drag a column from the column list onto the reporting canvas, no data is loaded and I just get the circling dots for a few minutes and then nothing. I’m not seeing any errors.

 

I’ve successfully connected to the spark cluster I’m currently working with from the Power BI Desktop client and loaded the data there, so I’m sure I’ve set up everything correctly.

 

As a reference:

My test dataset was 26,729 rows x 10 columns for a total of 2.8 MB. Queries were a little slow, but not too bad.

My current dataset (the one timing out) is 104,274 rows x 2 columns for a total of 6.5 MB.


So the current data set is about as four times as many rows and a little over twice the size of the test data I was working with. Is this big enough to hit such drastic performance limits?

 

3 REPLIES 3
v-haibl-msft
Microsoft Employee
Microsoft Employee

@cbsalling

 

Do you mean that you can load the same column data in Desktop but cannot in Service? How about the result if you apply a filter to reduce the row records returned in Service?

 

Best Regards,

Herbert

Yes, essentially, although I believe that the data connection in the desktop client isn't live. It imports the data. Versus the cloud service, where it sets up a direct connection.

 

I haven't gotten around to trying filters or reducing the number of rows returned somehow. The data isn't in a great format - it's in a non-parseable date format with paired alphanumerical values. I can parse it via the query editor in the desktop. I have to use a language I'm not super familiar with if I want the data in a better format before I connect to it via the service. I was just curious if there was anyone out there who had actually worked with the HDInsight Spark data connector and what their experience was. 

 

I'll reshape the data before connecting to it and we'll see how it goes. Thanks, anyway!

 

Thanks,
Claire

@cbsalling

 

Every action such as selecting a column or adding a filter will send a query back to the database. There are some tips to optimize your clusters for Power BI can be found here. And before selecting very large fields, consider choosing an appropriate visual type.

 

Best Regards,

Herbert

Helpful resources

Announcements
September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors