Timeouts with Power BI Service DirectQuery to HDIn...

cbsalling · ‎10-11-2016

I’m trying to set up a connection from Azure Data Lake Store (ADLS) to Power BI through a HDInsight Spark Cluster. My exact data pipeline is as follows:

CSV file with time series data stored in ADLS
From Spark cluster, run Jupyter notebook that pulls in CSV data and outputs it to a table
From Power BI Cloud Service, connect to Spark cluster and query tables previously connected

I got this pipeline to work in a test environment with a different set of data, but now that I’m working with more realistic data, it’s not quite working. Everything is the same up until I go to create the visualizations in Power BI. The metadata for the table is imported correctly, but when I drag a column from the column list onto the reporting canvas, no data is loaded and I just get the circling dots for a few minutes and then nothing. I’m not seeing any errors.

I’ve successfully connected to the spark cluster I’m currently working with from the Power BI Desktop client and loaded the data there, so I’m sure I’ve set up everything correctly.

As a reference:

My test dataset was 26,729 rows x 10 columns for a total of 2.8 MB. Queries were a little slow, but not too bad.

My current dataset (the one timing out) is 104,274 rows x 2 columns for a total of 6.5 MB.

So the current data set is about as four times as many rows and a little over twice the size of the test data I was working with. Is this big enough to hit such drastic performance limits?

v-haibl-msft · ‎10-13-2016

@cbsalling

Do you mean that you can load the same column data in Desktop but cannot in Service? How about the result if you apply a filter to reduce the row records returned in Service?

Best Regards,

Herbert

cbsalling · ‎10-20-2016

Yes, essentially, although I believe that the data connection in the desktop client isn't live. It imports the data. Versus the cloud service, where it sets up a direct connection.

I haven't gotten around to trying filters or reducing the number of rows returned somehow. The data isn't in a great format - it's in a non-parseable date format with paired alphanumerical values. I can parse it via the query editor in the desktop. I have to use a language I'm not super familiar with if I want the data in a better format before I connect to it via the service. I was just curious if there was anyone out there who had actually worked with the HDInsight Spark data connector and what their experience was.

I'll reshape the data before connecting to it and we'll see how it goes. Thanks, anyway!

Thanks,
Claire

v-haibl-msft · ‎10-20-2016

@cbsalling

Every action such as selecting a column or adding a filter will send a query back to the database. There are some tips to optimize your clusters for Power BI can be found here. And before selecting very large fields, consider choosing an appropriate visual type.

Best Regards,

Herbert

Timeouts with Power BI Service DirectQuery to HDInsight Spark Cluster

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Timeouts with Power BI Service DirectQuery to HDInsight Spark Cluster

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026