Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
PrabhakaranM
Frequent Visitor

Longer Refresh Time After Switching from SQL Server to Azure Databricks

I cannot post any screenshots or visuals due to client policies restricting data disclosure.

I am currently working on a project where the original architecture involved SQL Server as the data source. Power BI reports were developed on top of SQL Server tables and views, with data refreshed using a Gateway. The scheduled refresh never exceeded 3 minutes. There were a few transformations applied in Power Query. These reports were published in a PP3 workspace.

Now, the architecture has shifted to Azure Databricks, and I updated the data source for the existing reports using the Azure Databricks connector in Power BI. The connection is configured with ODBC, running native queries, and using Azure Active Directory for credentials. The transformations in Power Query from the original report were preserved. The reports are also published in a PP3 workspace, but the refresh time has increased to 9–12 minutes.

Some additional details:

  • Data connectivity mode = Import Mode
  • The cluster on Databricks is configured with 56GB RAM and 16 cores.
  • The current dataset in Databricks is for UAT and is smaller than the production dataset (which is 5–6x larger).
  • We have replicated the exact tables and views from SQL Server in Databricks, so the structure and data are consistent between the two environments.
  • I tested an alternative approach where I moved all the Power Query transformations into SQL (so that the transformations are pushed to Databricks). This reduced the refresh time to 5 minutes, which was a great improvement.
  • However, the client insists on keeping the transformations in Power Query.

My questions are focused on understanding what might be causing this issue, rather than solely fixing it:

  1. Does Power BI inherently perform better with SQL Server due to better query folding support compared to Databricks?
  2. Is the Azure Databricks connector not as optimized as the SQL Server connector?
  3. Is there an inherent waiting time or latency when the Databricks connector fetches data/rows?

Any insights or experiences with similar setups would be greatly appreciated! Any official doumentation would also be of a great help! Thanks in Advance 🙂

1 ACCEPTED SOLUTION
Poojara_D12
Super User
Super User

Hi @PrabhakaranM 

  1. SQL Server vs. Databricks Performance:
    • Power BI performs better with SQL Server due to full query folding support, while Databricks has limited query folding.
  2. Databricks Connector Limitations:
    • The Databricks connector (via ODBC) adds latency and is less optimized compared to the SQL Server connector.
  3. Cluster Latency:
    • Databricks' distributed cluster adds overhead for job initialization and data fetching, especially for smaller datasets.

 

  1. Push Transformations to Databricks: This reduces refresh times significantly.
  2. Optimize Databricks: Use indexes, cache tables, and optimize queries in Databricks SQL.
  3. Cluster Settings: Ensure proper cluster configuration and enable SQL caching.
  4. Parallel Execution: Minimize dependencies between tables in Power BI to enable parallel queries.

 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Poojara
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS

View solution in original post

6 REPLIES 6
v-prasare
Community Support
Community Support

 @PrabhakaranM, As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by our super users for your issue worked? or let us know if you need any further assistance here?

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

v-prasare
Community Support
Community Support

@PrabhakaranM As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by our super users for your issue worked? or let us know if you need any further assistance here?

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

v-prasare
Community Support
Community Support

@PrabhakaranM, Hope your doing well.

 

@Poojara_D12@TomMartens thanks for your promt repsonse.

 

@PrabhakaranM, As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided by our super users for your issue worked? or let us know if you need any further assistance here?

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

Poojara_D12
Super User
Super User

Hi @PrabhakaranM 

  1. SQL Server vs. Databricks Performance:
    • Power BI performs better with SQL Server due to full query folding support, while Databricks has limited query folding.
  2. Databricks Connector Limitations:
    • The Databricks connector (via ODBC) adds latency and is less optimized compared to the SQL Server connector.
  3. Cluster Latency:
    • Databricks' distributed cluster adds overhead for job initialization and data fetching, especially for smaller datasets.

 

  1. Push Transformations to Databricks: This reduces refresh times significantly.
  2. Optimize Databricks: Use indexes, cache tables, and optimize queries in Databricks SQL.
  3. Cluster Settings: Ensure proper cluster configuration and enable SQL caching.
  4. Parallel Execution: Minimize dependencies between tables in Power BI to enable parallel queries.

 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Poojara
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS
TomMartens
Super User
Super User

Hey @PrabhakaranM ,

 

from my understanding you do not use Direct Query?
Nevertheless, do not use ODBC, use the Azure Databricks connectivity that comes with Power BI Desktop / Power BI Service.

 

Basically, SQL Server is better if there are multiple connections at the same time, but this is only important if you are using Direct Query connection modes.

 

If you are using Azure Databricks, then I recommend that all data transformations are done in Power Query should be moved to Databricks, this is what Databricks is good at. Of course this requires "communication" between Power BI users and the Databricks team.

 

Regarding load times, not sure if you are using Power BI Premium or not and if you are using a on-premises data gateway.

Depending on the location of the SQL Server, you also have to remember that the data travels from SQL Server to Power BI Service vs. from Databricks to Power BI Service. If an on-premises data gateway is involved for the latter, then data travel will add a lot to the data load / data refresh duration.

Please check if Azure Databricks and the Power BI Service are in the same region. If they are not, you may have to reach out to your Azure Infrastructure architects and ask how these different data centers are connected.

 

Hopefully, this helps to tackle your challenge.

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Hey Tom... Thank you for your time and response.
I want to give more clarifications as you asked few questions.
1. I am using Azure Databricks connector found in Power Query.
2. All the workspaces and servers are in same region.
3. The workspaces is Premium Per User.

4. The connectivity mode is Import and client doesn't wants to change it.

4. The tranformation is power bi are complex and stops at the first applied step. Hence without the query folding the refresh times are quicker when the datasource is SQL Server. Whereas Azure Databricks is take 4x time to complete a scheduled refresh. Hope this gives more insight on the situation.

As per my knowledge there is no bottleneck from databrick side. What could be the most reasonable issue?

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.