Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execute and are automatically cancelled before a plan is generated. The time required for query optimization and file pruning further delays the process, preventing the plan from being generated. As a result, we are unable to use the report with Databricks, as queries containing numerous OR clauses are either taking an excessive amount of time to execute or failing altogether.
Please note that we have already implemented optimization techniques within Databricks, and our data consists of small files, such as 1 file in the DIM table and 22 files in the FACT tables. Adjusting the size of the serverless SQL warehouse has not resolved the issue.
As per our analysis, “joins” are not a problem but the huge “where” clause with lot of “OR” conditions.
If anyone has successfully addressed this issue, please share your solution.
Solved! Go to Solution.
Hi @vdhaks111 ,
If you have Premium capacity, consider Config Automatic Aggregations to improve performance.
Once the model is trained, Power BI will have aggregated values in in-memory cache. The next time you interact with the report using similar patterns (dimensions, measures, filters) Power BI will leverage cached aggregations to serve the queries and will not send queries to Databricks SQL Warehouse.
Hope it helps!
Best regards,
Community Support Team_ Scott Chang
If this post helps then please consider Accept it as the solution to help the other members find it more quickly.
Hi @vdhaks111 ,
If you have Premium capacity, consider Config Automatic Aggregations to improve performance.
Once the model is trained, Power BI will have aggregated values in in-memory cache. The next time you interact with the report using similar patterns (dimensions, measures, filters) Power BI will leverage cached aggregations to serve the queries and will not send queries to Databricks SQL Warehouse.
Hope it helps!
Best regards,
Community Support Team_ Scott Chang
If this post helps then please consider Accept it as the solution to help the other members find it more quickly.
@Poojara_D12 Thanks for looking into this
@Poojara_D12 , Everyone
The "OR" conditions are getting automatically generated by Power BI - Table and Matrix visuals, so we do not have control over the query.
I am not sure if Temporary views or "Join instead of OR" could help us in this context. We are using Databricks serverless SQL warehouse which has dynamicPartitionPruning enabled.
Hi @vdhaks111
Use Temporary Views: Create a view in Databricks with the OR conditions applied, and connect Power BI to this view.
Replace OR with JOIN: Use a temporary table for filter values and join it with the dataset to replace the OR clause.
Enable Dynamic Query Pruning: Ensure spark.sql.optimizer.dynamicPartitionPruning is enabled in Databricks for better performance on large filters.
Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂
Kind Regards,
Poojara
Data Analyst | MSBI Developer | Power BI Consultant
Please Subscribe my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS