Power BI Embedded using Databricks sql warehouse o...

Don-Bot · ‎02-22-2024

We are currently looking into moving from using a SQL Server datawarehouse to something new. We have large amount of data, currently in the millions of rows used by hundreds of clients. The current solution is an import into power bi dataset which takes several hours to update even with incremental loads.

Can someone point me to some information as to what would be a good next solution? We are currently considering potentially using DataBricks SQL Warehouse (DirectQuery) or Azure Data Lake Direct COnnection. Does anyone have any experience doing something similar?

We are in the planning stages.

Thanks

Anonymous · ‎02-25-2024

Hi, @Don-Bot

The following are the characteristics and applicable scenarios of these two solutions:

Azure Databricks SQL Warehouse supports serverless architecture with the following features:

1.Fast startup time (usually between 2 and 6 seconds).

2.Rapidly expand to obtain more computing resources to maintain low latency.

3.Query capacity close to hardware limits, not virtual machines.

4.Rapid scaling to minimize costs and provide optimized performance and resources when demand is low.

5.Photon engine: The native vectorized query engine on Databricks accelerates existing SQL and DataFrame API calls and reduces overall workload costs.

6.Predictive IO: A set of features used to accelerate selective scan operations in SQL queries.

7.Intelligent workload management (IWM): By using AI prediction and dynamic management technology, ensure that workloads get the correct number of resources quickly.

Applicable scene:

Ideal for large data workloads, 1TB or more.

For query needs that require flexibility, fast startup, intelligent resource management, and low cost, choose serverless SQL Warehouse.

The features of Azure Data Lake are as follows:

1.Azure Data Lake Storage Gen2 is a large-scale data lake storage solution suitable for storing and analyzing large amounts of data.

2.Direct connection: You can directly connect Power BI to Azure Data Lake to query and analyze data.

3.Unlimited scalability: Azure Data Lake Storage Gen2 can easily expand to TB and PB levels of data, suitable for large-scale data storage needs.

4.Cost-effectiveness: It provides cost-effective storage, and you only pay based on the storage resources and computing resources actually used12.

5.Raw data storage: Data Lake stores data in its original, unconverted state without losing any information. This is useful for data exploration in big data environments, as you may not know in advance what insights exist in the data.

6.Flexibility: Data Lake can store unstructured and semi-structured data and is more flexible than traditional data warehouses.

7.Self-service queries: Users can explore data and create their own queries without being restricted by predefined schemas.

Applicable scene:

Suitable for scenarios where large amounts of unstructured data need to be stored and analyzed. If you already have Azure Data Lake storage and want to avoid data imports and query the data directly, Azure Data Lake is a good choice.

If you want to learn more about these two solutions, you can click on the relevant links below for more detailed information:

SQL WareHouses: https://learn.microsoft.com/en-us/azure/databricks/compute/sql-warehouse

Microsoft Power BI and Lakehouse: https://techcommunity.microsoft.com/t5/analytics-on-azure-blog/power-up-your-bi-with-microsoft-power...

Data Lake Storage: https://www.trustradius.com/products/azure-data-lake-storage/reviews?qs=pros-and-cons#comparisons

Data Lake: https://learn.microsoft.com/en-us/azure/architecture/data-guide/scenarios/data-lake

Data Lake benefits: https://blog.nextpathway.com/azure-data-lake-benefits

Remember that each solution has its trade-offs, and the best choice depends on your specific requirements, existing skills, and budget. Consider factors like data volume, performance, ease of use, and integration capabilities.

How to Get Your Question Answered Quickly

Best Regards

Jianpeng Li

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

lbendlin · ‎02-22-2024

millions of rows should not take several hours. Are you doing a lot of transforms in Power Query? Are you attempting merges?

Consider using dataflows, or flat out Parquet files. Let the data model do the work for you.

Don-Bot · ‎02-28-2024

Currently the data model is connecting to a view in SQL Server to pull the information. Hence the long load times and huge memory hogging.

I am looking into enhancing it to utilize Databricks Warehouse or Direct Lake parquet files. Either one is probably a huge improvement over what we have today.

lbendlin · ‎02-28-2024

Currently the data model is connecting to a view in SQL Server to pull the information.  Hence the long load times and huge memory hogging.

Optimize the query behind the view. Add indexes as needed and make sure the statistics are up to date.

Power BI Embedded using Databricks sql warehouse or data lake?

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

Power BI Embedded using Databricks sql warehouse or data lake?

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026