Solved: Re: Power BI storage mode that loads only required...

EduardD · ‎09-07-2025

Hi there,
I have some huge fact tables in SQL Server with Billions of records partitoned by RegionID.
In the Power BI report user can select Maximum 2 out of 100s regions.
I'm looking at the options to replicated data into one lake and use DirectLake connection/storage model. I know that Power BI when Direct Lake is used loads only requested columns in memory.
What about partitions? Will DirectLake on SQL endpoint or OneLake loads in memory only requested by user partitions (e.g user selcted 2 regions in the slicer)?

Thank you.

tayloramy · ‎09-07-2025

Hi @EduardD,

Direct Lake only prunes by columns, not by row partitions. When a visual runs, Power BI loads only the columns referenced in the DAX query into memory. It does not dynamically load only the selected RegionID partitions from the slicer (Direct Lake overview).

Partitioning your Delta tables can still help with maintenance or reduce the number of files scanned, but over-partitioning (hundreds of regions) often hurts Direct Lake performance. The best practice is to partition on low-cardinality keys (Partitioning guidance).

Even if users only pick a couple of regions, the engine still loads the needed column segments and filters rows at query time - it isn’t partition based. Performance depends much more on good Delta file layout: use V-Order, larger row groups, limit small files, and compact properly (Understand Direct Lake storage).

If region-level isolation is important, you could create materialized lake views or prefiltered Delta tables for each region. Materialized lake views let you declaratively define transformations, store results physically, and refresh them on a schedule (Materialized lake views). Note this feature is currently in preview, do not use it for any production grade work until it is GA.

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution.

View solution in original post

tayloramy · ‎09-11-2025

Hi @EduardD,

I wanted to check in and see if you still needed help with anything?
If not, could you please mark a post as the solution to close off this thread and help any other community members find the solution in the future?

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, please mark this as the solution.

V-yubandi-msft · ‎09-10-2025

Hi @EduardD ,

I wanted to check in to see if your issue has been resolved or if you are still experiencing any difficulties. If you need further information or assistance, please let us know. We're here to help.

Thank you.

Poojara_D12 · ‎09-08-2025

Hi @EduardD

When using Power BI’s DirectLake connection—whether pointing to a SQL endpoint, OneLake, or a replicated data lake—the engine leverages columnar storage and lazy loading to optimize memory usage. This means only the columns needed for the visuals are loaded into memory, which significantly reduces memory footprint compared to loading entire tables. However, partitioning behavior depends on how the data is exposed in the lake or semantic model. If your fact tables are partitioned by RegionID, Power BI’s engine will ideally query only the relevant partitions for the user’s selection, but this requires that the partitioning metadata is exposed and the query engine can push filters down to the storage layer. In practice, with DirectLake, if a user selects only two regions, the engine will request data only for those partitions, not the entire table, as long as the query can be folded and the storage system supports predicate pushdown. This means memory usage will be limited to the selected regions and requested columns, but it’s important to ensure that your lake or SQL endpoint exposes partitioning in a way that Power BI can exploit; otherwise, more data may be loaded than strictly necessary. DirectLake provides a near “on-demand” experience, but effective partition pruning still depends on both the storage design and the query folding capabilities of the connector.

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS

EduardD · ‎09-08-2025

@tayloramy thank you so much for comprehensive answer. If I have 10 Billion records in Fact table, and DirectLake bring all those records (specific columns only I need to populate the report page) to memory I will face Capacity limit, right? In this case DirectQuery would be the only option for me, right? Maybe Direct Query + Some "User Defined" aggregation for most popular report views ..

tayloramy · ‎09-08-2025

Hi @EduardD,
That depends on the overall size of your model and what capacity you are on.

I've not run into any issues using DIrectLake on very large tables, but I have not quite hit the 10 billion mark, and I am running on F64 capacities.

I'd suggest you try DirectQuery first and see if the performance is good enough for you. If not, come back to the forums and we can explore DirectLake and optimization.

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution.

V-yubandi-msft · ‎09-08-2025

Hi @EduardD ,

Thank you for engaging with the Microsoft Fabric Community. In addition to the @tayloramy response, I wanted to share a few optional strategies that might be helpful depending on your situation.

1. SQL Endpoint + DirectQueryThis method lets you push filters like RegionID down to the source using Power Query and Dynamic M Parameters. It’s useful for row-level filtering before data reaches the model, though it doesn’t take advantage of Direct Lake’s in memory performance.

2. Prefiltered Delta tables- For regions that are often selected, you could maintain separate Delta tables or views scoped to those regions. This might help reduce scan time and make your model simpler, especially if isolating data by region is a frequent requirement.

Each option has its pros and cons, but they’re worth considering if Direct Lake’s column-only pruning doesn’t fully address your performance or filtering needs.

Thank you for the explanation @tayloramy

regards,
Yugandhar.

tayloramy · ‎09-07-2025

Hi @EduardD,

Direct Lake only prunes by columns, not by row partitions. When a visual runs, Power BI loads only the columns referenced in the DAX query into memory. It does not dynamically load only the selected RegionID partitions from the slicer (Direct Lake overview).

Partitioning your Delta tables can still help with maintenance or reduce the number of files scanned, but over-partitioning (hundreds of regions) often hurts Direct Lake performance. The best practice is to partition on low-cardinality keys (Partitioning guidance).

Even if users only pick a couple of regions, the engine still loads the needed column segments and filters rows at query time - it isn’t partition based. Performance depends much more on good Delta file layout: use V-Order, larger row groups, limit small files, and compact properly (Understand Direct Lake storage).

If region-level isolation is important, you could create materialized lake views or prefiltered Delta tables for each region. Materialized lake views let you declaratively define transformations, store results physically, and refresh them on a schedule (Materialized lake views). Note this feature is currently in preview, do not use it for any production grade work until it is GA.

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution.

EduardD · ‎09-07-2025

@tayloramy thank you for the quick and detailed answer.
I can bring each region into separate delata lake table.
Question: when user select a region in the slicer can I direct Power BI to only one of those tables with the region user selected, e.g. smth that will work as Dynamic M Paramter but for Direct Lake connection?
ref: Dynamic M query parameters in Power BI Desktop - Power BI | Microsoft Learn

tayloramy · ‎09-07-2025

Hi @EduardD,

Dynamic M Parameters are a feature of Power Query. Direct Lake doesn’t go through Power Query to load data, so this won’t work for Direct Lake tables.

In theory you could connect through the SQL Endpoint, which always uses a DirectQuery connection (not Direct Lake). That would let you use Power Query and Dynamic M Parameters, but you’d lose the performance benefits of Direct Lake

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution.

EduardD · ‎09-08-2025

With direct Query I can directly connect to my Azure SQL database, there is no much value to have a replica in Fabric, right?
BTW always had a question about Dynamic M parameter benefits: if Power BI folds query anyway my region will go to the similiar where clause of SQL query without Dynamic M parameter, correct? So using Dynamic M parameter should not have an impact on performance.

tayloramy · ‎09-08-2025

Hi @EduardD,

You're right that if your source is Azure SQL Database, connecting directly in DirectQuery will look very similar to going through the Fabric SQL Endpoint. In that case, there’s not a lot of benefit in replicating unless you want the governance, security, and integration features that Fabric provides (for example, centralizing access control and combining data from multiple lakehouse or warehouse sources).

On Dynamic M Parameters: the benefit isn't really about query folding performance. As you said, a slicer on Region will fold into a SQL WHERE clause anyway. The advantage of Dynamic M Parameters is that they let you parameterize the source query itself - things like swapping schemas or tables, or calling a stored procedure with arguments. If you're only filtering rows in a single fact table, there's usually no performance gain over a standard slicer.

Where Direct Lake is different is performance. Direct Lake loads the required columns from Delta/Parquet directly into the VertiPaq engine in Power BI. That means you get the query performance of Import mode (sub-second visuals, no round trips to the database) while still refreshing near real-time from OneLake. DirectQuery - whether to Azure SQL or the Fabric SQL Endpoint - will always send queries back to the source, so performance depends on network latency and database execution times (Direct Lake overview).

So in short: if you're fine with DirectQuery to Azure SQL, Fabric doesn't add raw speed, but it adds integration and governance. If you want the performance boost, Direct Lake is the real differentiator.

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution.