The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Dear Experts
I have a question around partitions, and best practices.
Scenario
We have a Lakehouse in use - which will underpin a number of Direct lake semantic models, as well as used for querying directly through the SQL end point. It's been designed to follow a fairly simple star schema model, very flat fact / dimensional modelling.
The basic partition strategy was to partition the larger fact tables based on Year-Month, pushing a specific column into the Gold layer table specifically for partitioning, creating approx 60 partitions (12 Months * 5 Years of data) ~40million rows in total for one of the fact tables.
Then in reading up some more, the theory of partition pruning was mentioned - which caught my eye, and has caused some conflicting info regarding the validity of the partitions.
Question
Given that most of my user queries will be driven from the semantic model, and through power BI itself, any filtering of the data is generally done from the dimension tables themselve (as multiple fact tables will be linked to that dim table). So for example - if I want to select a fiscal year, we would filter it from the dimension table, which in turn would link to the fact table based on the date and return all the necessary dates in that fiscal year.
As my partitions are based on Year-Period (doesn't include date) - does this mean that Partition Pruning doesn't occur given that my filtering of that table will be coming from the join with my dim_date table - which would pass down the dates related to the Year-Period rather than the Year-Period itself. Would partition pruning only be activated if and when I query directly on the Year-Period column that i've introduced for the partitions?
Any advice on this would be greatly appreciated.
Thanks
Neil
Solved! Go to Solution.
Hi @NEJO ,
Thanks for reaching out to the Microsoft fabric community forum.
In Power BI and Fabric, partition pruning is most effective when filters are applied directly to the column used for partitioning in the fact table. When filters are applied indirectly such as through a related dimension table partition pruning typically does not occur. This is because the query engine cannot reliably determine which partitions to scan until after joins are evaluated, which happens too late in the query execution process for pruning to take effect. To improve pruning in such scenarios, it’s recommended to include commonly filtered attributes (like FiscalYear) directly in the fact table, allowing filters to act on those columns directly. This denormalization ensures that partition pruning can occur and helps optimize query performance.
If you find this post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thank you
Hi @NEJO
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
Hi @NEJO ,
Thanks for reaching out to the Microsoft fabric community forum.
In Power BI and Fabric, partition pruning is most effective when filters are applied directly to the column used for partitioning in the fact table. When filters are applied indirectly such as through a related dimension table partition pruning typically does not occur. This is because the query engine cannot reliably determine which partitions to scan until after joins are evaluated, which happens too late in the query execution process for pruning to take effect. To improve pruning in such scenarios, it’s recommended to include commonly filtered attributes (like FiscalYear) directly in the fact table, allowing filters to act on those columns directly. This denormalization ensures that partition pruning can occur and helps optimize query performance.
If you find this post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thank you
Thanks for the response.
How would I achieve that though, as my model allows for selection of anything from the caelendar (day, month, year, calendar year, fiscal year etc) from my dimension, so the join and filtering will never be direct on period from within the fact table directly, as the join will always be on date. I have multiple fact tables that are required to filter based on that selection from the dim table, so it doesn't seem right to apply the filter to the fact table(s) directly.
Maybe partitions are not required in my instance, given the size of my data tables - but would be good to understand how to get around this type of scenario for future developments.
Thanks
Neil
Hi @NEJO
Unfortunately, we have to work around this limitation, as partition pruning is only triggered when the filter condition is applied directly to the partitioned column in the fact table, rather than through a relationship or join via a dimension table.
Given your current data volume (40 million rows per fact table) and a well-structured star schema, it's very possible that partitioning may not yield a noticeable performance benefit especially in Direct Lake mode, where data access is optimized for read operations on delta tables. In fact, excessive partitioning can increase management overhead without a proportional benefit at this scale.
If you find this post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thank you
40 million rows is on the small side. That can easily fit into a single partition. You will want to aim for partition sizes around 200 million rows per partition.