The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi fellow fabricators,
I was asked to provide options for the following scenario. Our customer has some very large fact tables in their gold lakehouse. To maintain performance in their PBI reports, they want to filter the fact table based on date (for instance include only the data of the last fiscal year) prior to using the data in the report. They are using direct lake semantic models.
I checked as best as I can and came up with the following:
So far, none of the above methods seem to suffice the requirements I got. Do you have other ideas on how to achieve this or do you see errors in my thought process? I am tahnkful for any sort of input.
Kind regards,
Niels
Solved! Go to Solution.
Hi there,
quick update on this.
We eventually went with the option of creating duplicate tables. We followed the following steps to ensure data integrity:
For the warehouse tables we appended a CTAS statement to the stored procedure which handles the transformations to the gold layer. The CTAS select data from the complete gold table and applies the desired filtering. Appending this to the stored procedure ensure that we are selecting the most recent version of the data.
We are hoping that the additional storage costs will be justified by faster queries on the filtered fact tables.
Hi @ObungiNiels
Thank you for reaching out microsoft fabric community forum.
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
Hi there,
quick update on this.
We eventually went with the option of creating duplicate tables. We followed the following steps to ensure data integrity:
For the warehouse tables we appended a CTAS statement to the stored procedure which handles the transformations to the gold layer. The CTAS select data from the complete gold table and applies the desired filtering. Appending this to the stored procedure ensure that we are selecting the most recent version of the data.
We are hoping that the additional storage costs will be justified by faster queries on the filtered fact tables.
Hi @ObungiNiels
Good to hear that you found a workaround. Please accept it as a solution so that it might be helpful for other community members with similar problems to solve it faster.
Thank you.
Hello @ObungiNiels
your research is valid for the options.
I would suggest Optimize Delta Table Layout.
Example: Sorting by `date` reduced a 50M-row table’s memory footprint to 444 MB in testing See the article below
Power BI loads only the required row groups into memory during queries, mimicking partition pruning even if not natively supported
https://datamonkeysite.com/2024/12/21/optimizing-parquet-layout-for-power-bi-direct-lake-mode
do explore materialised view with kql
User | Count |
---|---|
4 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
13 | |
11 | |
10 | |
6 | |
5 |