Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
I have one large data set (300MB about 700k rows).
I would like to make a series of dimension tables to reduce the size. My process has been to duplicate the original query, remove the columns I don't need, remove duplicates, add an index column. Then I will merge it back to the original query and just keep the index column.
However when I refresh my project it goes to 600MB. And when I add another query and follow the dame process it goes to 900MB. How can I avoid this I thought creating Dim tables would make the project run smoother with less data..
Hi,
Thanks for the solutions @Anonymous and @Joe_Barry provided and i want to offer some more information for user to refer to.
hello @Anonymous , based on your description , you can consider to use calculated table to create the demension table in desktop. such as
Table=summerize(table,[index]...)
You can refer to the following link about it.
Using calculated tables in Power BI Desktop - Power BI | Microsoft Learn
Best Regards!
Yolo Zhu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
First, are you confusing the query size with the data model size? You might see the table queried 3 times, but your resulting data model will be much smaller if you've replaced a non-integer dimension with integers.
Second, instead of duplicating the table, duplicate just the data source, remove all of the other columns (after duplicating the filter logic if necessary), then make distinct and add the index. Then perform your join. If your indexes are both sorted the same way, then you can use Table.Join instead of NestedJoin, and use the JoinAlgorithm.Sortmerge for the JoinAlgorithm parameter. This way the data will be streamed and not have the table brought into memory.
--Nate
Hi @Anonymous
From which source are you getting your data from? Can you write an SQL query just to bring in the columns you need?
I would also do the following
Hope this helps
Joe
Proud to be a Super User! | |
Date tables help! Learn more
I don't have access to SQL on my work computer but that could be an option. Thanks for your response. I didn't realize that the duplication would still be a large file after I removed the columns I didn't need.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
| User | Count |
|---|---|
| 5 | |
| 3 | |
| 3 | |
| 3 | |
| 2 |
| User | Count |
|---|---|
| 7 | |
| 5 | |
| 5 | |
| 5 | |
| 4 |