The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hello,
For some I've been trying and struggling with the following dilemma.
The below is an oversimplified version of my real PowerBI report.
The real source is > 100GB of csv files. The current resulting PowerBI files is over 1 GB in size.
I would like to reduce the footprint (and possibly make the report a bit better performing).
Once the data is ingested it is in a single table.
As you can see in the below example there is a column called "ServiceSKU" that is a unique identifier for the values in colums "Service Action", "Service Product", "Service Component" and "Service Description".
My idea would be to split those 5 columns off of the table into a new table. Than run "Unique" on them to leave a nice lookup table. This should help reducing size. A classic 1-to-many lookup table as I would do in database design.
Though... that only helps if I can delete the 4x columns ("Service Action", "Service Product", "Service Component" and "Service Description") from the original table.
I've tried "reference" table, though when you then delete the column from the original table the referenced table doesn't work anymore.
Doing a "duplicate" table becomes a very expensive operation, while it would have to read all of the data twice.
Is there any way to achieve this (other than using duplicate)?
2x circles with the 1: split table into two tables. Where orange table will have unique values and green table only has the lookup key (ServiceSKU) to find the service information in the orange table via a 1-to-many (circle with 2) relationship.
@fmasselink , I think DAX summarize can be good option in this case, In case you have not tried. Please try that
@amitchandak thank you for the swift reply. Would that also decrease the overall size of the resulting report?
Wouldn't DAX Summarize merely create an extra table/view? It would not delete the information from the original table?
Reducing the size of my report is the primary goal.
I used Grouping in PowerQuery, though instead of ~8 hours to read the data it took > 48 hours. So l'm looking at better ways.
User | Count |
---|---|
87 | |
84 | |
36 | |
35 | |
30 |
User | Count |
---|---|
95 | |
74 | |
67 | |
52 | |
51 |