Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hello,
I'm facing an issue where the size of my data is potentially too large for the type of visual I want to create. I have a dataset that is around 45M records, and I need to create a waterfall chart that would, to my mind, require some unpivoting of the data, as some of the fields we are bringing into the dataset are aggregates of other fields. For example:
Date | Market | 30 days past due | 60 days past due |
1/1/2021 | US | 7,000,000 | 8,000,401 |
1/1/2021 | CA | 5,300,000 | 500,130 |
Would then become the following in order to be able to do a waterfall breakdown by determining days past due based on distinct aging segments:
Date | Market | Metric | Value |
1/1/2021 | US | 30 days past due | 7,000,000 |
1/1/2021 | US | 60 days past due | 8,000,401 |
1/1/2021 | CA | 30 days past due | 5,300,000 |
1/1/2021 | CA | 60 days past due | 500,130 |
The problem is the size of my dataset. It's already fairly large (45M) and I'm potentially going to have to blow it up much further to create the waterfall views I need to create unless I a) limit certain segmentations or b) collapse certain segments together before I bring the dataset into PowerBI. Neither is ideal. Is there a way to do a waterfall chart in the way that I'm thinking of without completely blowing up my dataset or report?
Solved! Go to Solution.
Hi @v-kelly-msft,
No worries at all. After doing some research I was able to come up with something that I think will work. Essentially, what I've found is that if I can omit any and all records with a zero value, the dataset becomes much more manageable. The way to do it would be to create a calculated table as such:
UnpivotedTable = UNION(CALCULATETABLE(
SUMMARIZE(OriginalTable,
OriginalTable[date],
OriginalTable[segment1],
OriginalTable[segment2],
OriginalTable[segment3],
OriginalTable[segment4],
OriginalTable[segment5],
OriginalTable[segment6],
OriginalTable[segment7],
OriginalTable[segment8],
OriginalTable[segment9],
OriginalTable[segment10],
OriginalTable[segment11],
OriginalTable[segment12],
OriginalTable[segment13],
OriginalTable[segment14],
"Metric",
"30 days past due",
"Value",
sum(OriginalTable[30 days past due])),
FILTER(OriginalTable,OriginalTable[30 days past due] <> 0)),
CALCULATETABLE(
SUMMARIZE(OriginalTable,
OriginalTable[date],
OriginalTable[segment1],
OriginalTable[segment2],
OriginalTable[segment3],
OriginalTable[segment4],
OriginalTable[segment5],
OriginalTable[segment6],
OriginalTable[segment7],
OriginalTable[segment8],
OriginalTable[segment9],
OriginalTable[segment10],
OriginalTable[segment11],
OriginalTable[segment12],
OriginalTable[segment13],
OriginalTable[segment14],
"Metric",
"60 days past due",
"Value",
sum(OriginalTable[60 days past due])),
FILTER(OriginalTable,OriginalTable[60 days past due] <> 0)),
Whereas before unpivoting left me with a dataset 2-5 times the size of my original dataset, this one leaves me with a much smaller dataset that still contains all of the relevant data required for a working waterfall chart.
Hi @Anonymous ,
How about create a duplicated table then using unpivoting the columns ?
Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!
Hi @v-kelly-msft,
Wouldn't this still be an enormous table that I would have to store somewhere? That is really the issue...it would be technically possible to duplicate and unpivot a table in M or DAX but would I even be able to store that in my file given that the main dataset itself is already 45M records? The only thing I could possibly think of would be to bring in the data through Direct Query (hence not storing it in PBI) which is simply not feasible because of the limitations of that method.
Hi @Anonymous ,
If so,first create a dim table as below:
Then create a calculated table as below:
Table 2 = CROSSJOIN(SELECTCOLUMNS('Table',"Date",[Date],"Market",[Market]),VALUES('dim table'[Column1]))
And a measure as below:
Measure =
SWITCH(
SELECTEDVALUE('Table 2'[Column1]),
"30 days past due",CALCULATE(SUM('Table'[30 days past due]),FILTER(ALL('Table'),'Table'[Date]=MAX('Table 2'[Date])&&'Table'[Market]=MAX('Table 2'[Market]))),
"60 days past due",CALCULATE(SUM('Table'[60 days past due]),FILTER(ALL('Table'),'Table'[Date]=MAX('Table 2'[Date])&&'Table'[Market]=MAX('Table 2'[Market]))),BLANK())
Finally you will see:
For the related .pbix file,pls see attached.
Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!
Hi @v-kelly-msft,
Unfortunately this option still blows my dataset up to an unsustainable size. I think I will need to handle this on the DB side. Thank you for your help.
Hi @Anonymous ,
Sorry for not being able to help in the end.
Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!
Hi @v-kelly-msft,
No worries at all. After doing some research I was able to come up with something that I think will work. Essentially, what I've found is that if I can omit any and all records with a zero value, the dataset becomes much more manageable. The way to do it would be to create a calculated table as such:
UnpivotedTable = UNION(CALCULATETABLE(
SUMMARIZE(OriginalTable,
OriginalTable[date],
OriginalTable[segment1],
OriginalTable[segment2],
OriginalTable[segment3],
OriginalTable[segment4],
OriginalTable[segment5],
OriginalTable[segment6],
OriginalTable[segment7],
OriginalTable[segment8],
OriginalTable[segment9],
OriginalTable[segment10],
OriginalTable[segment11],
OriginalTable[segment12],
OriginalTable[segment13],
OriginalTable[segment14],
"Metric",
"30 days past due",
"Value",
sum(OriginalTable[30 days past due])),
FILTER(OriginalTable,OriginalTable[30 days past due] <> 0)),
CALCULATETABLE(
SUMMARIZE(OriginalTable,
OriginalTable[date],
OriginalTable[segment1],
OriginalTable[segment2],
OriginalTable[segment3],
OriginalTable[segment4],
OriginalTable[segment5],
OriginalTable[segment6],
OriginalTable[segment7],
OriginalTable[segment8],
OriginalTable[segment9],
OriginalTable[segment10],
OriginalTable[segment11],
OriginalTable[segment12],
OriginalTable[segment13],
OriginalTable[segment14],
"Metric",
"60 days past due",
"Value",
sum(OriginalTable[60 days past due])),
FILTER(OriginalTable,OriginalTable[60 days past due] <> 0)),
Whereas before unpivoting left me with a dataset 2-5 times the size of my original dataset, this one leaves me with a much smaller dataset that still contains all of the relevant data required for a working waterfall chart.
Hi @Anonymous ,
Thanks for sharing.
Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
97 | |
65 | |
45 | |
39 | |
31 |
User | Count |
---|---|
164 | |
111 | |
61 | |
53 | |
38 |