Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Hi Guys,
I created a calculated column calculation to find the running total.
Here is the data set. I have Dates, Individual IDs, Logic1 (It is a calculated column which find the breakpoints for consecutive days for Individual Ids as per dates. It sets to 1,0,0 for every consecutive cycle)
I need to create a running total of column "Logic1". Currently it is shown as "Running Total"
I created running total calculation as:
Running Total = SUMX ( FILTER ( Table2, Table2[Index] <= EARLIER ( Table2[Index] ) ), Table2[Logic1] )
Also, I tried:
Running Total 2 = CALCULATE ( SUM ( Table2[Logic1] ), FILTER ( ALL ( Table2 ), Table2[Index] <= EARLIER ( Table2[Index] ) ) )
Both of these calculation gives right result as expected with small dataset.
The issue I am facing is with performance. It takes 1 hour and sometime memory issue is faced. I have a dataset with 2 miliion record set as of now which will grow further.
How can I re-write this calculation so that it doesn't consume all of my memory? I researched that using RANKX function is a faster approach but struggling to implement it.
Please advice!
Solved! Go to Solution.
@Anonymous
Thanks to your suggestion! I have changed the calculation to:
Running Total = CALCULATE ( SUM ( 'Table2'[Logic 1] ), FILTER ( 'Table2', AND ( 'Table2'[Individual ID] = EARLIER ( 'Table2'[Individual ID] ), 'Table2'[Index] <= EARLIER ( 'Table2'[Index] ) ) ) )
And the calculation of Logic 2 to:
Logic 2 = RANKX ( FILTER ( 'Table2', AND('Table2'[Running Total] = EARLIER ( 'Table2'[Running Total]) , 'Table2'[Individual ID] = EARLIER('Table2'[Individual ID])) ), 'Table2'[Index], , ASC )
Now the columns is getting calculation in less than 1 min than 1hour before.
Thanks!!
@Mann -
The problem is with iterations. Picture row #2M. First, it checks whether its index is smaller. Then it takes those 2M rows and adds them together.
Is there any way to break this down by some attribute? That would reduce the number of rows it checks dramatically, first because it would only check rows within that category, and secondly it would have less rows to add up.
Hope this helps,
Nathan
Hi Nathan,
Thanks for the reply.
I think we can break it by Individual IDs as a category. I mean if it is scanning the complete table then passing filter conditon on Individual Ids would also give my final result i.e. Run the cumulative total per individual ID.
Also, I am calculating running total because I want to RANK every set of consecutive days per Ind Id later. Currently the calculation for that is (third calculated column):
Logic 2 = RANKX ( FILTER ( Table2, Table2[Running Total] = EARLIER ( Table2[Running Total] ) ), Table2[Index], , ASC )
This si giving expected result as per earlier calculation. If I change the calculation for "Running Total" as per your advice then I might need to change above calculation too. I am fine with it.
Just struggling to get it done.
Here is the final outcome coming as per current calculations:
@Anonymous
Thanks to your suggestion! I have changed the calculation to:
Running Total = CALCULATE ( SUM ( 'Table2'[Logic 1] ), FILTER ( 'Table2', AND ( 'Table2'[Individual ID] = EARLIER ( 'Table2'[Individual ID] ), 'Table2'[Index] <= EARLIER ( 'Table2'[Index] ) ) ) )
And the calculation of Logic 2 to:
Logic 2 = RANKX ( FILTER ( 'Table2', AND('Table2'[Running Total] = EARLIER ( 'Table2'[Running Total]) , 'Table2'[Individual ID] = EARLIER('Table2'[Individual ID])) ), 'Table2'[Index], , ASC )
Now the columns is getting calculation in less than 1 min than 1hour before.
Thanks!!
User | Count |
---|---|
84 | |
76 | |
75 | |
43 | |
36 |
User | Count |
---|---|
109 | |
56 | |
52 | |
48 | |
43 |