Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I need to create a measure for dynamic ABC analysis and run into heavy performance issues when calculating the running total.
My measure ranks all customers by sales (and keeps filter context of all other dimensions alive) and calculates the running total. It works as intended and I already spent a lot of time doing performance optimization. However, it is still way too slow. It barely even works for 10.000 customers, but it should work for more than 100.000 customers.
running_total =
VAR _my_sales = [Customer Sales]
VAR _allSales =
FILTER (
ADDCOLUMNS (
CALCULATETABLE (
'Dim Customer',
CROSSFILTER ( 'Fact Sales'[Customer_SKey], 'Dim Customer'[Customer_SKey], BOTH ),
REMOVEFILTERS ( 'Dim Customer' )
),
"@Sales", CALCULATE ( [Customer Sales] )
),
[@Sales] > 0
)
VAR _ranked_sales =
ADDCOLUMNS (
_allSales,
"@runningtotal",
SUMX (
WINDOW ( 1, ABS, 0, REL, _allSales, ORDERBY ( [@Sales], DESC ) ),
[@Sales]
)
)
RETURN
MAXX ( FILTER ( _ranked_sales, [@Sales] = _my_sales ), [@runningtotal] )
Can anybody explain in detail why performance keeps dropping (and maybe even offer a solution)?
Things which do not satisfy the requirements:
Please do not post an AI response. I already spent hours discussing the problem with AI and really hope for human expertise.
Solved! Go to Solution.
Hi, thanks for your replies 😊
@v-agajavelly your approach runs into the same issues, since [running total sales] uses iterators as well and eats up way too much memory.
I have come up with a different solution where I can avoid running totals alltogether.
The approach is based on z-scores (standard deviation form arithmetic mean). That is something, the storage engine can calculate incredibly fast for huge datasets.
The second step is, to find the correct threshold for the cumulative sales percentage (e.g. all customers with z-score > 2.5 are responsible for 50% of all sales). This second step of parameter calibration is also very simple in terms of computation power.
I've come up with a consistent ABC formula which works fast for millions of customers.
ABC-Class =
-- Performs a dynamic ABC classification
-- Replaces traditional running-total iterators with Z-scores
-- → scales to very large datasets without the usual iterator slowdown
-- Target cumulative percentages are approximated within a few points
-- “Magic” constants for the functional approximation g(y) = k / y – c
-- k = slope · c = horizontal shift (tune if your sales distribution changes)
VAR _magic_k = 0.6
VAR _magic_c = 0.9
-- Target cumulative-sales cut-offs for the classes (can be parameterised)
VAR _a_threshold = 0.50 -- top-A customers should cover 50 % of sales
VAR _b_threshold = 0.75 -- A + B together 75 %
VAR _c_threshold = 0.90 -- A + B + C together 90 %
-- Initial Z-score limits derived from the k/c approximation
VAR _z_a_threshold_initial = DIVIDE(_magic_k, _a_threshold) - _magic_c
VAR _z_b_threshold_initial = DIVIDE(_magic_k, _b_threshold) - _magic_c
VAR _z_c_threshold_initial = DIVIDE(_magic_k, _c_threshold) - _magic_c
/* ------------------------------------------------------------------------- */
/* Build a customer-level sales table */
/* Adapt only the two variables below if your model uses other tables/fields */
/* ------------------------------------------------------------------------- */
VAR _customer_sales =
FILTER (
ADDCOLUMNS (
CALCULATETABLE (
'Dim Customer',
REMOVEFILTERS ( 'Dim Customer' ),
CROSSFILTER ( 'Fact Sales'[Customer_SKey], 'Dim Customer'[Customer_SKey], BOTH )
),
"@Sales", [Customer Sales]
),
[@Sales] > 0
)
/* Total sales – needed once to adjust the Z-score limits */
VAR _all_sales =
CALCULATE ( [Customer Sales], REMOVEFILTERS ( 'Dim Customer' ) )
/* Mean, standard deviation and Z-score per customer */
VAR _stdev_sales = STDEVX.P ( _customer_sales, [@Sales] )
VAR _avg_sales = AVERAGEX ( _customer_sales, [@Sales] )
VAR _z_score =
ADDCOLUMNS (
_customer_sales,
"@z_score", DIVIDE ( [@Sales] - _avg_sales, _stdev_sales )
)
/* ------------------------------------------------------------------------- */
/* One-shot adjustment of the Z-score limits */
/* Using h(x₀,a,y) = (a / y)·(x₀ + c) – c from our functional approximation */
/* ------------------------------------------------------------------------- */
VAR _a_threshold_actual =
DIVIDE (
SUMX ( FILTER ( _z_score, [@z_score] > _z_a_threshold_initial ), [@Sales] ),
_all_sales
)
VAR _z_a_threshold_adjusted =
( DIVIDE ( _a_threshold_actual, _a_threshold ) * ( _z_a_threshold_initial + _magic_c ) )
- _magic_c
VAR _b_threshold_actual =
DIVIDE (
SUMX ( FILTER ( _z_score, [@z_score] > _z_b_threshold_initial ), [@Sales] ),
_all_sales
)
VAR _z_b_threshold_adjusted =
( DIVIDE ( _b_threshold_actual, _b_threshold ) * ( _z_b_threshold_initial + _magic_c ) )
- _magic_c
VAR _c_threshold_actual =
DIVIDE (
SUMX ( FILTER ( _z_score, [@z_score] > _z_c_threshold_initial ), [@Sales] ),
_all_sales
)
VAR _z_c_threshold_adjusted =
( DIVIDE ( _c_threshold_actual, _c_threshold ) * ( _z_c_threshold_initial + _magic_c ) )
- _magic_c
/* Z-score of the current customer row */
VAR _my_z_score =
DIVIDE ( [Customer Sales] - _avg_sales, _stdev_sales )
/* ------------------------------------------------------------------------- */
/* Final classification */
/* ------------------------------------------------------------------------- */
RETURN
SWITCH (
TRUE (),
_my_z_score > _z_a_threshold_adjusted, "A",
_my_z_score > _z_b_threshold_adjusted, "B",
_my_z_score > _z_c_threshold_adjusted, "C",
"D"
)
Hi, thanks for your replies 😊
@v-agajavelly your approach runs into the same issues, since [running total sales] uses iterators as well and eats up way too much memory.
I have come up with a different solution where I can avoid running totals alltogether.
The approach is based on z-scores (standard deviation form arithmetic mean). That is something, the storage engine can calculate incredibly fast for huge datasets.
The second step is, to find the correct threshold for the cumulative sales percentage (e.g. all customers with z-score > 2.5 are responsible for 50% of all sales). This second step of parameter calibration is also very simple in terms of computation power.
I've come up with a consistent ABC formula which works fast for millions of customers.
ABC-Class =
-- Performs a dynamic ABC classification
-- Replaces traditional running-total iterators with Z-scores
-- → scales to very large datasets without the usual iterator slowdown
-- Target cumulative percentages are approximated within a few points
-- “Magic” constants for the functional approximation g(y) = k / y – c
-- k = slope · c = horizontal shift (tune if your sales distribution changes)
VAR _magic_k = 0.6
VAR _magic_c = 0.9
-- Target cumulative-sales cut-offs for the classes (can be parameterised)
VAR _a_threshold = 0.50 -- top-A customers should cover 50 % of sales
VAR _b_threshold = 0.75 -- A + B together 75 %
VAR _c_threshold = 0.90 -- A + B + C together 90 %
-- Initial Z-score limits derived from the k/c approximation
VAR _z_a_threshold_initial = DIVIDE(_magic_k, _a_threshold) - _magic_c
VAR _z_b_threshold_initial = DIVIDE(_magic_k, _b_threshold) - _magic_c
VAR _z_c_threshold_initial = DIVIDE(_magic_k, _c_threshold) - _magic_c
/* ------------------------------------------------------------------------- */
/* Build a customer-level sales table */
/* Adapt only the two variables below if your model uses other tables/fields */
/* ------------------------------------------------------------------------- */
VAR _customer_sales =
FILTER (
ADDCOLUMNS (
CALCULATETABLE (
'Dim Customer',
REMOVEFILTERS ( 'Dim Customer' ),
CROSSFILTER ( 'Fact Sales'[Customer_SKey], 'Dim Customer'[Customer_SKey], BOTH )
),
"@Sales", [Customer Sales]
),
[@Sales] > 0
)
/* Total sales – needed once to adjust the Z-score limits */
VAR _all_sales =
CALCULATE ( [Customer Sales], REMOVEFILTERS ( 'Dim Customer' ) )
/* Mean, standard deviation and Z-score per customer */
VAR _stdev_sales = STDEVX.P ( _customer_sales, [@Sales] )
VAR _avg_sales = AVERAGEX ( _customer_sales, [@Sales] )
VAR _z_score =
ADDCOLUMNS (
_customer_sales,
"@z_score", DIVIDE ( [@Sales] - _avg_sales, _stdev_sales )
)
/* ------------------------------------------------------------------------- */
/* One-shot adjustment of the Z-score limits */
/* Using h(x₀,a,y) = (a / y)·(x₀ + c) – c from our functional approximation */
/* ------------------------------------------------------------------------- */
VAR _a_threshold_actual =
DIVIDE (
SUMX ( FILTER ( _z_score, [@z_score] > _z_a_threshold_initial ), [@Sales] ),
_all_sales
)
VAR _z_a_threshold_adjusted =
( DIVIDE ( _a_threshold_actual, _a_threshold ) * ( _z_a_threshold_initial + _magic_c ) )
- _magic_c
VAR _b_threshold_actual =
DIVIDE (
SUMX ( FILTER ( _z_score, [@z_score] > _z_b_threshold_initial ), [@Sales] ),
_all_sales
)
VAR _z_b_threshold_adjusted =
( DIVIDE ( _b_threshold_actual, _b_threshold ) * ( _z_b_threshold_initial + _magic_c ) )
- _magic_c
VAR _c_threshold_actual =
DIVIDE (
SUMX ( FILTER ( _z_score, [@z_score] > _z_c_threshold_initial ), [@Sales] ),
_all_sales
)
VAR _z_c_threshold_adjusted =
( DIVIDE ( _c_threshold_actual, _c_threshold ) * ( _z_c_threshold_initial + _magic_c ) )
- _magic_c
/* Z-score of the current customer row */
VAR _my_z_score =
DIVIDE ( [Customer Sales] - _avg_sales, _stdev_sales )
/* ------------------------------------------------------------------------- */
/* Final classification */
/* ------------------------------------------------------------------------- */
RETURN
SWITCH (
TRUE (),
_my_z_score > _z_a_threshold_adjusted, "A",
_my_z_score > _z_b_threshold_adjusted, "B",
_my_z_score > _z_c_threshold_adjusted, "C",
"D"
)
Hi @AMeyersen ,
If your report is freezing when you try to calculate running totals over ranked customers, it’s not your fault it’s just Power BI being asked to do way too much.
When you use things like WINDOW, SUMX and nested filters, Power BI has to stop and think hard for every single customer. Multiply that by 100,000 customers, and yeah… it’s going to struggle.
First Rank Your Customers. Start by giving each customer a rank based on their sales. By using bellow dax.
Customer Rank = RANKX(ALLSELECTED('Dim Customer'), [Customer Sales],,DESC,DENSE)
This tells Power BI who’s #1, #2, #3, etc. It’s quick, respects filters, and sets the stage for everything else.
Then Build the running total. Now use that rank to calculate cumulative sales by using bellow dax.
Running Total Sales = VAR CurrentRank = [Customer Rank]
RETURN
CALCULATE([Customer Sales],FILTER(ALLSELECTED('Dim Customer'),[Customer Rank] <= CurrentRank))
So if I’m ranked #5, this adds up sales from ranks 1 to 5. Power BI loves this because it doesn’t have to re-do the math for every row.
Optional Bucket customers into A, B, C, Want to group customers by value? Try bellow Dax.
ABC Class = SWITCH(TRUE(),[Customer Rank] <= [Total Customers] * 0.2, "A", [Customer Rank] <= [Total Customers] * 0.5, "B","C")
This is a problem in Tabular as this calculation is very intensive for the architecture of Tabular.
10.000 Customers is already a lot
There is no solution up to now, in the dinamic case performance is awful
If this helped, please consider giving kudos and mark as a solution
@me in replies or I'll lose your thread
consider voting this Power BI idea
Francesco Bergamaschi
MBA, M.Eng, M.Econ, Professor of BI
User | Count |
---|---|
22 | |
11 | |
8 | |
6 | |
6 |
User | Count |
---|---|
25 | |
13 | |
11 | |
9 | |
6 |