Solved: DAX Average of Distinct Count Over two tables

Anonymous · ‎12-17-2020

I have two tables (Sales,Stock) both have a DateKey and ProductCode. The requirement is to count the Distinct ProductCodes across both tables by Financial week driven by a custom Date Dimension. There is a further requirement to take the Average of the Weekly Distinct counts if multiple weeks are selected not the distinct count for all weeks.

The below DAX works in my unit tests, however on real data volumes (300+ million rows imported) it grinds to a complete standstill. If I select a week it takes about 5 sec, for 12 months its 60 seconds. They need to do YoY and WoW calcs so this perfomance will not be good enough. Not sure how else to achieve this.

Here is the DAX I've written:

Option Count = 
    // Get options at a week level for Sales
    VAR SalesOptions =
        SELECTCOLUMNS (
            Sales,
            "Week",
                CALCULATE (
                    MAX ( Dates[Financial Year Week Number] ),
                    FILTER ( ALL ( Dates ), Sales[DimBilledDateSKey] = Dates[DimDateSKey] )
                ),
            "ProductOptionCode", 'Sales'[ProductOptionCode]
        )
    // Get options at a week level for Stock
    VAR StockOptions =
        SELECTCOLUMNS (
            Stock,
            "Week",
                CALCULATE (
                    MAX ( Dates[Financial Year Week Number] ),
                    FILTER ( ALL ( Dates ), Stock[DimDateSKey] = Dates[DimDateSKey] )
                ),
            "ProductOptionCode", 'Stock'[ProductOptionCode]
        )
    // Union the options from both and then apply a distinct so that a count of the product code would be the same result as a distinct count
    VAR CombinedOptions = DISTINCT ( UNION ( SalesOptions, StockOptions ) )
            
    RETURN
        // Average the weekly disitnct option count
        AVERAGEX( 
            // Group the above union by week and do a count of the options for the week. They should now be distinct due to the distinct applied to the union
            GROUPBY(CombinedOptions, [Week], "Week Option Count", COUNTX(CURRENTGROUP(),  [ProductOptionCode] )),
            [Week Option Count]
        )

daxer-almighty · ‎12-17-2020

[Option Count] =

// [FinancialWeekID] should be an int that uniquely
// determines the financial week across the whole
// calendar.
var __visibleWeeks = DISTINCT( Dates[FinancialWeekID] )

// First, you have to make sure that full weeks are
// selected/visible in the current context.
var __fullWeeksVisible =    
    var __numberOfWeeks = COUNTROWS( __visibleWeeks )
    var __numberOfDays = COUNTROWS( Dates )
    var __numberOfAllDays =
        CALCULATE(
            // Of course, the granularity of the calendar
            // is DAY (=DimDateSKey).
            COUNTROWS( Dates ),
            REMOVEFILTERS( Dates ),
            __visibleWeeks
        )
    RETURN
        ( __numberOfAllDays = __numberOfDays )

var __result =
    if( __fullWeeksVisible,
        AVERAGEX(
            // For very visible week, calculate
            // the number of distinct products
            // that can be found for each week
            // in either Sales or Stock.
            __visibleWeeks,
            calculate(
                var __salesProds =
                    DISTINCT( Sales[ProductOptionCode] )
                var __stockProds =
                    DISTINCT( Stock[ProductOptionCode] )
                var __prods =
                    distinct(
                        union(
                            __salesProds,
                            __stockProds
                        )
                    )
                RETURN 
                    // If it so happens that COUNTROWS returns
                    // BLANK for any week and you want to treat
                    // it as 0 and include in the average calculation,
                    // then add 0 to COUNTROWS, like:
                    // COUNTROWS( __prods ) + 0
                    COUNTROWS( __prods )
            )
        )
    )
return
    __result

Your code is very slow because you're calculating your measure row by row going through the fact tables. One should never do this. Row by row calculations will always lead to horrible performance. Always. The code above should be blazingly fast.

View solution in original post

daxer-almighty · ‎12-17-2020

If the code I gave you is not fast, then it means you've got a problem with the model or your computer is weak. I'd also suggest you also make two two tables into one, if at all possible. IF you do this, you'll get rid of the step where you have to union 2 sets of codes and then take DISTINCT. This, I believe, is the slowest operation here.

By the way, is your model in the Direct Query or the Import mode? If it's DQ, then... you have to make sure you optimize at the source. DAX won't give you much speed-up in this case. Since in such a model DAX is translated into the language of the source, you don't have many options. Just optimize at the source by intercepting the query that goes into it (using DAX Studio).

daxer-almighty · ‎12-17-2020

[Option Count] =

// [FinancialWeekID] should be an int that uniquely
// determines the financial week across the whole
// calendar.
var __visibleWeeks = DISTINCT( Dates[FinancialWeekID] )

// First, you have to make sure that full weeks are
// selected/visible in the current context.
var __fullWeeksVisible =    
    var __numberOfWeeks = COUNTROWS( __visibleWeeks )
    var __numberOfDays = COUNTROWS( Dates )
    var __numberOfAllDays =
        CALCULATE(
            // Of course, the granularity of the calendar
            // is DAY (=DimDateSKey).
            COUNTROWS( Dates ),
            REMOVEFILTERS( Dates ),
            __visibleWeeks
        )
    RETURN
        ( __numberOfAllDays = __numberOfDays )

var __result =
    if( __fullWeeksVisible,
        AVERAGEX(
            // For very visible week, calculate
            // the number of distinct products
            // that can be found for each week
            // in either Sales or Stock.
            __visibleWeeks,
            calculate(
                var __salesProds =
                    DISTINCT( Sales[ProductOptionCode] )
                var __stockProds =
                    DISTINCT( Stock[ProductOptionCode] )
                var __prods =
                    distinct(
                        union(
                            __salesProds,
                            __stockProds
                        )
                    )
                RETURN 
                    // If it so happens that COUNTROWS returns
                    // BLANK for any week and you want to treat
                    // it as 0 and include in the average calculation,
                    // then add 0 to COUNTROWS, like:
                    // COUNTROWS( __prods ) + 0
                    COUNTROWS( __prods )
            )
        )
    )
return
    __result

Your code is very slow because you're calculating your measure row by row going through the fact tables. One should never do this. Row by row calculations will always lead to horrible performance. Always. The code above should be blazingly fast.

Anonymous · ‎12-17-2020

Thanks this makes a lot more sense. I knew there must be another way of achieving this. In most cases its still faster than my solution, however when I selecet a year its only marginally faster. I will see if there is a way I can aggregate the data in the model to import a distinct union list, possibly already counted and make use of aggregate tables

AntrikshSharma · ‎12-17-2020

@Anonymous Try to write something like this :

Option Count =
VAR SalesOptions =
    ADDCOLUMNS (
        VALUES ( Sales[ProductOptionCode] ),
        "Week",
            CALCULATE (
                MAX ( Dates[Financial Year Week Number] ),
                ALL ( Dates ),
                CROSSFILTER ( Sales[DimBilledDateSKey], Dates[DimDateSKey], BOTH )
            )
    )
VAR StockOptions =
    ADDCOLUMNS (
        VALUES ( 'Stock'[ProductOptionCode] ),
        "Week",
            CALCULATE (
                MAX ( Dates[Financial Year Week Number] ),
                ALL ( Dates ),
                CROSSFILTER ( Stock[DimDateSKey], Dates[DimDateSKey], BOTH )
            )
    )
VAR CombinedOptions =
    DISTINCT ( UNION ( SalesOptions, StockOptions ) )
VAR Result =
    AVERAGEX (
        GROUPBY (
            CombinedOptions,
            [Week],
            "Week Option Count", COUNTX ( CURRENTGROUP (), [ProductOptionCode] )
        ),
        [Week Option Count]
    )
RETURN
    Result

Anonymous · ‎12-17-2020

Thanks I have tried this and its unfirtunately slower than my solution even only with one week seleced and eventually comes up with this error. I'm not opposed to change the DAX completely, I'm not even sure if my approach is the correct one. Is there any other way to solve this?

AntrikshSharma · ‎12-17-2020

@Anonymous Yup, your approach is definitely not the correct one because if there are 100 Million rows in filter context you will do context transition for those 100 million rows. If you are unable to share the file, then i would suggest that you use a calculated table for the code upto CombinedOptions variable. If you can share the file that would be good. Then we can check the internal queries generated by PBI.

AlB · ‎12-17-2020

Hi @Anonymous

Can you provide a pbix with some sample data we can run a few tests on? Perhaps the one you are using for your unit tests?

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

DAX Average of Distinct Count Over two tables

Helpful resources

Power BI Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

DAX Average of Distinct Count Over two tables

Helpful resources

Power BI Monthly Update - September 2025

FabCon Atlanta 2026