DISTINCTCOUNT Optimization

robarivas · ‎08-14-2022

I believe the DistinctOrders DAX measure below gives accurate results and is faster than other versions I've tried but I need to know if I can make it faster. With the model I'm working with (see below) this takes around 12 seconds in DAX Studio server timings when slicing by products in ProdDim (which is unfortunately snowflaked to the fact table through a very large dim table). I'm hoping for something less than 3 seconds. Maybe 12 seconds is the best possible result given my data size and model structure but just want to make sure. Please assume I'm stuck with the model size and structure.

Model

ProdDim (225,000 rows)--->TransactionDim (5 million rows 😧)--->SalesFact (110 million rows)

Calendar[Date]--->SalesFact[DeliveryDate] (active relationship) or SalesFact[OrderDate] (inactive relationship)

Measures

OrderTransactionQuantity by Order Date =
CALCULATE (
SUM ( 'SalesFact'[OrderTransactionQuantity] ),
USERELATIONSHIP ( 'SalesFact'[Order Date], 'Calendar'[Date] )
)

DistinctOrders =
CALCULATE (
DISTINCTCOUNT ( 'SalesFact'[ClientID] ),
KEEPFILTERS ( FILTER ( ALL ( 'SalesFact'[ClientID] ), [OrderTransactionQuantity by Order Date] <> 0 ) ),
USERELATIONSHIP ( 'SalesFact'[Order Date], 'Calendar'[Date])
)

Example SalesFact Data:

ClientID	TranDimKey	Order Date	Delivery Date	OrderTransactionQuantity
1	HTX180	2022-01-05	2022-01-07	1
1	HTX180	2022-01-05	2022-01-08	-1
1	BTX260	2022-01-09	2022-01-11	1
2	BTX270	2022-01-12	2022-01-14	1
3	BTX390	2022-01-15	2022-01-16	1
3	BTX380	2022-01-16	2022-01-17	1
4	HTX170	2022-01-16	2022-01-17	1
4	HTX170	2022-01-16	2022-01-19	-1

daXtreme · ‎08-14-2022

Hi @robarivas

You are calling a measure in an iterator and on top of that you're iterating the whole column regardless of what's already been selected/crossfiltered in it.

KEEPFILTERS ( FILTER ( ALL ( 'SalesFact'[ClientID] ), [OrderTransactionQuantity by Order Date] <> 0 ) )

This line makes your measure slow.

By the way, why is there KEEPFILTERS in there? What's the reason? Are you by any chance slicing in the UI by the field SalesFact[ClientID]? That would be pretty bad as well but KEEPFILTERS in this place does nothing to make your measure faster.

robarivas · ‎08-15-2022

Thank you @daXtreme Is there an alternative formulation you think might fix this? The pattern I used is one I got from page 703 of the Definitive Guide to DAX (2nd edition). Instances where a ClientID's quantity nets to zero on an Order Date need to not be considered in the DIstinct Count. If the data was more aggregated (so that I wouldn't have to sum the quantity) then that might help avoid referencing a measure maybe.

daXtreme · ‎08-15-2022

Try this:

OrderTransactionQuantity by Order Date =
CALCULATE(
    SUM( 'SalesFact'[OrderTransactionQuantity] ),
    USERELATIONSHIP ( 'SalesFact'[Order Date], 'Calendar'[Date] )
)

DistinctOrders =
COUNTROWS(
    FILTER(
        DISTINCT( 'SalesFact'[ClientID] ),
        // Edit: Had to change > to <> as I can see from your data
        // you can have negative quantities as well.
        [OrderTransactionQuantity by Order Date] <> 0
    )
)

robarivas · ‎08-16-2022

Thanks again @daXtreme I did a quick check and it looks like your formula also may be producing accurate results. However, initial testing suggests it may not be any faster. So I will do some more robust testing but I'm starting to think any/all remaining speed issues may be solely attributable to model structure/size/ design rather than unoptimized DAX.

daXtreme · ‎08-16-2022

Hiya

The problem you've got is that you filter rows based on a value of a measure. So, FILTER has to call it for each and every ClientId in the current context. If there are only a few clients, that's OK. But if there are thousands of them... This is where things start to get a bit hairy.

You could still try to code it in a different way. For instance:

[Distinct Orders] =
COUNTROWS(
    FILTER(
        ADDCOLUMNS(
            DISTINCT( SalesFact[ClientID] ),
            // I can see that quantities can be negative, hence
            // the check must be <> instead of >. The same applies
            // to my previous measure.
            "@QtyIsNonZero", [OrderTransactionQuantity by Order Date] <> 0
        ),
        [@QtyIsNonZero] // this is of type bool, so no comparison is needed
    )
)

DISTINCTCOUNT Optimization

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - February 2025

Fabric Community Update - February 2025

Join us at the 2025 Microsoft Fabric Community Conference

DISTINCTCOUNT Optimization

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - February 2025

Fabric Community Update - February 2025