Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

October 28 & 29: Experts share their secrets on how to pass the Fabric Analytics Engineer certification exam—live. Learn more

Reply
robarivas
Post Patron
Post Patron

DISTINCTCOUNT Optimization

I believe the DistinctOrders DAX measure below gives accurate results and is faster than other versions I've tried but I need to know if I can make it faster. With the model I'm working with (see below) this takes around 12 seconds in DAX Studio server timings when slicing by products in ProdDim (which is unfortunately snowflaked to the fact table through a very large dim table). I'm hoping for something less than 3 seconds. Maybe 12 seconds is the best possible result given my data size and model structure but just want to make sure. Please assume I'm stuck with the model size and structure.

 

 

Model

 

ProdDim (225,000 rows)--->TransactionDim (5 million rows 😧)--->SalesFact (110 million rows)

 

Calendar[Date]--->SalesFact[DeliveryDate] (active relationship) or SalesFact[OrderDate] (inactive relationship)

 

 

Measures

 

OrderTransactionQuantity by Order Date =
CALCULATE (
SUM ( 'SalesFact'[OrderTransactionQuantity] ),
USERELATIONSHIP ( 'SalesFact'[Order Date], 'Calendar'[Date] )
)


DistinctOrders =
CALCULATE (
DISTINCTCOUNT ( 'SalesFact'[ClientID] ),
KEEPFILTERS ( FILTER ( ALL ( 'SalesFact'[ClientID] ), [OrderTransactionQuantity by Order Date] <> 0 ) ),
USERELATIONSHIP ( 'SalesFact'[Order Date], 'Calendar'[Date])
)

 

 

Example SalesFact Data:

 

ClientID TranDimKey Order Date Delivery Date OrderTransactionQuantity 
HTX180 2022-01-05   2022-01-07 1
HTX180 2022-01-05   2022-01-08  -1
BTX260 2022-01-09   2022-01-11 1
BTX270 2022-01-12   2022-01-14 1
BTX390 2022-01-15   2022-01-16 1
BTX380 2022-01-16   2022-01-17 1
HTX170 2022-01-16   2022-01-17 1
HTX170 2022-01-16   2022-01-19 -1

 

5 REPLIES 5
daXtreme
Solution Sage
Solution Sage

Hi @robarivas 

 

You are calling a measure in an iterator and on top of that you're iterating the whole column regardless of what's already been selected/crossfiltered in it.

 

KEEPFILTERS ( FILTER ( ALL ( 'SalesFact'[ClientID] ), [OrderTransactionQuantity by Order Date] <> 0 ) )

 

This line makes your measure slow.

 

By the way, why is there KEEPFILTERS in there? What's the reason? Are you by any chance slicing in the UI by the field SalesFact[ClientID]? That would be pretty bad as well but KEEPFILTERS in this place does nothing to make your measure faster.

Thank you @daXtreme Is there an alternative formulation you think might fix this? The pattern I used is one I got from page 703 of the Definitive Guide to DAX (2nd edition). Instances where a ClientID's quantity nets to zero on an Order Date need to not be considered in the DIstinct Count. If the data was more aggregated (so that I wouldn't have to sum the quantity) then that might help avoid referencing a measure maybe. 

Try this:

 

OrderTransactionQuantity by Order Date =
CALCULATE(
    SUM( 'SalesFact'[OrderTransactionQuantity] ),
    USERELATIONSHIP ( 'SalesFact'[Order Date], 'Calendar'[Date] )
)

DistinctOrders =
COUNTROWS(
    FILTER(
        DISTINCT( 'SalesFact'[ClientID] ),
        // Edit: Had to change > to <> as I can see from your data
        // you can have negative quantities as well.
        [OrderTransactionQuantity by Order Date] <> 0
    )
)

 

Thanks again @daXtreme I did a quick check and it looks like your formula also may be producing accurate results. However, initial testing suggests it may not be any faster. So I will do some more robust testing but I'm starting to think any/all remaining speed issues may be solely attributable to model structure/size/ design rather than unoptimized DAX.

Hiya

 

The problem you've got is that you filter rows based on a value of a measure. So, FILTER has to call it for each and every ClientId in the current context. If there are only a few clients, that's OK. But if there are thousands of them... This is where things start to get a bit hairy.

 

You could still try to code it in a different way. For instance:

[Distinct Orders] =
COUNTROWS(
    FILTER(
        ADDCOLUMNS(
            DISTINCT( SalesFact[ClientID] ),
            // I can see that quantities can be negative, hence
            // the check must be <> instead of >. The same applies
            // to my previous measure.
            "@QtyIsNonZero", [OrderTransactionQuantity by Order Date] <> 0
        ),
        [@QtyIsNonZero] // this is of type bool, so no comparison is needed
    )
)

 

 

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.