Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Hazenm
Advocate II
Advocate II

Non Distinct Count Calculation Efficiency

Have a challenge for folk out there. Maybe there's a much simpler way to do this and I'm not realizing it, but here is the scenario. 

I have a table of sales that is right around 1 MM rows. It is a sales order table with 13 columns. 
The two important columns for this calc are: Customer & OrderNbr

I am calculating 12 Month Repurchase Rates. 
The formula for this is simply = Total Repeat customers / Total Customers = Repurchase Rate. If I have 100 customers and 10 buy more than once, I have a 10% customer repurchase rate. 

Now, I have a chart, and on the X axis is fiscal period. In each fiscal period, I am calculating the 12 month repurchase rate. 
I am doing so by using virtual tables. 
My calculations are: 

VAR repeatTable = 

SUMMARIZE(
   CALCULATETABLE(
         'All Order Table',
         DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
      ),
   'All Order Table'[Customer],
   "OrderCount",
   COUNT('All Order Table'[OrderNbr])
)
The table is structured so that each ROW is an order. That's why I can just COUNT(ordernbr) here and it shows the correct number of orders.
So I first group the table by customer, filtering it to the 12 month date range, and counting the order numbers. 
Then count the repeat cust: 

VAR repeatCust = 
COUNTROWS(
   FILTER(
      custTable,
      [OrderCount] > 1
   )
)

Then I calc all customers and divide. 
   
VAR allCust =
CALCULATE(
   DISTINCTCOUNT('All Order Table'[Customer]),
   DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
)
VAR MMT = repeatCust / allCust

It definitely does the job. BUT with 1MM + rows, it is slow. This calculation takes 16 - 20 seconds to run, because remember, on my chart, I have Period on the X axis, so when I load the visual, I'm calculating this for many (20+) periods at a time. 

The "allcust" part of the formula is super fast. Like ~1 second. 
The repeat table takes about 6-8 seconds
When I "FILTER" the repeat table and COUNTROWS there, it goes from the 6-8 seconds to the 14-16 second range. 

So the challenge is, how would you more efficiently calculate this variable? 
I really just need a "COUNTNONDISTINCT" function!

If a customer only purchased once or twice, you could just do COUNT - DISTINCTCOUNT and get your answer. And I feel like maybe there is something there...



1 ACCEPTED SOLUTION

@Hazenm 

Thanks for testing those out!

 

Good to get an idea of relative performance, and glad some of them are performing better, but it feels as though performance should be much better!

 

Period Table filtering

Looking again at the code you posted earlier to determine the date range to filter, have I understood correctly that 'Period Table' your sole date table?

I think you could improve performance by rewriting the logic to filter 'Period Table' like this:

 

VAR curPeriodIndex =
    -- I would generally prefer MAX rather than MIN
    -- Doesn't affect performance but makes more sense if filtering
    -- on multiple Periods.
    MAX ( 'Period Table'[Period Index] )
VAR startPeriod =
    curPeriodIndex - 11
RETURN
    CALCULATE (
        < Some Measure >,
        ALL ( 'Period Table' ),
        'Period Table'[Period Index] >= startPeriod,
        'Period Table'[Period Index] <=curPeriod
    )

 

 

This version removes filters on 'Period Table' then applies filters to the Period Index column (rather than Date column). The original version using FILTER ( ALL ( 'Period Table' ), ... ) is an iteration over the entire 'Period Table' which can be expensive.

Would you be able to post a model diagram, or a PBIX with an empty 'All Order Table', and I can generate a fact table at my end?

 

Repeat Customers calculation itself

Going back to the different DAX options for Repeat Customers, some ideas occurred to me, that I probably should have thought of earlier!

 

Version 5

Uses GENERATE to remove Customers whose first & last order are the same.

If FirstOrder = LastOrder, then EXCEPT ( FirstOrder, LastOrder ) is empty, and that Customer's row won't appear in result.

 

Repeat Customers Version 5 =
VAR RepeatCustomers =
    GENERATE (
        VALUES ( 'All Order Table'[Customer] ),
        VAR FirstOrder =
            FIRSTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
        VAR LastOrder =
            LASTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
        RETURN
            EXCEPT ( FirstOrder, LastOrder )
    )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

Version 6

Use HASONEVALUE to see if there is not exactly one OrderNbr for a given Customer. This might be optimised to stop counting when it knows there are 2+ values.

 

Repeat Customers Version 6 =
VAR RepeatCustomers =
    FILTER (
        VALUES ( 'All Order Table'[Customer] ),
        NOT CALCULATE ( HASONEVALUE ( 'All Order Table'[OrderNbr] ) )
    )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

Version 7

Same as Version 6 but use COUNTROWS (this is really the same logic as Version 3):

 

Repeat Customers Version 7 =
VAR RepeatCustomers =
    FILTER (
        VALUES ( 'All Order Table'[Customer] ),
        NOT CALCULATE ( COUNTROWS ( 'All Order Table' ) ) = 1
    )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

I'm hoping some of this gets us closer to acceptable performance!

 

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

View solution in original post

7 REPLIES 7
Hazenm
Advocate II
Advocate II

We took this offline, and @OwenAuger was able to get my query down to 6 seconds! 

He created a new measure similar to some of the ones above, but first created a new Customer Table, so that the main calculate argument worked off of that table instead of the order table. This, I think, had the largest impact on performance. 
His key notes on changes; 

    • Some key points that appear to help performance:
  1.       Iterate over the Customer dimension to count customers with one order.
  2.       It’s more efficient to count customers with one order than it is to count customers with 2+ orders.
  •       Filtering on Period Index gave better performance than filtering on Date.
    Also creating the list of periods with GENERATESERIES performed better than using <= and >= conditions.
  1.       Applying the Period Table filter once to the overall calculation, not repeatedly in different parts of the measure.

 

Here was the 'winning' formula: 

12 Month Repurchase Rate Optimized =
VAR EndPeriodIndex =
    MAX ( 'Period Table'[Period Index] )
VAR StartPeriodIndex = EndPeriodIndex – 11
-- Create explicit list of 12 Period Index values
VAR PeriodFilter =
    TREATAS (
        GENERATESERIES ( StartPeriodIndex, EndPeriodIndex ),
        'Period Table'[Period Index]
    )
RETURN
    CALCULATE (
        -- Count Customers with exactly one order
        VAR NumSingleOrderCustomers =
            COUNTROWS (
                FILTER (
                    VALUES ( 'Customer'[Customer] ),
                    // Another option that may perform better if customers are "sparse"
                    -- SUMMARIZE ( 'All Order Table', Customer[Customer] ),
                    CALCULATE (
                        COUNTROWS ( 'All Order Table' )
                    ) = 1
                    // Another option that didn't perform as well
                    -- CALCULATE ( HASONEVALUE ( 'All Order Table'[OrderNbr] ) )
                )
            )
        VAR AllCustomers =
            DISTINCTCOUNT ( 'All Order Table'[Customer] )
        -- Subtract to get NumRepeatCustomers
        VAR NumRepeatCustomers = AllCustomers - NumSingleOrderCustomers
        VAR MMT =
            DIVIDE ( NumRepeatCustomers, AllCustomers )
        RETURN
            MMT,
        -- Apply PeriodFilter to above calculation
        REMOVEFILTERS ( 'Period Table' ),
        PeriodFilter
    )


HUGE kudos to @OwenAuger ! Thanks a lot!

Hazenm
Advocate II
Advocate II

Nice, Owen!! Great variations here!

My notes: 

First of all, notes about the period table: 

    CALCULATE (
        < Some Measure >,
        ALL ( 'Period Table' ),
        'Period Table'[Period Index] >= startPeriod,
        'Period Table'[Period Index] <=curPeriod
    )

Typically, instead of the above, which I have used in the past, I have been using

    CALCULATE (
        < Some Measure >,
        DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
    )

 This seems to be slightly faster. I also have been using it when I might have some other filter active that I don't want to strip away, and only want to strip away the date (so avoids using ALL). I tested the two variations above on your version #5, and performance was slightly improved (6.3 seconds vs 7.2 seconds) - that was giving the wrong data, but the efficiency test was still accurate. I'll explain more below

I also want to be clear that I'm also using this for the "all customer" calculation here: 

VAR allCust = 
CALCULATE(
    DISTINCTCOUNT('All Order Table'[Customer]),
    DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
)

When running this on its own, it takes only about 1.5 seconds. So I know there isn't a problem with this part of the code. 


So speaking of the versions, Version 5 was definitely the best yet at 6.3 seconds!
Just to be clear, here is what the code looks like with the modification

VAR RepeatCustomers =
    GENERATE (
        CALCULATETABLE(
            VALUES ( 'All Order Table'[Customer] ),
            DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
        ),
        VAR FirstOrder =
            FIRSTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
        VAR LastOrder =
            LASTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
        RETURN
            EXCEPT ( FirstOrder, LastOrder )
    )
VAR NumRepeat =
    COUNTROWS ( RepeatCustomers )

This took only 6.2 seconds. But then I realized the data was incorrect. Again, because the FIRSTNONBLANK was removing the filter context of DATESBETWEEN. So I had to add another calculate there. This threw us back down to 26 seconds calc time, and it looked like this: 

VAR RepeatCustomers =
    GENERATE (
        CALCULATETABLE(
            VALUES ( 'All Order Table'[Customer] ),
            DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
        ),
        VAR FirstOrder =
        CALCULATETABLE(
            FIRSTNONBLANK ( 'All Order Table'[OrderNbr], 0 ),
            DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
        )
        VAR LastOrder =
        CALCULATETABLE(
            LASTNONBLANK ( 'All Order Table'[OrderNbr], 0 ),
            DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
        )
        RETURN
            EXCEPT ( FirstOrder, LastOrder )
    )
VAR NumRepeat =
    COUNTROWS ( RepeatCustomers )



This version blew my mind a little bit, Owen! I didn't realize you could write VAR INSIDE a calculated virtual table?!?! It really threw me off. Can you help me understand a little bit how this works? 
I think the "GENERATE" calc throws me off a little bit, because I haven't used it much before. Only generate series. 
I see that you first create a one column unique table of customers.  Then my hunch here is that the FirstOrder "VAR" table gets applied at the row level to the unique customers? Or is it creating another table next to the first and applying a filter from the first of the customer? This is throwing me a bit. And then the EXCEPT is working on FirstOrder and LastOrder table, but then it applies the filter back to the customer table? 

Version 6 and Version 7 - these I understand how you're calculating, but again, really smart usage here. Version 6 took about 19 seconds, but 6 was only at 14 seconds. I would not have thought of using calculate (didn't think it was possible) as a filter argument! Genius! Unfortunately, again, I had to add the DATESBETWEEN context to get this to work. And I had to add it to the VALUES table and to the CALCULATE formula. So to be clear, here is the final version:

VAR RepeatCustomers =
    FILTER (
        CALCULATETABLE(
            VALUES ( 'All Order Table'[Customer] ),
            DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
        ),
        NOT CALCULATE ( COUNTROWS ( 'All Order Table' ),DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod) ) = 1
    )
VAR NumRepeat =
    COUNTROWS ( RepeatCustomers )



As for building out a PBIX, happy to do so, but I can't post the files on the forum - it won't let me.. Thought you could. Anyway, I can send you a PM and could email it to you, or if there is another way to upload here, happy to do that as well. 




Hazenm
Advocate II
Advocate II

Hey Owen! Thanks for replying so quickly!
Here are my comments: 
The use of period table = we work on a 4-4-5 Fiscal Period Cal that starts in July. The period table is a reference/calendar table that stores all of the company fiscal weeks, periods, and years.  Yes, one row per day. 

"Ideally, you should have a Date table containing one row per date, with a relationship with 'All Orders Table'. If set up correctly, this will remove the need to construct a date filter using DATESBWEEN, and hopefully improve performance. (There could be something I'm missing here though.)" 
-Yes, this is how this is built, and the relationship is setup. However, because the chart is broken out into periods, each period, because of the relationship, is filtered to that period. 
For example, the X-Axis point at Period 9 of FY22, "202209" is filtering period table (and thus order date table) to only dates in 202209. This means I have to use something like DATESBETWEEN to remove that filter contex, and instead apply the new filter criteria which is defined by the variables you mentioned: 

VAR curPeriodIndex = MIN('Period Table'[Period Index])
(Period index is calculated column on the period table labeling each period incrementally from the first. This saves the virtual tables from having to create some kind of index of periods to subtract the desired number from on the fly. Some time these moving totals are not a full year, but a number of periods. So good for me to have this flexibility pre-built)
VAR startPeriod = curPeriodIndex - 11
VAR endOfPeriod = MAX('Period Table'[Day])
VAR startOfPeriod =
MINX(
   FILTER(
      ALL('Period Table'),
         'Period Table'[Period Index] = startPeriod
      ),
      'Period Table'[Day]
   )

)

So it's finding first day in the period 11 months prior, and last day of current period to get the 12 month range. 
Again, the "DATESBETWEEN" is removing our 'row context' of 202209 (dates only occurrring in FY 2022 Period 9), and applying the new context of all dates in last 12 months. 

 

Okay, so those notes aside, I attempted the variation you supplied, and I'm still getting a 15 second load time. I do appreciate what you've done here though! Counting rows vs counting records, always better. I didn't understand that applying "CALCULATE" to the start of the new summarized column would apply the row context back to the COUNTROWS function. Brilliant! Also appreciate the minor tweak of using DIVIDE instead of "/" - I always forget to do that.

This did give me an idea to attempt removing filter context from PERIOD TABLE completely, then applying filter directly to the Order Table. But that didn't really improve efficiency at all. 

 

 

Hi again @Hazenm 

Thanks for testing it out, and providing further explanation of how your data model is set up.

 

It looks like my suggestions haven't addressed the core performance issue.

 

I think if we can focus on the calculation of "number of repeat customers" in isolation, that might help get to the bottom of the issue. I think the Period/Date filtering is probably not hugely impacting performance.

 

I've come up with some "Repeat Customers" measures to test below, as this seems to be the expensive part of the calculation.

 

Could you create the below measures and see how they perform, either in a test visual grouped by Year/Month or some other dimension(s)?

I was using DAX Studio myself to test these in a sample model, so you could also do something similar.

 

Interested to hear how performance is with these, and whether any of them is an improvement!

 

Regards,

Owen

 

Repeat Customers Version 1

Repeat Customers Version 1 =
// SUMMARIZE by Customer/OrderNbr
VAR CustomerOrder =
    SUMMARIZE (
        'All Order Table',
        'All Order Table'[Customer],
        'All Order Table'[OrderNbr]
    )
// GROUPBY to count orders per customer
VAR CustomerOrderCount =
    GROUPBY (
        CustomerOrder,
        'All Order Table'[Customer],
        "@Orders", SUMX ( CURRENTGROUP (), 1 )
    )
VAR RepeatCustomers =
    FILTER ( CustomerOrderCount, [@Orders] > 1 )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

Repeat Customers Version 2

Repeat Customers Version 2 =
// ADDCOLUMNS / SUMMARIZE / DISTINCTCOUNT to get orders per customer
VAR CustomerOrderCount =
    ADDCOLUMNS (
        SUMMARIZE ( 'All Order Table', 'All Order Table'[Customer] ),
        "@Orders", CALCULATE ( DISTINCTCOUNT ( 'All Order Table'[OrderNbr] ) )
    )
VAR RepeatCustomers =
    FILTER ( CustomerOrderCount, [@Orders] > 1 )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

Repeat Customers Version 3

 

Repeat Customers Version 3 =
// ADDCOLUMNS / SUMMARIZE / COUNTROWS to get orders per customer
VAR CustomerOrderCount =
    ADDCOLUMNS (
        SUMMARIZE ( 'All Order Table', 'All Order Table'[Customer] ),
        "@Orders", CALCULATE ( COUNTROWS ( 'All Order Table' ) )
    )
VAR RepeatCustomers =
    FILTER ( CustomerOrderCount, [@Orders] > 1 )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

Repeat Customers Version 4

(a bit of an experiment - don't expect to perform too well)

Repeat Customers Version 4 =
// SUMMARIZE by Customer/OrderNbr
VAR CustomerOrder =
    SUMMARIZE (
        'All Order Table',
        'All Order Table'[Customer],
        'All Order Table'[OrderNbr]
    )
// Find First Order for each Customer
VAR CustomerFirstOrder =
    GENERATE (
        VALUES ( 'All Order Table'[Customer] ),
        FIRSTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
    )
// Take set difference between CustomerOrder & CustomerFirstOrder
VAR CustomerExceptFirstOrder =
    EXCEPT ( CustomerOrder, CustomerFirstOrder )
VAR RepeatCustomers =
    SUMMARIZE ( CustomerExceptFirstOrder, 'All Order Table'[Customer] )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

Owen!
Thanks so much for putting these together! This was really nice!
Here are the results: 
First off, I had to add the DATESBETWEEN function to each of these variations to allow them to calc the correct number. 

Version One: 
This was the best one. After applying the datesbetween function, it took 9 seconds originally. I made some minor tweaks to try to reduce the load, but I couldn't get it much below 9. For example, the original summarize is not required, because the order table is already one row per order. So I changed the summarize to just: 

CALCULATETABLE(
   'All Order Table',
   DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
)

I thought this would reduce performance further, but no change. I think there might be a way here to get it down, but already, this is much improved from last results. 

VERSION TWO: 
After applying datesbetween, this one took just under 19 seconds

 

VERSION THREE: 

Slightly better performance on this variation, at 13 seconds.


VERSION FOUR: 
I love the creativity in this process. This is what I was trying to come up with when I was originally trying to develop some way to create a NON-DISTINCT formula. 
Unfortunately, as you predicted, this took longer, at 23 seconds. 
I even attempted to play with it a little bit to make it faster. The GENERATE table part of the formula was taking quite abit, so I changed it to this: 

CALCULATETABLE(
   ADDCOLUMNS(
      VALUES('All Order Table'[Customer]),
      "OrderNbr",
      CALCULATE(FIRSTNONBLANK('All Order Table'[OrderNbr],0))
   ),
   DATESBETWEEN('Period Table'[Day],startOfPeriod,endOfPeriod)
)
Interestingly, it was slightly faster on its own, but then applied back to the EXCEPT and SUMMARIZE and COUNTROWS, and it took much longer than the initial calculation. 

My gut tells me this direction is good, and there is some way to massively simplify this idea, but I can't think of what it is. Distinct values of first ordernbr and and EXCEPT and another distinct count of customer.  But maybe this direction is just always going to take more steps. Is there some other function that gets at this via a shorter path that we're not considering? 

So it's either some major tweakage with the last idea, or some further improvements, potentially on the first variation. 
At this point, with the 9 second calculation, I could probably release this and it'll be fine, but I wanted to add a few more variables on the chart that would have added more load time. 
But I also want to see if this performance issue can be cracked!

@Hazenm 

Thanks for testing those out!

 

Good to get an idea of relative performance, and glad some of them are performing better, but it feels as though performance should be much better!

 

Period Table filtering

Looking again at the code you posted earlier to determine the date range to filter, have I understood correctly that 'Period Table' your sole date table?

I think you could improve performance by rewriting the logic to filter 'Period Table' like this:

 

VAR curPeriodIndex =
    -- I would generally prefer MAX rather than MIN
    -- Doesn't affect performance but makes more sense if filtering
    -- on multiple Periods.
    MAX ( 'Period Table'[Period Index] )
VAR startPeriod =
    curPeriodIndex - 11
RETURN
    CALCULATE (
        < Some Measure >,
        ALL ( 'Period Table' ),
        'Period Table'[Period Index] >= startPeriod,
        'Period Table'[Period Index] <=curPeriod
    )

 

 

This version removes filters on 'Period Table' then applies filters to the Period Index column (rather than Date column). The original version using FILTER ( ALL ( 'Period Table' ), ... ) is an iteration over the entire 'Period Table' which can be expensive.

Would you be able to post a model diagram, or a PBIX with an empty 'All Order Table', and I can generate a fact table at my end?

 

Repeat Customers calculation itself

Going back to the different DAX options for Repeat Customers, some ideas occurred to me, that I probably should have thought of earlier!

 

Version 5

Uses GENERATE to remove Customers whose first & last order are the same.

If FirstOrder = LastOrder, then EXCEPT ( FirstOrder, LastOrder ) is empty, and that Customer's row won't appear in result.

 

Repeat Customers Version 5 =
VAR RepeatCustomers =
    GENERATE (
        VALUES ( 'All Order Table'[Customer] ),
        VAR FirstOrder =
            FIRSTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
        VAR LastOrder =
            LASTNONBLANK ( 'All Order Table'[OrderNbr], 0 )
        RETURN
            EXCEPT ( FirstOrder, LastOrder )
    )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

Version 6

Use HASONEVALUE to see if there is not exactly one OrderNbr for a given Customer. This might be optimised to stop counting when it knows there are 2+ values.

 

Repeat Customers Version 6 =
VAR RepeatCustomers =
    FILTER (
        VALUES ( 'All Order Table'[Customer] ),
        NOT CALCULATE ( HASONEVALUE ( 'All Order Table'[OrderNbr] ) )
    )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

Version 7

Same as Version 6 but use COUNTROWS (this is really the same logic as Version 3):

 

Repeat Customers Version 7 =
VAR RepeatCustomers =
    FILTER (
        VALUES ( 'All Order Table'[Customer] ),
        NOT CALCULATE ( COUNTROWS ( 'All Order Table' ) ) = 1
    )
VAR NumRepeatCustomers =
    COUNTROWS ( RepeatCustomers )
RETURN
    NumRepeatCustomers

 

 

I'm hoping some of this gets us closer to acceptable performance!

 

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn
OwenAuger
Super User
Super User

Hi @Hazenm 

The logic of the calcuation looks good, but there should be room to improve performance here.

 

  • First of all: the use of 'Period Table' seems unusual - is it a Date table with one row per date, and does it have a relationship with 'All Orders Table'?
    In your code above, it is not clear where startOfPeriod and endOfPeriod are defined - where do they come from? I'm assuming they are the bounds of the 12 month period.
    Ideally, you should have a Date table containing one row per date, with a relationship with 'All Orders Table'. If set up correctly, this will remove the need to construct a date filter using DATESBWEEN, and hopefully improve performance. (There could be something I'm missing here though.)
  • Leaving the 'Period Table' as-is for now, and assuming startOfPeriod & endOfPeriod are defined somehow, we can improve performance by minimizing the number operations on 'All Order Table', and using since we want a row count, COUNTROWS is preferable to COUNT of a particular column. We still need to user FILTER to identify repeat customers - I can't really see any alternative to that.

 

 

Repurchase Rate =
VAR CustomerNumOrders =
    CALCULATETABLE (
        ADDCOLUMNS (
            SUMMARIZE ( 'All Order Table', 'All Order Table'[Customer] ),
            "@NumOrders", CALCULATE ( COUNTROWS ( 'All Order Table' ) )
        ),
        DATESBETWEEN ( 'Period Table'[Day], startOfPeriod, endOfPeriod )
    )
VAR Customers =
    COUNTROWS ( CustomerNumOrders )
VAR RepeatCustomers =
    COUNTROWS ( FILTER ( CustomerNumOrders, [@NumOrders] > 1 ) )
VAR MMT =
    DIVIDE ( RepeatCustomers, Customers )
RETURN
    MMT

 

 

 Does something like this work, and how is performance?

 

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors