Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
SimonVallejoV
Frequent Visitor

DAX Measure Optimization and Correction - Avg. SKUs per Invoice - Filter/RelatedTable/DistinctCount

Hello everyone! First of all thank you very much for taking the time to read this question, is is about both a DAX measure optimization and correction.

 

 

1. I have a data model that consist of two tables, dimCustomer and factSales. In reality dimCustomer has around 800K rows and factSales around 60M rows. It looks something like this
sample_model_1.png

 if we carefully look at the dimCustomer table, we can see that for each customer_code, I might have several customer_ids, this is because a customer_id is based on the different combinations of customer_code and customer_name, but in my company we focus on customer_code.

sample_model_2.png

 For the factSales table, it is transactional information of invoices and returns. Some invoices might have multiple lines for the same SKU (Purple Cells), this is because it was a gift or a different price for a certain amount of items, it just happens. (I can't group my fact table to have only one row per sku inside each invoice)

The KPI I'm trying get is the Average SKUs per Invoice. We have two business rules in my company: invoice_type = "INVOICE" and overall customer_code sum of sales >0, this means that if one customer_code sum of sales is 0 or lower because of returns in a certain period of time, I dont want to take him into account, nor count the SKUs he returned.

The calculation would be SUM of DISTINCT SKUs inside each invoice, divided by distinct count of invoices, (not leaving aside the two business rules)

The distinct invoices I have it, my problem is with the 'Sum of Distinct SKUs inside each invoice' part.

(I'm attaching a sample workbook with the data in the pictures and the loaded measures.)

I already have a measure that works, but with the whole dataset it is not performing well, it even shows an error of not enought resources to compute measure kind of type.

This is the working measure (Slow with whole dataset):
my idea is,
1. vTable0 = filter the fact table with the 4 fields I need, only invoice_type = "INVOICE" (Part that I guess is the most resource consuming)

2.  vTable_1 = get a list of customer_code where sum of sales >0
3. vTable_2 = filter  vTable0 based on the customer_code present in vTable_1

4. count rows of resulting vTable_2

 

 

Measure_Working_But_Slow = 

VAR vTable_0 =
    FILTER(
        SUMMARIZE(
            factSales,
            dimCustomer[customer_code],
            factSales[invoice_number],
            factSales[invoice_type],
            factSales[SKU]),
        factSales[invoice_type]="INVOICE"
    )

VAR vTable_1 =
    FILTER(
        SUMMARIZE(
            dimCustomer,
            dimCustomer[customer_code]
        ),
    CALCULATE(SUM(factSales[row_value]))>0
    )

VAR vTable_2 =
    FILTER(
        vTable_0,
        dimCustomer[customer_code] IN vTable_1
    )

RETURN
    COUNTROWS(vTable_2)

 

 

 

This is the Not working measure (very close to the result and fast)

I got this idea from SQLBI relatedtable youtube video which performs like a charm. this is the idea

 

1. Filter customer_code based on sum of sales > 0

2. SUMX of countrows of related table where invoice_type = "INVOICE"

The error is that as in the same invoice we can have two rows with the same SKU, those two are counted, but I need to count only one row per unique SKU inside each invoice. So if the invoice has this phenomenon, the result wont be right.

 

Measure_Not_Working = 

VAR vTable_0 = 
    FILTER(
        VALUES(dimCustomer[customer_code]),
        CALCULATE(SUM(factSales[row_value])) > 0
    )

VAR vTable_1 = 
    SUMX(
        vTable_0,
        COUNTROWS(
            FILTER(
                RELATEDTABLE(factSales),
                factSales[invoice_type] = "INVOICE"
            )
        )
    )

RETURN vTable_1

 

 

This is kind of the logic that the measure has to follow to give the right result.  For only customer codes that have a total sum of sales greater than 0 and for invoices which type is "INVOICE", we count distinct skus inside the invoice.

 

Also, the measure has to work in a linechart too. not only in card. (did some experiments with summarizecolumns that did not work for line charts)

sample_model_3.png

 

I've wondered all around to see if there is a way to filter that related table before passing it to the count rows... That would be a good thing but I have not been able to figure out a way. 

Thanks again for taking the time.!!! 

 

PS: I just saw that i cant attach the pbix file directly as I am kind of new in the community.
WeTransfer link to download pbix file or xlsx file with tables

We Transfer Link (https://we.tl/t-lcrrUEigxJ) 

or tables here:

customer_idcustomer_codecustomer_name

1C_001Tenis Store - Miami
2C_001Tenis Store - Orlando
3C_002Shirt Store - Dallas
4C_002Shirt Store - New York
5C_002Shirt Store - Tampa
6C_003Surf Store

 

invoice_datecustomer_idinvoice_numberinvoice_typeSKUrow_value

1/01/20231INV_001INVOICE1001100
1/01/20231INV_001INVOICE1002200
1/01/20231INV_001INVOICE1003300
1/01/20231INV_001INVOICE1003100
2/01/20232INV_002INVOICE1001200
2/01/20232INV_002INVOICE1002300
2/01/20232INV_002INVOICE1003100
3/01/20233INV_003INVOICE1001200
3/01/20233INV_003INVOICE1002300
3/01/20233INV_003INVOICE1003100
3/01/20233INV_003INVOICE1004200
3/01/20233INV_003INVOICE1004300
4/01/20234RET_001RETURN1001-100
4/01/20234RET_001RETURN1002-200
5/01/20236INV_004INVOICE1001100
6/01/20236RET_002RETURN1001-100

 

4 REPLIES 4
gmsamborn
Super User
Super User

Hi @SimonVallejoV 

In your slow measure where your are defining the vTable_2 variable, would it make any difference to remove 'dimCustomer' from 'dimCustomer'[customer_code] ?

 

I'm not sure if would make a difference but maybe it's worth a try.

Thanks for the idea! Unfortunately, I tried both removing dimCustomer from dimCustomer[customer_code] in vTable_2 and factSales from factSales[invoice_type] in vTable_0 but performance is basically the same, for a filter applied on a lot of rows this is the result.
with table name = 37588 ms
vs

without table name 36548 ms.

@SimonVallejoV 

If I think of anything else, I'll let you know.  I'm going to watch the thread because I'd like to find out what other ideas might come up.

Good Luck

Thank you!! I will still work on it on my side and post if anything changes.

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.