Solved: Conditional deduplication based on multiple column...

Anonymous · ‎11-04-2021

Hi everyone,

I need to write a very specific deduplication query and I am wondering if you can help me out. We have an ecommerce site and I am getting a daily CSV file that has transactional information for each day, and I have built a few Power BI reports based on that dataset. Over time, in a certain situation, parts of a transaction can get double counted. So here's what I am trying to do.

Within a Master Order Number, I want to check for duplicate Item IDs. If an Item ID is duplicated within a Master Order Number, its sales should NOT be counted IF the Invoice Date is blank. So in the data below, the rows I highlighted should not be counted.

I am scratching my head on figuring out how to do this, so any insight or suggestions are appreciated! I have created a RANKX function that looks within a Master Order # and will rank the ITEM IDs. But, I am not sure what to do with that info because I'd still have to write a query that checks if the "Check for dupes" column is equal, and then find the Item ID that has a blank Invoice Number. So I don't think that has done me any good.

Thanks in advance for your wisdom!

Anonymous · ‎11-10-2021

Hi @Anonymous

Try this code.

Remove Duplicate =
SUMMARIZE (
    FILTER (
        'Table',
        IF (
            VAR _COUNTROW =
                CALCULATE (
                    COUNTROWS ( 'Table' ),
                    FILTER (
                        ALL ( 'Table' ),
                        'Table'[Master Order Number] = EARLIER ( 'Table'[Master Order Number] )
                            && 'Table'[Item ID] = EARLIER ( 'Table'[Item ID] )
                    )
                )
            RETURN
                _COUNTROW > 1,
            'Table'[RANK] <> 1
                || 'Table'[Invoice Date] <> BLANK (),
            'Table'[RANK] = 1
        )
    ),
    'Table'[Order Number],
    'Table'[Master Order Number],
    'Table'[Order Date],
    'Table'[Invoice Date],
    'Table'[Item ID]
)

Result is as below.

Best Regards,
Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

lbendlin · ‎11-05-2021

Deduplication is much easier to do in Power Query (unless it depends on user filters).

Please provide sanitized sample data that fully covers your issue. Paste the data into a table in your post or use one of the file services. Please show the expected outcome.

Anonymous · ‎11-08-2021

@lbendlin thanks for the reply.

Table 1: Sample of data before filter showing all the order IDs within Master Order # 10034858.

Order Number	Master Order Number	Order Date	Invoice Date	Item ID
10034858	10034858	7/12/2021		CA-ST-EA
10034858	10034858	7/12/2021		CA-BB-CS
10034858	10034858	7/12/2021		CC-USR-04
10034858	10034858	7/12/2021		CC-UPTP-04
20405998	10034858	7/12/2021		FP-20EHH-CS
20405999	10034858	7/12/2021		3M-SGCUP-01
10034858	10034858	7/12/2021	7/13/2021	CC-USR-04
10034858	10034858	7/12/2021	7/13/2021	CC-UPTP-04
10034858	10034858	7/12/2021	7/13/2021	CA-ST-EA
10034858	10034858	7/12/2021	7/13/2021	CA-BB-CS
20405998	10034858	7/12/2021	7/13/2021	FP-20EHH-CS
20405999	10034858	7/12/2021	7/13/2021	3M-SGCUP-01

Table 2: Data after filter applied; duplicate order IDs within Master Order # 10034858 WHERE Invoice Date = NULL have been removed.

(Remember I want this logic to look within all Master Order #s. I have just filtered down to one Master Order # to simplify the example output.)

Order Number	Master Order Number	Order Date	Invoice Date	Item ID
10034858	10034858	7/12/2021	7/13/2021	CC-USR-04
10034858	10034858	7/12/2021	7/13/2021	CC-UPTP-04
10034858	10034858	7/12/2021	7/13/2021	CA-ST-EA
10034858	10034858	7/12/2021	7/13/2021	CA-BB-CS
20405998	10034858	7/12/2021	7/13/2021	FP-20EHH-CS
20405999	10034858	7/12/2021	7/13/2021	3M-SGCUP-01

Thanks again.

Anonymous · ‎11-09-2021

Hi @Anonymous

Try this rank code.

RANK = 
RANKX (
    FILTER (
        'Table',
        'Table'[Order Number] = EARLIER ( 'Table'[Order Number] )
            && 'Table'[Master Order Number] = EARLIER ( 'Table'[Master Order Number] )
            && 'Table'[Item ID] = EARLIER ( 'Table'[Item ID] )
            && 'Table'[Order Date] = EARLIER ( 'Table'[Order Date] )
    ),
    'Table'[Invoice Date],
    ,
    ASC
)

This code will create a rank based on each [Order Number],[Master Order Number],[Order Date] and [Item ID] group.

Then create a new dax table to remove rank =1.

Remove Duplicate = 
SUMMARIZE(FILTER('Table','Table'[RANK]<>1),'Table'[Order Number],'Table'[Master Order Number],'Table'[Order Date],'Table'[Invoice Date],'Table'[Item ID])

Result is as below.

Or you can calcualte sum directly by add rank<>1 in your table filter. You don't need to create a new table.

Such as:

Measure = CALCULATE(SUM([Sales],Filter('Table',[Rank]<>1))

Best Regards,
Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎11-09-2021

Hi @Anonymous, thank you so much for the reply and for taking the time to help me out. I think this gets me closer but it's not quite what I need. I may not have explained it very clearly, but I only want to suppress rows when the following conditions are met:

There are duplicate Item IDs within a Master Order # AND those duplicate Item IDs contain a blank/NULL invoice date.

So consider the screenshot below. The ones highlighted in green should be removed, and the white ones left alone, which your script did perfectly.

But the yellow ones I highlighted, which are part of a different Master Order #, should not be removed because they do not have duplicate Item IDs nor do they have a blank Invoice Date. So I don't want to remove the yellow rows. Does this help clarify what I am asking for?

Anonymous · ‎11-09-2021

Hi @Anonymous

Add a filter in calculated table code.

Remove Duplicate =
SUMMARIZE (
    FILTER ( 'Table', OR ( 'Table'[RANK] <> 1, 'Table'[Invoice Date] <> BLANK () ) ),
    'Table'[Order Number],
    'Table'[Master Order Number],
    'Table'[Order Date],
    'Table'[Invoice Date],
    'Table'[Item ID]
)

Result is as below.

Best Regards,
Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎11-10-2021

@Anonymous -- even closer but one thing is still missing. I still want to count orders that were placed but not yet invoiced. Only when there were duplicate Item IDs within a Master Order # that had blank Invoice Dates did I want to discard them from the total.

Consider the Master Order # below, #10007106. It does not have an invoice date (yet) but does not have any duplicate Item IDs. In other words, it has not yet been invoiced but has not been double counted. It should still be counted in total sales (I created a separate metric for invoiced sales).

So how could I modify either the deduplication table or the rank column to account for this? Again, thanks for all your help and for sticking with me on this one.

Anonymous · ‎11-10-2021

Hi @Anonymous

Try this code.

Remove Duplicate =
SUMMARIZE (
    FILTER (
        'Table',
        IF (
            VAR _COUNTROW =
                CALCULATE (
                    COUNTROWS ( 'Table' ),
                    FILTER (
                        ALL ( 'Table' ),
                        'Table'[Master Order Number] = EARLIER ( 'Table'[Master Order Number] )
                            && 'Table'[Item ID] = EARLIER ( 'Table'[Item ID] )
                    )
                )
            RETURN
                _COUNTROW > 1,
            'Table'[RANK] <> 1
                || 'Table'[Invoice Date] <> BLANK (),
            'Table'[RANK] = 1
        )
    ),
    'Table'[Order Number],
    'Table'[Master Order Number],
    'Table'[Order Date],
    'Table'[Invoice Date],
    'Table'[Item ID]
)

Result is as below.

Best Regards,
Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎11-15-2021

Thank you very much @Anonymous this has done the trick. Appreciate your expertise and assistance.

Conditional deduplication based on multiple columns

Helpful resources

Power BI Dataviz World Championships

Join our Community Sticker Challenge 2026

Power BI Monthly Update - January 2026

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Conditional deduplication based on multiple columns

Helpful resources

Power BI Dataviz World Championships

Join our Community Sticker Challenge 2026

Power BI Monthly Update - January 2026

FabCon Atlanta 2026

How to Get Your Question Answered Quickly