Solved: Deleting occurences based on the number of times i...

Anonymous · ‎03-16-2020

Hi all,

I have a problem with one of my datasets I try to solve.

In my dataset I have the customerd IDs of all the customers I have reached out to. Unfortunately, in my dataset it looks like some customer IDs are shown 3 times. But I only want to show the first two occurences of a specific customer ID (based on date). How do I get this?

For example. My dataset looks like this.

ID	Date of mail
A	10-03-2020
B	10-03-2020
C	10-03-2020
A	11-03-2020
B	11-03-2020
C	11-03-2020
A	12-03-2020
B	12-03-2020

In this table, I would like to delete the last A and B since it is the third time these customer IDs occur in my table. Does anyone know how to make this happen?

Best,
Jeroen

v-lid-msft · ‎03-18-2020

Hi @Anonymous ,

For the dax solution, we simplify the formula, could you please try to use the following one?

Filtered Table = 
FILTER (
    'Table',
    VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2
)

If it still does not work, could you please try to verify how much time it will take if create a calculated column in the origin table?

Calculated Column = 
VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2

For the Power Query Solution, what "query change is already up to 6 GB" mean ? If you mean the size of data is 6GB, when apply filter on the origin table, it will reduce the size of it. But when apply a dax solution, the origin table will keep 6 GB and generate a new calculated table with some data, for example 2GB if there are 4 GB filtered by dax, then the final size of dataset will be 6+2=8GB if use dax solution.

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

v-lid-msft · ‎03-16-2020

Hi @Anonymous ,

We can try to use Table.SelectRows function in Power Query Editor to meet your requirement:

Table.SelectRows(#"Changed Type",each let i = [ID],t = Table.Buffer(Table.SelectRows(#"Changed Type",each [ID] = i)),minDateList =  Table.MinN(t,"Date of mail",2)
in Table.RowCount(Table.Distinct(Table.SelectColumns(t,{"Condition"})))=1 and Table.Contains(minDateList,_))

please change #"Changed Type" to the last step name of your query

All the queries are here:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUTI00DUw1jUyMDIAcZRidaKVnLALO2MXBhtiiN0QTGFndGEjhCFG2A1BE44FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, #"Date of mail" = _t, Condition = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}, {"Date of mail", type date}, {"Condition", Int64.Type}}),
    FilterTable = Table.SelectRows(#"Changed Type",each let i = [ID],t = Table.Buffer(Table.SelectRows(#"Changed Type",each [ID] = i)),minDateList =  Table.MinN(t,"Date of mail",2)
in Table.RowCount(Table.Distinct(Table.SelectColumns(t,{"Condition"})))=1 and Table.Contains(minDateList,_))
in
    FilterTable

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎03-17-2020

@v-lid-msft thanks for the solution. Due to the size of the different tables I rather fix this in DAX than in PowerQuery. Do you know how to fix this with DAX as well?

Best,

Jeroen

v-lid-msft · ‎03-17-2020

Hi @Anonymous ,

We can also create a calculated table by using following dax to meet your requirement, the second parameter can also be used to create a calculated column in the origin table.

Filtered Table =
FILTER (
    'Table',
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[Condition] ),
        FILTER ( 'Table', 'Table'[ID] = EARLIER ( 'Table'[ID] ) )
    ) = 1
        && CALCULATE (
            COUNTROWS ( 'Table' ),
            FILTER (
                'Table',
                'Table'[ID] = EARLIER ( 'Table'[ID] )
                    && 'Table'[Date of mail] <= EARLIER ( 'Table'[Date of mail] )
            )
        ) <= 2
)

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎03-17-2020

Hi @v-lid-msft ,

Thanks for your help. I'm getting close. However, I do not want to filter on "condition = 1", but I want to filter out customers where the conditions are not equal for the two occurences.

Do you know how I can change this?

Best,
Jeroen

v-lid-msft · ‎03-17-2020

Hi @Anonymous ,

The "=1" in the formula does not mean condition equal to 1, it means there are only one unique value in the condition for each id. Does the following example meet your requirement?

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎03-18-2020

Hi @v-lid-msft ,

I see. My bad. Thanks for your help once again.

I tried to use the DAX formula, but after multiple hours I still saw the screen which said "Working on it". Therefore, I'm now trying to do this in PowerQuery, but now my query change is already up to 6 GB.

Any tips on this part?

Best,

Jeroen

v-lid-msft · ‎03-18-2020

Hi @Anonymous ,

For the dax solution, we simplify the formula, could you please try to use the following one?

Filtered Table = 
FILTER (
    'Table',
    VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2
)

If it still does not work, could you please try to verify how much time it will take if create a calculated column in the origin table?

Calculated Column = 
VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2

For the Power Query Solution, what "query change is already up to 6 GB" mean ? If you mean the size of data is 6GB, when apply filter on the origin table, it will reduce the size of it. But when apply a dax solution, the origin table will keep 6 GB and generate a new calculated table with some data, for example 2GB if there are 4 GB filtered by dax, then the final size of dataset will be 6+2=8GB if use dax solution.

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎03-23-2020

Hi @v-lid-msft ,

I managed to make it work. Thanks for your help.

Regarding the "query change is already up to 6 GB" comment; after editing the query with the PowerQuery statements, my query turned out to be really big as of data size. Normally, if I change this query, it takes up to 1 GB and then the changes are done. But after the statements you provided, it took up to a minimum of 40 GB (and then I cancelled the changes).

But anyway, in the end I managed to make it work thanks to your help. Thanks once again.

Best,

Jeroen

v-lid-msft · ‎03-23-2020

Hi @Anonymous ,

Glad to hear that you have resolved your problem. If it does not contain any confidential information, could you kindly please share the improved code you used to filter the table so that it can benefit more users?

If you have any other questions about this scenario, please kindly ask here and we will try to resolve it.

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Greg_Deckler · ‎03-16-2020

Well, you could create a column like this:

Count = COUNTROWS(FILTER('Table',[ID] = EARLIER([ID]) && [Date of mail] <= EARLIER([Date of mail]))

Then you could create a new table and FILTER anything over 2 in Count.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎03-16-2020

@Greg_Deckler It worked! Thank you very much!

I have a subsequent question I hope you can help me with as well. My dataset has another error. Everyone occurs twice now, but each should have the same condition for both occurences (either 1 or 2). However, this is not the case. My data looks like this now:

ID	Date of mail	Condition
A	10-03-2020	1
B	10-03-2020	1
C	10-03-2020	1
A	11-03-2020	1
B	11-03-2020	1
C	11-03-2020	2

In this case, I would like to delete C because the condition is not the same for both occurences. I want to keep A and B.

Do you know how to make this work in the same table as the one I have created using your previous solution?

I hope you are able to help.

Best,

Jeroen

Greg_Deckler · ‎03-16-2020

@Anonymous ,

Hmm... Let's backup because I am not getting the same results as you. Have a look at the attached PBIX.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎03-17-2020

@Greg_Deckler
Thanks for the file. The error of the different conditions was already in the dataset, so it is not something new that has occured after your solution. It is just that I have found out that the error was there after your solution, but it has been there since the beginning. It's just an error in the dataset due to human errors.

So your solution worked, and I used it correctly, but I am just wondering how I can delete the other error in the dataset (with the conditions) as well with an end result of 1 correct table.

Do you understand what I mean?

Best,

Jeroen

Deleting occurences based on the number of times it occured

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta

Deleting occurences based on the number of times it occured

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

How to Get Your Question Answered Quickly