Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more

Reply
Anonymous
Not applicable

Deleting occurences based on the number of times it occured

Hi all,

 

I have a problem with one of my datasets I try to solve.

In my dataset I have the customerd IDs of all the customers I have reached out to. Unfortunately, in my dataset it looks like some customer IDs are shown 3 times. But I only want to show the first two occurences of a specific customer ID (based on date). How do I get this?

For example. My dataset looks like this. 

IDDate of mail
A

10-03-2020

B10-03-2020
C10-03-2020
A11-03-2020
B11-03-2020
C11-03-2020
A12-03-2020
B12-03-2020

 

In this table, I would like to delete the last A and B since it is the third time these customer IDs occur in my table. Does anyone know how to make this happen?

Best,
Jeroen

 

1 ACCEPTED SOLUTION

Hi @Anonymous ,

 

For the dax solution, we simplify the formula, could you please try to use the following one?

 

Filtered Table = 
FILTER (
    'Table',
    VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2
)

 

If it still does not work, could you please try to verify how much time it will take if create a calculated column in the origin table?

Calculated Column = 
VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2

 

For the Power Query Solution, what "query change is already up to 6 GB" mean ? If you mean the size of data is 6GB, when apply filter on the origin table, it will reduce the size of it. But when apply a dax solution, the origin table will keep 6 GB and generate a new calculated table with some data, for example 2GB if there are 4 GB filtered by dax, then the final size of dataset will be 6+2=8GB if use dax solution.

 

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

13 REPLIES 13
v-lid-msft
Community Support
Community Support

Hi @Anonymous ,

 

We can try to use Table.SelectRows function in Power Query Editor to meet your requirement:

 

Table.SelectRows(#"Changed Type",each let i = [ID],t = Table.Buffer(Table.SelectRows(#"Changed Type",each [ID] = i)),minDateList =  Table.MinN(t,"Date of mail",2)
in Table.RowCount(Table.Distinct(Table.SelectColumns(t,{"Condition"})))=1 and Table.Contains(minDateList,_))

 

please change #"Changed Type" to the last step name of your query

 

All the queries are here:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUTI00DUw1jUyMDIAcZRidaKVnLALO2MXBhtiiN0QTGFndGEjhCFG2A1BE44FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, #"Date of mail" = _t, Condition = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}, {"Date of mail", type date}, {"Condition", Int64.Type}}),
    FilterTable = Table.SelectRows(#"Changed Type",each let i = [ID],t = Table.Buffer(Table.SelectRows(#"Changed Type",each [ID] = i)),minDateList =  Table.MinN(t,"Date of mail",2)
in Table.RowCount(Table.Distinct(Table.SelectColumns(t,{"Condition"})))=1 and Table.Contains(minDateList,_))
in
    FilterTable

 

4.jpg5.jpg


Best regards,

 

 

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

@v-lid-msft thanks for the solution. Due to the size of the different tables I rather fix this in DAX than in PowerQuery. Do you know how to fix this with DAX as well?

 

Best,

Jeroen

 

Hi @Anonymous ,

 

We can also create a calculated table by using following dax to meet your requirement, the second parameter can also be used to create a calculated column in the origin table.

 

Filtered Table =
FILTER (
    'Table',
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[Condition] ),
        FILTER ( 'Table', 'Table'[ID] = EARLIER ( 'Table'[ID] ) )
    ) = 1
        && CALCULATE (
            COUNTROWS ( 'Table' ),
            FILTER (
                'Table',
                'Table'[ID] = EARLIER ( 'Table'[ID] )
                    && 'Table'[Date of mail] <= EARLIER ( 'Table'[Date of mail] )
            )
        ) <= 2
)

 

27.jpg


Best regards,

 

 

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-lid-msft ,

 

Thanks for your help. I'm getting close. However, I do not want to filter on "condition = 1", but I want to filter out customers where the conditions are not equal for the two occurences. 

Do you know how I can change this?

Best,
Jeroen

 

Hi @Anonymous ,

 

The "=1" in the formula does not mean condition equal to 1, it means there are only one unique value in the condition for each id. Does the following example meet your requirement?

 

1.jpg2.jpg

 


Best regards,

 

 

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-lid-msft ,

 

I see. My bad. Thanks for your help once again. 

 

I tried to use the DAX formula, but after multiple hours I still saw the screen which said "Working on it". Therefore, I'm now trying to do this in PowerQuery, but now my query change is already up to 6 GB.

 

Any tips on this part?

 

Best,

Jeroen

Hi @Anonymous ,

 

For the dax solution, we simplify the formula, could you please try to use the following one?

 

Filtered Table = 
FILTER (
    'Table',
    VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2
)

 

If it still does not work, could you please try to verify how much time it will take if create a calculated column in the origin table?

Calculated Column = 
VAR i = [ID]
    VAR d = [Date of mail]
    RETURN
        CALCULATE ( DISTINCTCOUNT ( 'Table'[Condition] ), 'Table', 'Table'[ID] = i ) = 1
        && CALCULATE ( COUNTROWS ( 'Table' ), 'Table', 'Table'[ID] = i, 'Table'[Date of mail] <= d ) <= 2

 

For the Power Query Solution, what "query change is already up to 6 GB" mean ? If you mean the size of data is 6GB, when apply filter on the origin table, it will reduce the size of it. But when apply a dax solution, the origin table will keep 6 GB and generate a new calculated table with some data, for example 2GB if there are 4 GB filtered by dax, then the final size of dataset will be 6+2=8GB if use dax solution.

 

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-lid-msft ,

 

I managed to make it work. Thanks for your help. 

 

Regarding the "query change is already up to 6 GB" comment; after editing the query with the PowerQuery statements, my query turned out to be really big as of data size. Normally, if I change this query, it takes up to 1 GB and then the changes are done. But after the statements you provided, it took up to a minimum of 40 GB (and then I cancelled the changes). 

 

But anyway, in the end I managed to make it work thanks to your help. Thanks once again. 

 

Best,

Jeroen

Hi @Anonymous ,


Glad to hear that you have resolved your problem. If it does not contain any confidential information, could you kindly please share the improved code you used to filter the table so that it can benefit more users?

 

If you have any other questions about this scenario, please kindly ask here and we will try to resolve it.


Best regards,

 

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Greg_Deckler
Super User
Super User

Well, you could create a column like this:

 

Count = COUNTROWS(FILTER('Table',[ID] = EARLIER([ID]) && [Date of mail] <= EARLIER([Date of mail]))

 

Then you could create a new table and FILTER anything over 2 in Count.

 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler It worked! Thank you very much! 

 

I have a subsequent question I hope you can help me with as well. My dataset has another error. Everyone occurs twice now, but each should have the same condition for both occurences (either 1 or 2). However, this is not the case. My data looks like this now:

 

IDDate of mailCondition
A10-03-20201
B10-03-20201
C10-03-20201
A11-03-20201
B11-03-20201
C11-03-20202

 

In this case, I would like to delete C because the condition is not the same for both occurences. I want to keep A and B. 

 

Do you know how to make this work in the same table as the one I have created using your previous solution?

I hope you are able to help.

 

Best,

Jeroen

 

@Anonymous ,

 

Hmm... Let's backup because I am not getting the same results as you. Have a look at the attached PBIX. 

 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler 
Thanks for the file. The error of the different conditions was already in the dataset, so it is not something new that has occured after your solution. It is just that I have found out that the error was there after your solution, but it has been there since the beginning. It's just an error in the dataset due to human errors. 

 

So your solution worked, and I used it correctly, but I am just wondering how I can delete the other error in the dataset (with the conditions) as well with an end result of 1 correct table. 

 

Do you understand what I mean?

 

Best,

Jeroen

 

Helpful resources

Announcements
March PBI video - carousel

Power BI Monthly Update - March 2025

Check out the March 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors