Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
olumidef
Regular Visitor

REMOVING DUPLICATES AND MERGING QUERIES

Hi All, 

I am hoping i could get some help, 

I have two set of queries on Power Query and for query 1 ( I have duplicates data due to entries on different date), i have sorted the date  in descending order cause i would like to keep the latest date, i removed duplicates (using customer number ) after sorting and then proceed to merge query 1 with query 2 using a uniques value (customer number). To my amusement , the merge query is returning values in query 1 that was meant to have removed through duplicates down 

So say Customer A has Unique number 5 and has dates pn the 31/10/2024 and 21/10/2024 when i remove the duplicates , i could only see 31/10/2024 values (which is fine) but upon merging with query 2 , i am seeing values of 21/10/2024 been returned 

How can i resolve this please 

Thanks

1 ACCEPTED SOLUTION
Omid_Motamedise
Super User
Super User

on the steps, after sorting data, a new step will added into your query steps which its formula starts with Table.Sort(....), befor removing the duplicteas, revise this formula by addin Table.Buffer befor it and rewrite the formula as
Table.Buffer(Table.Sort(....))

then remove duplicat. your peoblem become solved


If my answer helped solve your issue, please consider marking it as the accepted solution.

View solution in original post

14 REPLIES 14
Anonymous
Not applicable

Hi @olumidef ,

 

I tried to reproduce the whole step and it works properly in my case.

vcgaomsft_0-1730685351143.png

Could you check “Query1” again to make sure that the last result it returns does not contain the date 21/10/2024?

Here's my test file.

 

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum

Omid_Motamedise
Super User
Super User

on the steps, after sorting data, a new step will added into your query steps which its formula starts with Table.Sort(....), befor removing the duplicteas, revise this formula by addin Table.Buffer befor it and rewrite the formula as
Table.Buffer(Table.Sort(....))

then remove duplicat. your peoblem become solved


If my answer helped solve your issue, please consider marking it as the accepted solution.

Thank you @Omid_Motamedise , the buffer function perfectly solves the problem 

You are welcome


If my answer helped solve your issue, please consider marking it as the accepted solution.

Hi @Omid_Motamedise , I see you introduce the Table.Buffer in the solution. Can you explain why that works? In my mind it is only a performance function and does nothing functionally.

Hi, it is true that Table.Buffer is mostely for performance but in this case, it can be used. Power query do not run the step, after each other in the order you create and it try to find the best order. so consider the below two steps


S0=Source,

S1=Table.Sort(S0,....),

S2=Table.Distinct(S1,....) 

in thiss case, it seems that S1 is neglected and the duplicate rows are removed befor sorting, adding Table.Buffer in this step, forced power query to sort the rows and remove duplicate. 

this problem can be solved in another way by combining the S1 and S2 and rewrite it as 

S0=Source,

Srev=Table.Distinct(Table.Sort(S0,....),....) 


If my answer helped solve your issue, please consider marking it as the accepted solution.

Ok. But wouldn't that be a bug in powerquery? Your 2 examples should be equivalent. In both cases S0 should be evaluated first because S1 depends on it...

It might be but not in all the case, consider this example
S0=Source,
S1= Table.Addcolumn(S0,"x",....),
S2=Table.Group(S1,"A", {"m",Table.RowCount})


it is true that s2 is calculated based on S1, but its result does not dependent to the result of S1, if you use S0 also in this function, the result would be the same (adding a new column, does not changed the result of counting the rows)


If my answer helped solve your issue, please consider marking it as the accepted solution.
PwerQueryKees
Super User
Super User

Sorry, can't help you then...

PwerQueryKees
Super User
Super User

PLease share your actual Copy/Pasted M code. The code you shared has syntax errors and non-existing colum names.... So not possible to debug. 

The process is an applied steps on power query and those are the code relating to the steps in questions 

Due to data privacy issue , unable to share the data/references 

Not certain what the syntax error might be also

hi @olumidef ,

 

I understand data privacy issues.

 

If you could provide a sample input and output, masking sensitive information in a usable format ( tables, excel, csv, gdrive/ onedrive etc. link to pbix ) , would be happy to support.

 

 

for reference:

https://community.fabric.microsoft.com/t5/DAX-Commands-and-Tips/How-to-Get-Your-Question-Answered-Qu... 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a Kudos 🙂
Kind Regards,
Avinash
PwerQueryKees
Super User
Super User

Please share your M code and examples of your data.

 

I am unable to provide the sample data due to privacy issue 
However, i have created a replica as shown below for QUERY 1

Ac NoCustomer NamePINSDEDCreated 
1A000131/10/202427/02/202731/10/2024
2B000207/11/202406/11/202731/10/2024
3C000330/10/202431/12/999831/10/2024
4D000401/01/202531/12/202531/10/2024
4D000401/09/202431/12/999821/10/2024
5E000502/07/202501/07/202831/10/2024
5E000508/08/202422/09/202422/10/2024
6F000601/10/202431/12/999831/10/2024
6F000614/08/202430/10/202426/10/2024


The data is first sorted by created to show newest to oldest 
and duplucate is removed  for which it returns the below 

Ac NoCustomer NamePINSDEDCreated 
1A000131/10/202427/02/202731/10/2024
2B000207/11/202406/11/202731/10/2024
3C000330/10/202431/12/999831/10/2024
4D000401/01/202531/12/202531/10/2024
5E000502/07/202501/07/202831/10/2024
6F000601/10/202431/12/999831/10/2024


This is then meant to be merge with another query 2 using the Pin, but ended up returning the deleted values after merging 

The Steps code : 

Sorting : = Table.Sort(#"Changed Type1",{{"Created", Order.Descending}})
Removing Duplicate: = Table.Distinct(#"Sorted Rows1", {"PIN"})

Merging on query 2: = Table.NestedJoin(#"Added Custom", {"PIN1 "}, #"Query1", {"PIN"}, "Query1", JoinKind.LeftOuter)


Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.