Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
olumidef
Regular Visitor

REMOVING DUPLICATES AND MERGING QUERIES

Hi All, 

I am hoping i could get some help, 

I have two set of queries on Power Query and for query 1 ( I have duplicates data due to entries on different date), i have sorted the date  in descending order cause i would like to keep the latest date, i removed duplicates (using customer number ) after sorting and then proceed to merge query 1 with query 2 using a uniques value (customer number). To my amusement , the merge query is returning values in query 1 that was meant to have removed through duplicates down 

So say Customer A has Unique number 5 and has dates pn the 31/10/2024 and 21/10/2024 when i remove the duplicates , i could only see 31/10/2024 values (which is fine) but upon merging with query 2 , i am seeing values of 21/10/2024 been returned 

How can i resolve this please 

Thanks

1 ACCEPTED SOLUTION
Omid_Motamedise
Memorable Member
Memorable Member

on the steps, after sorting data, a new step will added into your query steps which its formula starts with Table.Sort(....), befor removing the duplicteas, revise this formula by addin Table.Buffer befor it and rewrite the formula as
Table.Buffer(Table.Sort(....))

then remove duplicat. your peoblem become solved

View solution in original post

14 REPLIES 14
v-cgao-msft
Community Support
Community Support

Hi @olumidef ,

 

I tried to reproduce the whole step and it works properly in my case.

vcgaomsft_0-1730685351143.png

Could you check “Query1” again to make sure that the last result it returns does not contain the date 21/10/2024?

Here's my test file.

 

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum

Omid_Motamedise
Memorable Member
Memorable Member

on the steps, after sorting data, a new step will added into your query steps which its formula starts with Table.Sort(....), befor removing the duplicteas, revise this formula by addin Table.Buffer befor it and rewrite the formula as
Table.Buffer(Table.Sort(....))

then remove duplicat. your peoblem become solved

Thank you @Omid_Motamedise , the buffer function perfectly solves the problem 

You are welcome

Hi @Omid_Motamedise , I see you introduce the Table.Buffer in the solution. Can you explain why that works? In my mind it is only a performance function and does nothing functionally.

Hi, it is true that Table.Buffer is mostely for performance but in this case, it can be used. Power query do not run the step, after each other in the order you create and it try to find the best order. so consider the below two steps


S0=Source,

S1=Table.Sort(S0,....),

S2=Table.Distinct(S1,....) 

in thiss case, it seems that S1 is neglected and the duplicate rows are removed befor sorting, adding Table.Buffer in this step, forced power query to sort the rows and remove duplicate. 

this problem can be solved in another way by combining the S1 and S2 and rewrite it as 

S0=Source,

Srev=Table.Distinct(Table.Sort(S0,....),....) 

Ok. But wouldn't that be a bug in powerquery? Your 2 examples should be equivalent. In both cases S0 should be evaluated first because S1 depends on it...

It might be but not in all the case, consider this example
S0=Source,
S1= Table.Addcolumn(S0,"x",....),
S2=Table.Group(S1,"A", {"m",Table.RowCount})


it is true that s2 is calculated based on S1, but its result does not dependent to the result of S1, if you use S0 also in this function, the result would be the same (adding a new column, does not changed the result of counting the rows)

PwerQueryKees
Impactful Individual
Impactful Individual

Sorry, can't help you then...

PwerQueryKees
Impactful Individual
Impactful Individual

PLease share your actual Copy/Pasted M code. The code you shared has syntax errors and non-existing colum names.... So not possible to debug. 

The process is an applied steps on power query and those are the code relating to the steps in questions 

Due to data privacy issue , unable to share the data/references 

Not certain what the syntax error might be also

hi @olumidef ,

 

I understand data privacy issues.

 

If you could provide a sample input and output, masking sensitive information in a usable format ( tables, excel, csv, gdrive/ onedrive etc. link to pbix ) , would be happy to support.

 

 

for reference:

https://community.fabric.microsoft.com/t5/DAX-Commands-and-Tips/How-to-Get-Your-Question-Answered-Qu... 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a Kudos 🙂
Kind Regards,
Avinash
PwerQueryKees
Impactful Individual
Impactful Individual

Please share your M code and examples of your data.

 

I am unable to provide the sample data due to privacy issue 
However, i have created a replica as shown below for QUERY 1

Ac NoCustomer NamePINSDEDCreated 
1A000131/10/202427/02/202731/10/2024
2B000207/11/202406/11/202731/10/2024
3C000330/10/202431/12/999831/10/2024
4D000401/01/202531/12/202531/10/2024
4D000401/09/202431/12/999821/10/2024
5E000502/07/202501/07/202831/10/2024
5E000508/08/202422/09/202422/10/2024
6F000601/10/202431/12/999831/10/2024
6F000614/08/202430/10/202426/10/2024


The data is first sorted by created to show newest to oldest 
and duplucate is removed  for which it returns the below 

Ac NoCustomer NamePINSDEDCreated 
1A000131/10/202427/02/202731/10/2024
2B000207/11/202406/11/202731/10/2024
3C000330/10/202431/12/999831/10/2024
4D000401/01/202531/12/202531/10/2024
5E000502/07/202501/07/202831/10/2024
6F000601/10/202431/12/999831/10/2024


This is then meant to be merge with another query 2 using the Pin, but ended up returning the deleted values after merging 

The Steps code : 

Sorting : = Table.Sort(#"Changed Type1",{{"Created", Order.Descending}})
Removing Duplicate: = Table.Distinct(#"Sorted Rows1", {"PIN"})

Merging on query 2: = Table.NestedJoin(#"Added Custom", {"PIN1 "}, #"Query1", {"PIN"}, "Query1", JoinKind.LeftOuter)


Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.