Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
tember
New Member

Practice Assessment Wrong? Power Query Sort and Remove Duplicates

I am preparing for the Dp-600 Exam and one of the Practice Assessment questions is either wrong or very confusing.

 

Question: 

You have a Fabric lakehouse named Lakehouse1 that contains a Dataflow Gen2 query. You have an Azure SQL database that contains a type 2 slowly changing dimension database table named CustomerMaster. CustomerMaster contains the following columns:

  • Customer ID – Number
  • EffectiveDate – Date
  • Address – Text
  • Status - Text

You plan to ingest CustomerMaster into Lakehouse1. The solution must only keep the latest record (unique) per Customer ID.

Which two applied steps should you use? Each correct answer presents part of the solution.

 

Answers:

Keep top rows

Max Customer ID

Remove duplicates on the Customer ID column

Remove duplicates on the CustomerMaster table

Sort on Customer ID, EffectiveDate

 

The bolded selections are the correct answer with this explanation: "Sorting CustomerID and EffectiveDate, and then removing duplicates on the Customer ID column is the only way to keep the correct latest row per customer ID. All other options will not correctly keep the latest customer row per effective date."

 

 This answer confused me, so I clicked through to the explanation.

Common Authoring Issues in Power Query - Power Query | Microsoft Learn

 

It says:

You might assume that if you sort your data, any downstream operations preserve the sort order.

For example, if you sort a sales table so that each store's largest sale is shown first, you might expect that doing a "Remove duplicates" operation returns only the top sale for each store. And this operation might, in fact, appear to work. However, this behavior isn't guaranteed.

 

So it literally contradicts what the answer said...  Am I misunderstanding something??

2 ACCEPTED SOLUTIONS
OktayPamuk80
Helper V
Helper V

Hi Tember,

Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.

In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.

Regards,

Oktay

View solution in original post

frithjof_v
Super User
Super User

Yeah, the explanation sounds confusing. It could be a bug in the question/answer.

 

I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques

 

Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...

View solution in original post

6 REPLIES 6
Mandeepkaur
New Member

Remove duplicates on the Customer ID column

Sort on Customer ID, EffectiveDate

This answer is correct.
frithjof_v
Super User
Super User

Yeah, the explanation sounds confusing. It could be a bug in the question/answer.

 

I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques

 

Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...

miguel
Community Admin
Community Admin

"Keep top rows" and "remove duplicates" are semantically different

 

I haven't seen the question in the assessment, so I'll pass this feedback over to the team who owns it.

 

Nevertheless, doing a "remove duplicates" doesn't keep or take into consideration the ordinal position of the rows. This means that if you are doing a "remove duplicates" on a subset of the columns from your table and not the whole table, then there's no guarantee that the result of the operation will be based on the ordinal position of the rows like you saw them in a previous step. That's where the more explicit definition of a explicit logic that you define where your logic can set a flag (or a value) for what rows to keep would be a far better approach. Typically you could reach this by using the rank, group by or merge methods in combination with a filter rows. If you have a specific example, we could go over it and how you could be more specific with your logic to keep exactly the rows based on your explicit logic 

I don't have a specific example - I am just working through the DP-600 material and this answer didn't sit well with me - especially when compared with the MS documentation.  I generally use the ROW_NUMBER() PARTITION BY technique (from miguel) or the MAX date (from OktayPamuk80) .  Looking for an explanation why that is the "right answer"

The Rank column (preview) function in Power Query Online is exciting!

 

Is it planned to support rank within partitions in the future?

 

And query folding? The indicator says "This step will be evaluated outside the data source".

 

Thanks!

 

frithjof_v_0-1728824564113.png

 

OktayPamuk80
Helper V
Helper V

Hi Tember,

Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.

In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.

Regards,

Oktay

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.