Solved: Practice Assessment Wrong? Power Query Sort and Re...

tember · ‎10-04-2024

I am preparing for the Dp-600 Exam and one of the Practice Assessment questions is either wrong or very confusing.

Question:

You have a Fabric lakehouse named Lakehouse1 that contains a Dataflow Gen2 query. You have an Azure SQL database that contains a type 2 slowly changing dimension database table named CustomerMaster. CustomerMaster contains the following columns:

Customer ID – Number
EffectiveDate – Date
Address – Text
Status - Text

You plan to ingest CustomerMaster into Lakehouse1. The solution must only keep the latest record (unique) per Customer ID.

Which two applied steps should you use? Each correct answer presents part of the solution.

Answers:

Keep top rows

Max Customer ID

Remove duplicates on the Customer ID column

Remove duplicates on the CustomerMaster table

Sort on Customer ID, EffectiveDate

The bolded selections are the correct answer with this explanation: "Sorting CustomerID and EffectiveDate, and then removing duplicates on the Customer ID column is the only way to keep the correct latest row per customer ID. All other options will not correctly keep the latest customer row per effective date."

This answer confused me, so I clicked through to the explanation.

Common Authoring Issues in Power Query - Power Query | Microsoft Learn

It says:

You might assume that if you sort your data, any downstream operations preserve the sort order.

For example, if you sort a sales table so that each store's largest sale is shown first, you might expect that doing a "Remove duplicates" operation returns only the top sale for each store. And this operation might, in fact, appear to work. However, this behavior isn't guaranteed.

So it literally contradicts what the answer said... Am I misunderstanding something??

OktayPamuk80 · ‎10-07-2024

Hi Tember,

Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.

In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.

Regards,

Oktay

View solution in original post

frithjof_v · ‎10-12-2024

Yeah, the explanation sounds confusing. It could be a bug in the question/answer.

I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques

Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...

View solution in original post

Mandeepkaur · ‎12-25-2024

Remove duplicates on the Customer ID column

Sort on Customer ID, EffectiveDate

This answer is correct.

frithjof_v · ‎10-12-2024

Yeah, the explanation sounds confusing. It could be a bug in the question/answer.

I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques

Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...

miguel · ‎10-12-2024

"Keep top rows" and "remove duplicates" are semantically different

I haven't seen the question in the assessment, so I'll pass this feedback over to the team who owns it.

Nevertheless, doing a "remove duplicates" doesn't keep or take into consideration the ordinal position of the rows. This means that if you are doing a "remove duplicates" on a subset of the columns from your table and not the whole table, then there's no guarantee that the result of the operation will be based on the ordinal position of the rows like you saw them in a previous step. That's where the more explicit definition of a explicit logic that you define where your logic can set a flag (or a value) for what rows to keep would be a far better approach. Typically you could reach this by using the rank, group by or merge methods in combination with a filter rows. If you have a specific example, we could go over it and how you could be more specific with your logic to keep exactly the rows based on your explicit logic

tember · ‎10-13-2024

I don't have a specific example - I am just working through the DP-600 material and this answer didn't sit well with me - especially when compared with the MS documentation. I generally use the ROW_NUMBER() PARTITION BY technique (from miguel) or the MAX date (from OktayPamuk80) . Looking for an explanation why that is the "right answer"

frithjof_v · ‎10-13-2024

The Rank column (preview) function in Power Query Online is exciting!

Is it planned to support rank within partitions in the future?

And query folding? The indicator says "This step will be evaluated outside the data source".

Thanks!

OktayPamuk80 · ‎10-07-2024

Hi Tember,

Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.

In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.

Regards,

Oktay

Practice Assessment Wrong? Power Query Sort and Remove Duplicates

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025

Become a Certified Power BI Data Analyst!

Practice Assessment Wrong? Power Query Sort and Remove Duplicates

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025