Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.
I am preparing for the Dp-600 Exam and one of the Practice Assessment questions is either wrong or very confusing.
Question:
You have a Fabric lakehouse named Lakehouse1 that contains a Dataflow Gen2 query. You have an Azure SQL database that contains a type 2 slowly changing dimension database table named CustomerMaster. CustomerMaster contains the following columns:
You plan to ingest CustomerMaster into Lakehouse1. The solution must only keep the latest record (unique) per Customer ID.
Which two applied steps should you use? Each correct answer presents part of the solution.
Answers:
Keep top rows
Max Customer ID
Remove duplicates on the Customer ID column
Remove duplicates on the CustomerMaster table
Sort on Customer ID, EffectiveDate
The bolded selections are the correct answer with this explanation: "Sorting CustomerID and EffectiveDate, and then removing duplicates on the Customer ID column is the only way to keep the correct latest row per customer ID. All other options will not correctly keep the latest customer row per effective date."
This answer confused me, so I clicked through to the explanation.
Common Authoring Issues in Power Query - Power Query | Microsoft Learn
It says:
You might assume that if you sort your data, any downstream operations preserve the sort order.
For example, if you sort a sales table so that each store's largest sale is shown first, you might expect that doing a "Remove duplicates" operation returns only the top sale for each store. And this operation might, in fact, appear to work. However, this behavior isn't guaranteed.
So it literally contradicts what the answer said... Am I misunderstanding something??
Solved! Go to Solution.
Hi Tember,
Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.
In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.
Regards,
Oktay
Yeah, the explanation sounds confusing. It could be a bug in the question/answer.
I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques
Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...
I have the same question on my practice exam, but mine asked for three applied steps, however, there are only two correct answers according to the assessment.
Is the answer incorrect and/or is there another correct choice in these options?
Hi,
The only additional I can think of is to to choose the remove duplicates on the customer master table. I think, before the Buffer table, adding this ensures, you apply group by to a table with less size.
But yes, two options should be sufficient.
Regards,
Oktay
Remove duplicates on the Customer ID column
Sort on Customer ID, EffectiveDate
Yeah, the explanation sounds confusing. It could be a bug in the question/answer.
I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques
Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...
"Keep top rows" and "remove duplicates" are semantically different
I haven't seen the question in the assessment, so I'll pass this feedback over to the team who owns it.
Nevertheless, doing a "remove duplicates" doesn't keep or take into consideration the ordinal position of the rows. This means that if you are doing a "remove duplicates" on a subset of the columns from your table and not the whole table, then there's no guarantee that the result of the operation will be based on the ordinal position of the rows like you saw them in a previous step. That's where the more explicit definition of a explicit logic that you define where your logic can set a flag (or a value) for what rows to keep would be a far better approach. Typically you could reach this by using the rank, group by or merge methods in combination with a filter rows. If you have a specific example, we could go over it and how you could be more specific with your logic to keep exactly the rows based on your explicit logic
I don't have a specific example - I am just working through the DP-600 material and this answer didn't sit well with me - especially when compared with the MS documentation. I generally use the ROW_NUMBER() PARTITION BY technique (from miguel) or the MAX date (from OktayPamuk80) . Looking for an explanation why that is the "right answer"
The Rank column (preview) function in Power Query Online is exciting!
Is it planned to support rank within partitions in the future?
And query folding? The indicator says "This step will be evaluated outside the data source".
Thanks!
Hi Tember,
Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.
In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.
Regards,
Oktay