Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I am preparing for the Dp-600 Exam and one of the Practice Assessment questions is either wrong or very confusing.
Question:
You have a Fabric lakehouse named Lakehouse1 that contains a Dataflow Gen2 query. You have an Azure SQL database that contains a type 2 slowly changing dimension database table named CustomerMaster. CustomerMaster contains the following columns:
You plan to ingest CustomerMaster into Lakehouse1. The solution must only keep the latest record (unique) per Customer ID.
Which two applied steps should you use? Each correct answer presents part of the solution.
Answers:
Keep top rows
Max Customer ID
Remove duplicates on the Customer ID column
Remove duplicates on the CustomerMaster table
Sort on Customer ID, EffectiveDate
The bolded selections are the correct answer with this explanation: "Sorting CustomerID and EffectiveDate, and then removing duplicates on the Customer ID column is the only way to keep the correct latest row per customer ID. All other options will not correctly keep the latest customer row per effective date."
This answer confused me, so I clicked through to the explanation.
Common Authoring Issues in Power Query - Power Query | Microsoft Learn
It says:
You might assume that if you sort your data, any downstream operations preserve the sort order.
For example, if you sort a sales table so that each store's largest sale is shown first, you might expect that doing a "Remove duplicates" operation returns only the top sale for each store. And this operation might, in fact, appear to work. However, this behavior isn't guaranteed.
So it literally contradicts what the answer said... Am I misunderstanding something??
Solved! Go to Solution.
Hi Tember,
Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.
In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.
Regards,
Oktay
Yeah, the explanation sounds confusing. It could be a bug in the question/answer.
I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques
Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...
Remove duplicates on the Customer ID column
Sort on Customer ID, EffectiveDate
Yeah, the explanation sounds confusing. It could be a bug in the question/answer.
I would check out this blog: https://gorilla.bi/power-query/removing-duplicates/#advanced-techniques
Or write a native SQL query with ROW_NUMBER() PARTITION BY and the M function Value.NativeQuery() to maintain query folding if needed https://blog.crossjoin.co.uk/2021/02/21/query-folding-on-sql-queries-in-power-query-using-value-nati...
"Keep top rows" and "remove duplicates" are semantically different
I haven't seen the question in the assessment, so I'll pass this feedback over to the team who owns it.
Nevertheless, doing a "remove duplicates" doesn't keep or take into consideration the ordinal position of the rows. This means that if you are doing a "remove duplicates" on a subset of the columns from your table and not the whole table, then there's no guarantee that the result of the operation will be based on the ordinal position of the rows like you saw them in a previous step. That's where the more explicit definition of a explicit logic that you define where your logic can set a flag (or a value) for what rows to keep would be a far better approach. Typically you could reach this by using the rank, group by or merge methods in combination with a filter rows. If you have a specific example, we could go over it and how you could be more specific with your logic to keep exactly the rows based on your explicit logic
I don't have a specific example - I am just working through the DP-600 material and this answer didn't sit well with me - especially when compared with the MS documentation. I generally use the ROW_NUMBER() PARTITION BY technique (from miguel) or the MAX date (from OktayPamuk80) . Looking for an explanation why that is the "right answer"
The Rank column (preview) function in Power Query Online is exciting!
Is it planned to support rank within partitions in the future?
And query folding? The indicator says "This step will be evaluated outside the data source".
Thanks!
Hi Tember,
Sorting and removing the duplicates is in general the correct answer. However, yes, it does not work all the time, and in my opinion, it is ambiguous, too as you stated.
In my projects, I never use this option, I rather go with inner joining with a duplicated version of the table having the max effective date grouped by customer id.
Regards,
Oktay
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |
User | Count |
---|---|
4 | |
3 | |
1 | |
1 | |
1 |