The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi,
I need to remove rows in a table that have duplicate values and I need to keep last one. Power BI by default keeps first one regardless of sorting order. I tried to use buffered table in advanced editor, this didn't help. Any ideas how to achieve this?
Solved! Go to Solution.
In this scenario, if you want to keep the last records associated with each group column. You can build a calculated table aggregating values with max sort within each group.
I assume you a have a table like below:
Then you can create a calculated table like below:
Table = SUMMARIZE( Table3,Table3[Name], "Last Value", CALCULATE(SUM(Table3[Value]), FILTER(Table3,Table3[Sort]=MAX(Table3[Sort])) ) )
Regards,
The trick from RichieRich is a very elegant straight forward approach that provided a simple solution for my even more complex duplicates problem! Thanks a lot!
In this scenario, if you want to keep the last records associated with each group column. You can build a calculated table aggregating values with max sort within each group.
I assume you a have a table like below:
Then you can create a calculated table like below:
Table = SUMMARIZE( Table3,Table3[Name], "Last Value", CALCULATE(SUM(Table3[Value]), FILTER(Table3,Table3[Sort]=MAX(Table3[Sort])) ) )
Regards,
Hi! Can you help me add a custom column counting the number of entries? Outcome shall be the same as the SORT column used here. Thank you!
Hi,
Share some data, describe the question and show the expected result.
I need an additional custom column PQ Editor that COUNTS the number of OrderNumber entries..
Goal is to remove the duplicates and get the latest entry based on RunDate.
I have tried the Table.Buffer but it seems not to be applicable because it slows down the performance/process. I thought of this workaround instead...
Hi,
Does this M Query work faster? If not, then let me know. I will share a calculated column formula in DAX with you.
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"OrderNumber", Int64.Type}, {"RunDate", type date}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrderNumber", Order.Ascending}, {"RunDate", Order.Ascending}}),
Partition = Table.Group(#"Sorted Rows", {"OrderNumber"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
#"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"RunDate", "Index"}, {"RunDate", "Index"})
in
#"Expanded Partition"
Trying to figure out things.. What if there are other columns to be displayed? The M query worked as well.. But maybe i'll use it on the smaller data.
How can we display other columns that were hidden in the Partition?
Hi,
I shared an alternative DAX calculated solution yesterday. Try that one.
I have tried both the M query and DAX calculated column.. Both solutions worked incredibly!
I will observe M query's performance on BIG data first.. If really not applicable, I'll use your DAX calculated column solution...
Thank you so much for taking time!!
Sure. Based on whichever works better on your live data, please mark that relevant response as Answer.
Hi @Ashish_Mathur, thank you but do you have another work around? The M query takes time to process big data.
I have a DAX calculated column alternative (not an M query alternative). Are you interested in that?
Okay with DAX, if it will perform faster than the M query and as long as it meets the desired outcome as well! Many thanks!
Hi,
Try this calculated column formula
=calculate(countrows(data),filter(data,data[OrderNumber]=earlier(data[OrderNumber])&&data[RunDate]<=earlier(data[RunDate])))
Hope this helps.
Thanks! It's not that elegant as I hoped it would be, but it works
There's a more elegant method here which I've just used with Table.Buffer. https://www.youtube.com/watch?v=rqDdnNxSgHQ
The crucial part in the video is after the 4 minute mark.....
The Table.Buffer command saves the sort prior to removing the duplicates ensuring you get the latest.
The Table.Buffer has to be added manually in the advanced editor
Here is an example I did where "Registration" is the group on which I'm removing duplicates and keeping the latest record (First of the date for each of the group, date descending)
let
Source = #"IVMS - As Posted View - Unstructured Master Query",
#"Sorted Rows" = Table.Sort(Source,{{"Calendar Year/Month.Calendar Year/Month Level 01", Order.Descending}}),
#"Buffer table"= Table.Buffer(#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffer table", {"Registration"})
in
#"Removed Duplicates"
This worked great! Thank you.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
80 | |
78 | |
47 | |
39 |
User | Count |
---|---|
148 | |
115 | |
65 | |
64 | |
53 |