Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi all!
What would be the smart way to remove duplicates here?
I need the first value from the column b and last value from column c. There can be even more than two rows for the same ID.
Thanks in advance for any ideas,
Jenni
Solved! Go to Solution.
For the sake of easy understanding of the code, I have written the power query step by step. But you can combine several steps and make the query much denser.
let Source = Table.FromRecords( { [id = 1, date = #date(2018,2,3), b = 513, c = 423], [id = 2, date = #date(2018,5,1), b = 534, c = 432], [id = 2, date = #date(2019,1,18), b = 345, c = 5434], [id = 3, date = #date(2019,2,2), b = 354, c = 4534], [id = 4, date = #date(2019,4,6), b = 543, c = 544], [id = 5, date = #date(2019,5,5), b = 654, c = 432] } ), ChangeColumnTypes = Table.TransformColumnTypes(Source, { {"date", type date}, {"id", Int64.Type}, {"b", Int64.Type}, {"c", Int64.Type} } ), TempTable = ChangeColumnTypes, ResultTable = ChangeColumnTypes, MinDate = (x as table, y as number ) as date => List.Min( Table.Column( Table.SelectRows(x,each Record.Field(_,"id")=y), "date") ), MaxDate = (x as table, y as number ) as date => List.Max( Table.Column( Table.SelectRows(x,each Record.Field(_,"id")=y), "date") ), AddIsFirstDateColumn = Table.AddColumn(ResultTable, "IsFirstDate", each Record.Field(_,"date") = MinDate(TempTable,Record.Field(_,"id")) ), AddIsLastDateColumn =Table.AddColumn(AddIsFirstDateColumn, "IsLastDate", each Record.Field(_,"date") = MaxDate(TempTable,Record.Field(_,"id")) ), getFirstValue = (x as table,y as number) as number => List.Min( Table.Column(Table.SelectRows( x, each Record.Field(_,"id")=y and Record.Field(_,"IsFirstDate") = true ),"b") ), FirstValue = Table.AddColumn( AddIsLastDateColumn,"FirstValue", each getFirstValue(AddIsLastDateColumn,Record.Field(_,"id")) ), getLastValue = (x as table,y as number) as number => List.Min( Table.Column(Table.SelectRows( x, each Record.Field(_,"id")=y and Record.Field(_,"IsLastDate") = true ),"c") ), LastValue = Table.AddColumn( FirstValue,"LastValue", each getLastValue(AddIsLastDateColumn,Record.Field(_,"id")) ), RemoveDuplicates = Table.SelectRows(LastValue,each [IsLastDate] = true), RemoveColumns = Table.RemoveColumns(RemoveDuplicates,{"IsLastDate","IsFirstDate","b","c"}) in RemoveColumns
Two inputs required.
1) If there are two entries with the same date and same id, what should be the result?
2) If there is more than one entry for an id with different dates, after removing the duplicate rows, what is the date you want to show in the date field of the result table? First date or last date?
1) If there are two entries with the same date and same id, what should be the result?
-> duplicate (the latter one on the list) can be removed. This is very rare possibility and thus it really doesn't make a huge difference whichever it is.
2) If there is more than one entry for an id with different dates, after removing the duplicate rows, what is the date you want to show in the date field of the result table? First date or last date?
-> Last date. Date has not got too much value after this step.
For the sake of easy understanding of the code, I have written the power query step by step. But you can combine several steps and make the query much denser.
let Source = Table.FromRecords( { [id = 1, date = #date(2018,2,3), b = 513, c = 423], [id = 2, date = #date(2018,5,1), b = 534, c = 432], [id = 2, date = #date(2019,1,18), b = 345, c = 5434], [id = 3, date = #date(2019,2,2), b = 354, c = 4534], [id = 4, date = #date(2019,4,6), b = 543, c = 544], [id = 5, date = #date(2019,5,5), b = 654, c = 432] } ), ChangeColumnTypes = Table.TransformColumnTypes(Source, { {"date", type date}, {"id", Int64.Type}, {"b", Int64.Type}, {"c", Int64.Type} } ), TempTable = ChangeColumnTypes, ResultTable = ChangeColumnTypes, MinDate = (x as table, y as number ) as date => List.Min( Table.Column( Table.SelectRows(x,each Record.Field(_,"id")=y), "date") ), MaxDate = (x as table, y as number ) as date => List.Max( Table.Column( Table.SelectRows(x,each Record.Field(_,"id")=y), "date") ), AddIsFirstDateColumn = Table.AddColumn(ResultTable, "IsFirstDate", each Record.Field(_,"date") = MinDate(TempTable,Record.Field(_,"id")) ), AddIsLastDateColumn =Table.AddColumn(AddIsFirstDateColumn, "IsLastDate", each Record.Field(_,"date") = MaxDate(TempTable,Record.Field(_,"id")) ), getFirstValue = (x as table,y as number) as number => List.Min( Table.Column(Table.SelectRows( x, each Record.Field(_,"id")=y and Record.Field(_,"IsFirstDate") = true ),"b") ), FirstValue = Table.AddColumn( AddIsLastDateColumn,"FirstValue", each getFirstValue(AddIsLastDateColumn,Record.Field(_,"id")) ), getLastValue = (x as table,y as number) as number => List.Min( Table.Column(Table.SelectRows( x, each Record.Field(_,"id")=y and Record.Field(_,"IsLastDate") = true ),"c") ), LastValue = Table.AddColumn( FirstValue,"LastValue", each getLastValue(AddIsLastDateColumn,Record.Field(_,"id")) ), RemoveDuplicates = Table.SelectRows(LastValue,each [IsLastDate] = true), RemoveColumns = Table.RemoveColumns(RemoveDuplicates,{"IsLastDate","IsFirstDate","b","c"}) in RemoveColumns
Thank you so much @Anonymous , this is working well!
Two questions
- If there are duplicate dates for certain ID, duplicate is not yet removed. Should I just remove duplicates in the end? As told before, there is no way to see what is the last change in these cases as there are no time stamps.
- After all my real ID:s contain a hyphen (-) and thus they are not number values. Do I just create an ID without the hyphen or is there a way to make this code work in a situation where ID if a text field?
Jenni
1) There is an ambiguity if id-date combinations repeat. There is no way a system can determine which one to keep and which one to remove. Maybe you can use some functions like List.First() etc.. to randomly pick the first record and keep it.
2) IF id is text, change the data type of the following line as text.
{"id", Int64.Type},
I think it should work as we are not using the id field for any arithmetic calculations. We are using it only for matching. So it really does not matter if id is number or text.
1) Yes this is the way to go, I agree.
2) I already did tchange the type you described this but keep receiving error in the "RemoveDuplicates" -step.
Expression.Error: We cannot convert the value "9-432235" to type Number. Details: Value= 9-432235 Type=Type
Jenni
Refer to the code below...
I have modified certain codes and identified as //MODIFIED CODE
let Source = Table.FromRecords( { [id = 1, date = #date(2018,2,3), b = 513, c = 423], [id = 2, date = #date(2018,5,1), b = 534, c = 432], [id = 2, date = #date(2019,1,18), b = 345, c = 5434], [id = 3, date = #date(2019,2,2), b = 354, c = 4534], [id = 4, date = #date(2019,4,6), b = 543, c = 544], [id = 5, date = #date(2019,5,5), b = 654, c = 432] } ), ChangeColumnTypes = Table.TransformColumnTypes(Source, { {"date", type date}, // {"id", Int64.Type}, {"id", type text}, //MODIFIED CODE {"b", Int64.Type}, {"c", Int64.Type} } ), TempTable = ChangeColumnTypes, ResultTable = ChangeColumnTypes, //MinDate = (x as table, y as number ) as date => List.Min( MinDate = (x as table, y as text ) as date => List.Min( // MODIFIED CODE Table.Column( Table.SelectRows(x,each Record.Field(_,"id")=y), "date") ), //MaxDate = (x as table, y as number ) as date => List.Max( MaxDate = (x as table, y as text ) as date => List.Max( //MODIFIED CODE Table.Column( Table.SelectRows(x,each Record.Field(_,"id")=y), "date") ), AddIsFirstDateColumn = Table.AddColumn(ResultTable, "IsFirstDate", each Record.Field(_,"date") = MinDate(TempTable,Record.Field(_,"id")) ), AddIsLastDateColumn =Table.AddColumn(AddIsFirstDateColumn, "IsLastDate", each Record.Field(_,"date") = MaxDate(TempTable,Record.Field(_,"id")) ), //getFirstValue = (x as table,y as number) as number => List.Min( getFirstValue = (x as table,y as text) as number => List.Min( //MODIFIED CODE Table.Column(Table.SelectRows( x, each Record.Field(_,"id")=y and Record.Field(_,"IsFirstDate") = true ),"b") ), FirstValue = Table.AddColumn( AddIsLastDateColumn,"FirstValue", each getFirstValue(AddIsLastDateColumn,Record.Field(_,"id")) ), //getLastValue = (x as table,y as number) as number => List.Min( getLastValue = (x as table,y as text) as number => List.Min( //MODIFIED CODE Table.Column(Table.SelectRows( x, each Record.Field(_,"id")=y and Record.Field(_,"IsLastDate") = true ),"c") ), LastValue = Table.AddColumn( FirstValue,"LastValue", each getLastValue(AddIsLastDateColumn,Record.Field(_,"id")) ), RemoveDuplicates = Table.SelectRows(LastValue,each [IsLastDate] = true), RemoveColumns = Table.RemoveColumns(RemoveDuplicates,{"IsLastDate","IsFirstDate","b","c"}) in RemoveColumns
Great, thanks @Anonymous !
Jenni
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
102 | |
84 | |
79 | |
68 |
User | Count |
---|---|
120 | |
110 | |
94 | |
81 | |
77 |