The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi Forum,
I can see some similar challenges from members, but not quite working for me. I have [an extract of] a list below, and where there are duplicates, there is an additional attribute which gives a version number. I need to be able to idenitfy the latest version of a duplicate (I need to keep all versions). What I would like is a column which identifies "Latest Version" which will essentially flag the duplicate "reference" which has the greatest number in "version_label".
Here's hoping.
Solved! Go to Solution.
Hi @StevieBeales ,
Sample data
IDReference
IDReference
15291150 | 2000-87144-MP-3323-0001 |
14968792 | 2000-87144-MS-6999-0001 |
14968793 | 2000-87144-MS-7303-0001 |
15038296 | 2000-87144-MS-7303-0001 |
15038798 | 2000-87144-MS-7303-0001 |
15088876 | 2000-87144-MS-7303-0001 |
You can try the following code
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jcxLCoAwDEXRvWTcQD5tk7cIQXBYuv9t2KEoqNPL4Y5B2gyqTaiQiQhnaK287exuzqsozbJYRc+A3djBHcCD+YOFy/XWxNPQ/7BAfrPMjNfbPAE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Reference = _t]),
GroupedRows = Table.Group(Source, {"Reference"}, {{"MaxID", each List.Max([ID]), type number}}),
MergedTables = Table.NestedJoin(Source, {"Reference"}, GroupedRows, {"Reference"}, "GroupedRows", JoinKind.LeftOuter),
ExpandedTable = Table.ExpandTableColumn(MergedTables, "GroupedRows", {"MaxID"}),
AddedCustom = Table.AddColumn(ExpandedTable, "Label", each if [ID] = [MaxID] then "Latest Version" else null),
RemovedColumns = Table.RemoveColumns(AddedCustom,{"MaxID"})
in
RemovedColumns
Final output
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hi @StevieBeales ,
Sample data
IDReference
IDReference
15291150 | 2000-87144-MP-3323-0001 |
14968792 | 2000-87144-MS-6999-0001 |
14968793 | 2000-87144-MS-7303-0001 |
15038296 | 2000-87144-MS-7303-0001 |
15038798 | 2000-87144-MS-7303-0001 |
15088876 | 2000-87144-MS-7303-0001 |
You can try the following code
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jcxLCoAwDEXRvWTcQD5tk7cIQXBYuv9t2KEoqNPL4Y5B2gyqTaiQiQhnaK287exuzqsozbJYRc+A3djBHcCD+YOFy/XWxNPQ/7BAfrPMjNfbPAE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Reference = _t]),
GroupedRows = Table.Group(Source, {"Reference"}, {{"MaxID", each List.Max([ID]), type number}}),
MergedTables = Table.NestedJoin(Source, {"Reference"}, GroupedRows, {"Reference"}, "GroupedRows", JoinKind.LeftOuter),
ExpandedTable = Table.ExpandTableColumn(MergedTables, "GroupedRows", {"MaxID"}),
AddedCustom = Table.AddColumn(ExpandedTable, "Label", each if [ID] = [MaxID] then "Latest Version" else null),
RemovedColumns = Table.RemoveColumns(AddedCustom,{"MaxID"})
in
RemovedColumns
Final output
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Can you provide your data in the format of table here. and if I am cach the question right, the result is the same as number of repitation of each record?
Attribute:reference become Reference
Attribute:VERSION_LABEL become ID
If that does not work show me you code and the error message.
Thank you. The code is as follows:
And strangely, this is the error code:
Weird. Are you sure there is no space after ID?
Don't think so actually. The formula assumes the step before is Source. change Source to the name of the previous step in you query.
If that does not work, give me a copy of the content of advanced editor.
Thanks PwerQueryKees
Annoylingly, the RSS feed for taking the information has been changed, and they have changed the datafields. I thought it would have been simple to change the code in your response to suit the new fields, but when I do this, I get a column not found error.
New table below, and still Reference is the duplicate area, but now ID is the new field to look at as being the highest number representing the latest version.
How about this:
The expression in text for easy copy paste:
let
allversions = Table.SelectRows(Source, (T) => T[#"Attribute:REFERENCE"] = [#"Attribute:REFERENCE"]),
max_version = List.Max(allversions[#"Attribute:VERSION_LABEL"]),
is_latest = max_version = [#"Attribute:VERSION_LABEL"]
in
is_latest