Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi,
I need help with something I'm trying to achieve, which I'm pretty sure isn't too complicated, but as I am a beginner with PowerQuery and M code, I am clearly out of my depth...
I have a table identifying people with, sometimes multiple rows for one same person. I would like to remove duplicates (keeping the first row in which the person appeared) but, before that, create a custom column that would state 'Yes' when there were several rows for that one person.
Which would mean comparing three columns (Last name, First name, Maiden name) to check whether they were the same, and if there are several occurrences, writing 'Yes' in the custom column, and only then removing duplicates, keeping the first row (which implies forcing a buffering of the whole table, or so I've read, else PowerQuery will just delete whichever row it finds while loading the data, apparently)...
It would look something like this:
Last name | First name | Maiden name | Multi |
LName1 | FName1 | MName1 | Yes |
LName 2 | FName2 | MName2 | No |
How could this be achieved in PowerQuery? I would then like to include that query in an Excel macro so that I can automate some routine stuff...
Thank you very much for any help you may provide...
Best,
Ari. ;o)
Thank you guys a million for the different ways to go about this problem!
I have indeed been able to get rid of all duplicates and keep one line for each person, with "Yes" when there were several lines, and "No" when there was only one. My problem is now the following: I "shortened" my table for the sake of example, but it comprises many other columns which are of importance, and which I need to keep.
It seems that, by grouping the table by Last name, First name, and Maiden name (to ensure I get one and the same person), I am effectively disregarding all the other columns in the table to implement my "Multi" column, but also actually getting rid of them (the other columns are, henceforth, nowhere to be found in the resulting table...)
Is there a way to do the same thing as what we just did, but to the whole table (which would mean several "Yes" for one same person if they were featured in several rows), and only then removing duplicates? Or transferring the column obtained through one of your methods to the original table so that I do not lose the information contained in the other columns?
To be clear, my table features (at least) the following columns:
Last name | First name | Maiden name | Date of birth | Email adress | Beginning of contract | Type of contract | School ID number | School name | School ZIP |
I need the 'Multi' column to be added to these, and then the duplicates to go away, so I can keep the info in the first (and most important) row.
Again, thank you guys so much for the answers you have already provided, and thanks in advance for those you will (hopefully) provide!
Best,
Ari ;o)
Hi @AriBouaniche ,
Thanks for Omid_Motamedise and AlienSx reply.
Here is another way
let
Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
GroupedRows = Table.Group(Source, {"Last name", "First name", "Maiden name"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
AddCustom = Table.AddColumn(GroupedRows, "Multi", each if [Count] > 1 then "YES" else "NO"),
RemoveCountColumn = Table.RemoveColumns(AddCustom, {"Count"})
in
RemoveCountColumn
Output
Press Alt+F11 in Excel to open the VBA editor.
In the VBA editor, go to insert > Module to create a new module.
Copy and paste the following VBA code into the module. Replace the pq.Nmae as your query name
Sub RunPowerQuery()
Dim wb As Workbook
Dim pq As WorkbookQuery
' Set the workbook
Set wb = ThisWorkbook
' Loop through all queries in the workbook
For Each pq In wb.Queries
' Check if the query name matches your query
If pq.Name = "Table4" Then
' Refresh the query
pq.Refresh
End If
Next pq
End Sub
Press Alt+F8 run the query
Final output
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
It is easy, just select the columns (First, Last, and Midd name) right click on them and pick group by command, like the below image, pick count roes as operation and press ok, in the resulted table, all the duplicated are remove and for the rows which have value 1 on the new column ,they were not duplicated but if value other than 1 recorded on the newe column, theat row were duplicated
see the bellow code
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8vFLzE01VNJRcoMxfGGMyNRipVgd4pUoGMHUGMHUgBh++UqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Last name" = _t, #"First name" = _t, #"Maiden name" = _t, Multi = _t]),
#"Grouped Rows" = Table.Group(Source, {"Last name", "First name", "Maiden name"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
Table.Group(
Source,
Table.ColumnNames(Source),
{"Multi", (x) => if Table.RowCount(x) > 1 then "YES" else "NO"}
)