Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello,
I need help removing "duplicates" from a table. Let me ilustrate my problem with one example:
Name | Country | Company | Age | City |
John Doe | US | Microsoft | 30 | Palo Alto |
John Doe | US | Azure | 30 | Madrid |
John Doe | US | 30 | Ireland | |
Jane Doe | UK | 54 | London | |
Dan Kent | MX | Microsoft | 38 | Tokyo |
In this example we can see that the Name 'John Doe' appears multiple times with the same Country and Age. I would like to remove one duplicate but giving priority to Company 'Microsoft' for instance. I don't care about the City and rest of the fields if there're more. Thus, resulting in the following table:
Name | Country | Company | Age | City |
John Doe | US | Microsoft | 30 | Palo Alto |
Jane Doe | UK | 54 | London | |
Dan Kent | MX | Microsoft | 38 | Tokyo |
I have been trying to count how many times the combination of fields Name, Country and Age are duplicated in the same table so I can mark the rows whose number is higher than 1 and Company is different than 'Microsoft' and then filter them out. However, I don't manage to do it. I don't really mind doing it in Power Query or DAX as long as it works.
I really appreaciate the help. Thank you very much.
Solved! Go to Solution.
@oliverL , Power Query is competent in such tasks. You might want to try,
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
#"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
#"Expanded ar"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
@oliverL , Power Query is competent in such tasks. You might want to try,
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Country = _t, Company = _t, Age = _t, City = _t]),
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]} otherwise _{0}}}),
#"Expanded ar" = let cols = List.RemoveItems(Table.ColumnNames(Source),{"Name"}) in Table.ExpandRecordColumn(#"Grouped Rows", "ar", cols, cols)
in
#"Expanded ar"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Thanks @CNENFRNL,
I have tried it and I think it works. I need to do some more testing since my real scanario is a bit more complex but I am sure I will be able to get to the solution with your help. I will mark it as solution. However, there are a couple of questions that I would like to ask you to see if you could be so glad to help me:
- In this section of the code:
#"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}
Do you know why is not possible to replace the "=" in Company... with "<>"? In my real scenario I know what company I don't want. My bad for not making that clear in the original post.
-Also:
Json.Document(Binary.Decompress(Binary.FromText("i45W8srPyFNwyU9V0lEKDQYSvpnJRfnF+WklQLaxAZAISMzJV3DMKclXitXBVO5YVVqUClPqm5hSlJmCVZ17fn56DlyhZ1FqTmIeVGViXipMpTeySlMTIOGTn5eSnwdW6JKYp+CdmgdymG8EukstgERIfnZlPrpSHycg4V+UmAyxHaQuuERPwc9bKTYWAA==", BinaryEncoding.Base64)
How do you get the binary representation of a JSON file like that? When I load a JSON file I get the path to the container folder.
Thank you once again 😄
@oliverL , further with your questions,
1. #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"ar", each try _{[Company="Microsoft"]}
table{[colName1=value1, colName2=value2, ...]} is a fixed pattern for filtering a table down to a unique matching result (one and only one result). It's can be considered as syntactic sugar for Table.SelectRows(), I think. Only equality ("=") applies here.
2. when a dataset is pasted or manually entered into Power Query, it's automatically converted by PQ into a Json representation.
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
10 | |
9 | |
8 | |
7 | |
6 |
User | Count |
---|---|
14 | |
13 | |
11 | |
9 | |
8 |