Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreWe've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
I am building a dataset/model from an Excel workbook prepared by others and split into multiple worksheets. The worksheets are split into lists of providers and their counts of (1) established visits and (2) new visits and the same but where providers have fewer than 26 (3) established visits or (4) new visits. Also, I am getting the provider ids from yet another excel workbook.
Anywho, in combining the queries so that I would have all Established visits in one query and all New visits in another query each with the provider id (i.e. NPI), the fact that there are two providers with the same name has caused a problem where I have duplicate values in the NPI column but the corresponding rows are not duplicates. See image:
So, I tried setting up a custom column to create duplicates that I could then remove, like so:
each if ([NPI]="1760810293" and [taxonomy_key]="14" and [group_key]="2") then "1871035881"
else if ([NPI]="1871035881" and [taxonomy_key]="54" and [group_key]="95") then "1760810293"
else [NPI])As you can see in the image above, that didn't work.
Essentially, I need to get rid of rows 1 and 3. Note that the above image is filtered to show the two NPIs of interest. There are a total of 697 rows, should be 695. Help! What should I do?
@cathoms
I am not sure if the order and the pattern of the actual records are similar to what you have shown in the sample. I merged all the columns except NPI and applied "Remove Duplicates" which basically keeps the first occurrence and removes the rest. Check the following code after pasting it in a blank query on the Advanced Editor.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjCxNFbSUTI0ARJGQGxibAYkLZRidaKVTC0sDAnImYLkLE1BqgwtQErAklBDsUjGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t, D = _t, E = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"A", Int64.Type}, {"B", Int64.Type}, {"C", Int64.Type}, {"D", Int64.Type}, {"E", Int64.Type}}),
#"Inserted Merged Column" = Table.AddColumn(#"Changed Type", "Merged", each Text.Combine({Text.From([B], "en-US"), Text.From([C], "en-US"), Text.From([D], "en-US"), Text.From([E], "en-US")}, "|"), type text),
#"Removed Duplicates" = Table.Distinct(#"Inserted Merged Column", {"Merged"})
in
#"Removed Duplicates"
⭕ Subscribe and learn Power BI from these videos
⚪ Website ⚪ LinkedIn ⚪ PBI User Group
So basically you are creating a new column based on all the other columns, then removing duplicates based on that new column. What you are showing gets rid of the wrong rows. However, if remove duplicates keeps the first occurence, would it work if I sort descending?
Also, could I just remove rows based on all the other columns without the need to create a new merged column?
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 5 | |
| 4 | |
| 3 | |
| 3 | |
| 2 |
| User | Count |
|---|---|
| 8 | |
| 7 | |
| 6 | |
| 6 | |
| 5 |