Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hello !
I face an issue with one table, I will try to reproduce this issue as a fictive exercise below.
Here is a table :
| ID | NAME | SOURCE |
| 1 | PLANT A | MYSQL |
| 2 | PLANT B1 | MYSQL |
| 2 | PLANT B2 | MYSQL |
| 3 | PLANT C | MYSQL |
| 4 | PLANT D | WEB API |
| 2 | PLANT B3 | WEB API |
| 5 | PLANT E | WEB API |
As you can see there are duplicate IDs for the plants with ID = 2. The exercise consists in removing the ID duplicates from the second source WEB API only.
In this case, the PLANT B3 would be removed but the PLANT B1 and PLANT B2 would remain in the table.
How would you solve this case ?
One rule : I don't want to filter on column NAME to remove directly the PLANT B3, I need it to be automatic.
Thanks in advance.
Best regards,
CADO
Solved! Go to Solution.
Split the table into WEB API and not WEB API rows. In the former, remove rows with IDs that exist in the latter and then append them together.
Try pasting this into the Advanced Editor of a new blank query to see the steps:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUQrwcfQLUXAEsnwjgwN9lGJ1opWM4OJOhrgkjFAkjOESzijiJnBxFyAr3NVJwTHAE90oYzQpU7iUK7JMLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, NAME = _t, SOURCE = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"NAME", type text}, {"SOURCE", type text}}),
#"Filtered NonWEB" = Table.SelectRows(#"Changed Type", each ([SOURCE] <> "WEB API")),
#"Filtered WEBAPI" = Table.SelectRows(#"Changed Type", each ([SOURCE] = "WEB API")),
#"Removed Duplicates" = Table.SelectRows(#"Filtered WEBAPI", each not List.Contains(#"Filtered NonWEB"[ID], [ID])),
#"Appended Query" = Table.Combine({#"Filtered NonWEB", #"Removed Duplicates"})
in
#"Appended Query"
For large datasets, it's probably more efficient to use an anti-join rather than List.Contains but the idea is the same.
Hi @AlexisOlson
Thank you for the trick it's working !
A little bit slow with 200 records so I will try the anti-join option.
Have a good day !
Cado
Split the table into WEB API and not WEB API rows. In the former, remove rows with IDs that exist in the latter and then append them together.
Try pasting this into the Advanced Editor of a new blank query to see the steps:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUQrwcfQLUXAEsnwjgwN9lGJ1opWM4OJOhrgkjFAkjOESzijiJnBxFyAr3NVJwTHAE90oYzQpU7iUK7JMLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, NAME = _t, SOURCE = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"NAME", type text}, {"SOURCE", type text}}),
#"Filtered NonWEB" = Table.SelectRows(#"Changed Type", each ([SOURCE] <> "WEB API")),
#"Filtered WEBAPI" = Table.SelectRows(#"Changed Type", each ([SOURCE] = "WEB API")),
#"Removed Duplicates" = Table.SelectRows(#"Filtered WEBAPI", each not List.Contains(#"Filtered NonWEB"[ID], [ID])),
#"Appended Query" = Table.Combine({#"Filtered NonWEB", #"Removed Duplicates"})
in
#"Appended Query"
For large datasets, it's probably more efficient to use an anti-join rather than List.Contains but the idea is the same.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 5 | |
| 4 | |
| 3 | |
| 3 | |
| 2 |
| User | Count |
|---|---|
| 8 | |
| 6 | |
| 6 | |
| 5 | |
| 5 |