Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
Product Name | Unique ID | Year | Issue |
Product A | 100001 | 2020 | Color wrong |
Product A | 111111 | 2021 | Color wrong |
Productt A | 111111 | 2021 | Fabric wrong |
Product A | 111111 | 2021 | Size wrong |
Product B | 222222 | 2021 | Tags wrong |
Each product in a given year has a unique ID (EG 100001 is shorthand for the model of Product A that was out in 2020). This means that every Unique ID only has one correctly-matching Product Name. However, sometimes people make typos.
I'm trying to find all the rows where the Unique IDs are the same but the Product Name is different, by adding a column that indicates this.
I've been able to do other kinds of data checks using grouping so I've tried that. But I'd have to find out how many distinct values there are in each of the generated tables and make a column that says if it's >1 then pull that out into the grouped data then put all the rows back and I can't figure out how to count things inside the tables.
I tried answers to other people's conundrums on the interwebs that seemed like they were the same problem but none of the solutions worked.
Product Name | Unique ID | Year | Issue | Product Names for Same Unique ID >1? |
Product A | 100001 | 2020 | Color wrong | Ok |
Product A | 111111 | 2021 | Color wrong | Name problem |
Productt A | 111111 | 2021 | Fabric wrong | Name problem |
Product A | 111111 | 2021 | Size wrong | Name problem |
Product B | 222222 | 2021 | Tags wrong | Ok |
Solved! Go to Solution.
Thanks for testing! Understood 🙂
For a large table, a self-join with Table.AggregateTableColumn may well not perform the best.
After some testing, a better method would be to add an intermediate step that groups by Unique ID and computes count of distinct Product Name converted to the text flag, then join this to the original table.
Below are a couple of variations of this method.
I ran some tests using a moderately large SQL Server source table, and performance seemed acceptable with query folding maintained.
let
Source = #table(
type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
{
{"Product A", 100001, 2020, "Color wrong"},
{"Product A", 111111, 2021, "Color wrong"},
{"Productt A", 111111, 2021, "Fabric wrong"},
{"Product A", 111111, 2021, "Size wrong"},
{"Product B", 222222, 2021, "Tags wrong"}
}
),
NameCount = Table.Group(
Source,
{"Unique ID"},
{{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
),
NameFlag = Table.TransformColumns(
NameCount,
{"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
),
#"Join NameFlag" = Table.NestedJoin(Source, "Unique ID", NameFlag, "Unique ID", "NameFlag", JoinKind.LeftOuter),
#"Expanded NameFlag" = Table.ExpandTableColumn(
#"Join NameFlag",
"NameFlag",
{"Product Names for Same Unique ID >1?"},
{"Product Names for Same Unique ID >1?"}
)
in
#"Expanded NameFlag"
let
Source = #table(
type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
{
{"Product A", 100001, 2020, "Color wrong"},
{"Product A", 111111, 2021, "Color wrong"},
{"Productt A", 111111, 2021, "Fabric wrong"},
{"Product A", 111111, 2021, "Size wrong"},
{"Product B", 222222, 2021, "Tags wrong"}
}
),
NameCount = Table.Group(
Source,
{"Unique ID"},
{{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
),
NameFlag = Table.TransformColumns(
NameCount,
{"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
),
#"Join NameFlag" = Table.Join(Source, "Unique ID", NameFlag, "Unique ID")
in
#"Join NameFlag"
Does this method work better with your dataset?
Hi @petrawiggin
You can use Table.NestedJoin to join the table to itself on the Unique ID column, then aggregate the nested table to count distinct Names and return the appropriate text value.
Here's a standalone query to illustrate:
let
Source = #table(
type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
{
{"Product A", 100001, 2020, "Color wrong"},
{"Product A", 111111, 2021, "Color wrong"},
{"Productt A", 111111, 2021, "Fabric wrong"},
{"Product A", 111111, 2021, "Size wrong"},
{"Product B", 222222, 2021, "Tags wrong"}
}
),
#"Self Join on Unique ID" = Table.NestedJoin(
Source,
{"Unique ID"},
Source,
{"Unique ID"},
"SelfJoinOnUniqueID",
JoinKind.LeftOuter
),
#"Add Indicator" = Table.AggregateTableColumn(
#"Self Join on Unique ID",
"SelfJoinOnUniqueID",
{
{
"Product Name",
each if List.Count(List.Distinct(_)) > 1 then "Name problem" else "Ok",
"Product Names for Same Unique ID >1",
type text
}
}
)
in
#"Add Indicator"
Would this or something similar work for you?
This technically did work, thank you. However, my data set is very large and this merge took an unbelieveably, impractically long time to run. Is there anything that would make it more efficient?
Thanks for testing! Understood 🙂
For a large table, a self-join with Table.AggregateTableColumn may well not perform the best.
After some testing, a better method would be to add an intermediate step that groups by Unique ID and computes count of distinct Product Name converted to the text flag, then join this to the original table.
Below are a couple of variations of this method.
I ran some tests using a moderately large SQL Server source table, and performance seemed acceptable with query folding maintained.
let
Source = #table(
type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
{
{"Product A", 100001, 2020, "Color wrong"},
{"Product A", 111111, 2021, "Color wrong"},
{"Productt A", 111111, 2021, "Fabric wrong"},
{"Product A", 111111, 2021, "Size wrong"},
{"Product B", 222222, 2021, "Tags wrong"}
}
),
NameCount = Table.Group(
Source,
{"Unique ID"},
{{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
),
NameFlag = Table.TransformColumns(
NameCount,
{"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
),
#"Join NameFlag" = Table.NestedJoin(Source, "Unique ID", NameFlag, "Unique ID", "NameFlag", JoinKind.LeftOuter),
#"Expanded NameFlag" = Table.ExpandTableColumn(
#"Join NameFlag",
"NameFlag",
{"Product Names for Same Unique ID >1?"},
{"Product Names for Same Unique ID >1?"}
)
in
#"Expanded NameFlag"
let
Source = #table(
type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
{
{"Product A", 100001, 2020, "Color wrong"},
{"Product A", 111111, 2021, "Color wrong"},
{"Productt A", 111111, 2021, "Fabric wrong"},
{"Product A", 111111, 2021, "Size wrong"},
{"Product B", 222222, 2021, "Tags wrong"}
}
),
NameCount = Table.Group(
Source,
{"Unique ID"},
{{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
),
NameFlag = Table.TransformColumns(
NameCount,
{"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
),
#"Join NameFlag" = Table.Join(Source, "Unique ID", NameFlag, "Unique ID")
in
#"Join NameFlag"
Does this method work better with your dataset?
This did work much more quickly, thank you. It took a bit over 40 minutes to initially run (MUCH faster) but seems to be quicker to refresh.
Check out the April 2025 Power BI update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
User | Count |
---|---|
20 | |
11 | |
10 | |
8 | |
7 |