Data Sample

petrawiggin · ‎03-19-2025

Data Sample

Product Name	Unique ID	Year	Issue
Product A	100001	2020	Color wrong
Product A	111111	2021	Color wrong
Productt A	111111	2021	Fabric wrong
Product A	111111	2021	Size wrong
Product B	222222	2021	Tags wrong

Background

Each product in a given year has a unique ID (EG 100001 is shorthand for the model of Product A that was out in 2020). This means that every Unique ID only has one correctly-matching Product Name. However, sometimes people make typos.

Problem Statement

I'm trying to find all the rows where the Unique IDs are the same but the Product Name is different, by adding a column that indicates this.

I've been able to do other kinds of data checks using grouping so I've tried that. But I'd have to find out how many distinct values there are in each of the generated tables and make a column that says if it's >1 then pull that out into the grouped data then put all the rows back and I can't figure out how to count things inside the tables.
I tried answers to other people's conundrums on the interwebs that seemed like they were the same problem but none of the solutions worked.

Hoped-for result

Product Name	Unique ID	Year	Issue	Product Names for Same Unique ID >1?
Product A	100001	2020	Color wrong	Ok
Product A	111111	2021	Color wrong	Name problem
Productt A	111111	2021	Fabric wrong	Name problem
Product A	111111	2021	Size wrong	Name problem
Product B	222222	2021	Tags wrong	Ok

OwenAuger · ‎03-19-2025

Thanks for testing! Understood 🙂

For a large table, a self-join with Table.AggregateTableColumn may well not perform the best.

After some testing, a better method would be to add an intermediate step that groups by Unique ID and computes count of distinct Product Name converted to the text flag, then join this to the original table.

Below are a couple of variations of this method.

I ran some tests using a moderately large SQL Server source table, and performance seemed acceptable with query folding maintained.

let
  Source = #table(
    type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
    {
      {"Product A", 100001, 2020, "Color wrong"},
      {"Product A", 111111, 2021, "Color wrong"},
      {"Productt A", 111111, 2021, "Fabric wrong"},
      {"Product A", 111111, 2021, "Size wrong"},
      {"Product B", 222222, 2021, "Tags wrong"}
    }
  ),
  NameCount = Table.Group(
    Source,
    {"Unique ID"},
    {{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
  ),
  NameFlag = Table.TransformColumns(
    NameCount,
    {"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
  ),
  #"Join NameFlag" = Table.NestedJoin(Source, "Unique ID", NameFlag, "Unique ID", "NameFlag", JoinKind.LeftOuter),
  #"Expanded NameFlag" = Table.ExpandTableColumn(
    #"Join NameFlag",
    "NameFlag",
    {"Product Names for Same Unique ID >1?"},
    {"Product Names for Same Unique ID >1?"}
  )
in
  #"Expanded NameFlag"

let
  Source = #table(
    type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
    {
      {"Product A", 100001, 2020, "Color wrong"},
      {"Product A", 111111, 2021, "Color wrong"},
      {"Productt A", 111111, 2021, "Fabric wrong"},
      {"Product A", 111111, 2021, "Size wrong"},
      {"Product B", 222222, 2021, "Tags wrong"}
    }
  ),
  NameCount = Table.Group(
    Source,
    {"Unique ID"},
    {{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
  ),
  NameFlag = Table.TransformColumns(
    NameCount,
    {"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
  ),
  #"Join NameFlag" = Table.Join(Source, "Unique ID", NameFlag, "Unique ID")
in
  #"Join NameFlag"

Does this method work better with your dataset?

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

View solution in original post

OwenAuger · ‎03-19-2025

Hi @petrawiggin

You can use Table.NestedJoin to join the table to itself on the Unique ID column, then aggregate the nested table to count distinct Names and return the appropriate text value.

Here's a standalone query to illustrate:

let
  Source = #table(
    type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
    {
      {"Product A", 100001, 2020, "Color wrong"},
      {"Product A", 111111, 2021, "Color wrong"},
      {"Productt A", 111111, 2021, "Fabric wrong"},
      {"Product A", 111111, 2021, "Size wrong"},
      {"Product B", 222222, 2021, "Tags wrong"}
    }
  ),
  #"Self Join on Unique ID" = Table.NestedJoin(
    Source,
    {"Unique ID"},
    Source,
    {"Unique ID"},
    "SelfJoinOnUniqueID",
    JoinKind.LeftOuter
  ),
  #"Add Indicator" = Table.AggregateTableColumn(
    #"Self Join on Unique ID",
    "SelfJoinOnUniqueID",
    {
      {
        "Product Name",
        each if List.Count(List.Distinct(_)) > 1 then "Name problem" else "Ok",
        "Product Names for Same Unique ID >1",
        type text
      }
    }
  )
in
  #"Add Indicator"

Would this or something similar work for you?

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

petrawiggin · ‎03-19-2025

This technically did work, thank you. However, my data set is very large and this merge took an unbelieveably, impractically long time to run. Is there anything that would make it more efficient?

OwenAuger · ‎03-19-2025

Thanks for testing! Understood 🙂

For a large table, a self-join with Table.AggregateTableColumn may well not perform the best.

After some testing, a better method would be to add an intermediate step that groups by Unique ID and computes count of distinct Product Name converted to the text flag, then join this to the original table.

Below are a couple of variations of this method.

I ran some tests using a moderately large SQL Server source table, and performance seemed acceptable with query folding maintained.

let
  Source = #table(
    type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
    {
      {"Product A", 100001, 2020, "Color wrong"},
      {"Product A", 111111, 2021, "Color wrong"},
      {"Productt A", 111111, 2021, "Fabric wrong"},
      {"Product A", 111111, 2021, "Size wrong"},
      {"Product B", 222222, 2021, "Tags wrong"}
    }
  ),
  NameCount = Table.Group(
    Source,
    {"Unique ID"},
    {{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
  ),
  NameFlag = Table.TransformColumns(
    NameCount,
    {"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
  ),
  #"Join NameFlag" = Table.NestedJoin(Source, "Unique ID", NameFlag, "Unique ID", "NameFlag", JoinKind.LeftOuter),
  #"Expanded NameFlag" = Table.ExpandTableColumn(
    #"Join NameFlag",
    "NameFlag",
    {"Product Names for Same Unique ID >1?"},
    {"Product Names for Same Unique ID >1?"}
  )
in
  #"Expanded NameFlag"

let
  Source = #table(
    type table [Product Name = text, Unique ID = Int64.Type, Year = Int64.Type, Issue = text],
    {
      {"Product A", 100001, 2020, "Color wrong"},
      {"Product A", 111111, 2021, "Color wrong"},
      {"Productt A", 111111, 2021, "Fabric wrong"},
      {"Product A", 111111, 2021, "Size wrong"},
      {"Product B", 222222, 2021, "Tags wrong"}
    }
  ),
  NameCount = Table.Group(
    Source,
    {"Unique ID"},
    {{"Product Names for Same Unique ID >1?", each List.Count(List.Distinct([Product Name])), Int64.Type}}
  ),
  NameFlag = Table.TransformColumns(
    NameCount,
    {"Product Names for Same Unique ID >1?", each if _ = 1 then "Ok" else "Name problem", type text}
  ),
  #"Join NameFlag" = Table.Join(Source, "Unique ID", NameFlag, "Unique ID")
in
  #"Join NameFlag"

Does this method work better with your dataset?

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
LinkedIn

petrawiggin · ‎03-21-2025

This did work much more quickly, thank you. It took a bit over 40 minutes to initially run (MUCH faster) but seems to be quicker to refresh.

Which values in 1 column are unique to 1 value in another column

Data Sample

Background

Problem Statement

Hoped-for result

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Which values in 1 column are unique to 1 value in another column

Data Sample

Background

Problem Statement

Hoped-for result

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025