Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
petrawiggin
Helper I
Helper I

Call out which cells are different between 2 tables (merge with multiple column keys)

Hello, experienced Power Query people!

I have a huge set of data in which there are 3 columns of data that should be standardized (and then a bunch of other columns). I also have a table that has the standard values for those 3 columns aligned correctly. I am using PQ to give me a set of data that shows where my huge set of data has either non-standard values for those columns or where the value in one of the columns isn't what it is supposed to be based on the other columns. 

  1. Is there a way to get PQ to tell me which values didn't match and thus caused the row to appear in the merge?
  2. Can someone help me figure out why my attempt at pulling in a column of just the non-standard values for one of my columns is adding a TON of duplicate rows when there are NO duplicate rows in either of the tables I'm merging? (My back up plan is to filter on that extra column of "if there's something in here it's a non-standard value.")

Acceptable values table example:

Thing    Type    Price

Kiwi       Fruit    $3

Cuke      Veg    $2

But sometimes in my huge data set someone made a mistake and either put the wrong type or price (or both!) with the wrong thing or put in a thing, type, or price (or any combination) that isn't in the list.

Data set table example:

Thing    Type    Price    Buyer

Kiwi       Fruit    $3        Xi

Cuke      Veg     $2       Juan

Kiwy      Fruit    $3       Mary

Cuke      Veg     $3       Shanti

Mismatched data table example:

Thing    Type    Price    Buyer

Kiwy      Fruit    $3       Mary

Cuke      Veg     $3       Shanti

I'm able to get the mismatched data table via PQ but I want to somehow highlight which cell is the one that is the mismatch for each row. So "Kiwy" because it is not in the list of accepted things and the second "$3" because it doesn't match up with what that value should be for a Cuke.

What I would really love is to have a way to get PQ to tell me which values didn't match and thus caused the row to appear in the merge. Is this possible?

If not...I tried making a query that just finds all the "Things" in the data that aren't in the list of accepted "Thing" values and then merging that with the query that pulled in ALL the mismatches so that I could get a column that just listed the "Thing" when it was wrong (so I could do conditional formatting in Excel based on values existing in that column).

Mismatched data table with extra column example:

Thing    Type    Price    Buyer     WrongThing

Kiwy      Fruit    $3       Mary       Kiwy

Cuke      Veg     $3       Shanti

I tried an outer join where I kept all values from the mismatch table and only the matching wrong_thing_names table then expanded only the thing column in the wrong_thing_names table. There are NO duplicate rows in either table. And yet, when I do the merge I get a huge number of duplicated rows. Then I tried all the other merge options and none of them kept the 3,111 rows I had in the mismatch table. I have read lots of articles and watched a few videos where people say "why do I have duplicate rows in my merged data?" and the answer is always "there are duplicate rows in one of your tables" but I have checked that and there really aren't. What else could be going wrong???

Thank you, kind people! 

1 ACCEPTED SOLUTION
Keezz
Helper I
Helper I

Sure. I added comments to the code. The syntax of the M language is very concise, making it cryptic at times, but I hope this helps. There a re many good resources on Youtube on PowerQuery and the M language if you want to get a better understanding of M. M is awesomely powerful and if you have complex data cleaning jobs regularly, it's worth it!

let
    Source = #"Data Values",    // Getting the source data from the workbook
    #"Added Is Type Correct" =  // adding a column with a true value if the Type is correct and otherwise false
        Table.AddColumn(Source, "Is Type Correct",  // The Table.AddColumn starts with an input table and the name of the new column. It will loop through all records and call the function you supply in the next parameter.
                each            /*  `each` is a shorthand for a function definition. it is equivalent to `(_) +>`,
                                    The function will be called by Table.AddColumn with 1 parameter: The current record. 
                                    So, within the function you can refer to _ if you need to refer to the current record.
                                */
                List.Contains(  // The List.Contains will do the actual checking
                    #"Acceptable Values"[Type]
                                /*  This refers to the column [Type] in the table #"Acceptable Values" from the spreadsheet. 
                                    This is a list of all the values in the colum */
                    , [Type]    /*  And this is short for _[Type] and refers to the field [Type] in the xurrent record. 
                                    List.Contains will see if this value is in the list.*/
                    ) // ending List.Contains
            ), // Table.AddColumn
    #"Added Is Thing Correct" = Table.AddColumn(#"Added Is Type Correct", "Is Thing Correct", // adding another column
            each List.Contains(  // again, a function `(_) =>`
                Table.SelectRows( /*    function to get rows from a table. This function is also used when you filter using the UI 
                                        we want a list of Things matching the Type in the row we are adding the column to */
                    #"Acceptable Values"    // The table to select from: Is coming from the excelsheet heet
                    , (acceptable) =>   /*  SelectRows also needs a function with 1 paremeter: the current row
                                            We can't use `each` here because it would conlict with the `each` in the AddColumn
                                            I named the parameter "acceptable" */
                            acceptable[Type] = [Type]   // select the rows from Accepatble values where the Type = the Type in the current row
                    )   // end select rows
                [Thing] // From the return selection of rows, get the column Thing as a list
                , [Thing] // See if [Thing] in the current row exsts in the accepatble [Thing]s
                ) // End list contains
            ),
    #"Added Is Price Correct" = // similar to the previous colum, only atching 2 columns
        Table.AddColumn(#"Added Is Thing Correct", "Is Price Correct", 
            each List.Contains(Table.SelectRows(#"Acceptable Values",
                (acceptable) => 
                    [Type] = acceptable[Type]
                    and [Thing] = acceptable[Thing]
                )[Price]
                , [Price]
                )
            ),
    #"Added Is Correct" = // New column returning true when all 3 new columns are true
        Table.AddColumn(#"Added Is Price Correct", "Is Correct", each [Is Type Correct] and [Is Thing Correct] and [Is Price Correct]),
    #"Filtered Only Errors" = // Only return the rows in the #"Data Values" table with errors
        Table.SelectRows(#"Added Is Correct", each ([Is Correct] = false))
in
    #"Filtered Only Errors"

View solution in original post

7 REPLIES 7
Keezz
Helper I
Helper I

Sure. I added comments to the code. The syntax of the M language is very concise, making it cryptic at times, but I hope this helps. There a re many good resources on Youtube on PowerQuery and the M language if you want to get a better understanding of M. M is awesomely powerful and if you have complex data cleaning jobs regularly, it's worth it!

let
    Source = #"Data Values",    // Getting the source data from the workbook
    #"Added Is Type Correct" =  // adding a column with a true value if the Type is correct and otherwise false
        Table.AddColumn(Source, "Is Type Correct",  // The Table.AddColumn starts with an input table and the name of the new column. It will loop through all records and call the function you supply in the next parameter.
                each            /*  `each` is a shorthand for a function definition. it is equivalent to `(_) +>`,
                                    The function will be called by Table.AddColumn with 1 parameter: The current record. 
                                    So, within the function you can refer to _ if you need to refer to the current record.
                                */
                List.Contains(  // The List.Contains will do the actual checking
                    #"Acceptable Values"[Type]
                                /*  This refers to the column [Type] in the table #"Acceptable Values" from the spreadsheet. 
                                    This is a list of all the values in the colum */
                    , [Type]    /*  And this is short for _[Type] and refers to the field [Type] in the xurrent record. 
                                    List.Contains will see if this value is in the list.*/
                    ) // ending List.Contains
            ), // Table.AddColumn
    #"Added Is Thing Correct" = Table.AddColumn(#"Added Is Type Correct", "Is Thing Correct", // adding another column
            each List.Contains(  // again, a function `(_) =>`
                Table.SelectRows( /*    function to get rows from a table. This function is also used when you filter using the UI 
                                        we want a list of Things matching the Type in the row we are adding the column to */
                    #"Acceptable Values"    // The table to select from: Is coming from the excelsheet heet
                    , (acceptable) =>   /*  SelectRows also needs a function with 1 paremeter: the current row
                                            We can't use `each` here because it would conlict with the `each` in the AddColumn
                                            I named the parameter "acceptable" */
                            acceptable[Type] = [Type]   // select the rows from Accepatble values where the Type = the Type in the current row
                    )   // end select rows
                [Thing] // From the return selection of rows, get the column Thing as a list
                , [Thing] // See if [Thing] in the current row exsts in the accepatble [Thing]s
                ) // End list contains
            ),
    #"Added Is Price Correct" = // similar to the previous colum, only atching 2 columns
        Table.AddColumn(#"Added Is Thing Correct", "Is Price Correct", 
            each List.Contains(Table.SelectRows(#"Acceptable Values",
                (acceptable) => 
                    [Type] = acceptable[Type]
                    and [Thing] = acceptable[Thing]
                )[Price]
                , [Price]
                )
            ),
    #"Added Is Correct" = // New column returning true when all 3 new columns are true
        Table.AddColumn(#"Added Is Price Correct", "Is Correct", each [Is Type Correct] and [Is Thing Correct] and [Is Price Correct]),
    #"Filtered Only Errors" = // Only return the rows in the #"Data Values" table with errors
        Table.SelectRows(#"Added Is Correct", each ([Is Correct] = false))
in
    #"Filtered Only Errors"

Thank you, this is fantastic! I was very close to following it, as it turns out, but there were some key things that I didn't quite get. I very much appreciate your time and effort to explain. It is indeed looking like I'll have to do some more formal learning of M query. Up til now I've been able to piece together my complicated processing by just looking at other code examples and following them from basic programming knowledge but as you say, M seems to be particularly cryptic. 🙂 

You're welcome. Have fun with M. It's a steep learning curve, but once you reach the top, the view is magnificant 😊

Keezz
Helper I
Helper I

Here my solutions

let
    Source = #"Data Values",
    #"Added Is Type Correct" = Table.AddColumn(Source, "Is Type Correct", each List.Contains( #"Acceptable Values"[Type], [Type])),
    #"Added Is Thing Correct" = Table.AddColumn(#"Added Is Type Correct", "Is Thing Correct", each List.Contains(Table.SelectRows(#"Acceptable Values", (acceptable) => [Type] = acceptable[Type])[Thing], [Thing])),
    #"Added Is Price Correct" = Table.AddColumn(#"Added Is Thing Correct", "Is Price Correct", each List.Contains(Table.SelectRows(#"Acceptable Values",
  (acceptable) => 
    [Type] = acceptable[Type]
    and [Thing] = acceptable[Thing]
    )[Price]
, [Price])),
    #"Added Is Correct" = Table.AddColumn(#"Added Is Price Correct", "Is Correct", each [Is Type Correct] and [Is Thing Correct] and [Is Price Correct]),
    #"Filtered Only Errors" = Table.SelectRows(#"Added Is Correct", each ([Is Correct] = false))
in
    #"Filtered Only Errors"

 

Keezz_0-1720517497444.png

 

 

Thank you for this! Unfortunately, I am not able to follow the syntax well enough to tweak it for what I need. Can you explain the syntax of the comparing part?

I'm thinking, based on what folks have pointed out, that I need 5 check columns:

  1. Three separate check columns to see if the values in each column of the data are acceptable values at all. This was easy to do.
  2. A check column to find out if the type (actually "category" in my data) is right for each thing (actually "issue" in my data). 
  3. Another check for whether the price ("WCAG" in my data) is right for each thing ("issue"). 
  4. Then filter the data to only return the records where one of those things is false. (So, either an actual value is not acceptable or a pairing is not.)

Is that right?

I think the step in your example for "Added Is Thing Correct" (and then the definition of the function (?) "acceptable") is the part that is checking the pairings but I can't follow the syntax. Would it be possible to explain what all the parts of that are doing?

I very much appreciate it!

I can't send my actual data set because it's huge and also has info I can't.

Keezz
Helper I
Helper I

If you have a mismatch there wil definitly be multiple rows in the acceptible values table that are NOT matched.
So how do you pick the right one?

If you assume (wild assumption here!) that only 1 field is wrong, you can find out which one is wrong by looking for the ones that are right. But beware: There may be multiple "acceptable values" for any combination of 2 fields filled correctly...

 

For example if I have additional acceptable value:

Thing    Type    Price

Apple     Fruit    $3

 

In that case 
Kiwy      Fruit    $3       Mary

 

Will not match 2 acceptible values rows.

 

And if you have  a data row as follows?? 

Kiwy      Fruit    $5       Mary

 

So a merge join will not work...

 

If you can elaborate on exactly what you are looking for in these situations, I may be of help...
What should be unique in the acceptable values?

 

I guess

  • There is an exhaustive list of Types
  • There should be one price for every combination of Thing+Type

In that case

  • Fisrt check for valid Type
  • Then check for valid Thing Type combination to see if the thing is correct
  • Then check the price against the found Thing Type combination.
BeaBF
Super User
Super User

@petrawiggin Hi! Can you share your pbix file?

 

BBF

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors