Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Remove duplicates by prioritizing rows based on another column

Hi, everyone.

 

I am having the following problem and don't seem to find any solution to implement this in Power Query.

I have a dataset of loading lists of orders. It looks like this:

 

SergeyShelest_1-1630059451503.png

 

 For the given Loading list (Ladeliste) I want to remove duplicate NVE numbers (shipping unit number) but only those that have empty values in LHM number column (loading unit ID). If a corresponding cell in LHM-Nr. column is not empty I want to keep it. For example, when removing duplicates in rows 2 and 6, I want to remove row 2 since LHM-Nr. column cell is empty. Correspondingly, between rows 3 and 7 I want to remove row 3. 

 

This is how the resulting table should look:

SergeyShelest_2-1630059719434.png

Removing duplicates in Power Query using Remove rows removes first rows. Sorting by loading list and then deleting rows might work but I am not sure if it will also not be deleting rows that have values in LHM-Nr. column.

 

Does anyone have any idea how to implement this?

1 ACCEPTED SOLUTION

Okay, please try this.  I have included comments that explain each step.

 

BEFORE: 

The goal is to remove rows 1, 3, 6, 8 and to have all columns present in the result.

jennratten_0-1630267603190.png

 

RESULT:

jennratten_1-1630267728521.png

 

SCRIPT:

There are two different options in the grouped step along with scenarios for when one would apply as opposed to the other, based on the specifics of the source data.  Currently, option 2 is in use.  To switch to option 1, add two forward slashes in front of Table.LastN and remove the two slashes at the beginning of let varTable and Table.FirstN.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jcy5DcAgEETRXjZGaHbBVwi+irDovw0vJiFAmOQjxGOeh0IIZIgB6BHiroW3YCsQ1kt+pGT+ISNXKiu9UTcEGaL1n40x5n/F7sfZGJ0q6HtwHoIMp10qOxV7XndjdB2CDK/dKKUX", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Latest = _t, #"NVE-Nr." = _t, Item = _t, Date = _t, #"LHM-Nr." = _t, Index = _t]),
    grouped = Table.Group(
        Source, 
        // Column(s) containing the values from which you'd like duplicates removed.  
        {"NVE-Nr."},
        {
            { "Table", // Name of the new column.
                each 
                //---------------------------------------------------------
                // Option 1: Sort descending, then select the first result.
                //---------------------------------------------------------
                // This will work if there is a maximum of only two rows per NVE-Nr.
                //let varTable = Table.Sort ( _, {{"LHM-Nr.", Order.Descending}}) in
                //Table.FirstN ( varTable, 1 ),
                //---------------------------------------------------------
                // Option 2: Select the last LHM-Hr for each NVE-Nr.
                //---------------------------------------------------------
                // This will work if the row to keep always appears last in the group.
                Table.LastN ( _, 1 ),                
                type table
            }
        }
    ),
    expand = Table.ExpandTableColumn ( 
        grouped, 
        "Table",                                  // Expand the tables in this column
        List.Difference (                         // New column names
            Table.ColumnNames (                   // are the column names 
                Table.Combine ( grouped[Table] )  // in the nested tables 
            ), 
            Table.ColumnNames ( grouped )         // that do not appear in the grouped table.
        ) 
    )

in
    expand

 

 

 

View solution in original post

25 REPLIES 25
JoaoPereira
Frequent Visitor

I might have a simpler solution. From this data set, I can see that there are only duplicates in NVE when LHM-Nr. is empty. In theory, you could just remove all rows where LHM-Nr. is empty and it would consequently remove duplicates from NVE. For this, you can click on the LHM-Nr. column, and on the top of Power Query click on Remove Rows+, then Remove Blank Rows. This should work.

 

Otherwise, you could just do what you said and sort by loading list. Since the rows move as a whole, you wouldn't mix any infos and could then remove duplicates, since the duplicate row without empty values would be on top. Hope this helps.

Anonymous
Not applicable

Hi, @JoaoPereira .

 

Thanks for the tip. Unfortanutely this will lead to data loss since I have data of over 12 000 rows.. Since I imported data from pdf and did the cleaning, I have some loading lists where Power Query didn't catch LHM numbers but only NVE numbers. If I delete all rows with empty LHM numbers it will delete this information as well, which I would like to avoid.

 

That's why I need a delicate solution so that it wouldn't lead to any information loss.

 

Regards,

Alright, the solution I came up with is a bit more complicated than it probably needed to be but it's all I got. Here is what I recommend:


Duplicate the LHM column. In the Transform section, divide column by Number of Characters (position 26) and by specific character (.). This will leave you with two columns, one with a 4 digit number (3422 for example) and a second column with the word pdf for the separator. Delete the pdf column. 

 

Now, duplicate the NVE column. 

 

Now, join the duplicate LHM column with the 4 digits and the NVE duplicate column. Since every NVE is unique, the only duplicate values in this new column will be when the 4 digits from the LHM are identical AND when the NVE column was blank, therefore not adding anything to the values.

 

Finally, click on the joined column from both duplicates and erase duplicates. From here, you can delete the joined column since the functions work without it. There you have it, you'll have a data set with duplicates but only when NVE is not empty. 

edhans
Super User
Super User

How to get good help fast. Help us help you.

How To Ask A Technical Question If you Really Want An Answer

How to Get Your Question Answered Quickly - Give us a good and concise explanation
How to provide sample data in the Power BI Forum - Provide data in a table format per the link, or share an Excel/CSV file via OneDrive, Dropbox, etc.. Provide expected output using a screenshot of Excel or other image. Do not provide a screenshot of the source data. I cannot paste an image into Power BI tables.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Anonymous
Not applicable

Hi, edhans. 

 

Here is a sample data:

 

https://docs.google.com/spreadsheets/d/12v9392oeTLGUqKC98SZLt-cZVHAh4dhA/edit?usp=sharing&ouid=10760...

 

Please let me know if you need any clarifications.

 

Regards,

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors