Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Applicable88
Impactful Individual
Impactful Individual

How to get rid of rows with a duplicate, but leave the ones which are not empty

Hello,

I have machine data here which sometimes writes errors and forget to write the name of the creator of an order and only sometimes rewrite it correctly with the right content:

 

OrdeID Starttime CreatedBy
1 2021-01-01 Worker1
2 2021-01-02 Worker2
3 2021-01-03 Worker3
4 2021-01-04 Worker4
1 2021-01-01  
2 2021-01-02  
3 2021-01-03  
4 2021-01-04  
11 2021-01-05  
12 2021-01-06  
13 2021-01-07  
14 2021-01-08  
     

 

In that case it creates a duplicate, but I only want to keep the ones which has "created by" data. The problem is I also need to consider other cells which are empty in "created by" and has no correct duplicates with data. these I also need to keep. In my sample table I only want to get rid of the ones with red orderID. 

Hope someone has an idea how to get rid of the ones without deleting the other data.

Thank you very much in advance. 

Best. 

1 ACCEPTED SOLUTION
sevenhills
Super User
Super User

The way I understood the problem as you want to
keep all rows "OrdeID Starttime CreatedBy",

with group by "OrdeID Starttime" and keep the first created by without blanks as first preference. If this is true, you can do this.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDLUNQAhICc8vyg7tchQKVYnWskIWdIILmkEljRGljSGSxqDJU2QJU3gkiZgSQw7FbDbpoDdHgXsNkCEDVEMN0WIo5huhhBHMd4cIY5ivgVEPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrdeID = _t, Starttime = _t, CreatedBy = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrdeID", Int64.Type}, {"Starttime", type date}, {"CreatedBy", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrdeID", Order.Ascending}, {"Starttime", Order.Ascending}, {"CreatedBy", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"OrdeID", "Starttime"}, {{"Alldatarows", each _, type table [OrdeID=nullable number, Starttime=nullable date, CreatedBy=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Alldatarows], "Sub Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"OrdeID", "Starttime", "CreatedBy", "Sub Index"}, {"Custom.OrdeID", "Custom.Starttime", "Custom.CreatedBy", "Custom.Sub Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each [Custom.Sub Index] = 1)
in
    #"Filtered Rows"

 

 

Basically, I did sorting of data, tried creating subgroup index and removed subgroup index > 1

 

sevenhills_0-1632292833847.png

 

sevenhills_1-1632292857337.png

 

If this is what you need, rename the columns, and remove the sub index column.

 

Hope this helps! 

View solution in original post

5 REPLIES 5
sevenhills
Super User
Super User

The way I understood the problem as you want to
keep all rows "OrdeID Starttime CreatedBy",

with group by "OrdeID Starttime" and keep the first created by without blanks as first preference. If this is true, you can do this.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDLUNQAhICc8vyg7tchQKVYnWskIWdIILmkEljRGljSGSxqDJU2QJU3gkiZgSQw7FbDbpoDdHgXsNkCEDVEMN0WIo5huhhBHMd4cIY5ivgVEPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrdeID = _t, Starttime = _t, CreatedBy = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrdeID", Int64.Type}, {"Starttime", type date}, {"CreatedBy", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrdeID", Order.Ascending}, {"Starttime", Order.Ascending}, {"CreatedBy", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"OrdeID", "Starttime"}, {{"Alldatarows", each _, type table [OrdeID=nullable number, Starttime=nullable date, CreatedBy=nullable text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.AddIndexColumn([Alldatarows], "Sub Index", 1, 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"OrdeID", "Starttime", "CreatedBy", "Sub Index"}, {"Custom.OrdeID", "Custom.Starttime", "Custom.CreatedBy", "Custom.Sub Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Custom", each [Custom.Sub Index] = 1)
in
    #"Filtered Rows"

 

 

Basically, I did sorting of data, tried creating subgroup index and removed subgroup index > 1

 

sevenhills_0-1632292833847.png

 

sevenhills_1-1632292857337.png

 

If this is what you need, rename the columns, and remove the sub index column.

 

Hope this helps! 

Watsky
Solution Sage
Solution Sage

Hey @Applicable88 ,

 

Per chance you want to keep all the records but be able to create a filter to achieve your result you can try this:

 

Column 2 = 
VAR a =
    CALCULATE (
        COUNT ( 'Table'[OrdeID] ),
        FILTER ( 'Table', 'Table'[OrdeID] = EARLIER ( 'Table'[OrdeID] ) )
    )
VAR b =
    CALCULATE (
        COUNT ( 'Table'[OrdeID] ),
        ALLEXCEPT ( 'Table', 'Table'[CreatedBy] )
    )
RETURN
    IF ( a = 1, "YES", IF ( b = 1, "YES", "NO" ) )

 You'll end up with a column where you'll want to filter on YES.

 

Watsky_0-1632248409547.png


Did my answer(s) help you? Give it a kudos by clicking the Thumbs Up! ?
Did my post answer your question(s)? Mark my post as a solution. This will help others find the solution.


Did my answer(s) help you? Give it a kudos by clicking the Thumbs Up!
Did my post answer your question(s)? Mark my post as a solution. This will help others find the solution.

Proud to be a Super User!

Anonymous
Not applicable

Open Power Query. Select only the OrdeID column and go to Remove Rows and select Remove Duplicates. That should do it. I replicated your case, see result below:

Raymundo2910_0-1632243195859.png

 

Hello @Anonymous,

 

thanks for your effort, but I saw you use a step called "remove bottom rows" which implies it only works if the one needed to be removed always comes second, which is unfortunately not in my case. It can be mixed. Sometimes comes first sometimes second etc. Any possiblites to overcome thes irregularities? 

Best. 

Anonymous
Not applicable

That step was just because I created the table manually and the last row was full of nulls. This would not happen on your data source 🙂 

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.