Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Sofinobi
Helper IV
Helper IV

delete duplicate in a column without deleting the rows

Hi all,

please i need help,

i have a sales table, that has in "Stock" column duplicate values, i'm looking if there is a solution to delete duplicate values without deleting all the row.

info; duplicate it depends on the "Product ID" columns, not in "Stock" (because in Stock we can have duplicate values but different Product ID, in this case its Ok)
(i have the solution in excel formula, but i need it in power query)
thank you so much.

sample_file.xlsx 
  
duplicate.png

2 ACCEPTED SOLUTIONS

When tested using the Power BI tools, this code seems to execute much more rapidly than yours.

  • Group by Product ID
  • Add a Shifted stock column to the sub-table
  • If the contents of Stock and Shifted Stock are the same, then replace Stock with a null
  • Remove the Shifted column
  • Re-expand the table

 

let
    Source = Excel.CurrentWorkbook(){[Name="Table9"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer ID", type text}, {"Product ID", Int64.Type}, {"Sales", Int64.Type}, {"Stock", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Product ID"}, {
        {"all", (t)=> 
            let 
                #"Add Shifted" = Table.FromColumns( 
                        Table.ToColumns(t) & 
                        {{null} & List.RemoveLastN(t[Stock],1)},
                        Table.ColumnNames(t) & {"Shifted"}),
                #"Null Dup Stock" = Table.ReplaceValue(
                    #"Add Shifted",
                    each [Stock],
                    each [Shifted],
                    (x,y,z)=> if y = z then null else y,
                    {"Stock"}),
                #"Remove Shifted" = Table.RemoveColumns(#"Null Dup Stock","Shifted")
            in 
                #"Remove Shifted",
            type table[Customer ID=text, Product ID=Int64.Type, Sales=Int64.Type, Stock=Int64.Type]}
        
        
        }),
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Product ID"}),
    #"Expanded all" = Table.ExpandTableColumn(#"Removed Columns", "all", Table.ColumnNames(#"Changed Type"))
in
    #"Expanded all"

Results from your data

ronrsnfld_0-1713957847979.png

 

 

 

 

 

View solution in original post

dufoq3
Super User
Super User

Hi @Sofinobi, different approach here.

 

Result

dufoq3_0-1713973402527.png

 

let
    Source = Excel.Workbook(File.Contents("C:\Downloads\2024 Test duplicate.xlsx"), null, true),
    Sales_Table = Source{[Item="Sales",Kind="Table"]}[Data],
    GroupedRows = Table.Group(Sales_Table, {"Product ID"}, {{"All", each 
        [ a = Table.RemoveColumns(_, {"Stock"}),
          b = Table.FromColumns(Table.ToColumns(a) & {{[Stock]{0}?} & List.Repeat({null}, Table.RowCount(a)-1)}, Value.Type(_))
        ][b], type table}}),
    CombinedAll = Table.Combine(GroupedRows[All])
in
    CombinedAll

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

11 REPLIES 11
dufoq3
Super User
Super User

Hi @Sofinobi, different approach here.

 

Result

dufoq3_0-1713973402527.png

 

let
    Source = Excel.Workbook(File.Contents("C:\Downloads\2024 Test duplicate.xlsx"), null, true),
    Sales_Table = Source{[Item="Sales",Kind="Table"]}[Data],
    GroupedRows = Table.Group(Sales_Table, {"Product ID"}, {{"All", each 
        [ a = Table.RemoveColumns(_, {"Stock"}),
          b = Table.FromColumns(Table.ToColumns(a) & {{[Stock]{0}?} & List.Repeat({null}, Table.RowCount(a)-1)}, Value.Type(_))
        ][b], type table}}),
    CombinedAll = Table.Combine(GroupedRows[All])
in
    CombinedAll

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

thank you @dufoq3  for your contribution, thats works perfectly, i'll test the speed with the fist code of @ronrsnfld 

thank you so much my friend

You're welcome.


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

thank you so much my friend, your code is the fastest one thanks again, (and sorry for being late)

Enjoy 😃


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

ToddChitt
Super User
Super User

Question: Why should the "Stock" value of 1,407 for Product ID 50159 be attributed to the Sales to Customer 3 instead of Customer 4 or 5? 

 

Honestly, you should strive to break your model out into three different tables: Customer (dimension), Product (dimension) and Sales (fact). 

That "Stock" column is an attribute of the Product dimesnion and has no business in the Sales fact table. 

 

 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





i found a solution, but its really energivore. if there is a light version of this M code

 

#"Type modifié" = Table.TransformColumnTypes(Source,{ {"Product ID", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Type modifié", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.PositionOf(#"Added Index"[Product ID],[Product ID])),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each if [Index] = [Custom] then [Stock] else 0
)
in
#"Added Custom1"

When tested using the Power BI tools, this code seems to execute much more rapidly than yours.

  • Group by Product ID
  • Add a Shifted stock column to the sub-table
  • If the contents of Stock and Shifted Stock are the same, then replace Stock with a null
  • Remove the Shifted column
  • Re-expand the table

 

let
    Source = Excel.CurrentWorkbook(){[Name="Table9"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer ID", type text}, {"Product ID", Int64.Type}, {"Sales", Int64.Type}, {"Stock", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Product ID"}, {
        {"all", (t)=> 
            let 
                #"Add Shifted" = Table.FromColumns( 
                        Table.ToColumns(t) & 
                        {{null} & List.RemoveLastN(t[Stock],1)},
                        Table.ColumnNames(t) & {"Shifted"}),
                #"Null Dup Stock" = Table.ReplaceValue(
                    #"Add Shifted",
                    each [Stock],
                    each [Shifted],
                    (x,y,z)=> if y = z then null else y,
                    {"Stock"}),
                #"Remove Shifted" = Table.RemoveColumns(#"Null Dup Stock","Shifted")
            in 
                #"Remove Shifted",
            type table[Customer ID=text, Product ID=Int64.Type, Sales=Int64.Type, Stock=Int64.Type]}
        
        
        }),
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Product ID"}),
    #"Expanded all" = Table.ExpandTableColumn(#"Removed Columns", "all", Table.ColumnNames(#"Changed Type"))
in
    #"Expanded all"

Results from your data

ronrsnfld_0-1713957847979.png

 

 

 

 

 

thank you very much @ronrsnfld , your solution works perfectly, and so fast comparing to my first code.

thanks again.

hi @ToddChitt  thank you for your answer,

the "Stock" value depends on the "Product ID", it doesn't matter if it is attributed to Costomer 3 or 4 or 5.
i agree with you about the model, (it's already a model with fact and Dim Tables), but for our business need, i have to send this "Table Situation" monthely.

Ah, OK, that is a different question. It is not about removing values from the SOURCE DATA, it is about removing values from a visualization. (Hint: this post probably belongs in the Desktop forum.)

 

I suggest you look into two DAX features:

Window Functions WINDOW function (DAX) - DAX | Microsoft Learn

and Visual Calculations Using visual calculations in Power BI Desktop - Power BI | Microsoft Learn

 

I would set up the table as you have shown with the four columns. Next, create a Visual Calculation (column) that looks at the previous (Hint: LAG) Product ID. Finally, create a second Visual Calculation that has IF/THEN logic like this: IF ([Procuct ID] <> [Previous Product Id], [Stock], "")

You could probably wrap the whole thing in one big DAX calculation statement but I like to break out the components during development and testing.

Hope that helps




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors