Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
vjnvinod
Impactful Individual
Impactful Individual

Need to create a calculate coloumn to identify duplicates lines

Hi,

 

I have a table, with so many rows and coloumns,

i need to create a calculate coloumn, to segregate "Original and Duplicate entries"

Certain coloumns

 

GlobalService

DunsNumber

OpportunityID

Opportunity

PipelineOpen_constant

 

if the above coloumns , for the lines(Rows) are equal, then mark these as duplicate, except 1line(which is original)

that means if there are 3 lines same, mark 2 of them as duplicate and 1 as original.

 

let me know if this is possible to do and also i want a table to be created for those entries.

 

1 ACCEPTED SOLUTION
v-frfei-msft
Community Support
Community Support

Hi @vjnvinod ,

 

I have created a sample for your reference, please check the following steps as below.

 

1. Insert an index column in power query. M code for your reference.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlSK1YGRSUhkMg4yBUymgsk0pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [catgeory = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"catgeory", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1)
in
    #"Added Index"

 

2. Close and apply, then wen can get the excepted result by a calculated column.

Column =
VAR inde = 'Table'[Index]
VAR previous = inde - 1
RETURN
    IF (
        CALCULATE (
            MAX ( 'Table'[catgeory] ),
            FILTER ( 'Table', 'Table'[Index] = previous )
        ) = 'Table'[catgeory],
        "Duplicated",
        "Original"
    )

Capture.PNG

 

Pbix as attached.

 

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.

View solution in original post

2 REPLIES 2
v-frfei-msft
Community Support
Community Support

Hi @vjnvinod ,

 

I have created a sample for your reference, please check the following steps as below.

 

1. Insert an index column in power query. M code for your reference.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlSK1YGRSUhkMg4yBUymgsk0pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [catgeory = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"catgeory", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1)
in
    #"Added Index"

 

2. Close and apply, then wen can get the excepted result by a calculated column.

Column =
VAR inde = 'Table'[Index]
VAR previous = inde - 1
RETURN
    IF (
        CALCULATE (
            MAX ( 'Table'[catgeory] ),
            FILTER ( 'Table', 'Table'[Index] = previous )
        ) = 'Table'[catgeory],
        "Duplicated",
        "Original"
    )

Capture.PNG

 

Pbix as attached.

 

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.
danextian
Super User
Super User

Hi @vjnvinod ,

 

This is achievable.

 

  • First create a calculated column to concatenate all these columns.

 

Concatenate =
//you may also use "&" instead of COMBINEVALUES() to concatenate text strings
COMBINEVALUES (
    " ",
    'Table'[GlobalService],
    'Table'[DunsNumber],
    'Table'[OpportunityID],
    'Table'[Opportunity],
    'Table'[PipelineOpen_constant]
)
​

 

 

  • Now, there should some kind of a column to identify which among the duplicate concatenated values come first like datetimestamp, etc. If there isn't, to to the Query Editor (Designer Home Tab >> Transform Data) -->Add Column tab -->Index Column and then apply the changes. Create another calculated column:
    index.png
    KeepThisRow =
    CALCULATE (
        MIN ( 'Table'[Index/DateTimeStamp] ),
        ALLEXCEPT ( 'Table', 'Table'[Concatenate] )
    ) = 'Table'[Index/DateTimeStamp]
    ​

     

Another way to do this, you don't want  to keep all other rows except the first one based on several columns is do Remove Duplicates in Power Query which is very similar to Excel.  In Power Query, select all the columns in your list. Right click and then select Remove Duplicates.

remove duplicates.png

 

 

 










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.