Distinct based on 2nd column

hansei · ‎03-18-2020

I'm having an issue trying to group the following requirement.

Source	FK	Note	Created
a	1	abc	14-Mar
a	1	abc	14-Mar
b	1	abc	14-Mar
b	1	xyz	14-Mar
c	1	abc	14-Mar
c	1	klm	14-Mar

I have data coming from various sources (.csv files), and records are not necessarily unique. So in the above table, the 1st and 2nd record are absolutely separate, even though they contain the same data because they were retrieved from the same source "a". The 3rd and 4th row are from last week's source "b" which may contain the same data as this week, and may contain data which was deleted.

Above, the 3rd row looks identical to either the 1st or the 2nd row - so a duplicate, and the 4th row is missing from the new source - so a deletion. The 5th row is also a duplicate of either the 1st or 2nd row, while the 6th is, again a unique record.

I have no reason to keep duplicate data, but do want to retain what appear to be duplicates from the same source and any new data. So, how would i keep the 1st, 2nd, 4th and 6th row?

hansei · ‎03-18-2020

Well, with that dearth of replies, I've decided to do the following

sort by source to keep most recent at top (and buffer)
remove duplicates
remove most recent source
combine with most recent source

v-deddai1-msft · ‎03-18-2020

Hi @hansei ，

First create an index column for later use to get the latest source：

Next, create a column filter. Its value judgment logic is that when the value of Note is duplicated and the value of Source is different from the latest source, it returns 1; otherwise, it returns 0. Finally create a calculated table to filter the table with filter equal to 0:

Table 2 =

VAR f =

    ADDCOLUMNS (

        'Table',

        "filter",

        VAR a = 'Table'[Index]

        VAR b =

            CALCULATETABLE (

                DISTINCT ( 'Table'[Note] ),

                FILTER ( 'Table', 'Table'[Index] < a )

            )

        VAR c =

            CALCULATE ( MAX ( 'Table'[Source] ), FILTER ( 'Table', 'Table'[Index] = 0 ) )

        RETURN

            IF ( 'Table'[Note] IN b && 'Table'[Source] <> c, 1, 0 )

    )

RETURN

    FILTER ( f, [filter] = 0 )

Please refer to the pbix file: https://qiuyunus-my.sharepoint.com/:u:/g/personal/pbipro_qiuyunus_onmicrosoft_com/ESq2wtkC4XFMhZcUOn...

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Best Regards,

Dedmon Dai

hansei · ‎04-10-2020

I cannot have a static solution based on a,b,c. There may be hundreds of sources.

Distinct based on 2nd column

Helpful resources

Join us at the Microsoft Fabric Community Conference

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

How to Get Your Question Answered Quickly

Join us at the 2025 Microsoft Fabric Community Conference

Distinct based on 2nd column

Helpful resources

Join us at the Microsoft Fabric Community Conference

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

How to Get Your Question Answered Quickly