Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
hansei
Helper V
Helper V

Distinct based on 2nd column

I'm having an issue trying to group the following requirement.

 

SourceFKNoteCreated
a1abc14-Mar
a1abc14-Mar
b1abc14-Mar
b1xyz14-Mar
c1abc14-Mar
c1klm14-Mar

 

I have data coming from various sources (.csv files), and records are not necessarily unique. So in the above table, the 1st and 2nd record are absolutely separate, even though they contain the same data because they were retrieved from the same source "a". The 3rd and 4th row are from last week's source "b" which may contain the same data as this week, and may contain data which was deleted.

 

Above, the 3rd row looks identical to either the 1st or the 2nd row - so a duplicate, and the 4th row is missing from the new source - so a deletion. The 5th row is also a duplicate of either the 1st or 2nd row, while the 6th is, again a unique record.

 

I have no reason to keep duplicate data, but do want to retain what appear to be duplicates from the same source and any new data. So, how would i keep the 1st, 2nd, 4th and 6th row?

3 REPLIES 3
hansei
Helper V
Helper V

Well, with that dearth of replies, I've decided to do the following

  • sort by source to keep most recent at top (and buffer)
  • remove duplicates
  • remove most recent source
  • combine with most recent source

Hi @hansei ,

 

First create an index column for later use to get the latest source:

Untitled picture.png

Next, create a column filter. Its value judgment logic is that when the value of Note is duplicated and the value of Source is different from the latest source, it returns 1; otherwise, it returns 0. Finally create a calculated table to filter the table with filter equal to 0:

 

 

 

Table 2 =

VAR f =

    ADDCOLUMNS (

        'Table',

        "filter",

        VAR a = 'Table'[Index]

        VAR b =

            CALCULATETABLE (

                DISTINCT ( 'Table'[Note] ),

                FILTER ( 'Table', 'Table'[Index] < a )

            )

        VAR c =

            CALCULATE ( MAX ( 'Table'[Source] ), FILTER ( 'Table', 'Table'[Index] = 0 ) )

        RETURN

            IF ( 'Table'[Note] IN b && 'Table'[Source] <> c, 1, 0 )

    )

RETURN

    FILTER ( f, [filter] = 0 )

 

 

Untitled picture1.png

 

Please refer to the pbix file: https://qiuyunus-my.sharepoint.com/:u:/g/personal/pbipro_qiuyunus_onmicrosoft_com/ESq2wtkC4XFMhZcUOn...

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards,

Dedmon Dai

I cannot have a static solution based on a,b,c. There may be hundreds of sources.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.