Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
littlemojopuppy
Community Champion
Community Champion

Grouping Eliminating Duplicates

Good morning!  I'd like to ask for some help with something...

I have a table of transactions.  I am summarizing them by IdenityID, Date, Time and Marketing Source.  Problem is that there are some duplicates for (Identity, Date, Time) with different Marketing Sources (one with a value, one blank).

Here's some code and a pic of the output of the table statement below for one person on one day...

Marketing Count Test = 
    VAR 
        UniqueLogins =
            GROUPBY(
                ADDCOLUMNS(
                    IdentityLog,
                    "ModifiedTime",
                    TIME(
                        HOUR(IdentityLog[CreatedTime]),
                        MINUTE(IdentityLog[CreatedTime]),
                        0
                    )
                ),
                IdentityLog[IdentityId],
                IdentityLog[MarketingSourceLogId],
                IdentityLog[CreatedDate],
                [ModifiedTime]
            )
    RETURN

    UniqueLogins

Capture.PNG

 

In the above data, there are duplicates at 11:14, 2:42 and 2:49.  How would I go about removing the only the duplicated records with a blank marketing source for those times?  The records at other times with a blank value should be retained.

 

Any help would be appreciated!

2 ACCEPTED SOLUTIONS

Ok...this works

 

 

Marketing Count Test = 
    VAR 
        LoginTransactions =
            GROUPBY(
                ADDCOLUMNS(
                    IdentityLog,
                    "ModifiedTime",
                    TIME(
                        HOUR(IdentityLog[CreatedTime]),
                        MINUTE(IdentityLog[CreatedTime]),
                        0
                    )
                ),
                IdentityLog[IdentityId],
                IdentityLog[MarketingSourceLogId],
                IdentityLog[CreatedDate],
                [ModifiedTime]
            )
    VAR
        UniqueLogins =
            SUMMARIZE(
                LoginTransactions,
                IdentityLog[IdentityId],
                IdentityLog[CreatedDate],
                [ModifiedTime],
                "MarketingSource",
                MAX(IdentityLog[MarketingSourceLogId])
            )
    RETURN

    UniqueLogins

 

 

Capture.PNG

 

But this seems overly complicated.  I could achieve the same result with SUMMARIZECOLUMNS if I could truncate seconds from the time.  Could anyone please offer some guidance on how to achieve that?

View solution in original post

This is what I was looking for!

Active Users by Marketing Source Earliest Attribution = 
    VAR 
        LoginTransactions =
            ADDCOLUMNS(
                SUMMARIZECOLUMNS(
                    IdentityLog[IdentityId],
                    IdentityLog[CreatedDate],
                    "Created Time",
                    MIN(IdentityLog[CreatedTime])
                ),
                "MarketingSource",
                LOOKUPVALUE(
                    IdentityLog[MarketingSourceLogId],
                    IdentityLog[IdentityId],
                    IdentityLog[IdentityId],
                    IdentityLog[CreatedDate],
                    IdentityLog[CreatedDate],
                    IdentityLog[CreatedTime],
                    [Created Time]
                )
            )
    RETURN

    COUNTROWS(LoginTransactions)

View solution in original post

6 REPLIES 6
littlemojopuppy
Community Champion
Community Champion

A second, simpler version of code...

 

Marketing Count Test = 
    VAR 
        LoginTransactions =
            SUMMARIZECOLUMNS(
                IdentityLog[IdentityId],
                IdentityLog[CreatedDate],
                IdentityLog[CreatedTime],
                "MarketingSource",
                FIRSTNONBLANK(
                    IdentityLog[MarketingSourceLogId],
                    VALUES(IdentityLog[MarketingSourceLogId])
                )
            )
    RETURN

    LoginTransactions

 

Unfortunately this produces only those records where there is a value in Marketing Source.  I did have the thought that maybe I could use this as a second table, produce a third filtering only blank from the previous code and then union them together but that would put me exactly where I started 🙄

Continuing to play with this.  This produces almost exactly what I want

Marketing Count Test Redux = 
    VAR
        LoginTransactions =
            SUMMARIZECOLUMNS(
                IdentityLog[IdentityId],
                IdentityLog[CreatedDate],
                IdentityLog[CreatedTime],
                "MarketingSource",
                IGNORE(
                    FIRSTNONBLANK(
                        IdentityLog[MarketingSourceLogId],
                        VALUES(IdentityLog[MarketingSourceLogId])
                    )
                )
            )
    RETURN
    
    LoginTransactions

 

The only problem with it is that it's grouping by HH:MM:SS as shown.

Capture.PNG

 

The original code using GROUPBY allowed me to add a column for truncating seconds from the time, but I couldn't remove the duplicates.  This works, except I can't truncate seconds from the time.

Any suggestions most appreciated!

Ok...this works

 

 

Marketing Count Test = 
    VAR 
        LoginTransactions =
            GROUPBY(
                ADDCOLUMNS(
                    IdentityLog,
                    "ModifiedTime",
                    TIME(
                        HOUR(IdentityLog[CreatedTime]),
                        MINUTE(IdentityLog[CreatedTime]),
                        0
                    )
                ),
                IdentityLog[IdentityId],
                IdentityLog[MarketingSourceLogId],
                IdentityLog[CreatedDate],
                [ModifiedTime]
            )
    VAR
        UniqueLogins =
            SUMMARIZE(
                LoginTransactions,
                IdentityLog[IdentityId],
                IdentityLog[CreatedDate],
                [ModifiedTime],
                "MarketingSource",
                MAX(IdentityLog[MarketingSourceLogId])
            )
    RETURN

    UniqueLogins

 

 

Capture.PNG

 

But this seems overly complicated.  I could achieve the same result with SUMMARIZECOLUMNS if I could truncate seconds from the time.  Could anyone please offer some guidance on how to achieve that?

Could you post some of that sample data as text?


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Thanks for looking at it!

Dumping an Excel file here.

This is what I was looking for!

Active Users by Marketing Source Earliest Attribution = 
    VAR 
        LoginTransactions =
            ADDCOLUMNS(
                SUMMARIZECOLUMNS(
                    IdentityLog[IdentityId],
                    IdentityLog[CreatedDate],
                    "Created Time",
                    MIN(IdentityLog[CreatedTime])
                ),
                "MarketingSource",
                LOOKUPVALUE(
                    IdentityLog[MarketingSourceLogId],
                    IdentityLog[IdentityId],
                    IdentityLog[IdentityId],
                    IdentityLog[CreatedDate],
                    IdentityLog[CreatedDate],
                    IdentityLog[CreatedTime],
                    [Created Time]
                )
            )
    RETURN

    COUNTROWS(LoginTransactions)

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.