Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
rhaining
Employee
Employee

want derived table with only distinct values

My source table has dupes, by design.  It's denormalized.  Imagine something like this:

 

Item ID | Item Name | Item Color | Owning Org | Owning Person

=========================================

1 | Car | Green | Finance | Joe

1 | Car | Green | Finance | Sally

1 | Car | Green | HR | Bob

 

This data is exactly what I need for most of my views -- the same item must be reported under multiple owners -- but I want a 2nd view that has no duplicates.  All item attributes are identical across all rows (item #1 will always be a car and green), but some or all of the ownership attributes will vary across near duplicate rows.

 

In effect, I want to summarize by Item ID, but I never want to sum or count.  Instead, for item attributes, I want to take any or the first value -- they will be the same, so whatever is cheapest / fastest in terms of compute -- and for the ownership attributes, my ideal would be to take the "mode" or most common value, but I'd happily start with just any or the first value.  However, for all other ownership attributes, I want to take the matching values.  E.g. I can't have Owning Org be HR and Owning Person to be Joe.

 

Almost none of the attributes / fields are scalars, and even for those that are, I still don't want sum or count.  I could probably use average here but that seems needlessly complex given the values will all be the same.

 

1 ACCEPTED SOLUTION
Greg_Deckler
Super User
Super User

@rhaining Maybe:

 

New Table = 
  SUMMARIZE(
    'Table',
    [ID],
    [Item Name],
    [Item Color],
    "Owning Org", MIN('Table'[Owning Org]),
    "Owning Person", MIN('Table'[Owning Person])
  )


or

New Table = 
  VAR __Table = 
  SUMMARIZE(
    'Table',
    [ID],
    [Item Name],
    [Item Color],
    [Owning Org],
    [Owning Person],
    "Count", COUNTROWS('Table')
  )
  VAR __Max = MAXX(__Table, [Count])
  VAR __Result = TOPN( 1, FILTER(__Table, [Count] = __Max))
RETURN
  __Result

 


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

2 REPLIES 2
Greg_Deckler
Super User
Super User

@rhaining Maybe:

 

New Table = 
  SUMMARIZE(
    'Table',
    [ID],
    [Item Name],
    [Item Color],
    "Owning Org", MIN('Table'[Owning Org]),
    "Owning Person", MIN('Table'[Owning Person])
  )


or

New Table = 
  VAR __Table = 
  SUMMARIZE(
    'Table',
    [ID],
    [Item Name],
    [Item Color],
    [Owning Org],
    [Owning Person],
    "Count", COUNTROWS('Table')
  )
  VAR __Max = MAXX(__Table, [Count])
  VAR __Result = TOPN( 1, FILTER(__Table, [Count] = __Max))
RETURN
  __Result

 


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

I haven't yet tried either of your specific suggestions, but they were enough to unblock me.  Thank you!  I decided to start from here:

 

New Table = 
  SUMMARIZE(
    'Table',
    [ID],
    [Item Name],
    [Item Color]
)

 

In effect throwing away all the ownership information.  At least I have a correctly deduped table.

 

I worry about your 1st suggestion -- how does MAX function w.r.t. strings?  And would it guarantee that the Owning Org + Owning Person were a valid pair?  Your 2nd suggestion looks like it would work perfectly and I may try that soon.

 

I have a follow-up question which I'm hoping you or someone else can answer.  I have my main page working fine, and users can filter in many different ways, and all changes they make reflect in all views -- great.  Now I wish my deduped item data set was also filtered to the selections made on report page 1, and would dynamically update as filter changes were applied on that page.

 

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.