Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
Check it out now!Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
My source table has dupes, by design. It's denormalized. Imagine something like this:
Item ID | Item Name | Item Color | Owning Org | Owning Person
=========================================
1 | Car | Green | Finance | Joe
1 | Car | Green | Finance | Sally
1 | Car | Green | HR | Bob
This data is exactly what I need for most of my views -- the same item must be reported under multiple owners -- but I want a 2nd view that has no duplicates. All item attributes are identical across all rows (item #1 will always be a car and green), but some or all of the ownership attributes will vary across near duplicate rows.
In effect, I want to summarize by Item ID, but I never want to sum or count. Instead, for item attributes, I want to take any or the first value -- they will be the same, so whatever is cheapest / fastest in terms of compute -- and for the ownership attributes, my ideal would be to take the "mode" or most common value, but I'd happily start with just any or the first value. However, for all other ownership attributes, I want to take the matching values. E.g. I can't have Owning Org be HR and Owning Person to be Joe.
Almost none of the attributes / fields are scalars, and even for those that are, I still don't want sum or count. I could probably use average here but that seems needlessly complex given the values will all be the same.
Solved! Go to Solution.
@rhaining Maybe:
New Table =
SUMMARIZE(
'Table',
[ID],
[Item Name],
[Item Color],
"Owning Org", MIN('Table'[Owning Org]),
"Owning Person", MIN('Table'[Owning Person])
)
or
New Table =
VAR __Table =
SUMMARIZE(
'Table',
[ID],
[Item Name],
[Item Color],
[Owning Org],
[Owning Person],
"Count", COUNTROWS('Table')
)
VAR __Max = MAXX(__Table, [Count])
VAR __Result = TOPN( 1, FILTER(__Table, [Count] = __Max))
RETURN
__Result
@rhaining Maybe:
New Table =
SUMMARIZE(
'Table',
[ID],
[Item Name],
[Item Color],
"Owning Org", MIN('Table'[Owning Org]),
"Owning Person", MIN('Table'[Owning Person])
)
or
New Table =
VAR __Table =
SUMMARIZE(
'Table',
[ID],
[Item Name],
[Item Color],
[Owning Org],
[Owning Person],
"Count", COUNTROWS('Table')
)
VAR __Max = MAXX(__Table, [Count])
VAR __Result = TOPN( 1, FILTER(__Table, [Count] = __Max))
RETURN
__Result
I haven't yet tried either of your specific suggestions, but they were enough to unblock me. Thank you! I decided to start from here:
New Table =
SUMMARIZE(
'Table',
[ID],
[Item Name],
[Item Color]
)
In effect throwing away all the ownership information. At least I have a correctly deduped table.
I worry about your 1st suggestion -- how does MAX function w.r.t. strings? And would it guarantee that the Owning Org + Owning Person were a valid pair? Your 2nd suggestion looks like it would work perfectly and I may try that soon.
I have a follow-up question which I'm hoping you or someone else can answer. I have my main page working fine, and users can filter in many different ways, and all changes they make reflect in all views -- great. Now I wish my deduped item data set was also filtered to the selections made on report page 1, and would dynamically update as filter changes were applied on that page.