Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
gp10
Advocate III
Advocate III

Group By Distinct Count for Multiple Columns

Hi community,

I have an issue with aggregating a table and getting the correct distinct count on every level of aggregation. I need this aggregation on Power Query (or SQL) not on DAX.

The table has several columns and I need the distinct count of one of them and the sum of the amounts of others.


The issue is when we need the total on different levels. For example the daily distinct count is correct, but if we sum it up to get the monthly one, it will be wrong because a same value can appear in several days. And there are plenty of combinations between dates (year, quarter, month,day) and every column and every combination of them.

The table looks like this and the column where we need the distinct count is F_id.
Any thoughts more than welcome. thanks.

DateA_idB_idC_idD_idE_idF_idAmount1BooleanAmount2
2/13/20229a2b21d 2f113000
2/13/20222a2b12d4e4f111000
2/12/20225a2b12d13e4f113000
2/12/20227a2b12d 4f112000
2/11/202211a2b23d5e3f210
2/11/20223a2b23d 3f210
2/11/202211a2b23d3e3f210
2/11/20223a2b22d 4f112000
2/11/20223a2b23d 3f110
2/11/20221a2b23d17e3f410
2/11/202211a2b23d 3f210
2/11/202213a2b22d16e4f111000
2/11/20226a2b22d2e4f113000
2/11/202211a2b22d5e4f112000
2/11/202211a2b23d 3f310
2/11/20224a2b22d 4f111000
2/11/202211a2b23d1e3f110
2/11/202211a2b23d10e3f310
2/11/202211a2b23d9e3f210
2/11/202211a2b23d6e3f310
2/11/202211a2b23d8e3f110
2/11/20221a2b23d 3f210
2/11/202212a1a23d21e1f1118000
2/11/20228a2b22d25e4f111000
2/11/20221a2b23d 3f310
2/10/202210a2b21d19e5f114000
2/10/202210a2b21d15e5f115000
2/9/202215a2b12d7e4f112000
1 ACCEPTED SOLUTION
dufoq3
Super User
Super User

Hi @gp10, check this one:

 

dufoq3_0-1706871697154.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ndRdDoMgDADgqxiel0hb8Ocsxoct0/sfYeAGg7UI8wESEr+0pZVlUdgD9agR1U3Nd7fhw29uwdNtnT/s/vRZpLVW6y2HGKH/BD00m99SCYnEIC2XQIySREdOu1+IAVIPECBAVid5aX1M2kPpbnFHnHWyMpVoVIhmi9Eq5Y2NaUIWDtM0uYIxpmnKToCFW8mYUB4MpzMT7cApno5MKVmMnZdvtblOKtZpGrooFinFg9iPkz5KTm/1TCU4F+a05oaL8aaWAq/N2/FEvW1geNwnZK2YpF5MwsDxsZHb+M/U6Kg0f4nhaIZNI5ok4rm1zNqvnSMVXuOx8HusLw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, A_id = _t, B_id = _t, C_id = _t, D_id = _t, E_id = _t, F_id = _t, Amount1 = _t, Boolean = _t, Amount2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}}, "en-US"),
    #"Inserted Year" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([Date]), Int64.Type),
    #"Inserted Quarter" = Table.AddColumn(#"Inserted Year", "Quarter", each Date.QuarterOfYear([Date]), Int64.Type),
    #"Inserted Month" = Table.AddColumn(#"Inserted Quarter", "Month", each Date.Month([Date]), Int64.Type),
    GroupedRowsYear = Table.Group(#"Inserted Month", {"Year"}, {{"Detail Year", each _, type table}}),
    StepBack = #"Inserted Month",
    YearQuarterMonthDistinct = Table.Distinct(Table.SelectColumns(StepBack, {"Year", "Quarter", "Month"})),
    MergedQueryItself = Table.NestedJoin(YearQuarterMonthDistinct, {"Year"}, GroupedRowsYear, {"Year"}, "GroupedRowsYear", JoinKind.LeftOuter),
    #"Expanded Detail" = Table.ExpandTableColumn(MergedQueryItself, "GroupedRowsYear", {"Detail Year"}, {"Detail Year"}),
    id_ColumnNames = Table.FromList(List.Select(Table.ColumnNames(#"Changed Type"), each Text.EndsWith(_, "id")), Splitter.SplitByNothing()),
    AddedYearQuarterMonth = Table.AddColumn(id_ColumnNames, "Custom", each {"Year", "Quarter", "Month"}),
    Expanded_id_YearQuarterMonth = Table.ExpandListColumn(AddedYearQuarterMonth, "Custom"),
    StepBack2 = #"Expanded Detail",
    Ad_SummaryMonthColumns = List.Accumulate(
     List.Buffer(Table.ToRows(Expanded_id_YearQuarterMonth)),
     StepBack2,
     (s,c)=> Table.AddColumn(s, c{0} & "_Distinct_" & c{1}, each Table.RowCount(Table.Distinct(Table.SelectRows([Detail Year], (a)=> Record.Field(a, c{1}) = Record.Field(_, c{1})), {{c{0}, Comparer.Ordinal}})), Int64.Type))
in
    Ad_SummaryMonthColumns

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

3 REPLIES 3
dufoq3
Super User
Super User

Hi @gp10, check this one:

 

dufoq3_0-1706871697154.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ndRdDoMgDADgqxiel0hb8Ocsxoct0/sfYeAGg7UI8wESEr+0pZVlUdgD9agR1U3Nd7fhw29uwdNtnT/s/vRZpLVW6y2HGKH/BD00m99SCYnEIC2XQIySREdOu1+IAVIPECBAVid5aX1M2kPpbnFHnHWyMpVoVIhmi9Eq5Y2NaUIWDtM0uYIxpmnKToCFW8mYUB4MpzMT7cApno5MKVmMnZdvtblOKtZpGrooFinFg9iPkz5KTm/1TCU4F+a05oaL8aaWAq/N2/FEvW1geNwnZK2YpF5MwsDxsZHb+M/U6Kg0f4nhaIZNI5ok4rm1zNqvnSMVXuOx8HusLw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, A_id = _t, B_id = _t, C_id = _t, D_id = _t, E_id = _t, F_id = _t, Amount1 = _t, Boolean = _t, Amount2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}}, "en-US"),
    #"Inserted Year" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([Date]), Int64.Type),
    #"Inserted Quarter" = Table.AddColumn(#"Inserted Year", "Quarter", each Date.QuarterOfYear([Date]), Int64.Type),
    #"Inserted Month" = Table.AddColumn(#"Inserted Quarter", "Month", each Date.Month([Date]), Int64.Type),
    GroupedRowsYear = Table.Group(#"Inserted Month", {"Year"}, {{"Detail Year", each _, type table}}),
    StepBack = #"Inserted Month",
    YearQuarterMonthDistinct = Table.Distinct(Table.SelectColumns(StepBack, {"Year", "Quarter", "Month"})),
    MergedQueryItself = Table.NestedJoin(YearQuarterMonthDistinct, {"Year"}, GroupedRowsYear, {"Year"}, "GroupedRowsYear", JoinKind.LeftOuter),
    #"Expanded Detail" = Table.ExpandTableColumn(MergedQueryItself, "GroupedRowsYear", {"Detail Year"}, {"Detail Year"}),
    id_ColumnNames = Table.FromList(List.Select(Table.ColumnNames(#"Changed Type"), each Text.EndsWith(_, "id")), Splitter.SplitByNothing()),
    AddedYearQuarterMonth = Table.AddColumn(id_ColumnNames, "Custom", each {"Year", "Quarter", "Month"}),
    Expanded_id_YearQuarterMonth = Table.ExpandListColumn(AddedYearQuarterMonth, "Custom"),
    StepBack2 = #"Expanded Detail",
    Ad_SummaryMonthColumns = List.Accumulate(
     List.Buffer(Table.ToRows(Expanded_id_YearQuarterMonth)),
     StepBack2,
     (s,c)=> Table.AddColumn(s, c{0} & "_Distinct_" & c{1}, each Table.RowCount(Table.Distinct(Table.SelectRows([Detail Year], (a)=> Record.Field(a, c{1}) = Record.Field(_, c{1})), {{c{0}, Comparer.Ordinal}})), Int64.Type))
in
    Ad_SummaryMonthColumns

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Anonymous
Not applicable

Hi @gp10 

You can put the following code to advanced editor in power query

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ndRdDoMgDADgqxiel0hb8Ocsxoct0/sfYeAGg7UI8wESEr+0pZVlUdgD9agR1U3Nd7fhw29uwdNtnT/s/vRZpLVW6y2HGKH/BD00m99SCYnEIC2XQIySREdOu1+IAVIPECBAVid5aX1M2kPpbnFHnHWyMpVoVIhmi9Eq5Y2NaUIWDtM0uYIxpmnKToCFW8mYUB4MpzMT7cApno5MKVmMnZdvtblOKtZpGrooFinFg9iPkz5KTm/1TCU4F+a05oaL8aaWAq/N2/FEvW1geNwnZK2YpF5MwsDxsZHb+M/U6Kg0f4nhaIZNI5ok4rm1zNqvnSMVXuOx8HusLw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, A_id = _t, B_id = _t, C_id = _t, D_id = _t, E_id = _t, F_id = _t, Amount1 = _t, Boolean = _t, Amount2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"A_id", type text}, {"B_id", type text}, {"C_id", Int64.Type}, {"D_id", type text}, {"E_id", type text}, {"F_id", type text}, {"Amount1", Int64.Type}, {"Boolean", Int64.Type}, {"Amount2", Int64.Type}}),
    #"Inserted Year" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([Date]), Int64.Type),
    #"Inserted Quarter" = Table.AddColumn(#"Inserted Year", "Quarter", each Date.QuarterOfYear([Date]), Int64.Type),
    #"Inserted Month" = Table.AddColumn(#"Inserted Quarter", "Month", each Date.Month([Date]), Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Inserted Month", "DistinctCount_Month", each List.Count(List.Distinct(Table.SelectRows(#"Inserted Month",(x)=>x[Year]=[Year] and x[Month]=[Month])[F_id]))),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "DistinctCount_year", each List.Count(List.Distinct(Table.SelectRows(#"Inserted Month",(x)=>x[Year]=[Year])[F_id]))),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "DistinctCount_Quarter", each List.Count(List.Distinct(Table.SelectRows(#"Inserted Month",(x)=>x[Year]=[Year] and x[Quarter]=[Quarter])[F_id])))
in
    #"Added Custom2"

Output

vxinruzhumsft_0-1706837778526.png

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Hi @Anonymous , thank you so much for your time and effort.

While this covers a part of the issue, it does adress only the date part.
Not the combinations of all other columns.

My goal is to get rid of the F_id column.

In this solution if I want to see the distinct count by A_id for example, I can still see only the distinct count for the month regardless of A_id's values.

All the other combinations are missing.
Thanks again.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.