The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi,
I am looking for help with the group by function in Power Query.
I am trying to consolidate files from a folder where historical data is placed in separate CSV files with a common key which is the date/time stamp, however, some periods might be overlapping when a new file is added.
I cannot simply remove duplicate date/time values since some "measuement" are taken at the exact same time and is thus not a duplicate, but others are.
My idea is to group by the date column and then keep the maximum value of the other columns and this works fine.
But since new data is added regularly and sometimes with new columns i want to do this without hardcoding column names.
This is the formula generated from the built in function in PQ:
= Table.Group(Source, {"Date"}, {{"Column 1", each List.Max([Column 1]), type nullable number}, {"Column 2", each List.Max([Column 2]), type number}, {"Column n", each List.Max([Column n]), type nullable number}})
As you can see Column name 1 to n is hardcoded.
So in short i want to group all other columns by the "Date" column and keep the max values.
Any pointers, thanks in advance.
Solved! Go to Solution.
You can create a dynamic List of functions, based on the column names, for both setting the data types, and also performing the aggregations in the Grouping.
In the aggregation transformation, C represents the column name; the underscore ( _ ) represents the subtable returned in the Table.Group function
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TY7RDQAhCEN34dtExTv1ZjHuv4ZQMMdHTekjxbWoZs5cKRH0yPOa2UmhDqxBuyvDSZcBIV9TAtEi1A7RFwDa1UzIgC51v9Tg/9wK/BcRTIcFrXsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t]),
//List of column names EXCEPT for "Date"
maxCols= List.RemoveItems(Table.ColumnNames(Source),{"Date"}),
//set data types
#"Changed Type" = Table.TransformColumnTypes(Source,
{{"Date", type date}} & List.Transform(maxCols, each {_, Int64.Type})),
//Group and create list of aggregation functions
group = Table.Group(#"Changed Type",
"Date",
List.Transform(maxCols, (C) => {C, each List.Max(Table.Column(_, C)), type number})
)
in
group
Source Data
Typed
Results
You can create a dynamic List of functions, based on the column names, for both setting the data types, and also performing the aggregations in the Grouping.
In the aggregation transformation, C represents the column name; the underscore ( _ ) represents the subtable returned in the Table.Group function
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TY7RDQAhCEN34dtExTv1ZjHuv4ZQMMdHTekjxbWoZs5cKRH0yPOa2UmhDqxBuyvDSZcBIV9TAtEi1A7RFwDa1UzIgC51v9Tg/9wK/BcRTIcFrXsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t]),
//List of column names EXCEPT for "Date"
maxCols= List.RemoveItems(Table.ColumnNames(Source),{"Date"}),
//set data types
#"Changed Type" = Table.TransformColumnTypes(Source,
{{"Date", type date}} & List.Transform(maxCols, each {_, Int64.Type})),
//Group and create list of aggregation functions
group = Table.Group(#"Changed Type",
"Date",
List.Transform(maxCols, (C) => {C, each List.Max(Table.Column(_, C)), type number})
)
in
group
Source Data
Typed
Results
Hi ronrsnfld,
I tried this out today and it worked as intended, and i learned a lot from your comment.
Thank you!