Solved: Re: Power query: Group by Date Column and keep Max...

Nilselmano · ‎09-30-2021

Hi,
I am looking for help with the group by function in Power Query.
I am trying to consolidate files from a folder where historical data is placed in separate CSV files with a common key which is the date/time stamp, however, some periods might be overlapping when a new file is added.
I cannot simply remove duplicate date/time values since some "measuement" are taken at the exact same time and is thus not a duplicate, but others are.
My idea is to group by the date column and then keep the maximum value of the other columns and this works fine.

But since new data is added regularly and sometimes with new columns i want to do this without hardcoding column names.
This is the formula generated from the built in function in PQ:

= Table.Group(Source, {"Date"}, {{"Column 1", each List.Max([Column 1]), type nullable number}, {"Column 2", each List.Max([Column 2]), type number}, {"Column n", each List.Max([Column n]), type nullable number}})

As you can see Column name 1 to n is hardcoded.

So in short i want to group all other columns by the "Date" column and keep the max values.

Any pointers, thanks in advance.

ronrsnfld · ‎09-30-2021

You can create a dynamic List of functions, based on the column names, for both setting the data types, and also performing the aggregations in the Grouping.

In the aggregation transformation, C represents the column name; the underscore ( _ ) represents the subtable returned in the Table.Group function

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TY7RDQAhCEN34dtExTv1ZjHuv4ZQMMdHTekjxbWoZs5cKRH0yPOa2UmhDqxBuyvDSZcBIV9TAtEi1A7RFwDa1UzIgC51v9Tg/9wK/BcRTIcFrXsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t]),
    
//List of column names EXCEPT for "Date"
    maxCols= List.RemoveItems(Table.ColumnNames(Source),{"Date"}),

//set data types
    #"Changed Type" = Table.TransformColumnTypes(Source,
        {{"Date", type date}} & List.Transform(maxCols, each {_, Int64.Type})),

//Group and create list of aggregation functions
    group = Table.Group(#"Changed Type",
            "Date",
            List.Transform(maxCols, (C) => {C, each List.Max(Table.Column(_, C)), type number})
)

in
    group

Source Data

Typed

Results

View solution in original post

ronrsnfld · ‎09-30-2021

You can create a dynamic List of functions, based on the column names, for both setting the data types, and also performing the aggregations in the Grouping.

In the aggregation transformation, C represents the column name; the underscore ( _ ) represents the subtable returned in the Table.Group function

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TY7RDQAhCEN34dtExTv1ZjHuv4ZQMMdHTekjxbWoZs5cKRH0yPOa2UmhDqxBuyvDSZcBIV9TAtEi1A7RFwDa1UzIgC51v9Tg/9wK/BcRTIcFrXsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t, Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t]),
    
//List of column names EXCEPT for "Date"
    maxCols= List.RemoveItems(Table.ColumnNames(Source),{"Date"}),

//set data types
    #"Changed Type" = Table.TransformColumnTypes(Source,
        {{"Date", type date}} & List.Transform(maxCols, each {_, Int64.Type})),

//Group and create list of aggregation functions
    group = Table.Group(#"Changed Type",
            "Date",
            List.Transform(maxCols, (C) => {C, each List.Max(Table.Column(_, C)), type number})
)

in
    group

Source Data

Typed

Results