The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
I have a large set of data where for each Number/Max Date Reported I need to identify the most recent dateAdded. As you can see, the score and dateAdded for each contactID is repeated for each Number/Max Date Reported. This is a tricky problem and I have not been able to figure out how to accomplish this in Power Query.
Here is a subset of data for one contactID:
contactId | changeId | policyId | Number | Max Date Reported | score | dateAdded |
ABC123 | 111-222-333 | 2023ABC | ABC-2015-117 | 9/14/2015 | 687 | 12/29/2016 11:35 |
ABC123 | 111-222-333 | 2023ABC | ABC-2015-117 | 9/14/2015 | 630 | 12/20/2017 14:05 |
ABC123 | 111-222-333 | 2023ABC | ABC-2015-117 | 9/14/2015 | 909 | 3/13/2012 13:38 |
ABC123 | 111-222-333 | 2023ABC | ABC-2015-117 | 9/14/2015 | 678 | 1/4/2016 12:11 |
ABC123 | 111-222-333 | 2023ABC | ABC-2015-117 | 9/14/2015 | 687 | 1/3/2017 13:07 |
ABC123 | 222-233-666 | 2023ABC | ABC-2022-316 | 11/22/2022 | 687 | 12/29/2016 11:35 |
ABC123 | 222-233-666 | 2023ABC | ABC-2022-316 | 11/22/2022 | 630 | 12/20/2017 14:05 |
ABC123 | 222-233-666 | 2023ABC | ABC-2022-316 | 11/22/2022 | 909 | 3/13/2012 13:38 |
ABC123 | 222-233-666 | 2023ABC | ABC-2022-316 | 11/22/2022 | 678 | 1/4/2016 12:11 |
ABC123 | 222-233-666 | 2023ABC | ABC-2022-316 | 11/22/2022 | 687 | 1/3/2017 13:07 |
ABC123 | 885-555-555 | 2023ABC | ABC-2019-103 | 5/16/2019 | 687 | 12/29/2016 11:35 |
ABC123 | 885-555-555 | 2023ABC | ABC-2019-103 | 5/16/2019 | 630 | 12/20/2017 14:05 |
ABC123 | 885-555-555 | 2023ABC | ABC-2019-103 | 5/16/2019 | 909 | 3/13/2012 13:38 |
ABC123 | 885-555-555 | 2023ABC | ABC-2019-103 | 5/16/2019 | 678 | 1/4/2016 12:11 |
ABC123 | 885-555-555 | 2023ABC | ABC-2019-103 | 5/16/2019 | 687 | 1/3/2017 13:07 |
And what the end result should look like:
contactId | changeId | policyId | Number | Max Date Reported | score | dateAdded |
ABC123 | 111-222-333 | 2023ABC | ABC-2015-117 | 9/14/2015 | 678 | 1/4/2016 12:11 |
ABC123 | 222-233-666 | 2023ABC | ABC-2022-316 | 11/22/2022 | 630 | 12/20/2017 14:05 |
ABC123 | 885-555-555 | 2023ABC | ABC-2019-103 | 5/16/2019 | 630 | 12/20/2017 14:05 |
Please note that the dateAdded should always be the most recent date in the past relative to the Max Date Reported.
Thank you!
Solved! Go to Solution.
I'm assuming that the dateAdded: 1/4/2016 12:11 for Max Date Reported: 9/14/2015 is a mistake given the max dateAdded is 12/20/2017 14:05 like for the two other Max Date Reported groups.
let
Source = Original,
PreSort = Table.Sort(
Source,
{{"contactId", Order.Ascending}, {"Max Date Reported", Order.Ascending}, {"dateAdded", Order.Descending}}
),
LocalGroupFirstRow = Table.Group(
PreSort,
{"Number", "Max Date Reported"},
{{"FirstRow", each Table.FirstN(_, 1), Value.Type(Source)}},
GroupKind.Local
),
CombineGroups = Table.Combine(LocalGroupFirstRow[FirstRow])
in
CombineGroups
Thank you for pointing that out my mistake. The dateAdded for the Max Date Reported of 9/14/2015 should actually be 3/13/2012, so the result would look like this:
I do not want the score/DateAdded for the Max Date Reported of 9/14/2015 to be 12/20/2017. The score must be the most recent one based on the Max Date Reported.
Thank you!
I'm assuming that the dateAdded: 1/4/2016 12:11 for Max Date Reported: 9/14/2015 is a mistake given the max dateAdded is 12/20/2017 14:05 like for the two other Max Date Reported groups.
let
Source = Original,
PreSort = Table.Sort(
Source,
{{"contactId", Order.Ascending}, {"Max Date Reported", Order.Ascending}, {"dateAdded", Order.Descending}}
),
LocalGroupFirstRow = Table.Group(
PreSort,
{"Number", "Max Date Reported"},
{{"FirstRow", each Table.FirstN(_, 1), Value.Type(Source)}},
GroupKind.Local
),
CombineGroups = Table.Combine(LocalGroupFirstRow[FirstRow])
in
CombineGroups
Thank you! This looks like it is working in my report. Could you explain exactly what you are doing in the last 2 steps of your query? I can see it working but cannot understand exactly WHY it is working.
Sure. Including solution + explanation for comprehension's sake.
Get the row per [Number] and [Max Dat Reported] with closest earlier dateAdded relative to Max Date Reported. Logic for dateAdded vs Max Date Reported:
let
Source = Original,
RemoveAddedAfterMax = Table.SelectRows(
Source,
each [dateAdded] <= DateTime.From([Max Date Reported])
),
AddDiff = Table.AddColumn(
RemoveAddedAfterMax, "Diff",
each DateTime.From([Max Date Reported]) - [dateAdded],
type duration
),
SortDiff = Table.Sort(AddDiff, {{"Diff", Order.Ascending}}),
GetSmallestDiffRow = Table.Group(
SortDiff,
{"contactId", "Max Date Reported"},
{{"group", each Table.RemoveColumns(Table.FirstN(_, 1), "Diff"), Value.Type(Source)}}
),
CombineSmallestDiffRows = Table.Combine(GetSmallestDiffRow[group])
in
CombineSmallestDiffRows
The objective of the step is to find the row that meets our criteria by every [Number] and [Max Dat Reported] pair. In scenarios like this, Group By is the route to take.
This produces the following
output (with preview of All Rows output per group row)
and M code (check out here for good explanation on Table.Group).
In both UI code and solution code, you can see we are taking SortDiff table and grouping by columns {Number, Max Date Reported}. The difference is the aggregation code in the third argument. Compare:
UI All Rows | Solution | Comment |
"AllRows" | "group" | Name of agg column to add |
each _ | each Table.RemoveColumns( Table.FirstN( _, 1 ), "Diff" ) | Rather than simply return all the grouped rows (with otb function "each _", where _ = grouped table rows), we use a custom function to (a) grab the first row of _ with Table.FirstN which gets us desired row with smallest Diff given we sorted Diff ascending in previous step, and (b) remove the extra Diff column with Table.RemoveColumns |
type table [ ... ] | Value.Type( Source ) |
Similar to Table.AddColumn and Table.TransformColumns, this part tells Power Query the output column type. Rather than hard-code the table type like the UI, we use Value.Type to dynamically use the Source's table type, which should match exactly the grouped rows given we are removing the only difference: the Diff column we added.
And, actually, as I was reviewing as part of this response, I'm realizing we don't need to specify the output type here at all. The resulting output of the next step (CombineSmallestDiffRows) retains proper table (and column) types without needing to specify output type here. Ie, the solution agg column definition could just follow syntax "{agg column name, agg function}" and ignore output type altogether. |
In conclusion for this step, we are producing a grouped agg column "group" where each row is a single row table with the target "closest earlier dateAdded relative to Max Date Reported"
So, we have what we want from the previous step. The problem is that they are nested inside a table column. Rather than this column of separated target rows per [Number] and [Max Date Reported] pair, we want a single table that is the combination of all those single-row tables. Our objective for this step is to perform this transformation (column of tables -> single table combination of tables in column).
Take
Table1
Column1 | Column2 |
1 | 2 |
3 | 4 |
and
Table2
Column1 | Column2 |
5 | 6 |
7 | 8 |
Now, let's Append these through the UI.
Here is resulting output
and resulting code.
So, we append tables together with Table.Combine, which takes as its only argument a list of tables.
As discussed just above, at a high level, the previous step added a column of tables. And when you reference a table column like "Table[Column]", the output is a list of the values of the column. In the case of our "group" column that we added, the values are tables (single target row). In other words, "GetSmallestDiffRow[group]" gives us a list tables, which is exactly the input for Table.Combine that we want in order to get our final output.
Thank you for pointing that out my mistake. The dateAdded for the Max Date Reported of 9/14/2015 should actually be 3/13/2012, so the result would look like this:
I do not want the score/DateAdded for the Max Date Reported of 9/14/2015 to be 12/20/2017. The score must be the most recent one based on the Max Date Reported.
Thank you!
I'm having trouble understanding the logic - I assumed "most recent dateAdded" meant absolute latest, but that clearly is not correct based on your response.
By most recent dateAdded "based on the Max Date Reported", do you mean closest earlier dateAdded relative to Max Date Reported? Ie is logic:
Yes, that is the logic. I need the most recent credit score/Date Added for each Max Date Reported.
I think this meets your reqs:
let
Source = Original,
RemoveAddedAfterMax = Table.SelectRows(
Source,
each [dateAdded] <= DateTime.From([Max Date Reported])
),
AddDiff = Table.AddColumn(
RemoveAddedAfterMax, "Diff",
each DateTime.From([Max Date Reported]) - [dateAdded],
type duration
),
SortDiff = Table.Sort(AddDiff, {{"Diff", Order.Ascending}}),
GetSmallestDiffRow = Table.Group(
SortDiff,
{"contactId", "Max Date Reported"},
{{"group", each Table.RemoveColumns(Table.FirstN(_, 1), "Diff"), Value.Type(Source)}}
),
CombineSmallestDiffRows = Table.Combine(GetSmallestDiffRow[group])
in
CombineSmallestDiffRows
Wow, thank you so much for this detailed explanation. This has taken report building to a new level for me, and I appreciate it greatly!
Here are screenshots of the sample data for readability.
Current Data
End Goal