The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hi everyone!
I am trying to use the function group by to get below source dataset
Source data
Person | Start Date | End Date |
Person 1 | 1-7-2021 | 30-6-2022 |
Person 1 | 1-7-2022 | 31-12-2022 |
Person 1 | 1-2-2023 | 31-8-2024 |
Person 2 | 1-9-2022 | 29-2-2024 |
Person 2 | 1-3-2024 | 31-8-2025 |
Person 3 | 15-3-2023 | 14-3-2024 |
Person 3 | 1-5-2024 | 31-10-2024 |
Person 3 | 5-11-2024 | 31-12-2025 |
to the following output where an interval of maximum 30 days is assumed
Output data
Person | Start Date | End Date |
Person 1 | 1-7-2021 | 31-12-2022 |
Person 1 | 1-2-2023 | 31-8-2024 |
Person 2 | 1-9-2022 | 31-8-2025 |
Person 3 | 15-3-2023 | 14-3-2024 |
Person 3 | 1-5-2024 | 31-12-2025 |
If I assume an interval of 0 days, I can use the solution below after first sorting my Start Dates ascending.
Only I need an interval of 30 days. l have been looking for a solution for many hours now, but without the desired result.
Does any expert have a good solution? Many thanks in advance
Solved! Go to Solution.
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
rows = List.Buffer(Table.ToRecords(Source)),
gen = List.Generate(
() => [i = 0, combine = false, c = rows{0}, s = {}],
(x) => x[i] < List.Count(rows),
(x) => [
i = x[i] + 1,
combine = (rows{i}[Person] = x[c][Person]) and (rows{i}[Start Date] - x[c][End Date] <= #duration(30, 0, 0, 0)),
c = if combine then x[c] & [End Date = rows{i}[End Date]] else rows{i},
s = if combine then x[s] else x[s] & {x[c]}
],
(x) => if x[i] = List.Count(rows) - 1 then x[s] & {x[c]} else x[s]
),
res= Table.FromRecords(List.Last(gen))
in
res
Hi @data_SWVO, check this:
Edited
Output
let
fn_ShiftRows =
(tbl as table, col as text, shift as nullable number, optional newColName as text, optional _type as type) as table =>
//v 3. parametri zadaj zaporne cislo ak chces posunut riadky hore, kladne ak dole, 4. je nepovinny (novy nazov stlpca), 5. je nepovinny typ
let
a = Table.Column(tbl, col),
b = if shift = 0 or shift = null then a else if shift > 0
then List.Repeat({null}, shift) & List.RemoveLastN(a, shift)
else List.RemoveFirstN(a, shift * -1) & List.Repeat({null}, shift * -1),
c = Table.FromColumns(Table.ToColumns(tbl) & {b}, Table.ColumnNames(tbl) &
( if newColName <> null then {newColName} else
if shift = 0 then {col & "_Duplicate"} else
if shift > 0 then {col & "_PrevValue"}
else {col & "_NextValue"} )),
d = Table.TransformColumnTypes(c, {List.Last(Table.ColumnNames(c)), if _type <> null then _type else type any})
in
d,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc3BCoAgEATQXxHPLjirVv5F9/DYtaD+H0o3SchuA/tmdln0vB7nvikobTRoJLaMOzpLQ86sk+kizggE/lPl4ERNOfsWsaBYpzhKoYecHN6l0CJXUBCV/8HXxkdRaKZg+yoQ0DJ+XqYL", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Person = _t, #"Start Date" = _t, #"End Date" = _t]),
ChangedType = Table.TransformColumnTypes(Source,{{"Start Date", type date}, {"End Date", type date}}, "sk-SK"),
GroupedRows = Table.Group(ChangedType, {"Person"}, {{"All", each
[ //_tbl = GroupedRows{[Person="Person 1 "]}[All],
_tbl = _,
SortedRows = Table.Sort(_tbl,{{"Start Date", Order.Ascending}}),
Ad_EndDatePrevValue = fn_ShiftRows(SortedRows, "End Date", 1, null, type date),
Ad_Index = Table.AddIndexColumn(Ad_EndDatePrevValue, "Index", 0, 1, Int64.Type),
Ad_GroupHelper = Table.AddColumn(Ad_Index, "GroupHelper", each try if Duration.Days([Start Date] - [End Date_PrevValue]) <= 30 then [Index] else [Index] + 1000000 otherwise 0, Int64.Type),
GroupedRows1 = Table.Group(Ad_GroupHelper, {"Person", "GroupHelper"}, {{"All", each _, type table}, {"Start Date", each List.Min([Start Date]), type date}, {"End Date", each List.Max([End Date]), type date}}, GroupKind.Local,
(x,y)=> try Number.From( y[GroupHelper] - x[GroupHelper] > 1) otherwise false ),
RemovedColumns = Table.RemoveColumns(GroupedRows1,{"GroupHelper", "All"})
][RemovedColumns], type table}}),
Combined = Table.Combine(GroupedRows[All])
in
Combined
Hi Dufoq3,
Thank you very much for your help. In your output I only see 1 row where I expected 2.
Regards,
Jolanda
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
rows = List.Buffer(Table.ToRecords(Source)),
gen = List.Generate(
() => [i = 0, combine = false, c = rows{0}, s = {}],
(x) => x[i] < List.Count(rows),
(x) => [
i = x[i] + 1,
combine = (rows{i}[Person] = x[c][Person]) and (rows{i}[Start Date] - x[c][End Date] <= #duration(30, 0, 0, 0)),
c = if combine then x[c] & [End Date = rows{i}[End Date]] else rows{i},
s = if combine then x[s] else x[s] & {x[c]}
],
(x) => if x[i] = List.Count(rows) - 1 then x[s] & {x[c]} else x[s]
),
res= Table.FromRecords(List.Last(gen))
in
res
Thanks a lot, it works!
Hi @data_SWVO
I'm not following the logic used to get the result. I don't follow what you mean by intervals of 30 days and how that applies to the data. Please try explaining again and illustrate how you are turning the source data into the result.
Regards
Phil
Proud to be a Super User!
Thanks for your quick response Phil. It's nice that you asked for an explanation, because I actually have no experience on this forum. I will try to explain it further.
Let's take person 1 as an example: The difference between the Start Date of row 2 (1-7-2022) compared to the End Date of row 1 (30-6-2022) is less than 30 days. These rows must be merged into 1 row with the Start Date of row 1 (1-7-2021) and the End date of row 2 (31-12-2022).
The difference between the Start Date of row 3 (1-2-2023) and the end date of row 2 (31-12-2022) is more than 30 days. Therefore, the row 3 may not be merged with rows 1 and 2 and the Start Date and End Date remain the same.
Is it clearer to you this way?
Best regards, Jolanda