Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Hi, I have a dataset where each staff member can have 1 or more records and each record has a start date and an end date.
name | record | start date | end date |
James | 1 | 12/11/2015 | 11/11/2017 |
James | 2 | 30/05/2017 | 3/07/2020 |
James | 3 | 3/07/2020 | 21/03/2033 |
Sarah | 1 | 30/05/2019 | 3/07/2033 |
Sarah | 2 | 3/07/2020 | 21/03/2032 |
Clare | 1 | 3/07/2020 | 21/03/2033 |
Simon | 1 | 12/11/2015 | 11/11/2017 |
Simon | 2 | 3/08/2021 | 3/09/2023 |
I want to turn it into the following table with the following conditions:
1. If the start date of the new record is on or before the end date of the previous record, then the records are in a continuous series.
2. If records are in a continuous series, keep the start date of the oldest record and keep the end date of the latest record.
3. If records are not in a continuous series (see 'Simon's records), then treat them as separate records with their own start and end dates.
name | start date | end date |
James | 12/11/2015 | 21/03/2033 |
Sarah | 30/05/2019 | 21/03/2032 |
Clare | 3/07/2020 | 21/03/2033 |
Simon | 12/11/2015 | 11/11/2017 |
Simon | 3/08/2021 | 3/09/2023 |
I'm new to power query and not sure how to to compare dates in multiple rows.
I've tried the method from a solved post: https://statics.teams.cdn.office.net/evergreen-assets/safelinks/1/atp-safelinks.html
but I can't make it work for my scenario. For example, using that method I could not merge all 3 records from James into 1 record, and I got the following instead.
name | start date | end date |
James | 12/11/2015 | 3/07/2020 |
James | 3/07/2020 | 21/03/2033 |
Solved! Go to Solution.
2 rows for James...
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
types = Table.TransformColumnTypes(Source, {{"start date", type date}, {"end date", type date}}),
sort = Table.Sort(types, {"name", "start date"}),
rows = List.Buffer(Table.ToList(sort, each _)),
gen = List.Generate(
() => [i = 0, new = true, max = rows{0}{3}, group = 1],
(x) => x[i] < List.Count(rows),
(x) => [
i = x[i] + 1,
new = (rows{i}{0} <> rows{i - 1}{0}) or (rows{i}{2} > x[max]),
max = if new then rows{i}{3} else List.Max({x[max], rows{i}{3}}),
group = if new then x[group] + 1 else x[group]
],
(x) => rows{x[i]} & {x[group]}
),
group = Table.Group(
Table.FromList(gen, each _, Table.ColumnNames(Source) & {"gr"}),
{"gr", "name"},
{
{"start date", (x) => List.Min(x[start date])},
{"end date", (x) => List.Max(x[end date])}
}
)
in
group
2 rows for James...
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
types = Table.TransformColumnTypes(Source, {{"start date", type date}, {"end date", type date}}),
sort = Table.Sort(types, {"name", "start date"}),
rows = List.Buffer(Table.ToList(sort, each _)),
gen = List.Generate(
() => [i = 0, new = true, max = rows{0}{3}, group = 1],
(x) => x[i] < List.Count(rows),
(x) => [
i = x[i] + 1,
new = (rows{i}{0} <> rows{i - 1}{0}) or (rows{i}{2} > x[max]),
max = if new then rows{i}{3} else List.Max({x[max], rows{i}{3}}),
group = if new then x[group] + 1 else x[group]
],
(x) => rows{x[i]} & {x[group]}
),
group = Table.Group(
Table.FromList(gen, each _, Table.ColumnNames(Source) & {"gr"}),
{"gr", "name"},
{
{"start date", (x) => List.Min(x[start date])},
{"end date", (x) => List.Max(x[end date])}
}
)
in
group
Hi, after I corrected the start date of the second record for James, I have been able to use your solution to get the desired output. This method is also the fastest. Thank you very much!
Hi, I realised I provided a wrong date for James' record 2. The start date should have the year 2017. I've edited my post. So all three records in a continuous series.
Hello everyone,
Here is another solution.
However, following the defined rules, I find 2 rows for James.
let
Source = #table(
type table [name = text, #"start date" = date, #"end date" = date],
{
{"James", #date(2015, 11, 12), #date(2017, 11, 11)},
{"James", #date(2019, 5, 30), #date(2020, 7, 3)},
{"James", #date(2020, 7, 3), #date(2033, 3, 21)},
{"Sarah", #date(2019, 5, 30), #date(2033, 7, 3)},
{"Sarah", #date(2020, 7, 3), #date(2032, 3, 21)},
{"Clare", #date(2020, 7, 3), #date(2033, 3, 21)},
{"Simon", #date(2015, 11, 12), #date(2017, 11, 11)},
{"Simon", #date(2021, 8, 3), #date(2023, 9, 3)}
}),
TransformSource = #table(type table [ID=Int64.Type, data=record], List.Zip({{1 .. Table.RowCount(Source)}, Table.ToRecords(Source)})),
fnGetEndDate = (r as record) as date =>
let
nextRecord = try TransformSource{[ID = r[ID]+1]} otherwise null,
result = if nextRecord is null then r[data][#"end date"]
else if nextRecord[data][name] <> r[data][name] then r[data][#"end date"]
else if (nextRecord[data][name] = r[data][name]) and (nextRecord[data][#"start date"] > r[data][#"end date"]) then r[data][#"end date"]
else @fnGetEndDate(nextRecord)
in
result,
fnGetStartDate = (r as record) as date =>
let
prevRecord = try TransformSource{[ID = r[ID]-1]} otherwise null,
result = if prevRecord is null then r[data][#"start date"]
else if prevRecord[data][name] <> r[data][name] then r[data][#"start date"]
else if (prevRecord[data][name] = r[data][name]) and (prevRecord[data][#"end date"] < r[data][#"start date"]) then r[data][#"start date"]
else @fnGetStartDate(prevRecord)
in
result,
TransformData = Table.FromRecords(Table.TransformRows(TransformSource, each Record.AddField(Record.AddField(Record.AddField(_, "name", [data][name]), "start date", fnGetStartDate(_)), "end date", fnGetEndDate(_)))),
RemoveDuplicates = Table.Distinct(TransformData[[name], [#"start date"], [#"end date"]])
in
RemoveDuplicates
Hi, after I corrected the start date of the second record for James, I have been able to use your solution to get the desired output. This method seems to take a long time when the dataset is large (I have a bout 8000+ records in dataset). But still thank you very much!
Hi, i realised I provided a wrong start date for James's second record. I've edited my post. Now all 3 records are in a continuous series.
/*
...den 3. Punkt hab ich nicht verstanden. Woran erkennt man denn Simons Rekorde?
Für die "normale" Anzeige der gruppierten Start- und Enddaten kannst Du folgendermaßen vorgehen:
*/
let
Ende = Table.Group(#"Geänderter Typ", {"Name"}, {{"Enddatum", each List.Max([Enddatum]), type nullable date}}),
Start = Table.Group(#"Geänderter Typ", {"Name"}, {{"Startdatum", each List.Min([Startdatum]), type nullable date}}) ,
Quelle = Excel.CurrentWorkbook(){[Name="tbQuelle"]}[Content],
#"Entfernte Spalten" = Table.RemoveColumns(Quelle,{"Aufzeichnung"}),
#"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Startdatum", type date}, {"Enddatum", type date}}),
Ergebnis = Ende,
#"Zusammengeführte Abfragen" = Table.NestedJoin(Start, {"Name"}, Ende, {"Name"}, "Ergebnis", JoinKind.LeftOuter),
#"Erweiterte Ergebnis" = Table.ExpandTableColumn(#"Zusammengeführte Abfragen", "Ergebnis", {"Enddatum"}, {"Enddatum"})
in
#"Erweiterte Ergebnis"
Hi, i realised I provided a wrong start date for James's record 2. I've edited my post. Now all three records are in a continuous series.
Hallo CharC,
/*
Punkt 3 hab ich nicht verstanden. Woran erkennt man den Simons Recorde?
Für die "normal gruppierte" Anzeige des Start- und Enddatums in Abhängigkeit des Namens, kannst Du so vorgehen:
*/
let
Ende = Table.Group(#"Geänderter Typ", {"Name"}, {{"Enddatum", each List.Max([Enddatum]), type nullable date}}),
Start = Table.Group(#"Geänderter Typ", {"Name"}, {{"Startdatum", each List.Min([Startdatum]), type nullable date}}) ,
Quelle = Excel.CurrentWorkbook(){[Name="tbQuelle"]}[Content],
#"Entfernte Spalten" = Table.RemoveColumns(Quelle,{"Aufzeichnung"}),
#"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Startdatum", type date}, {"Enddatum", type date}}),
Ergebnis = Ende,
#"Zusammengeführte Abfragen" = Table.NestedJoin(Start, {"Name"}, Ende, {"Name"}, "Ergebnis", JoinKind.LeftOuter),
#"Erweiterte Ergebnis" = Table.ExpandTableColumn(#"Zusammengeführte Abfragen", "Ergebnis", {"Enddatum"}, {"Enddatum"})
in
#"Erweiterte Ergebnis"