Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
CharC
Regular Visitor

Grouping and merging records (across multiple rows) by staff and continuous dates

Hi, I have a dataset where each staff member can have 1 or more records and each record has a start date and an end date.

 

namerecordstart dateend date
James112/11/201511/11/2017
James230/05/20173/07/2020
James33/07/202021/03/2033
Sarah130/05/20193/07/2033
Sarah23/07/202021/03/2032
Clare13/07/202021/03/2033
Simon112/11/201511/11/2017
Simon23/08/20213/09/2023

 

I want to turn it into the following table with the following conditions:

1. If the start date of the new record is on or before the end date of the previous record, then the records are in a continuous series.

2. If records are in a continuous series, keep the start date of the oldest record and keep the end date of the latest record.

3. If records are not in a continuous series (see 'Simon's records), then treat them as separate records with their own start and end dates.

namestart dateend date
James12/11/201521/03/2033
Sarah30/05/201921/03/2032
Clare3/07/202021/03/2033
Simon12/11/201511/11/2017
Simon3/08/20213/09/2023

 

I'm new to power query and not sure how to to compare dates in multiple rows. 

 

I've tried the method from a solved post: https://statics.teams.cdn.office.net/evergreen-assets/safelinks/1/atp-safelinks.html

but I can't make it work for my scenario. For example, using that method I could not merge all 3 records from James into 1 record, and I got the following instead.

namestart dateend date
James12/11/20153/07/2020
James3/07/202021/03/2033
1 ACCEPTED SOLUTION
AlienSx
Super User
Super User

2 rows for James... 

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    types = Table.TransformColumnTypes(Source, {{"start date", type date}, {"end date", type date}}),
    sort = Table.Sort(types, {"name", "start date"}), 
    rows = List.Buffer(Table.ToList(sort, each _)),
    gen = List.Generate(
        () => [i = 0, new = true, max = rows{0}{3}, group = 1],
        (x) => x[i] < List.Count(rows), 
        (x) => [
            i = x[i] + 1, 
            new = (rows{i}{0} <> rows{i - 1}{0}) or (rows{i}{2} > x[max]), 
            max = if new then rows{i}{3} else List.Max({x[max], rows{i}{3}}),
            group = if new then x[group] + 1 else x[group]
        ], 
        (x) => rows{x[i]} & {x[group]}
    ), 
    group = Table.Group(
        Table.FromList(gen, each _, Table.ColumnNames(Source) & {"gr"}), 
        {"gr", "name"}, 
        {
            {"start date", (x) => List.Min(x[start date])}, 
            {"end date", (x) => List.Max(x[end date])}
        }
    )
in
    group

gr.png

 

View solution in original post

9 REPLIES 9
AlienSx
Super User
Super User

2 rows for James... 

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    types = Table.TransformColumnTypes(Source, {{"start date", type date}, {"end date", type date}}),
    sort = Table.Sort(types, {"name", "start date"}), 
    rows = List.Buffer(Table.ToList(sort, each _)),
    gen = List.Generate(
        () => [i = 0, new = true, max = rows{0}{3}, group = 1],
        (x) => x[i] < List.Count(rows), 
        (x) => [
            i = x[i] + 1, 
            new = (rows{i}{0} <> rows{i - 1}{0}) or (rows{i}{2} > x[max]), 
            max = if new then rows{i}{3} else List.Max({x[max], rows{i}{3}}),
            group = if new then x[group] + 1 else x[group]
        ], 
        (x) => rows{x[i]} & {x[group]}
    ), 
    group = Table.Group(
        Table.FromList(gen, each _, Table.ColumnNames(Source) & {"gr"}), 
        {"gr", "name"}, 
        {
            {"start date", (x) => List.Min(x[start date])}, 
            {"end date", (x) => List.Max(x[end date])}
        }
    )
in
    group

gr.png

 

Hi, after I corrected the start date of the second record for James, I have been able to use your solution to get the desired output. This method is also the fastest. Thank you very much!

Hi, I realised I provided a wrong date for James' record 2. The start date should have the year 2017. I've edited my post. So all three records in a continuous series.

mromain
Regular Visitor

Hello everyone,

 

Here is another solution.
However, following the defined rules, I find 2 rows for James.

 

let
    Source = #table(
        type table [name = text, #"start date" = date, #"end date" = date],
        {
            {"James", #date(2015, 11, 12), #date(2017, 11, 11)},
            {"James", #date(2019, 5, 30), #date(2020, 7, 3)},
            {"James", #date(2020, 7, 3), #date(2033, 3, 21)},
            {"Sarah", #date(2019, 5, 30), #date(2033, 7, 3)},
            {"Sarah", #date(2020, 7, 3), #date(2032, 3, 21)},
            {"Clare", #date(2020, 7, 3), #date(2033, 3, 21)},
            {"Simon", #date(2015, 11, 12), #date(2017, 11, 11)},
            {"Simon", #date(2021, 8, 3), #date(2023, 9, 3)}
        }),
    TransformSource = #table(type table [ID=Int64.Type, data=record], List.Zip({{1 .. Table.RowCount(Source)}, Table.ToRecords(Source)})),
    fnGetEndDate = (r as record) as date =>
        let 
            nextRecord = try TransformSource{[ID = r[ID]+1]} otherwise null,
            result = if nextRecord is null then r[data][#"end date"]
                     else if nextRecord[data][name] <> r[data][name] then r[data][#"end date"]
                     else if (nextRecord[data][name] = r[data][name]) and (nextRecord[data][#"start date"] > r[data][#"end date"]) then r[data][#"end date"]
                     else @fnGetEndDate(nextRecord)
        in
            result,
    fnGetStartDate = (r as record) as date =>
        let 
            prevRecord = try TransformSource{[ID = r[ID]-1]} otherwise null,
            result = if prevRecord is null then r[data][#"start date"]
                     else if prevRecord[data][name] <> r[data][name] then r[data][#"start date"]
                     else if (prevRecord[data][name] = r[data][name]) and (prevRecord[data][#"end date"] < r[data][#"start date"]) then r[data][#"start date"]
                     else @fnGetStartDate(prevRecord)
        in
            result,
    TransformData = Table.FromRecords(Table.TransformRows(TransformSource, each Record.AddField(Record.AddField(Record.AddField(_, "name", [data][name]), "start date", fnGetStartDate(_)), "end date", fnGetEndDate(_)))),
    RemoveDuplicates = Table.Distinct(TransformData[[name], [#"start date"], [#"end date"]])
in
    RemoveDuplicates

Hi, after I corrected the start date of the second record for James, I have been able to use your solution to get the desired output. This method seems to take a long time when the dataset is large (I have a bout 8000+ records in dataset). But still thank you very much!

Hi, i realised I provided a wrong start date for James's second record. I've edited my post. Now all 3 records are in a continuous series. 

ralf_anton
Frequent Visitor

/*
...den 3. Punkt hab ich nicht verstanden. Woran erkennt man denn Simons Rekorde?
Für die "normale" Anzeige der gruppierten Start- und Enddaten kannst Du folgendermaßen vorgehen:
*/

let

    Ende = Table.Group(#"Geänderter Typ", {"Name"}, {{"Enddatum", each List.Max([Enddatum]), type nullable date}}),
    Start = Table.Group(#"Geänderter Typ", {"Name"}, {{"Startdatum", each List.Min([Startdatum]), type nullable date}})    ,
    Quelle = Excel.CurrentWorkbook(){[Name="tbQuelle"]}[Content],
    #"Entfernte Spalten" = Table.RemoveColumns(Quelle,{"Aufzeichnung"}),
    #"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Startdatum", type date}, {"Enddatum", type date}}),
    Ergebnis = Ende,
    #"Zusammengeführte Abfragen" = Table.NestedJoin(Start, {"Name"}, Ende, {"Name"}, "Ergebnis", JoinKind.LeftOuter),
    #"Erweiterte Ergebnis" = Table.ExpandTableColumn(#"Zusammengeführte Abfragen", "Ergebnis", {"Enddatum"}, {"Enddatum"})
in
    #"Erweiterte Ergebnis"

Hi, i realised I provided a wrong start date for James's record 2. I've edited my post. Now all three records are in a continuous series.

ralf_anton
Frequent Visitor

Hallo CharC,

 

/*
Punkt 3 hab ich nicht verstanden. Woran erkennt man den Simons Recorde?
Für die "normal gruppierte" Anzeige des Start- und Enddatums in Abhängigkeit des Namens, kannst Du so vorgehen:
*/

let

    Ende = Table.Group(#"Geänderter Typ", {"Name"}, {{"Enddatum", each List.Max([Enddatum]), type nullable date}}),
    Start = Table.Group(#"Geänderter Typ", {"Name"}, {{"Startdatum", each List.Min([Startdatum]), type nullable date}})    ,
    Quelle = Excel.CurrentWorkbook(){[Name="tbQuelle"]}[Content],
    #"Entfernte Spalten" = Table.RemoveColumns(Quelle,{"Aufzeichnung"}),
    #"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Startdatum", type date}, {"Enddatum", type date}}),
    Ergebnis = Ende,
    #"Zusammengeführte Abfragen" = Table.NestedJoin(Start, {"Name"}, Ende, {"Name"}, "Ergebnis", JoinKind.LeftOuter),
    #"Erweiterte Ergebnis" = Table.ExpandTableColumn(#"Zusammengeführte Abfragen", "Ergebnis", {"Enddatum"}, {"Enddatum"})
in
    #"Erweiterte Ergebnis"

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.