Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
CharC
Regular Visitor

Grouping and merging records (across multiple rows) by staff and continuous dates

Hi, I have a dataset where each staff member can have 1 or more records and each record has a start date and an end date.

 

namerecordstart dateend date
James112/11/201511/11/2017
James230/05/20173/07/2020
James33/07/202021/03/2033
Sarah130/05/20193/07/2033
Sarah23/07/202021/03/2032
Clare13/07/202021/03/2033
Simon112/11/201511/11/2017
Simon23/08/20213/09/2023

 

I want to turn it into the following table with the following conditions:

1. If the start date of the new record is on or before the end date of the previous record, then the records are in a continuous series.

2. If records are in a continuous series, keep the start date of the oldest record and keep the end date of the latest record.

3. If records are not in a continuous series (see 'Simon's records), then treat them as separate records with their own start and end dates.

namestart dateend date
James12/11/201521/03/2033
Sarah30/05/201921/03/2032
Clare3/07/202021/03/2033
Simon12/11/201511/11/2017
Simon3/08/20213/09/2023

 

I'm new to power query and not sure how to to compare dates in multiple rows. 

 

I've tried the method from a solved post: https://statics.teams.cdn.office.net/evergreen-assets/safelinks/1/atp-safelinks.html

but I can't make it work for my scenario. For example, using that method I could not merge all 3 records from James into 1 record, and I got the following instead.

namestart dateend date
James12/11/20153/07/2020
James3/07/202021/03/2033
1 ACCEPTED SOLUTION
AlienSx
Super User
Super User

2 rows for James... 

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    types = Table.TransformColumnTypes(Source, {{"start date", type date}, {"end date", type date}}),
    sort = Table.Sort(types, {"name", "start date"}), 
    rows = List.Buffer(Table.ToList(sort, each _)),
    gen = List.Generate(
        () => [i = 0, new = true, max = rows{0}{3}, group = 1],
        (x) => x[i] < List.Count(rows), 
        (x) => [
            i = x[i] + 1, 
            new = (rows{i}{0} <> rows{i - 1}{0}) or (rows{i}{2} > x[max]), 
            max = if new then rows{i}{3} else List.Max({x[max], rows{i}{3}}),
            group = if new then x[group] + 1 else x[group]
        ], 
        (x) => rows{x[i]} & {x[group]}
    ), 
    group = Table.Group(
        Table.FromList(gen, each _, Table.ColumnNames(Source) & {"gr"}), 
        {"gr", "name"}, 
        {
            {"start date", (x) => List.Min(x[start date])}, 
            {"end date", (x) => List.Max(x[end date])}
        }
    )
in
    group

gr.png

 

View solution in original post

9 REPLIES 9
AlienSx
Super User
Super User

2 rows for James... 

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    types = Table.TransformColumnTypes(Source, {{"start date", type date}, {"end date", type date}}),
    sort = Table.Sort(types, {"name", "start date"}), 
    rows = List.Buffer(Table.ToList(sort, each _)),
    gen = List.Generate(
        () => [i = 0, new = true, max = rows{0}{3}, group = 1],
        (x) => x[i] < List.Count(rows), 
        (x) => [
            i = x[i] + 1, 
            new = (rows{i}{0} <> rows{i - 1}{0}) or (rows{i}{2} > x[max]), 
            max = if new then rows{i}{3} else List.Max({x[max], rows{i}{3}}),
            group = if new then x[group] + 1 else x[group]
        ], 
        (x) => rows{x[i]} & {x[group]}
    ), 
    group = Table.Group(
        Table.FromList(gen, each _, Table.ColumnNames(Source) & {"gr"}), 
        {"gr", "name"}, 
        {
            {"start date", (x) => List.Min(x[start date])}, 
            {"end date", (x) => List.Max(x[end date])}
        }
    )
in
    group

gr.png

 

Hi, after I corrected the start date of the second record for James, I have been able to use your solution to get the desired output. This method is also the fastest. Thank you very much!

Hi, I realised I provided a wrong date for James' record 2. The start date should have the year 2017. I've edited my post. So all three records in a continuous series.

mromain
New Member

Hello everyone,

 

Here is another solution.
However, following the defined rules, I find 2 rows for James.

 

let
    Source = #table(
        type table [name = text, #"start date" = date, #"end date" = date],
        {
            {"James", #date(2015, 11, 12), #date(2017, 11, 11)},
            {"James", #date(2019, 5, 30), #date(2020, 7, 3)},
            {"James", #date(2020, 7, 3), #date(2033, 3, 21)},
            {"Sarah", #date(2019, 5, 30), #date(2033, 7, 3)},
            {"Sarah", #date(2020, 7, 3), #date(2032, 3, 21)},
            {"Clare", #date(2020, 7, 3), #date(2033, 3, 21)},
            {"Simon", #date(2015, 11, 12), #date(2017, 11, 11)},
            {"Simon", #date(2021, 8, 3), #date(2023, 9, 3)}
        }),
    TransformSource = #table(type table [ID=Int64.Type, data=record], List.Zip({{1 .. Table.RowCount(Source)}, Table.ToRecords(Source)})),
    fnGetEndDate = (r as record) as date =>
        let 
            nextRecord = try TransformSource{[ID = r[ID]+1]} otherwise null,
            result = if nextRecord is null then r[data][#"end date"]
                     else if nextRecord[data][name] <> r[data][name] then r[data][#"end date"]
                     else if (nextRecord[data][name] = r[data][name]) and (nextRecord[data][#"start date"] > r[data][#"end date"]) then r[data][#"end date"]
                     else @fnGetEndDate(nextRecord)
        in
            result,
    fnGetStartDate = (r as record) as date =>
        let 
            prevRecord = try TransformSource{[ID = r[ID]-1]} otherwise null,
            result = if prevRecord is null then r[data][#"start date"]
                     else if prevRecord[data][name] <> r[data][name] then r[data][#"start date"]
                     else if (prevRecord[data][name] = r[data][name]) and (prevRecord[data][#"end date"] < r[data][#"start date"]) then r[data][#"start date"]
                     else @fnGetStartDate(prevRecord)
        in
            result,
    TransformData = Table.FromRecords(Table.TransformRows(TransformSource, each Record.AddField(Record.AddField(Record.AddField(_, "name", [data][name]), "start date", fnGetStartDate(_)), "end date", fnGetEndDate(_)))),
    RemoveDuplicates = Table.Distinct(TransformData[[name], [#"start date"], [#"end date"]])
in
    RemoveDuplicates

Hi, after I corrected the start date of the second record for James, I have been able to use your solution to get the desired output. This method seems to take a long time when the dataset is large (I have a bout 8000+ records in dataset). But still thank you very much!

Hi, i realised I provided a wrong start date for James's second record. I've edited my post. Now all 3 records are in a continuous series. 

ralf_anton
Regular Visitor

/*
...den 3. Punkt hab ich nicht verstanden. Woran erkennt man denn Simons Rekorde?
Für die "normale" Anzeige der gruppierten Start- und Enddaten kannst Du folgendermaßen vorgehen:
*/

let

    Ende = Table.Group(#"Geänderter Typ", {"Name"}, {{"Enddatum", each List.Max([Enddatum]), type nullable date}}),
    Start = Table.Group(#"Geänderter Typ", {"Name"}, {{"Startdatum", each List.Min([Startdatum]), type nullable date}})    ,
    Quelle = Excel.CurrentWorkbook(){[Name="tbQuelle"]}[Content],
    #"Entfernte Spalten" = Table.RemoveColumns(Quelle,{"Aufzeichnung"}),
    #"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Startdatum", type date}, {"Enddatum", type date}}),
    Ergebnis = Ende,
    #"Zusammengeführte Abfragen" = Table.NestedJoin(Start, {"Name"}, Ende, {"Name"}, "Ergebnis", JoinKind.LeftOuter),
    #"Erweiterte Ergebnis" = Table.ExpandTableColumn(#"Zusammengeführte Abfragen", "Ergebnis", {"Enddatum"}, {"Enddatum"})
in
    #"Erweiterte Ergebnis"

Hi, i realised I provided a wrong start date for James's record 2. I've edited my post. Now all three records are in a continuous series.

ralf_anton
Regular Visitor

Hallo CharC,

 

/*
Punkt 3 hab ich nicht verstanden. Woran erkennt man den Simons Recorde?
Für die "normal gruppierte" Anzeige des Start- und Enddatums in Abhängigkeit des Namens, kannst Du so vorgehen:
*/

let

    Ende = Table.Group(#"Geänderter Typ", {"Name"}, {{"Enddatum", each List.Max([Enddatum]), type nullable date}}),
    Start = Table.Group(#"Geänderter Typ", {"Name"}, {{"Startdatum", each List.Min([Startdatum]), type nullable date}})    ,
    Quelle = Excel.CurrentWorkbook(){[Name="tbQuelle"]}[Content],
    #"Entfernte Spalten" = Table.RemoveColumns(Quelle,{"Aufzeichnung"}),
    #"Geänderter Typ" = Table.TransformColumnTypes(#"Entfernte Spalten",{{"Startdatum", type date}, {"Enddatum", type date}}),
    Ergebnis = Ende,
    #"Zusammengeführte Abfragen" = Table.NestedJoin(Start, {"Name"}, Ende, {"Name"}, "Ergebnis", JoinKind.LeftOuter),
    #"Erweiterte Ergebnis" = Table.ExpandTableColumn(#"Zusammengeführte Abfragen", "Ergebnis", {"Enddatum"}, {"Enddatum"})
in
    #"Erweiterte Ergebnis"

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors