March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I have an export of contract data where overlapping contracts need to be grouped into continuous dates, with the first/last dates for each group. Individual IDs may have more than one group of contracts, as not all contracts will overlap.
e.g.
ID | Contract_Start | Contract_End |
100 | 01/01/2018 | 30/04/2018 |
100 | 01/04/2018 | 31/12/2018 |
100 | 01/06/2018 | 31/10/2018 |
100 | 01/04/2019 | 31/10/2019 |
100 | 01/06/2019 | 31/12/2019 |
101 | 01/01/2018 | 30/06/2018 |
101 | 01/05/2018 | 30/11/2018 |
101 | 01/01/2019 | 31/05/2019 |
101 | 01/02/2019 | 30/06/2019 |
102 | 01/01/2018 | 31/08/2018 |
102 | 01/04/2019 | 31/08/2019 |
102 | 01/05/2019 | 30/06/2019 |
would become:
ID | Continuous_Start | Continuous_End |
100 | 01/01/2018 | 31/12/2018 |
100 | 01/04/2019 | 31/12/2019 |
101 | 01/01/2018 | 30/11/2018 |
101 | 01/01/2019 | 30/06/2019 |
102 | 01/01/2018 | 31/08/2018 |
102 | 01/04/2019 | 31/08/2019 |
I have grouped all contracts with the start/end of all contracts for an ID using a merge into new table and then grouping by ID, but I haven't been able to find a way to split these into groups of overlapping/continuous dates, with a new grouping for an ID where there has been a gap between contracts.
Please can you help?
Solved! Go to Solution.
Here a complete solution (to test, of course, with more significant data).
let
#"Raggruppate righe" = Table.Group(yourTable, {"ID"}, {{"Conteggio", each Table.RowCount(_), type number}, {"startend", each CSEsorted( _[Contract_Start], _[Contract_End])}}),
#"Tabella startend espansa" = Table.ExpandListColumn(#"Raggruppate righe", "startend"),
#"Valori estratti" = Table.TransformColumns(#"Tabella startend espansa", {"startend", each Text.Combine(List.Transform(_, Text.From), ":"), type text})
in
#"Valori estratti"
this modified function does not rely on the order of identifiers.
let
CS_CE=(start as list, end as list)=>
let
startend=List.Sort(List.Zip({start,end}),(x,y)=>Value.Compare(x{0},y{0})),
unionInt=List.Accumulate(startend, [CS={List.Min(start)},CE={List.Min(start)}],
(s,c)=> [CS= if c{0}>List.Last(s[CE]) then s[CS]&{c{0}} else s[CS],
CE= if c{0}>List.Last(s[CE]) then s[CE]&{c{1}} else
if c{1}>List.Last(s[CE]) then List.RemoveLastN(s[CE],1)&{c{1}} else s[CE]])
in
List.Zip(Record.ToList(unionInt))
in CS_CE
in this case the result seems what you expect, but I suggest you test other situations as well
A solution that does not use the table.group:
let
cp = Table.AddColumn(yourtab, "grpIdx", each Table.PositionOf(tab1[[ID],[Contract_Start],[Contract_End]],[[ID],[Contract_Start],[Contract_End]],Occurrence.All,(x,y)=> x[ID]=y[ID] and
IntervalIntersect({x[Contract_Start],x[Contract_End]},{y[Contract_Start],y[Contract_End]}) )),
ap = Table.AddColumn(cp, "union", each let un=List.Accumulate(cp[grpIdx],{},(s,c)=> if not List.IsEmpty(List.Intersect({_[grpIdx],c})) then List.Union({s,c}) else s)
in Text.Combine(List.Transform({cp[Contract_Start]{List.Min(un)},cp[Contract_End]{List.Max(un)}}, Text.From), ":")),
#"Rimossi duplicati" = Table.Distinct(ap)
in
#"Rimossi duplicati"
where IntervalIntersect is the following function:
= (x ,y)=>
let ovl= x{1}>y{0} and x{0}<y{1} in ovl
in an attempt to solve this problem, I thought it might be useful (even just as an exercise) to have a function that, from a generic list of intervals, obtains the union-of-these-intervals-list.
Here the code:
let
union=(lstInt as list)=>
let
sortU=List.Sort(lstInt,(x,y)=>Value.Compare(x{0},y{0})),
UI=List.Accumulate(sortU, {{null,null}},
(s,c)=> if IntIntersection(c,List.Last(s))
then List.Union({List.RemoveLastN(s,1),{{List.Min({c{0},List.Last(s){0}}),List.Max({c{1},List.Last(s){1}})}}})
else List.Union({s , {c}})
)
in UI
in union
the function IntIntersection, to check if two intervals intersect
let
overlapInt=(x ,y)=>
let ovl= x{1}>=y{0} and x{0}<=y{1} ,
OVL=if ovl=null then true else ovl
in OVL
in overlapInt
here how you can use it in the problem:
...
grp = Table.Group(youTab, {"ID"}, {"StartEnd", each IntUnion(List.Zip( {_[Contract_Start], _[Contract_End]}))}),
te = Table.ExpandListColumn(grp, "StartEnd"),
ve = Table.TransformColumns(te, {"StartEnd", each Text.Combine(List.Transform(_, Text.From), ":"), type text})
in
ve
Try using this function for each group:
let
CS_CE=(start as list, end as list)=>
let
aaa=List.Accumulate(List.Zip({start, end}), [CS={List.Min(start)},CE={List.Min(start)}],
(s,c)=> [CS= if c{0}>List.Last(s[CE]) then s[CS]&{c{0}} else s[CS],
CE= if c{0}>List.Last(s[CE]) then s[CE]&{c{1}} else
if c{1}>List.Last(s[CE]) then List.RemoveLastN(s[CE],1)&{c{1}} else s[CE]])
in
aaa
in CS_CE
At the moment I don't have enough time to write a complete solution.
If it is needed and the function is considered valid (tests can be done by providing ad hoc lists), I will do it later.
Here a complete solution (to test, of course, with more significant data).
let
#"Raggruppate righe" = Table.Group(yourTable, {"ID"}, {{"Conteggio", each Table.RowCount(_), type number}, {"startend", each CSEsorted( _[Contract_Start], _[Contract_End])}}),
#"Tabella startend espansa" = Table.ExpandListColumn(#"Raggruppate righe", "startend"),
#"Valori estratti" = Table.TransformColumns(#"Tabella startend espansa", {"startend", each Text.Combine(List.Transform(_, Text.From), ":"), type text})
in
#"Valori estratti"
this modified function does not rely on the order of identifiers.
let
CS_CE=(start as list, end as list)=>
let
startend=List.Sort(List.Zip({start,end}),(x,y)=>Value.Compare(x{0},y{0})),
unionInt=List.Accumulate(startend, [CS={List.Min(start)},CE={List.Min(start)}],
(s,c)=> [CS= if c{0}>List.Last(s[CE]) then s[CS]&{c{0}} else s[CS],
CE= if c{0}>List.Last(s[CE]) then s[CE]&{c{1}} else
if c{1}>List.Last(s[CE]) then List.RemoveLastN(s[CE],1)&{c{1}} else s[CE]])
in
List.Zip(Record.ToList(unionInt))
in CS_CE
in this case the result seems what you expect, but I suggest you test other situations as well
Interesting problem. Does it have to be done in Power Query? If this data would reside in SQL server then it would be much easier to do the logic there.
In your sample data some durations are embedded in others (for example row 3 for ID 100 is fully covered by row 2). Is that intended?
Thanks for your response @lbendlin .
Currently the data is not able to be supplied via SQL server, but it is my intention to move to this in the longer term where, as you point out, the logic would be much easier.
Also, some contracts are embedded within earlier contracts, due to the way the contracts have been issued historically, so the sample data is intended to replicate this issue.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
21 | |
16 | |
13 | |
12 | |
9 |
User | Count |
---|---|
34 | |
31 | |
20 | |
19 | |
17 |