Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
AshBee
Frequent Visitor

Identifying clashes and overlapping events

Hello, I'm trying to identify clashes and overlapping events in Power Query. As I've got millions of rows so I really need to do this in Power Query. Here is an example of my dataset:

 

I'm trying to calculate if there is a clash and return a 0 or 1, and then if the person attends at least one of the clashing activities return an Adjusted Status so they don't get penalised not for attending the clashing activity. Any advice appreciated. I've tried grouping on ID, and ID, EventStartTime and EventEndTime, sorting and adding an index (terrible idea too many rows).

IDEventStartTimeEventEndTimeStatusClashAdjusted Status
Person A27/09/2024 11:00:0027/09/2024 12:00:00Attended1Attended
Person A27/09/2024 11:00:0027/09/2024 13:00:00Not Attended1Attended
Person A29/09/2024 11:00:0029/09/2024 12:00:00Not Attended0Not Attended
Person B27/09/2024 11:00:0027/09/2024 12:00:00Not Attended0Not Attended
Person C01/10/2024 09:00:0001/10/2024 11:00:00Attended1Attended
Person C01/10/2024 10:00:0001/10/2024 12:00:00Not Attended1Attended

Table Example p.PNG

1 ACCEPTED SOLUTION

Hi @AshBee ,

 

Please try this updated code with the [Clash] column added:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lY/BCoAgDEB/RTwHbisIu1X36Jx4y2tC+f8kBDJCyWCnN/Z4M0au7rz8IUbZSOoVaEVAnUAcAOK8KCU6huCO3e3SNj8dbaKLD6Lo0VmPzrWUPNPPn0qeOe4AFcJzATpdMMrsFQ6ErOO7ZYs7pFwLo8xe4WDlnJZa7A0=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),

// Relevant steps from here ===>
    sort_ID_Start_End = Table.Sort(chgTypes,{{"ID", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}}),
    addEventStartDate = Table.AddColumn(sort_ID_Start_End, "EventStartDate", each Date.From([EventStartTime]), type date),
    groupRows = Table.Group(addEventStartDate, {"ID", "EventStartDate"}, {{"FirstEventEnd", each _[EventEndTime]{0}}, {"data", each _, type table [ID=nullable text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, Status=nullable text, EventStartDate=date]}}),
    addNestedIndex = Table.TransformColumns(groupRows, {"data", each Table.AddIndexColumn(_, "Index", 1, 1)}),
    addClash = Table.AddColumn(addNestedIndex, "Clash",
        each List.Contains(
            List.Skip([data][EventStartTime], 1),
            [FirstEventEnd],
            (x, y)=> x < y
        )
    ),
    expandData = Table.ExpandTableColumn(addClash, "data", {"EventStartTime", "EventEndTime", "Status", "Index"}, {"EventStartTime", "EventEndTime", "Status", "Index"}),
    addAdjustedStatus = Table.AddColumn(expandData, "AdjustedStatus",
        each if [Index] = 1 then [Status]
        else if [EventStartTime] < [FirstEventEnd] then "Attended"
        else [Status]
    )
in
    addAdjustedStatus

 

It now turns this (note added Person Z for testing a different scenario):

BA_Pete_0-1730445299054.png

 

...to this:

BA_Pete_1-1730445377273.png

 

I think this should still be fairly performant over a large dataset, but you'll have to test tbh.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

16 REPLIES 16
AlienSx
Super User
Super User

As far as I understand, overlapping events may form a chain of events with different start and end datetimes. So I would not rely on grouping by start date. So I'd go this way:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    nulls = List.Buffer(List.Repeat({null}, Table.ColumnCount(Source))),
    rows = List.Buffer(Table.ToList(Table.Sort(Source, {"ID", "EventStartTime"}), (x) => x)),
    gen = List.Generate(
        () => [i = 0, r = rows{0}, new = false, max = r{2}, over = 0, att = r{3}, prev = {}],
        (x) => x[i] < List.Count(rows),
        (x) => 
            [
                i = x[i] + 1, 
                r = rows{i},
                new = r{1} > x[max] or r{0} <> x[r]{0},
                max = if new then r{2} else List.Max({x[max], r{2}}),
                over = if new then 0 else 1,
                att = if new then r{3} else if List.Contains({r{3}, x[att]}, "Attended") then "Attended" else "Not Attended",
                prev = {x[over], x[att]}
            ],
        (x) => (if x[new] then {nulls & x[prev]} else {}) & {x[r]} & (if x[i] = List.Count(rows) - 1 then {nulls & {x[over], x[att]}} else {})
    ), 
    to_table = Table.FromList(List.Combine(gen), (x) => x, Table.ColumnNames(Source) & {"Clash", "Adjusted Status"}),
    up = Table.FillUp(to_table,{"Clash", "Adjusted Status"}), 
    result = Table.SelectRows(up, (x) => x[ID] <> null)
in
    result

 

Omid_Motamedise
Memorable Member
Memorable Member

Your question is not completely clear, but based on my underestanding, the below formula might help you, otherwise, please provide your sample table and result table seperately

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMtc3sNQ3MjAyUTA0tDIwACI0USO4qGNJSWpeSmoKkGmoFKtDojnGcFG//BIFvGZZYjXLEpub0MwyQDbLiUT/4TPLGSRgqG9oANFlYAnXhSSKZAMu/6GZY2iA1RycbgKaFQsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t, Clash = _t]),
    #"Grouped Rows" = Table.Group(Source, {"ID", "EventStartTime"}, {{"all", each _},{"Adjusted Status", each if List.Contains(_[Status],"Attended") then "Attended" else "not Attended"}})
in
    #"Grouped Rows"

 

Thanks Omid, but my table contains:

  • ID
  • EventStartTime
  • EventEndTime
  • Status

I'm trying to add two additional columns:

  • Clash - which would return 1 if the person had a clash (more than one event at the same time or an overlapping event)
  • If the person has a clash, and has attended one of the clashing activities (as indicated in the Status column) then I want to return 'Attended' in the Adjusted Status column.

I think your code relies on me already having a Clash column I don't have that yet.

Ok catch it, 
just give me some times, I will back by the solution.

The below code is for the firs part (clash) please check it over your 1 million data and see it does run in rationable time or not, if it does, I will provide the next solution.

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMtc3sNQ3MjAyUTA0tDIwACI0USO4qGNJSWpeSmqKUqwOiWYYw0X98ksUcJpjidUcS2xuwWWOE4l+wmWOM1DOwFDf0ACiw8ASrgNJFMl0IswwNMBqBi63xAIA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
CH = Table.TransformColumnTypes(Source,{{"ID", type text}, {"EventStartTime", type datetime}, {"EventEndTime", type datetime}, {"Status", type text}}),
T=Table.Buffer(CH),
#"Added Custom" = Table.AddColumn(T, "Clash", each if Table.RowCount(Table.SelectRows(CH, (x)=>
x[ID]=_[ID] and (
(_[EventStartTime]>=x[EventStartTime] and
_[EventStartTime]<=x[EventEndTime]) or
(_[EventEndTime]>=x[EventStartTime] and
_[EventEndTime]<=x[EventEndTime]))

))>1 then 1 else 0)
in
#"Added Custom"

Thank you for spending some time looking Omid. Unfortunately it's taking about a minute to process a single row. I'm going to let it run for half an hour to see how it goes - it's not looking good so far 😞

How many columns do you have in youtube orgine table?

14 columns, and nearly two million rows, which will grow each day.

Replace 

T=Table.Buffer(CH)

 

With

T=Table.Buffer(Table.SelectColumns(CH,{"EventStartTime","EventEndTime","ID"}))

 

AshBee
Frequent Visitor

I've also tried this: 

#"Added Clash Column" = Table.AddColumn(
#"Grouped Rows",
"Clash",
each
let
IdentifyingClashes = [IdentifyingClashes],
clashCount = Table.RowCount(IdentifyingClashes) // Count the number of rows in the group
in
if clashCount <= 1 then
0 // No clash if there's one or no event
else
let
// Extract Event Start and End Times
events = IdentifyingClashes,
clashesFound = List.AnyTrue(
List.Transform(
List.Generate(
() => [i = 0],
each [i] < clashCount - 1,
each [i = [i] + 1],
each [i]
),
(i) =>
List.AnyTrue(
List.Transform(
List.Generate(
() => [j = i + 1],
each [j] < clashCount,
each [j = [j] + 1],
each [j]
),
(j) =>
// Check if the times overlap considering the date
let
startTime1 = events{i}[EventStartTime],
endTime1 = events{i}[EventEndTime],
startTime2 = events{j}[EventStartTime],
endTime2 = events{j}[EventEndTime],
date1 = DateTime.Date(startTime1),
date2 = DateTime.Date(startTime2)
in
// Check if events are on the same date and overlap
(date1 = date2) and (startTime1 < endTime2) and (endTime1 > startTime2)
)
)
)
)
in
if clashesFound then 1 else 0
)

in
#"Added Clash Column"

Hi @AshBee ,

 

Try something like this. You can paste the following into a new blank query to see the steps I took:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMtc3sNQ3MjAyUTA0tDIwACI0USO4qGNJSWpeSmqKUqwOiWYYw0X98ksUcJpjidUcS2xuwWWOE4l+wmWOM1DOwFDf0ACiw8ASrgNJFMl0IswwNMBqBi63xAIA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),

// Relevant steps from here ===>
    sort_ID_Start_End = Table.Sort(chgTypes,{{"ID", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}}),
    addEventStartDate = Table.AddColumn(sort_ID_Start_End, "EventStartDate", each Date.From([EventStartTime]), type date),
    groupRows = Table.Group(addEventStartDate, {"ID", "EventStartDate"}, {{"FirstEventEnd", each _[EventEndTime]{0}}, {"data", each _, type table [ID=nullable text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, Status=nullable text, EventStartDate=date]}}),
    addNestedIndex = Table.TransformColumns(groupRows, {"data", each Table.AddIndexColumn(_, "Index", 1, 1)}),
    expandData = Table.ExpandTableColumn(addNestedIndex, "data", {"EventStartTime", "EventEndTime", "Status", "Index"}, {"EventStartTime", "EventEndTime", "Status", "Index"}),
    addAdjustedStatus = Table.AddColumn(expandData, "AdjustedStatus", each if [Index] = 1 then [Status]
else if [EventStartTime] < [FirstEventEnd] then "Attended"
else [Status])
in
    addAdjustedStatus

 

Summary:

sort_ID_Start_End = Sort the table to ensure the first-starting event always goes to the top.

addEventStartDate = Get the date portion of start to allow person-date grouping (assuming no events span two or more days).

groupRows = Group person-date, add a column that repeats the the earliest event start time per grouping, and nest the rest of the table.

addNestedIndex = Add an index column to the nested tables for later.

expandData = Get all our table columns back.

addAdjustedStatus  = Use the new index and earliest event start to apply the required logic.

 

The above example turns this:

BA_Pete_0-1730363465904.png

 

...to this, and should be pretty performant, even over large datasets:

BA_Pete_1-1730363526633.png

 

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thanks for your response Pete, I've been playing with this code to see if I can get it to return a clash value too. Will update on how it's gone. 

Hi @AshBee ,

 

Please try this updated code with the [Clash] column added:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lY/BCoAgDEB/RTwHbisIu1X36Jx4y2tC+f8kBDJCyWCnN/Z4M0au7rz8IUbZSOoVaEVAnUAcAOK8KCU6huCO3e3SNj8dbaKLD6Lo0VmPzrWUPNPPn0qeOe4AFcJzATpdMMrsFQ6ErOO7ZYs7pFwLo8xe4WDlnJZa7A0=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),

// Relevant steps from here ===>
    sort_ID_Start_End = Table.Sort(chgTypes,{{"ID", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}}),
    addEventStartDate = Table.AddColumn(sort_ID_Start_End, "EventStartDate", each Date.From([EventStartTime]), type date),
    groupRows = Table.Group(addEventStartDate, {"ID", "EventStartDate"}, {{"FirstEventEnd", each _[EventEndTime]{0}}, {"data", each _, type table [ID=nullable text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, Status=nullable text, EventStartDate=date]}}),
    addNestedIndex = Table.TransformColumns(groupRows, {"data", each Table.AddIndexColumn(_, "Index", 1, 1)}),
    addClash = Table.AddColumn(addNestedIndex, "Clash",
        each List.Contains(
            List.Skip([data][EventStartTime], 1),
            [FirstEventEnd],
            (x, y)=> x < y
        )
    ),
    expandData = Table.ExpandTableColumn(addClash, "data", {"EventStartTime", "EventEndTime", "Status", "Index"}, {"EventStartTime", "EventEndTime", "Status", "Index"}),
    addAdjustedStatus = Table.AddColumn(expandData, "AdjustedStatus",
        each if [Index] = 1 then [Status]
        else if [EventStartTime] < [FirstEventEnd] then "Attended"
        else [Status]
    )
in
    addAdjustedStatus

 

It now turns this (note added Person Z for testing a different scenario):

BA_Pete_0-1730445299054.png

 

...to this:

BA_Pete_1-1730445377273.png

 

I think this should still be fairly performant over a large dataset, but you'll have to test tbh.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi Pete, thank you so much. This was the solution. However, I found the impact on refresh for my dataset was too significant. For now I've had to handle this as a calculated column but will continue to beaver away with your code above to see if I can improve the report performance. Appreciate your input.

AshBee
Frequent Visitor

Hi, here's what I have, I hope this is enough to go on:

ChangetoDateTime = Table.TransformColumnTypes(LocationActivityTypeFlag, {{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),

SortedRows1 = if List.Contains(Table.ColumnNames(ChangetoDateTime), "Person") then
Table.Sort(ChangetoDateTime, {{"Person", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}})
else
ChangetoDateTime,

// Grouping Rows to prepare for clash detection
#"Grouped Rows" = Table.Group(SortedRows1, {"Person"},
{{"GroupedData", each _, type table [xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable number, xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable number, event_date=nullable date, startTime=nullable time, endTime=nullable time, emailAddress=nullable text, username=nullable text, xx=nullable text, xx=nullable text, #"Person"=text, CheckinStatus=text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, xx=number]}}
),

// Adding Clashes detection logic here after grouping
AddClashes = Table.AddColumn(#"Grouped Rows", "Clashes", each
let
CurrentGroup = [GroupedData],
ClashCount = Table.RowCount(CurrentGroup) - Table.RowCount(Table.Distinct(CurrentGroup, {"EventStartTime", "EventEndTime"}))
in
if ClashCount > 0 then "Yes" else "No"
),
#"Expanded GroupedData" = Table.ExpandTableColumn(AddClashes, "GroupedData", {"CheckinStatus", "EventStartTime", "EventEndTime"}, {"xx", "xx", "xx"})
in
#"Expanded GroupedData"

BA_Pete
Super User
Super User

Hi @AshBee ,

 

Are you able to share the M code for your attempt at using group/index please (anonymise any connection strings etc.). What you suggest sounds like a perfectly reasonable way to go about this problem, so it may just be a case of optimising what you've done so it can run quickly over so many rows.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors