Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hello, I'm trying to identify clashes and overlapping events in Power Query. As I've got millions of rows so I really need to do this in Power Query. Here is an example of my dataset:
I'm trying to calculate if there is a clash and return a 0 or 1, and then if the person attends at least one of the clashing activities return an Adjusted Status so they don't get penalised not for attending the clashing activity. Any advice appreciated. I've tried grouping on ID, and ID, EventStartTime and EventEndTime, sorting and adding an index (terrible idea too many rows).
ID | EventStartTime | EventEndTime | Status | Clash | Adjusted Status |
Person A | 27/09/2024 11:00:00 | 27/09/2024 12:00:00 | Attended | 1 | Attended |
Person A | 27/09/2024 11:00:00 | 27/09/2024 13:00:00 | Not Attended | 1 | Attended |
Person A | 29/09/2024 11:00:00 | 29/09/2024 12:00:00 | Not Attended | 0 | Not Attended |
Person B | 27/09/2024 11:00:00 | 27/09/2024 12:00:00 | Not Attended | 0 | Not Attended |
Person C | 01/10/2024 09:00:00 | 01/10/2024 11:00:00 | Attended | 1 | Attended |
Person C | 01/10/2024 10:00:00 | 01/10/2024 12:00:00 | Not Attended | 1 | Attended |
Solved! Go to Solution.
Hi @AshBee ,
Please try this updated code with the [Clash] column added:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lY/BCoAgDEB/RTwHbisIu1X36Jx4y2tC+f8kBDJCyWCnN/Z4M0au7rz8IUbZSOoVaEVAnUAcAOK8KCU6huCO3e3SNj8dbaKLD6Lo0VmPzrWUPNPPn0qeOe4AFcJzATpdMMrsFQ6ErOO7ZYs7pFwLo8xe4WDlnJZa7A0=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
chgTypes = Table.TransformColumnTypes(Source,{{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),
// Relevant steps from here ===>
sort_ID_Start_End = Table.Sort(chgTypes,{{"ID", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}}),
addEventStartDate = Table.AddColumn(sort_ID_Start_End, "EventStartDate", each Date.From([EventStartTime]), type date),
groupRows = Table.Group(addEventStartDate, {"ID", "EventStartDate"}, {{"FirstEventEnd", each _[EventEndTime]{0}}, {"data", each _, type table [ID=nullable text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, Status=nullable text, EventStartDate=date]}}),
addNestedIndex = Table.TransformColumns(groupRows, {"data", each Table.AddIndexColumn(_, "Index", 1, 1)}),
addClash = Table.AddColumn(addNestedIndex, "Clash",
each List.Contains(
List.Skip([data][EventStartTime], 1),
[FirstEventEnd],
(x, y)=> x < y
)
),
expandData = Table.ExpandTableColumn(addClash, "data", {"EventStartTime", "EventEndTime", "Status", "Index"}, {"EventStartTime", "EventEndTime", "Status", "Index"}),
addAdjustedStatus = Table.AddColumn(expandData, "AdjustedStatus",
each if [Index] = 1 then [Status]
else if [EventStartTime] < [FirstEventEnd] then "Attended"
else [Status]
)
in
addAdjustedStatus
It now turns this (note added Person Z for testing a different scenario):
...to this:
I think this should still be fairly performant over a large dataset, but you'll have to test tbh.
Pete
Proud to be a Datanaut!
As far as I understand, overlapping events may form a chain of events with different start and end datetimes. So I would not rely on grouping by start date. So I'd go this way:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
nulls = List.Buffer(List.Repeat({null}, Table.ColumnCount(Source))),
rows = List.Buffer(Table.ToList(Table.Sort(Source, {"ID", "EventStartTime"}), (x) => x)),
gen = List.Generate(
() => [i = 0, r = rows{0}, new = false, max = r{2}, over = 0, att = r{3}, prev = {}],
(x) => x[i] < List.Count(rows),
(x) =>
[
i = x[i] + 1,
r = rows{i},
new = r{1} > x[max] or r{0} <> x[r]{0},
max = if new then r{2} else List.Max({x[max], r{2}}),
over = if new then 0 else 1,
att = if new then r{3} else if List.Contains({r{3}, x[att]}, "Attended") then "Attended" else "Not Attended",
prev = {x[over], x[att]}
],
(x) => (if x[new] then {nulls & x[prev]} else {}) & {x[r]} & (if x[i] = List.Count(rows) - 1 then {nulls & {x[over], x[att]}} else {})
),
to_table = Table.FromList(List.Combine(gen), (x) => x, Table.ColumnNames(Source) & {"Clash", "Adjusted Status"}),
up = Table.FillUp(to_table,{"Clash", "Adjusted Status"}),
result = Table.SelectRows(up, (x) => x[ID] <> null)
in
result
Your question is not completely clear, but based on my underestanding, the below formula might help you, otherwise, please provide your sample table and result table seperately
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMtc3sNQ3MjAyUTA0tDIwACI0USO4qGNJSWpeSmoKkGmoFKtDojnGcFG//BIFvGZZYjXLEpub0MwyQDbLiUT/4TPLGSRgqG9oANFlYAnXhSSKZAMu/6GZY2iA1RycbgKaFQsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t, Clash = _t]),
#"Grouped Rows" = Table.Group(Source, {"ID", "EventStartTime"}, {{"all", each _},{"Adjusted Status", each if List.Contains(_[Status],"Attended") then "Attended" else "not Attended"}})
in
#"Grouped Rows"
Thanks Omid, but my table contains:
I'm trying to add two additional columns:
I think your code relies on me already having a Clash column I don't have that yet.
Ok catch it,
just give me some times, I will back by the solution.
The below code is for the firs part (clash) please check it over your 1 million data and see it does run in rationable time or not, if it does, I will provide the next solution.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMtc3sNQ3MjAyUTA0tDIwACI0USO4qGNJSWpeSmqKUqwOiWYYw0X98ksUcJpjidUcS2xuwWWOE4l+wmWOM1DOwFDf0ACiw8ASrgNJFMl0IswwNMBqBi63xAIA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
CH = Table.TransformColumnTypes(Source,{{"ID", type text}, {"EventStartTime", type datetime}, {"EventEndTime", type datetime}, {"Status", type text}}),
T=Table.Buffer(CH),
#"Added Custom" = Table.AddColumn(T, "Clash", each if Table.RowCount(Table.SelectRows(CH, (x)=>
x[ID]=_[ID] and (
(_[EventStartTime]>=x[EventStartTime] and
_[EventStartTime]<=x[EventEndTime]) or
(_[EventEndTime]>=x[EventStartTime] and
_[EventEndTime]<=x[EventEndTime]))
))>1 then 1 else 0)
in
#"Added Custom"
Thank you for spending some time looking Omid. Unfortunately it's taking about a minute to process a single row. I'm going to let it run for half an hour to see how it goes - it's not looking good so far 😞
How many columns do you have in youtube orgine table?
14 columns, and nearly two million rows, which will grow each day.
Replace
T=Table.Buffer(CH)
With
T=Table.Buffer(Table.SelectColumns(CH,{"EventStartTime","EventEndTime","ID"}))
I've also tried this:
#"Added Clash Column" = Table.AddColumn(
#"Grouped Rows",
"Clash",
each
let
IdentifyingClashes = [IdentifyingClashes],
clashCount = Table.RowCount(IdentifyingClashes) // Count the number of rows in the group
in
if clashCount <= 1 then
0 // No clash if there's one or no event
else
let
// Extract Event Start and End Times
events = IdentifyingClashes,
clashesFound = List.AnyTrue(
List.Transform(
List.Generate(
() => [i = 0],
each [i] < clashCount - 1,
each [i = [i] + 1],
each [i]
),
(i) =>
List.AnyTrue(
List.Transform(
List.Generate(
() => [j = i + 1],
each [j] < clashCount,
each [j = [j] + 1],
each [j]
),
(j) =>
// Check if the times overlap considering the date
let
startTime1 = events{i}[EventStartTime],
endTime1 = events{i}[EventEndTime],
startTime2 = events{j}[EventStartTime],
endTime2 = events{j}[EventEndTime],
date1 = DateTime.Date(startTime1),
date2 = DateTime.Date(startTime2)
in
// Check if events are on the same date and overlap
(date1 = date2) and (startTime1 < endTime2) and (endTime1 > startTime2)
)
)
)
)
in
if clashesFound then 1 else 0
)
in
#"Added Clash Column"
Hi @AshBee ,
Try something like this. You can paste the following into a new blank query to see the steps I took:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMtc3sNQ3MjAyUTA0tDIwACI0USO4qGNJSWpeSmqKUqwOiWYYw0X98ksUcJpjidUcS2xuwWWOE4l+wmWOM1DOwFDf0ACiw8ASrgNJFMl0IswwNMBqBi63xAIA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
chgTypes = Table.TransformColumnTypes(Source,{{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),
// Relevant steps from here ===>
sort_ID_Start_End = Table.Sort(chgTypes,{{"ID", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}}),
addEventStartDate = Table.AddColumn(sort_ID_Start_End, "EventStartDate", each Date.From([EventStartTime]), type date),
groupRows = Table.Group(addEventStartDate, {"ID", "EventStartDate"}, {{"FirstEventEnd", each _[EventEndTime]{0}}, {"data", each _, type table [ID=nullable text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, Status=nullable text, EventStartDate=date]}}),
addNestedIndex = Table.TransformColumns(groupRows, {"data", each Table.AddIndexColumn(_, "Index", 1, 1)}),
expandData = Table.ExpandTableColumn(addNestedIndex, "data", {"EventStartTime", "EventEndTime", "Status", "Index"}, {"EventStartTime", "EventEndTime", "Status", "Index"}),
addAdjustedStatus = Table.AddColumn(expandData, "AdjustedStatus", each if [Index] = 1 then [Status]
else if [EventStartTime] < [FirstEventEnd] then "Attended"
else [Status])
in
addAdjustedStatus
Summary:
sort_ID_Start_End = Sort the table to ensure the first-starting event always goes to the top.
addEventStartDate = Get the date portion of start to allow person-date grouping (assuming no events span two or more days).
groupRows = Group person-date, add a column that repeats the the earliest event start time per grouping, and nest the rest of the table.
addNestedIndex = Add an index column to the nested tables for later.
expandData = Get all our table columns back.
addAdjustedStatus = Use the new index and earliest event start to apply the required logic.
The above example turns this:
...to this, and should be pretty performant, even over large datasets:
Pete
Proud to be a Datanaut!
Thanks for your response Pete, I've been playing with this code to see if I can get it to return a clash value too. Will update on how it's gone.
Hi @AshBee ,
Please try this updated code with the [Clash] column added:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lY/BCoAgDEB/RTwHbisIu1X36Jx4y2tC+f8kBDJCyWCnN/Z4M0au7rz8IUbZSOoVaEVAnUAcAOK8KCU6huCO3e3SNj8dbaKLD6Lo0VmPzrWUPNPPn0qeOe4AFcJzATpdMMrsFQ6ErOO7ZYs7pFwLo8xe4WDlnJZa7A0=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, EventStartTime = _t, EventEndTime = _t, Status = _t]),
chgTypes = Table.TransformColumnTypes(Source,{{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),
// Relevant steps from here ===>
sort_ID_Start_End = Table.Sort(chgTypes,{{"ID", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}}),
addEventStartDate = Table.AddColumn(sort_ID_Start_End, "EventStartDate", each Date.From([EventStartTime]), type date),
groupRows = Table.Group(addEventStartDate, {"ID", "EventStartDate"}, {{"FirstEventEnd", each _[EventEndTime]{0}}, {"data", each _, type table [ID=nullable text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, Status=nullable text, EventStartDate=date]}}),
addNestedIndex = Table.TransformColumns(groupRows, {"data", each Table.AddIndexColumn(_, "Index", 1, 1)}),
addClash = Table.AddColumn(addNestedIndex, "Clash",
each List.Contains(
List.Skip([data][EventStartTime], 1),
[FirstEventEnd],
(x, y)=> x < y
)
),
expandData = Table.ExpandTableColumn(addClash, "data", {"EventStartTime", "EventEndTime", "Status", "Index"}, {"EventStartTime", "EventEndTime", "Status", "Index"}),
addAdjustedStatus = Table.AddColumn(expandData, "AdjustedStatus",
each if [Index] = 1 then [Status]
else if [EventStartTime] < [FirstEventEnd] then "Attended"
else [Status]
)
in
addAdjustedStatus
It now turns this (note added Person Z for testing a different scenario):
...to this:
I think this should still be fairly performant over a large dataset, but you'll have to test tbh.
Pete
Proud to be a Datanaut!
Hi Pete, thank you so much. This was the solution. However, I found the impact on refresh for my dataset was too significant. For now I've had to handle this as a calculated column but will continue to beaver away with your code above to see if I can improve the report performance. Appreciate your input.
Hi, here's what I have, I hope this is enough to go on:
ChangetoDateTime = Table.TransformColumnTypes(LocationActivityTypeFlag, {{"EventStartTime", type datetime}, {"EventEndTime", type datetime}}),
SortedRows1 = if List.Contains(Table.ColumnNames(ChangetoDateTime), "Person") then
Table.Sort(ChangetoDateTime, {{"Person", Order.Ascending}, {"EventStartTime", Order.Ascending}, {"EventEndTime", Order.Ascending}})
else
ChangetoDateTime,
// Grouping Rows to prepare for clash detection
#"Grouped Rows" = Table.Group(SortedRows1, {"Person"},
{{"GroupedData", each _, type table [xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable number, xx=nullable text, xx=nullable text, xx=nullable text, xx=nullable number, event_date=nullable date, startTime=nullable time, endTime=nullable time, emailAddress=nullable text, username=nullable text, xx=nullable text, xx=nullable text, #"Person"=text, CheckinStatus=text, EventStartTime=nullable datetime, EventEndTime=nullable datetime, xx=number]}}
),
// Adding Clashes detection logic here after grouping
AddClashes = Table.AddColumn(#"Grouped Rows", "Clashes", each
let
CurrentGroup = [GroupedData],
ClashCount = Table.RowCount(CurrentGroup) - Table.RowCount(Table.Distinct(CurrentGroup, {"EventStartTime", "EventEndTime"}))
in
if ClashCount > 0 then "Yes" else "No"
),
#"Expanded GroupedData" = Table.ExpandTableColumn(AddClashes, "GroupedData", {"CheckinStatus", "EventStartTime", "EventEndTime"}, {"xx", "xx", "xx"})
in
#"Expanded GroupedData"
Hi @AshBee ,
Are you able to share the M code for your attempt at using group/index please (anonymise any connection strings etc.). What you suggest sounds like a perfectly reasonable way to go about this problem, so it may just be a case of optimising what you've done so it can run quickly over so many rows.
Pete
Proud to be a Datanaut!
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
24 | |
12 | |
11 | |
11 | |
8 |
User | Count |
---|---|
43 | |
25 | |
15 | |
14 | |
12 |