The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I'm trying to implement an algorithm to efficiently merge a set of ranges. They represent a set of meetings, each with a start and end time. I'd like to merge the ranges to get the total time spent in meetings. I can't just add them up as some overlap (e.g. a 1-2PM and a 1-1:30 pm meeting would sum to 1.5 hours but one couldn't actually attend both).
@Greg_Deckler shared a great option with Generate series but the cartesian product used too much memory as I have lots of rows (but I learned a lot so still helpful). I'm used to traditional for-each loop programming. The algorithm is at the site below, any guidance on how I could do that in Dax?
https://www.google.com/amp/s/www.geeksforgeeks.org/merging-intervals/amp/
@Julian1 - Happy to help, sorry it didn't work out! 😞
Maybe include a link to the original thread? I am attaching a PBIX to this message below sig that shows the DAX solution to replicate. He wants it down to the minute!! Maybe it would be more efficient at 15 minute intervals?
@edhans @ImkeF any fancy, dancy Power Query solutions? @HotChilli ?
I tried all kinds of ways to do this and finally chose the "nuke it from orbit" route!!
Sure, here's the other answer you mentioned: https://community.powerbi.com/t5/Desktop/Properly-summing-up-time-spent-in-overlapping-appointments/...
Thanks again!
Please check the file attached.
Different solution.
If that's not what you need, please provide detailled information about requirements with table before and after.
https://community.powerbi.com/t5/Community-Blog/How-to-Get-Your-Question-Answered-Quickly/ba-p/384901
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
@Greg_Deckler and @ImkeF ,
Per request, here's some more detail on the use case:
I'm building a report that shows how much time people spend in appointments on their calendar. I run into a challenge when I'm dealing with overlapping appointments. As an example, I have a list of appointments shown below using a table visualization that provides a sum of the hours:
The problem is, that sum is wrong. The first and second meeting overlap by 1/2 an hour, so the total time should be 2.5 hours and not 3 hours.
@Greg_Deckler's solution works perfectly but the Generate() function creates a Cartesian product with millions of rows that my machine can't handle. I should have called out that I have a large number of rows but didn't think about it at the time (I'm relatively new to this area, sorry for not mentioning it up front).
An alternative solution would be to use an algorithm like this one: https://www.google.com/amp/s/www.geeksforgeeks.org/merging-intervals/amp/. I'm a traditional programmer and wouldn't have too much trouble doing this with for-each loops, but I'm having trouble wrapping my brain around how to do it in DAX. I would appreciate some guidance if you have some..
Hi @Julian1 ,
You can try the following, add an index to your data in Power Query sorted by the meeting start, you can then create a calculated column that adjustes the start time if the less than or equal to the prior rows end time, formula would look like this
time in meeting =
var start_time = 'Meeting Times'[Start]
var idx = 'Meeting Times'[Index]
var prior_end_time = CALCULATE(MAX('Meeting Times'[End]), FILTER('Meeting Times', 'Meeting Times'[Index] = idx -1))
var calc_start_time = if('Meeting Times'[Start] <= prior_end_time, prior_end_time, 'Meeting Times'[Start])
return
DATEDIFF(calc_start_time, 'Meeting Times'[End], MINUTE)
Please find a sample .pbix here, sample.pbix
Hope this Helps,
Richard
Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!
Proud to be a Super User!
Thanks @richbenmintz ,
I tried that and it worked for many cases but produced some incorrect data for some cases, like this one:
I think the underlying issue is that one row may not overlap with the previous one but may overlap with others that come before it. I tried to implement this algorithm https://www.geeksforgeeks.org/merging-intervals/ but had an issue where I couldn't figure out how "push a range on to the stack" as they mention in the article. In DAX, I can't figure out how to calculate a value on one loop of an iteration and then reference it in a later loop. Is such a thing possible?
Hi @Julian1
Take II
Create two new calculated columns to store the min and max end times from the previous rows in the table
min start time =
var idx = 'Meeting Times (2)'[Index]
var min_prior_start = CALCULATE(MIN([Start]), FILTER('Meeting Times (2)', 'Meeting Times (2)'[Index] < idx))
return
min_prior_start
max end time =
var idx = 'Meeting Times (2)'[Index]
var max_prior_end = CALCULATE(max([End]), FILTER('Meeting Times (2)', 'Meeting Times (2)'[Index] < idx))
return
max_prior_end
Then create a calculated column that determines the type of calc to perform, this is just for testing not neccessary
Calc in meeting = switch(
true(),
'Meeting Times (2)'[min start time] = BLANK() && 'Meeting Times (2)'[max end time] = BLANK(), "Calc Duration",
[Start] >= [min start time] && 'Meeting Times (2)'[End] < [max end time], "No Time Calc",
[Start] >= [min start time] && [Start] < [max end time] &&'Meeting Times (2)'[End] > [max end time], "Calc from Max End Time",
[Start] > 'Meeting Times (2)'[max end time], "Calc Duration",
"Calc Duration")
Then Create a column that figures out the duration if required
Calc in meeting min = switch(
true(),
'Meeting Times (2)'[min start time] = BLANK() && 'Meeting Times (2)'[max end time] = BLANK(), DATEDIFF([Start], [End], MINUTE),
[Start] >= [min start time] && 'Meeting Times (2)'[End] < [max end time], 0,
[Start] >= [min start time] && [Start] < [max end time] &&'Meeting Times (2)'[End] > [max end time], DATEDIFF([max end time], [End], MINUTE),
[Start] > 'Meeting Times (2)'[max end time], DATEDIFF([Start], [End], MINUTE),
DATEDIFF([Start], [End], MINUTE))
Hope this Helps,
Richard
Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!
Proud to be a Super User!
Thanks @richbenmintz . So does that mean what I'm trying to do (preserve the "top of the stack" like the algorithm recommends across different iterations of a looping function) not possible in DAX? That's fine if so, just trying to learn if I can use the most efficient method or not as I'm new to DAX.
Hi @Julian1,
The way I tried to build the formula essentially loops through each row of the table and checks the values of the current row against the mins and maxs of the prior rows, while it does not recreate an array with the merge ranges you cou remove the no calc rows with a filtered calc table if that was required. I am not sure if my approach is the most efficient, but i think it works.
Hope this Helps,
Richard
Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!
Proud to be a Super User!
Thanks @richbenmintz . I think that's still an O(n squared) algorithm, right? Given the performance requirements (filters are changed often by a person in front of the screen) I was hoping to go with as efficient an approach as I could find. That said, if it's not possible in DAX to use the algorithm I had in mind (or something similarly efficient) that's good learning for me as a new user. Based on what I'm hearing from folks like @Greg_Deckler , sounds like that's the case.
Hi @Julian1 ,
In terms of effeciency, you are sorting the data using Power Query similar to the second alfgorithm you have referenced hopefully making each row operation as efficient as possible with the DAX calc column. Given that the approach uses a calculated column it will only be recalculated when the model is processed and presisted to memory, each filter or interaction with the report from the user on screen will not impact performance of the calc.
Hope this Helps,
Richard
Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!
Proud to be a Super User!
@Julian1 can you confirm that you've tested the Power Query solution provided by ImkeF? I thought that you'd missed it from your subsequent posts.
@Julian1 , I created this file sometimes back to mark overlapping dates , it has flag to check
https://www.dropbox.com/s/1mlq21o1xjuw8il/overlappingdates.pbix?dl=0.
see if this can help