Complex Overlapping Linear Data Matching

mroberts_troy · ‎07-28-2022

I have two sets of linear data that measures progress from start-to-finish for each day. One data set reports the actuals, one data set reports the "check". For each set, each day there can be multiple segments of starts and stops, and there can be gaps.

I need to do analysis which compares the two sets to look for matches between the data for each particular day. In this instance, I'm less interested in where there are gaps in the overall dataset - just assume that for now we know that where there are gaps that someone will need to come fill those in later; for now I'm just looking for matches and discrepancies. In any case, I have attempted to put together an Excel file which depicts every scenario that we would need to test against, for each category for each day. I haven't included "category" or "date" in my testing data, because we can assume that the data will be grouped prior to doing analysis, so to make the problem simpler, for now, those variables are not a concern.

Anyway, hopefully all this doesn't sound too confusing. I have done a lot of looking at various analysis tutorials which include gap & island analysis, overlap analysis, etc. but putting all my thoughts together into one query is giving me a migraine. So I thought someone might want to take a crack at this fun problem. If you have questions, please let me know. In the meantime I'm going to bang my head into my keyboard and hopefully come up with the solution on my own, but let me know if you figure it out before me! Thank you!

Scenarios & Scenario Test Data

AlexisOlson · ‎07-29-2022

My prior answer was on the right track but made some errors near the end.

Try this cleaned-up and commented version:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dVZBcsQwCPtLzj3EQOL42H6j0/9/o91N7IJAh52dEbYEjgX+/t7a9rF9/v1e//vrfz+2n48b/ypwIesF1luE5Yb+1yvw6B7gcn2VjwGPRfjFc+L6r6xrLk/164+a/2D8R30Oh+MP68/6HE7Gf9b5n4y/1/l3xt/xnCMsNxR4Xrgiz1XrXkz3qnUvonuh7tgDbjf0zz/w3kYY+See6hr19x0bOf+218ILR+UVwNJWINX2jhRJLTxV0ZLtAU9ZNZZVo1lhC9E94strBhuWRoeA3ZjTYO2o+X4UDN2w84SAz3eqY4tZG3wveR/WPBPSNJrvGqG7tYOmiw3CAM9c2CLOJyvSClqnTL4ZhALR3Qp4aqPN+ztG0JmTa1Au9OCzQ9BqiCcm8cZ5F+i3lPNA0DlzR6MixAeSfPB8KMFLrYBnCWF1CGo8X1C8C8IOP4G9CYVNYDFG5YdwuFiCU3gF0DmTKjmnuw0l04lZGQRWgfNI0Df9iLhs0H2kMw06XgXnawc8ixCvifdaOBIcpece8ZzUYHUMrGOeFRmDQuegEnOqN2coXMlU051qNKxjijQUGS5QaTSqIUxDaCHkbaxCRdjrWFkdaE47Ip4lmGuVulbRtVOEjDv1pvW3R71powRzrdIHsZIXsaYn8VQnc1CTaf2GSoF6Vi9WRnr/Tg00rQKeNZJpQ6T6TtSchuZ8sjJiQaMWtGTBDoFUiJH5aN6D4ZYYGZDmLaiuuxpxoFEHGpuPpqgxXKDSqC348ws=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Scenario = _t, Set = _t, #"Segment Count" = _t, Start = _t, End = _t]),
    #"Start Table" = Table.TransformColumnTypes(Source,{{"Scenario", Int64.Type}, {"Set", type text}, {"Segment Count", Int64.Type}, {"Start", Int64.Type}, {"End", Int64.Type}}),
    /*Create a list of boundaries for each scenario*/
    #"Grouped By Scenario" = Table.Group(#"Start Table", {"Scenario"}, {{"Boundaries", each List.Sort(List.Distinct([Start] & [End])), type list}}),
    /*Transform each boundary list into an list of ranges plus an index.*/
    /*Example: {0,60,120,145} --> {{0,60,1},{60,120,2},{120,145,3}} */
    #"Transformed Boundaries" = Table.TransformColumns(#"Grouped By Scenario", {{"Boundaries", each List.Zip({List.RemoveLastN(_,1),List.RemoveFirstN(_,1),{1..List.Count(_)}}), type list}}),
    /*Expand previous list into on row per range + index sublist.*/
    #"Expanded List to Rows" = Table.ExpandListColumn(#"Transformed Boundaries", "Boundaries"),
    /*Turn each range + index list into record with field names.*/
    #"Trasformed Lists to Records" = Table.TransformColumns(#"Expanded List to Rows", {"Boundaries", each Record.FromList(_, {"Left", "Right", "Segment#"}), type record}),
    /*Expand these records into new columns*/
    #"Expanded Records to Columns" = Table.ExpandRecordColumn(#"Trasformed Lists to Records", "Boundaries", {"Left", "Right", "Segment#"}, {"Left", "Right", "Segment#"}),
    /*Create list of Sets to cross join with.*/
    /*This could be made dynamic. E.g. List.Distinct(Source[Set]).*/
    #"Added Set List" = Table.AddColumn(#"Expanded Records to Columns", "Set", each {"A","B"}),
    /*Expand set list into a full crossjoin.*/
    #"Expanded Sets to Rows" = Table.ExpandListColumn(#"Added Set List", "Set"),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Sets to Rows",{{"Left", Int64.Type}, {"Right", Int64.Type}, {"Set", type text}, {"Segment#", Int64.Type}}),
    /*Merge the table of all scenarios, corresponding ranges, and sets with the orginal table.*/
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Scenario", "Set"}, #"Start Table", {"Scenario", "Set"}, "Start Table", JoinKind.LeftOuter),
    /*Expand the Start and End columns from the original table.*/
    #"Expanded Columns" = Table.ExpandTableColumn(#"Merged Queries", "Start Table", {"Start", "End"}, {"Start", "End"}),
    /*Check if each range is covered by the segment in that row.*/
    #"Added Coverage Check" = Table.AddColumn(#"Expanded Columns", "IsFilled", each ([Start] <= [Left] and ([End] >= [Right])), type logical),
    /*A range is covered for a given set if a least one set segment covers it.*/
    #"Grouped Over Ranges" = Table.Group(#"Added Coverage Check", {"Scenario", "Left", "Right", "Segment#", "Set"}, {{"IsCovered", each List.Max([IsFilled]), type logical}}),
    /*Pivot the Set colmn to see A and B side by side.*/
    #"Pivoted Set Column" = Table.Pivot(#"Grouped Over Ranges", List.Distinct(#"Grouped Over Ranges"[Set]), "Set", "IsCovered"),
    /*There is a match if the list of ranges covered for is the same for A and B.*/
    /*Note: During grouping, [A] and [B] (below) are columns/lists not single values.*/
    #"Final Grouping" = Table.Group(#"Pivoted Set Column", {"Scenario"}, {{"List", each if [A] = [B] then "Match" else "Mismatch", type nullable number}})
in
    #"Final Grouping"

mroberts_troy · ‎07-28-2022

Okay, folks, I think I solved it. I just used this video about using List.Date to provide some guidance. Here's my code. One thing to note is that I did have to worry about overlaps for each set type first, clean those, then cycle through it all again.

let
    Source = Excel.Workbook(Web.Contents("https://troyconstruction0-my.sharepoint.com/personal/mroberts_troyconstruction_com/Documents/Test%20linear%20data/Validation%20Scenarios.xlsx"), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Filtered Rows" = Table.SelectRows(Table1_Table, each ([Scenario] <> null)),
    #"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows",{{"Scenario", Int64.Type}, {"Set", type text}, {"Segment Count", Int64.Type}, {"Start", Currency.Type}, {"End", Currency.Type}, {"0", type text}, {"15", type text}, {"30", type text}, {"45", type text}, {"60", type text}, {"75", type text}, {"90", type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Scenario", "Set", "Segment Count", "Start", "End"}),
    #"Inserted Subtraction" = Table.AddColumn(#"Removed Other Columns", "StationBetweenCount", each [End] - [Start], Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Inserted Subtraction", "AllStations", each List.Numbers([Start],[StationBetweenCount]*100,.01)),
    #"Expanded AllStations" = Table.ExpandListColumn(#"Added Custom", "AllStations"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded AllStations",{{"AllStations", Currency.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type1", {"Scenario", "AllStations", "Set"}, {{"CountPerSet", each Table.RowCount(_), Int64.Type}, {"StartNoOverlap", each List.Min([Start]), Currency.Type}, {"EndNoOverlap", each List.Max([End]), Currency.Type}}),
    #"Inserted Merged Column" = Table.AddColumn(#"Grouped Rows", "SetIDsWithNoOverlaps", each Text.Combine({Text.From([Scenario], "en-US"), [Set], Text.From([StartNoOverlap], "en-US")}, "."), type text),
    #"Removed Duplicates" = Table.Distinct(#"Inserted Merged Column", {"SetIDsWithNoOverlaps"}),
    #"Removed Columns" = Table.RemoveColumns(#"Removed Duplicates",{"AllStations", "CountPerSet", "SetIDsWithNoOverlaps"}),
    #"Inserted Subtraction1" = Table.AddColumn(#"Removed Columns", "StationBetweenCount2", each [EndNoOverlap] - [StartNoOverlap], type number),
    #"Added Custom1" = Table.AddColumn(#"Inserted Subtraction1", "AllStations2", each List.Numbers([StartNoOverlap],[StationBetweenCount2]*100,.01)),
    #"Added Index" = Table.AddIndexColumn(#"Added Custom1", "Index", 1, 1, Int64.Type),
    #"Expanded AllStations2" = Table.ExpandListColumn(#"Added Index", "AllStations2"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Expanded AllStations2",{{"AllStations2", Currency.Type}}),
    #"Grouped Rows1" = Table.Group(#"Changed Type2", {"Scenario", "AllStations2"}, {{"CountForOverlaps", each Table.RowCount(_), Int64.Type}, {"Data", each _, type table [Scenario=nullable number, Set=nullable text, StartNoOverlap=number, EndNoOverlap=number, StationBetweenCount2=number, AllStations2=nullable number, Index=number]}}),
    #"Filtered Rows2" = Table.SelectRows(#"Grouped Rows1", each [CountForOverlaps] = 1),
    #"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows2", "Data", {"Set", "Index"}, {"Set", "Index"}),
    #"Grouped Rows2" = Table.Group(#"Expanded Data", {"Scenario", "Index"}, {{"NoMatchStart", each List.Min([AllStations2]), type nullable number}, {"NoMatchEnd", each List.Max([AllStations2]), type nullable number}, {"Data", each _, type table [Scenario=nullable number, AllStations2=nullable number, CountForOverlaps=number, Set=nullable text, Index=nullable number]}}),
    #"Expanded Data1" = Table.ExpandTableColumn(#"Grouped Rows2", "Data", {"Set"}, {"Set"}),
    #"Inserted Merged Column1" = Table.AddColumn(#"Expanded Data1", "NoOverlapID", each Text.Combine({Text.From([Scenario], "en-US"), Text.From([Index], "en-US"), [Set]}, "."), type text),
    #"Removed Duplicates1" = Table.Distinct(#"Inserted Merged Column1", {"NoOverlapID"}),
    #"Added Conditional Column" = Table.AddColumn(#"Removed Duplicates1", "Cause for Mismatch", each if [Set] = "A" then "A but no B" else "B but no A"),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Conditional Column",{{"Cause for Mismatch", type text}})
in
    #"Changed Type3"

Now, I am seriously concerned that this method of using List will be totally bloated and cause the query to load exceptionally slow, but at this point, I'm not sure what to do. Does anyone else have a different/better method? I look forward to hearing from you!

AlexisOlson · ‎07-28-2022

I think the key to bloat is only breaking each scenario down into the necessary segments.

No time to explain further right now but try walking through the steps here and see if this makes any sense to you:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZhRUsMwDETvkm8+Ykmxk0+4BsP9rwFtYiOtvYlnYNqR6a7kWK8y399LWj6Wz7/f1+v6el235efjjH8N4n9v/39eASECAgIWw3KGboQVhHUN4U5AZzM2EL4yMyecvYCB8JVIDcsZujHcxoYbM9zGW7c5w3xrmMdbl5lhHleYpw3LuMLCDAs+qxiWMxR0XnF9TGQfJ7KzRPZxIjtJZMdEjjXE7QzdJHhAgpfh4RL0hjXeVX6Mz8ixzD6ytI4zaXFMpS1g8W2hq/69MsiyxZ8faOoAtcV4l2ZiaSaaJsJO1xhvLW7wgeZRYMHO2F1hDJzJk/MeLAkZ2TQQhoUpIAybgqfee3+vbewlCN6S51sgdfKAe6gQUVbFMxVHmGWWN4FWKlTaY+t+T5BD9UR5EIXKkyfRw54gQ6r4QcWRFkryFoTC9Yct3kmLb/H3nviPzH0/CvZ4lUjUlXSsdB3Lnr5g+1UFoZ6+L0OlgqbsWIjv1yDhpxoPFJmeasSYth9swnkWnGy4OPZ41e56vOaNMwyXzph3nRS7ceXaVsEOL1uMy/IEXynMlI4sgjNLgfiEK6GCeCqEXcTxJK8x/vw1Lx4WoVJPi7C9ZLSQ+dlCCUbUYyTslZJJQdd5U08R3wCa0PVwCyPTNG8qzFTQtJVKbkYq867sbqRoeriFkafOezK+KOWLIl+MaZMRQj1e/BlVj5eHayTji9LrkJL7kHYXIgZ6JbOFdnipBSFdFOLPdFFPl1Bod/uppogXhfiEaYeXtokEIzqPEUOMXHkbgYXNw8I6WFzItQ4W7d8AZOYwT4v7s2hk6DAPC3VfLkZYYfOsMDZzmKLp4RZGpmNY/PwC", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Scenario = _t, Set = _t, #"Segment Count" = _t, Start = _t, End = _t]),
    #"Changed Type0" = Table.TransformColumnTypes(Source,{{"Scenario", Int64.Type}, {"Set", type text}, {"Segment Count", Int64.Type}, {"Start", Int64.Type}, {"End", Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type0", each ([Scenario] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Scenario"}, {{"Boundaries", each List.Sort(List.Distinct([Start] & [End])), type list}}),
    #"Transformed Column" = Table.TransformColumns(#"Grouped Rows", {{"Boundaries", each List.Zip({_,List.Skip(_),{1..List.Count(_)}}), type list}}),
    #"Expanded Boundaries" = Table.ExpandListColumn(#"Transformed Column", "Boundaries"),
    #"Extracted Values" = Table.TransformColumns(#"Expanded Boundaries", {"Boundaries", each Record.FromList(_, {"Left", "Right", "Segment#"}), type record}),
    #"Expanded Boundaries1" = Table.ExpandRecordColumn(#"Extracted Values", "Boundaries", {"Left", "Right", "Segment#"}, {"Left", "Right", "Segment#"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded Boundaries1", each ([Right] <> null)),
    #"Added Custom1" = Table.AddColumn(#"Filtered Rows1", "Set", each {"A","B"}),
    #"Expanded Set" = Table.ExpandListColumn(#"Added Custom1", "Set"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Set",{{"Left", Int64.Type}, {"Right", Int64.Type}, {"Set", type text}, {"Segment#", Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type1", {"Scenario", "Set"}, #"Filtered Rows", {"Scenario", "Set"}, "Changed Type", JoinKind.LeftOuter),
    #"Expanded Changed Type" = Table.ExpandTableColumn(#"Merged Queries", "Changed Type", {"Start", "End"}, {"Start", "End"}),
    #"Removed Duplicates" = Table.Distinct(#"Expanded Changed Type", {"Scenario", "Segment#", "Set"}),
    #"Added Custom" = Table.AddColumn(#"Removed Duplicates", "IsFilled", each ([Start] <= [Left] and ([End] >= [Right])), type logical),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Start", "End"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Set]), "Set", "IsFilled"),
    #"Grouped Rows1" = Table.Group(#"Pivoted Column", {"Scenario"}, {{"List", each if [A] = [B] then "Match" else "Mismatch", type nullable number}})
in
    #"Grouped Rows1"

mroberts_troy · ‎07-29-2022

I'm trying to walk through your solution, but before I do, I notice that you're coming back with false matches/mismatches. In my dataset, the first 11 scenarios are all "matches" and every other scenario should be a "mismatch". However, when yours renders, it calls Scenarios 2-4 & 7-11 a "mismatch", then it calls 19 & 21 a "match" but they're definitely not.

AlexisOlson · ‎07-29-2022

Thanks for pointing that out. I was in a bit of a hurry and didn't have a chance to verify fully. I'll take another look when I get a chance.

Complex Overlapping Linear Data Matching

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Complex Overlapping Linear Data Matching

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026