Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
APQueiroz
Frequent Visitor

Joing tables with conditions using Power Query is not working as it should.

 

 

Hello, experts! I hope you can help me with this issue. Please excuse any language mistakes.

 

I am working with two tables in Power Query. The first table contains assignment activity data for equipment, with around 80,000 rows. The second table holds telemetric data for the same equipment, with over 3 million rows.

 

Assignment data example

equipment_id(Int64)start_date(Datetime)end_date(DateTime)assignment_id(Int64)
1412312342024-10-01 00:002024-10-01 01:0034453
3574633452024-10-01 01:002024-10-01 01:4023423
3171345232024-10-01 01:402024-10-01 03:0062342

 

Telemetric data example

equipment_id(Int64)timestamp(Datetime)speed
3574633452024-10-01 00:0510
3574633452024-10-01 00:108.2
3171345232024-10-01 00:155.6

 

My goal is to match each telemetric record with its corresponding assignment_id based on the timestamp interval, allowing me to establish a 1:n relationship between the two tables and calculate the necessary measures.

 

Here’s my current approach. I created a calculated column as follows:

 

 

 

 

let
    AddReferenceColumn = 
      Table.AddColumn(
        telemetric_data,
        "assignment_id",
        (row) =>
          let 
            search = 
              Table.SelectRows(
                assignment_data, 
                each ([equipment_id] = row[equipment_id] 
                    and [start_date] <= row[timestamp] 
                    and [end_date] >= row[timestamp])
              )
          in 
            if Table.RowCount(search) > 0 then search{0}[assignment_id] else null
      )
in
    AddReferenceColumn 

 

 

 

This solution works as expected on a sample of 1,000 rows, taking around 4 minutes to load the preview. However, when I attempt to load the full dataset, it becomes extremely slow and never completes.

I then tried a different approach:

 

 

 

let
    Source = Table.Join(telemetric_data, "equipment_id", assignment_data, "equipment_id"),
    FilterRows = Table.SelectRows(Source, each [timestamp] >= [start_date] and [timestamp] < [end_date])
in
    FilterRows

 

 

 

This also worked but still failed to load the entire dataset.

 

What would be the best way to solve this problem?

 

1 ACCEPTED SOLUTION
AlienSx
Super User
Super User

give this a try 

let
    assignment = Table.FromList(
        List.Combine(
            Table.ToList(
                assignment_data,
                (x) => {{x{0}, x{1}, -1, x{3}}} & {{x{0}, x{2}, 100, "none"}} // speeds -1 and 100 are for sorting
            )
        ),
        (x) => x, 
        Table.ColumnNames(telemetric_data) & {"assignment_id"}
    ), 
    combine_sort = Table.Sort(telemetric_data & assignment, {"equipment_id(Int64)", "timestamp(Datetime)", "speed"}), 
    fd = Table.FillDown(combine_sort, {"assignment_id"}),
    filter = Table.SelectRows(fd, (x) => x[speed] <> -1 and x[speed] <> 100)
in
    filter

Idea is to combine tables, sort them and fill down assignment_id.

View solution in original post

4 REPLIES 4
AlienSx
Super User
Super User

give this a try 

let
    assignment = Table.FromList(
        List.Combine(
            Table.ToList(
                assignment_data,
                (x) => {{x{0}, x{1}, -1, x{3}}} & {{x{0}, x{2}, 100, "none"}} // speeds -1 and 100 are for sorting
            )
        ),
        (x) => x, 
        Table.ColumnNames(telemetric_data) & {"assignment_id"}
    ), 
    combine_sort = Table.Sort(telemetric_data & assignment, {"equipment_id(Int64)", "timestamp(Datetime)", "speed"}), 
    fd = Table.FillDown(combine_sort, {"assignment_id"}),
    filter = Table.SelectRows(fd, (x) => x[speed] <> -1 and x[speed] <> 100)
in
    filter

Idea is to combine tables, sort them and fill down assignment_id.

Thank you greatly, @AlienSx . That solved my problem—ingenious! It never crossed my mind to use lists like that in Power Query; I thought it would be slower than using tables. It’s perfect for custom unpivot and merging large timeline tables! The only issue is that the column used for sorting cannot contain null values. In my case, I replaced the nulls with a value between the minimum and maximum values.

Omid_Motamedise
Memorable Member
Memorable Member

Use Table.Buffer as bellow to improve the execution time of your and also make the filtering condition simple code.


let
X=Table.Buffer(assignment_data),
    AddReferenceColumn = 
      Table.AddColumn(
        telemetric_data,
        "assignment_id",
        (row) =>
           
            
              Table.First(Table.SelectRows(
                X, 
                each ([equipment_id] = row[equipment_id] 
                    and [start_date] <= row[timestamp] 
                    and [end_date] >= row[timestamp])
              ))

      )
in
    AddReferenceColumn 

 

if all the column of table assignment_id are not requiered, beffor using it into Table.Buffer, remove the extra column to save more place for the memory.

If this answer helped resolve your issue, please consider marking it as the accepted answer. And if you found my response helpful, I'd appreciate it if you could give me kudos. Thank you!

Unfortunately, it didn't work. The speed of loading is at most 9 rows/s, even when buffering the two tables. I'm loading the data from a Dataflow. 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.