Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hello, experts! I hope you can help me with this issue. Please excuse any language mistakes.
I am working with two tables in Power Query. The first table contains assignment activity data for equipment, with around 80,000 rows. The second table holds telemetric data for the same equipment, with over 3 million rows.
Assignment data example
| equipment_id(Int64) | start_date(Datetime) | end_date(DateTime) | assignment_id(Int64) |
| 141231234 | 2024-10-01 00:00 | 2024-10-01 01:00 | 34453 |
| 357463345 | 2024-10-01 01:00 | 2024-10-01 01:40 | 23423 |
| 317134523 | 2024-10-01 01:40 | 2024-10-01 03:00 | 62342 |
Telemetric data example
| equipment_id(Int64) | timestamp(Datetime) | speed |
| 357463345 | 2024-10-01 00:05 | 10 |
| 357463345 | 2024-10-01 00:10 | 8.2 |
| 317134523 | 2024-10-01 00:15 | 5.6 |
My goal is to match each telemetric record with its corresponding assignment_id based on the timestamp interval, allowing me to establish a 1:n relationship between the two tables and calculate the necessary measures.
Here’s my current approach. I created a calculated column as follows:
let
AddReferenceColumn =
Table.AddColumn(
telemetric_data,
"assignment_id",
(row) =>
let
search =
Table.SelectRows(
assignment_data,
each ([equipment_id] = row[equipment_id]
and [start_date] <= row[timestamp]
and [end_date] >= row[timestamp])
)
in
if Table.RowCount(search) > 0 then search{0}[assignment_id] else null
)
in
AddReferenceColumn
This solution works as expected on a sample of 1,000 rows, taking around 4 minutes to load the preview. However, when I attempt to load the full dataset, it becomes extremely slow and never completes.
I then tried a different approach:
let
Source = Table.Join(telemetric_data, "equipment_id", assignment_data, "equipment_id"),
FilterRows = Table.SelectRows(Source, each [timestamp] >= [start_date] and [timestamp] < [end_date])
in
FilterRows
This also worked but still failed to load the entire dataset.
What would be the best way to solve this problem?
Solved! Go to Solution.
give this a try
let
assignment = Table.FromList(
List.Combine(
Table.ToList(
assignment_data,
(x) => {{x{0}, x{1}, -1, x{3}}} & {{x{0}, x{2}, 100, "none"}} // speeds -1 and 100 are for sorting
)
),
(x) => x,
Table.ColumnNames(telemetric_data) & {"assignment_id"}
),
combine_sort = Table.Sort(telemetric_data & assignment, {"equipment_id(Int64)", "timestamp(Datetime)", "speed"}),
fd = Table.FillDown(combine_sort, {"assignment_id"}),
filter = Table.SelectRows(fd, (x) => x[speed] <> -1 and x[speed] <> 100)
in
filter
Idea is to combine tables, sort them and fill down assignment_id.
give this a try
let
assignment = Table.FromList(
List.Combine(
Table.ToList(
assignment_data,
(x) => {{x{0}, x{1}, -1, x{3}}} & {{x{0}, x{2}, 100, "none"}} // speeds -1 and 100 are for sorting
)
),
(x) => x,
Table.ColumnNames(telemetric_data) & {"assignment_id"}
),
combine_sort = Table.Sort(telemetric_data & assignment, {"equipment_id(Int64)", "timestamp(Datetime)", "speed"}),
fd = Table.FillDown(combine_sort, {"assignment_id"}),
filter = Table.SelectRows(fd, (x) => x[speed] <> -1 and x[speed] <> 100)
in
filter
Idea is to combine tables, sort them and fill down assignment_id.
Thank you greatly, @AlienSx . That solved my problem—ingenious! It never crossed my mind to use lists like that in Power Query; I thought it would be slower than using tables. It’s perfect for custom unpivot and merging large timeline tables! The only issue is that the column used for sorting cannot contain null values. In my case, I replaced the nulls with a value between the minimum and maximum values.
Use Table.Buffer as bellow to improve the execution time of your and also make the filtering condition simple code.
let
X=Table.Buffer(assignment_data),
AddReferenceColumn =
Table.AddColumn(
telemetric_data,
"assignment_id",
(row) =>
Table.First(Table.SelectRows(
X,
each ([equipment_id] = row[equipment_id]
and [start_date] <= row[timestamp]
and [end_date] >= row[timestamp])
))
)
in
AddReferenceColumn
if all the column of table assignment_id are not requiered, beffor using it into Table.Buffer, remove the extra column to save more place for the memory.
If this answer helped resolve your issue, please consider marking it as the accepted answer. And if you found my response helpful, I'd appreciate it if you could give me kudos. Thank you!
Unfortunately, it didn't work. The speed of loading is at most 9 rows/s, even when buffering the two tables. I'm loading the data from a Dataflow.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 6 | |
| 3 | |
| 3 | |
| 3 | |
| 2 |