Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
How can I filter rows based on comparisons to other rows?
My desired filter is "Remove any row that has another row within X rows whose timestamp is more than Y seconds away from the timestamp of the next row"
My data is a series of about 600 CSV files with around 7000 rows and <500kB each, totalling less than 400MB. If it makes a difference, each file is in order by the Time column. Sample data:
Time (s) | Data |
0 | A |
1.012 | B |
2.022 | C |
3.032 | D |
4.041 | E |
5.05 | F |
5.906 | G |
5.917 | H |
5.937 | I |
5.948 | J |
5.959 | K |
5.97 | L |
5.98 | M |
5.991 | N |
The desired result is to remove A, B, C, D, E, F, G, H, I, J for an X rows of 3 and Y seconds of 0.025
I tried adding an index column and creating a custom column using List.Range to calculate the time difference to the previous row
try [#"Time (s)"] - List.Range(#"Reordered Columns"[#"Time (s)"], [Index]-1, 1){0} otherwise #"Max Timestep"+1
Followed by creating a second column again using List.Range to check if any values nearby exceed Y
try List.Max( List.Range(#"Reordered Columns1"[Time to Previous], [Index]-#"Filter Range", (#"Filter Range"*2)+1) ) < #"Max Timestep" otherwise false
Then just filtered out any rows that had false in that column.
This worked when testing on a single file, but encountered issues when I tried it on the whole folder. Loading has still not completed after about 3 hours at the time of writing (Imports of similar datasets without the filtering have completed in minutes in the past), and I am seeing insane file sizes in the load dialog - 100+GB from individual files in some cases.
What is a better way to do this kind of filtering?
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
107 | |
106 | |
87 | |
73 | |
66 |
User | Count |
---|---|
124 | |
113 | |
98 | |
81 | |
72 |