Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
MRD_JR
Frequent Visitor

Filtering rows based on other rows

How can I filter rows based on comparisons to other rows?

 

My desired filter is "Remove any row that has another row within X rows whose timestamp is more than Y seconds away from the timestamp of the next row"

 

My data is a series of about 600 CSV files with around 7000 rows and <500kB each, totalling less than 400MB. If it makes a difference, each file is in order by the Time column. Sample data:

 

Time (s)Data
0A
1.012B
2.022C
3.032D
4.041E
5.05F
5.906G
5.917H
5.937I
5.948J
5.959K
5.97L
5.98M
5.991N

 

The desired result is to remove A, B, C, D, E, F, G, H, I, J for an X rows of 3 and Y seconds of 0.025

 

I tried adding an index column and creating a custom column using List.Range to calculate the time difference to the previous row

 

try [#"Time (s)"] - List.Range(#"Reordered Columns"[#"Time (s)"], [Index]-1, 1){0} otherwise #"Max Timestep"+1

 

Followed by creating a second column again using List.Range to check if any values nearby exceed Y

 

try List.Max( List.Range(#"Reordered Columns1"[Time to Previous], [Index]-#"Filter Range", (#"Filter Range"*2)+1) ) < #"Max Timestep" otherwise false

 

Then just filtered out any rows that had false in that column.

This worked when testing on a single file, but encountered issues when I tried it on the whole folder. Loading has still not completed after about 3 hours at the time of writing (Imports of similar datasets without the filtering have completed in minutes in the past), and I am seeing insane file sizes in the load dialog - 100+GB from individual files in some cases.

 

What is a better way to do this kind of filtering?

0 REPLIES 0

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.