Solved: Keeping data that has changed more than a value an...

nebula · ‎04-15-2022

Hello
I have a dataset that I cannot fit in Excel and would like to filter the data according to the following logic:
The dataset has two columns: a datetime column and a value column.
The logic for which rows to be retained is as follows:

* 1st record is always kept
* For records other than the first one:
if
|value(i) - value (i-1)| >= 4 then keep this record
else
discard (remove) the record

This way, I target to retain only the records (datetime and value pairs) where the value has changed by a limit value of 4 or more with respect to the previous kept record.

I managed to perform the above logic in excel by constructing a new output column that creates values for each row using either new one or previous one with simple condition (if) function.
Then I filtered that new column and get rid of the repeating values (locally). I tried to explain the details in the following picture, and the attached excel file.

I would like to perform this logic in power query so that I can get rid of the data that I do not need and have less rows that will fit in excel.

I would be very glad if someone can help me with this.

Best regards

Vijay_A_Verma · ‎04-15-2022

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately)

My first column has date time in sequence, hence may not match yours but second column will match for you.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.Accumulate(List.FirstN(#"Changed Type"[value],[Index]),0,(s,c)=> if Number.Abs(c-s)<4 then s else c)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each try if #"Added Custom"[Custom]{[Index]-2}=[Custom] then 2 else 1 otherwise 1),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom.1] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom.1", "Custom"})
in
    #"Removed Columns"

View solution in original post

AlexisOlson · ‎04-19-2022

I suspect the step in @Vijay_A_Verma's solution that causes the slow down is #"Added Custom1".

See if this is faster for bigger data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    Vals = List.Buffer(#"Added Index"[value]),
    ListGen = List.Generate(
        () => [i = 0, C = Vals{0}, D = 1],
        each [i] <= List.Count(Vals),
        each [
            i = [i] + 1,
            C = if Number.Abs(Vals{[i]} - [C]) < 4 then [C] else Vals{[i]},
            D = if C = [C] then 2 else 1
        ]
    ),
    ToTable = Table.FromRecords(ListGen),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, ToTable, {"i"}, "ToTable", JoinKind.LeftOuter),
    #"Expanded ToTable" = Table.ExpandTableColumn(#"Merged Queries", "ToTable", {"C", "D"}, {"new value", "is there a change"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded ToTable", each ([is there a change] = 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"datetime", "value"})
in
    #"Removed Other Columns"

View solution in original post

Vijay_A_Verma · ‎04-15-2022

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately)

My first column has date time in sequence, hence may not match yours but second column will match for you.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.Accumulate(List.FirstN(#"Changed Type"[value],[Index]),0,(s,c)=> if Number.Abs(c-s)<4 then s else c)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each try if #"Added Custom"[Custom]{[Index]-2}=[Custom] then 2 else 1 otherwise 1),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom.1] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom.1", "Custom"})
in
    #"Removed Columns"

nebula · ‎04-15-2022

Thank you very much for your help. It worked. But it is very slow for very big data, and I believe there is not much to do to find a more efficient way.

Best regards

AlexisOlson · ‎04-19-2022

I suspect the step in @Vijay_A_Verma's solution that causes the slow down is #"Added Custom1".

See if this is faster for bigger data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    Vals = List.Buffer(#"Added Index"[value]),
    ListGen = List.Generate(
        () => [i = 0, C = Vals{0}, D = 1],
        each [i] <= List.Count(Vals),
        each [
            i = [i] + 1,
            C = if Number.Abs(Vals{[i]} - [C]) < 4 then [C] else Vals{[i]},
            D = if C = [C] then 2 else 1
        ]
    ),
    ToTable = Table.FromRecords(ListGen),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, ToTable, {"i"}, "ToTable", JoinKind.LeftOuter),
    #"Expanded ToTable" = Table.ExpandTableColumn(#"Merged Queries", "ToTable", {"C", "D"}, {"new value", "is there a change"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded ToTable", each ([is there a change] = 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"datetime", "value"})
in
    #"Removed Other Columns"

nebula · ‎04-24-2022

Thanks alot for the help. Your method is faster. I was getting into trouble in @Vijay_A_Verma's solution for 2 million rows. In your solution I was able to perform operation on 2 million rows easily. But for more than 4 million rows I had problems like computer stuck and won't respond forever. Then I decided to go to another pc with more performance (i9-11900K with 64gb ram, the previous one was a i7-7500u with 8gb ram) The difference is tremendous. Now I can do 8 million rows operations in around 4 minutes and create smaller file with 800k rows.

Keeping data that has changed more than a value and removing the others in Power Query

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Keeping data that has changed more than a value and removing the others in Power Query

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025