Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
nebula
New Member

Keeping data that has changed more than a value and removing the others in Power Query

Hello
I have a dataset that I cannot fit in Excel and would like to filter the data according to the following logic:
The dataset has two columns: a datetime column and a value column.
The logic for which rows to be retained is as follows:

* 1st record is always kept
* For records other than the first one:
if
|value(i) - value (i-1)| >= 4 then keep this record
else
discard (remove) the record

This way, I target to retain only the records (datetime and value pairs) where the value has changed by a limit value of 4 or more with respect to the previous kept record.

I managed to perform the above logic in excel by constructing a new output column that creates values for each row using either new one or previous one with simple condition (if) function.
Then I filtered that new column and get rid of the repeating values (locally). I tried to explain the details in the following picture, and the attached excel file.

 

datachng.png

I would like to perform this logic in power query so that I can get rid of the data that I do not need and have less rows that will fit in excel.

 

I would be very glad if someone can help me with this.

 

Best regards

 

2 ACCEPTED SOLUTIONS
Vijay_A_Verma
Super User
Super User

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately)

My first column has date time in sequence, hence may not match yours but second column will match for you.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.Accumulate(List.FirstN(#"Changed Type"[value],[Index]),0,(s,c)=> if Number.Abs(c-s)<4 then s else c)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each try if #"Added Custom"[Custom]{[Index]-2}=[Custom] then 2 else 1 otherwise 1),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom.1] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom.1", "Custom"})
in
    #"Removed Columns"

View solution in original post

I suspect the step in @Vijay_A_Verma's solution that causes the slow down is #"Added Custom1".

 

See if this is faster for bigger data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    Vals = List.Buffer(#"Added Index"[value]),
    ListGen = List.Generate(
        () => [i = 0, C = Vals{0}, D = 1],
        each [i] <= List.Count(Vals),
        each [
            i = [i] + 1,
            C = if Number.Abs(Vals{[i]} - [C]) < 4 then [C] else Vals{[i]},
            D = if C = [C] then 2 else 1
        ]
    ),
    ToTable = Table.FromRecords(ListGen),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, ToTable, {"i"}, "ToTable", JoinKind.LeftOuter),
    #"Expanded ToTable" = Table.ExpandTableColumn(#"Merged Queries", "ToTable", {"C", "D"}, {"new value", "is there a change"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded ToTable", each ([is there a change] = 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"datetime", "value"})
in
    #"Removed Other Columns"

View solution in original post

4 REPLIES 4
Vijay_A_Verma
Super User
Super User

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately)

My first column has date time in sequence, hence may not match yours but second column will match for you.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each List.Accumulate(List.FirstN(#"Changed Type"[value],[Index]),0,(s,c)=> if Number.Abs(c-s)<4 then s else c)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each try if #"Added Custom"[Custom]{[Index]-2}=[Custom] then 2 else 1 otherwise 1),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom.1] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom.1", "Custom"})
in
    #"Removed Columns"

Thank you very much for your help. It worked. But it is very slow for very big data, and I believe there is not much to do to find a more efficient way.

Best regards

 

I suspect the step in @Vijay_A_Verma's solution that causes the slow down is #"Added Custom1".

 

See if this is faster for bigger data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fdJBDoQgEETRqxjWJtNVoIBXMd7/GlKztjph99Kh+eG+S+C3DoPcIq7/KXsByrN/ITKkMAzWhdNY0yANHsLD4ClsBnu2z1jYjc1l57chElOdYYz+PqgNzaJQHJrkUByaclAcVoM9mxwZzmQhRvIUIsPs91CFqkMVquadVKG2Jp8X", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [datetime = _t, value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"datetime", type datetime}, {"value", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    Vals = List.Buffer(#"Added Index"[value]),
    ListGen = List.Generate(
        () => [i = 0, C = Vals{0}, D = 1],
        each [i] <= List.Count(Vals),
        each [
            i = [i] + 1,
            C = if Number.Abs(Vals{[i]} - [C]) < 4 then [C] else Vals{[i]},
            D = if C = [C] then 2 else 1
        ]
    ),
    ToTable = Table.FromRecords(ListGen),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, ToTable, {"i"}, "ToTable", JoinKind.LeftOuter),
    #"Expanded ToTable" = Table.ExpandTableColumn(#"Merged Queries", "ToTable", {"C", "D"}, {"new value", "is there a change"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded ToTable", each ([is there a change] = 1)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"datetime", "value"})
in
    #"Removed Other Columns"

Thanks alot for the help. Your method is faster. I was getting into trouble in @Vijay_A_Verma's solution for 2 million rows. In your solution I was able to perform operation on 2 million rows easily. But for more than 4 million rows I had problems like computer stuck and won't respond forever. Then I decided to go to another pc with more performance (i9-11900K with 64gb ram, the previous one was a i7-7500u with 8gb ram) The difference is tremendous. Now I can do 8 million rows operations in around 4 minutes and create smaller file with 800k rows.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Top Solution Authors