Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
cmbyrd
Frequent Visitor

Rate limited record inclusion help?

Hi everyone.  I have a dataset containing updates to records.  When changes are made, and the user hits 'save', an entry is made for every field that is changed in the record.  Sometimes this results in dozens of entries being recorded for what was really one update.  I am attempting to do some rate limiting on the dataset to get a more accurate number of updates.

 

The basic logic is to include the first update in the dataset, and then include only updates that are at least 5 minutes after the previously included update; but it's been giving me fits for a while now.  I've got all the preliminary filtering done to reduce the dataset to only the potential candidates, have grouped them by record identifier, and added an index column.

 

 

My current best effort:

Table.AddColumn(#"Custom 5", "finalFilter", each let
fauxTable = [Grouped rows], // Grouped Rows is the table created by the previous step.
fltrVar = 
    Table.AddColumn(
        fauxTable,
        "Filter",
        each  
            if      [Index] = 0 // This bit works
            then    "Include"   

            else if (x) => Duration.TotalSeconds([ENTRY_TIME] - x[ENTRY_TIME]) > (5 * 60) // This bit errors
                        // I've also tried [ENTRY_TIME] = (5 * (1/24/60)) >= x[ENTRY_TIME] with the same error
            and     x[Filter] = "Include" // even with this line commented out the section fails
            then    "Include"

            else    "Exclude") // Not sure on this one, but given the simplicity I assume it works.
in  fltrVar)

 

A dummy sample of a dataset and current output with what I believe to be all relevant columns.  Desired Output and Reason are my notes, which are not in the dataset.  

 

UPDATE_IDENTRY_DATEIndexFilter Desired OutputReason
103Feb23 17:00:000Include IncludeIndex = 0
803Feb23 17:53:511[Error] ExcludeLess than 5 minutes since 03FEB23 17:00:00
2003Feb23 17:55:032[Error] IncludeMore than 5 minutes since 03FEB23 17:00:00
2903Feb23 17:59:003[Error] ExcludeLess than 5 minutes since 03FEB23 17:55:03
3003Feb23 23:23:454[Error] IncludeMore than 5 minutes since 03FEB23 17:55:03
8704Feb23 09:28:405[Error] IncludeMore than 5 minutes since 03FEB23 23:23:45

 

Any suggestions would be much apprecaited, doubly so if they include a link to help me learn why... but at this point I'm more than willing to copy and paste a black box and get this off my plate.

 

Thanks!

1 ACCEPTED SOLUTION
AlienSx
Super User
Super User

Hello, @cmbyrd your (x)=> function is just a declaration. Ones you define it - call it with some x value to get true or false as a result. You would want to get latest entry_date with filter = "included". But you are in the process of creation of this column. So this approach won't work. Try the code below with List.Generate. The idea behind it is to get list of records of your table, iterate it one by one (always having time of the latest "approved" entry_date) and generating your "filter" column.

By the way your example is not correct. There are 53 minutes between 1st and 2nd entry. 2nd must be "Included".

Another options would be to use List.Accumulate. One could even filter out (skip) those "excluded" but you decided to keep them. There is also some room for algo optimization (e.g. find and select next record with entry_date + 5mins while travelling over "entry_date" column. But that's another story 😀

code updated - bug fixing

let
    raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
    Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
    rec = List.Buffer(Table.ToRecords(Source)),
    generate_filter = 
        List.Generate(
            () => rec{0} & [Filter = "Include", last_included = rec{0}[ENTRY_DATE], go = true],
            (x) => x[go],
            (x) => 
                try
                    let 
                        current_record = rec{x[Index] + 1},
                        good_to_go = Duration.Minutes(current_record[ENTRY_DATE] - x[last_included]) >= 5,
                        add_fields = 
                        if good_to_go 
                        then [Filter = "Include", last_included = current_record[ENTRY_DATE], go =  true]
                        else [Filter = "Exclude", last_included = x[last_included], go =  true]
                    in current_record & add_fields
                otherwise [go = false]
        ),
    z = Table.RemoveColumns(Table.FromRecords(generate_filter), {"last_included", "go"})
in
    z

 

 

View solution in original post

3 REPLIES 3
AlienSx
Super User
Super User

Hello, @cmbyrd your (x)=> function is just a declaration. Ones you define it - call it with some x value to get true or false as a result. You would want to get latest entry_date with filter = "included". But you are in the process of creation of this column. So this approach won't work. Try the code below with List.Generate. The idea behind it is to get list of records of your table, iterate it one by one (always having time of the latest "approved" entry_date) and generating your "filter" column.

By the way your example is not correct. There are 53 minutes between 1st and 2nd entry. 2nd must be "Included".

Another options would be to use List.Accumulate. One could even filter out (skip) those "excluded" but you decided to keep them. There is also some room for algo optimization (e.g. find and select next record with entry_date + 5mins while travelling over "entry_date" column. But that's another story 😀

code updated - bug fixing

let
    raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
    Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
    rec = List.Buffer(Table.ToRecords(Source)),
    generate_filter = 
        List.Generate(
            () => rec{0} & [Filter = "Include", last_included = rec{0}[ENTRY_DATE], go = true],
            (x) => x[go],
            (x) => 
                try
                    let 
                        current_record = rec{x[Index] + 1},
                        good_to_go = Duration.Minutes(current_record[ENTRY_DATE] - x[last_included]) >= 5,
                        add_fields = 
                        if good_to_go 
                        then [Filter = "Include", last_included = current_record[ENTRY_DATE], go =  true]
                        else [Filter = "Exclude", last_included = x[last_included], go =  true]
                    in current_record & add_fields
                otherwise [go = false]
        ),
    z = Table.RemoveColumns(Table.FromRecords(generate_filter), {"last_included", "go"})
in
    z

 

 

Thank you so much!  Fiddled a bit to figure out exactly what was going on, I think I learned at least a little, and I've got this working after pulling my hair out for many hours.

this one is even shorter. Selects "include" records. Tables "all" contain group of records with < 5 min interval. 

 

let
    raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
    Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
    groups = 
        Table.Group(
            Source, {"ENTRY_DATE"}, 
            {{"UPDATE_ID", each List.Max([UPDATE_ID])}, {"all", each _}},
            GroupKind.Local,
            (x, y) => Number.From(Duration.Minutes(y[ENTRY_DATE] - x[ENTRY_DATE]) >= 5)
        )
in
    groups

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Kudoed Authors