Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
cmbyrd
Frequent Visitor

Rate limited record inclusion help?

Hi everyone.  I have a dataset containing updates to records.  When changes are made, and the user hits 'save', an entry is made for every field that is changed in the record.  Sometimes this results in dozens of entries being recorded for what was really one update.  I am attempting to do some rate limiting on the dataset to get a more accurate number of updates.

 

The basic logic is to include the first update in the dataset, and then include only updates that are at least 5 minutes after the previously included update; but it's been giving me fits for a while now.  I've got all the preliminary filtering done to reduce the dataset to only the potential candidates, have grouped them by record identifier, and added an index column.

 

 

My current best effort:

Table.AddColumn(#"Custom 5", "finalFilter", each let
fauxTable = [Grouped rows], // Grouped Rows is the table created by the previous step.
fltrVar = 
    Table.AddColumn(
        fauxTable,
        "Filter",
        each  
            if      [Index] = 0 // This bit works
            then    "Include"   

            else if (x) => Duration.TotalSeconds([ENTRY_TIME] - x[ENTRY_TIME]) > (5 * 60) // This bit errors
                        // I've also tried [ENTRY_TIME] = (5 * (1/24/60)) >= x[ENTRY_TIME] with the same error
            and     x[Filter] = "Include" // even with this line commented out the section fails
            then    "Include"

            else    "Exclude") // Not sure on this one, but given the simplicity I assume it works.
in  fltrVar)

 

A dummy sample of a dataset and current output with what I believe to be all relevant columns.  Desired Output and Reason are my notes, which are not in the dataset.  

 

UPDATE_IDENTRY_DATEIndexFilter Desired OutputReason
103Feb23 17:00:000Include IncludeIndex = 0
803Feb23 17:53:511[Error] ExcludeLess than 5 minutes since 03FEB23 17:00:00
2003Feb23 17:55:032[Error] IncludeMore than 5 minutes since 03FEB23 17:00:00
2903Feb23 17:59:003[Error] ExcludeLess than 5 minutes since 03FEB23 17:55:03
3003Feb23 23:23:454[Error] IncludeMore than 5 minutes since 03FEB23 17:55:03
8704Feb23 09:28:405[Error] IncludeMore than 5 minutes since 03FEB23 23:23:45

 

Any suggestions would be much apprecaited, doubly so if they include a link to help me learn why... but at this point I'm more than willing to copy and paste a black box and get this off my plate.

 

Thanks!

1 ACCEPTED SOLUTION
AlienSx
Super User
Super User

Hello, @cmbyrd your (x)=> function is just a declaration. Ones you define it - call it with some x value to get true or false as a result. You would want to get latest entry_date with filter = "included". But you are in the process of creation of this column. So this approach won't work. Try the code below with List.Generate. The idea behind it is to get list of records of your table, iterate it one by one (always having time of the latest "approved" entry_date) and generating your "filter" column.

By the way your example is not correct. There are 53 minutes between 1st and 2nd entry. 2nd must be "Included".

Another options would be to use List.Accumulate. One could even filter out (skip) those "excluded" but you decided to keep them. There is also some room for algo optimization (e.g. find and select next record with entry_date + 5mins while travelling over "entry_date" column. But that's another story 😀

code updated - bug fixing

let
    raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
    Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
    rec = List.Buffer(Table.ToRecords(Source)),
    generate_filter = 
        List.Generate(
            () => rec{0} & [Filter = "Include", last_included = rec{0}[ENTRY_DATE], go = true],
            (x) => x[go],
            (x) => 
                try
                    let 
                        current_record = rec{x[Index] + 1},
                        good_to_go = Duration.Minutes(current_record[ENTRY_DATE] - x[last_included]) >= 5,
                        add_fields = 
                        if good_to_go 
                        then [Filter = "Include", last_included = current_record[ENTRY_DATE], go =  true]
                        else [Filter = "Exclude", last_included = x[last_included], go =  true]
                    in current_record & add_fields
                otherwise [go = false]
        ),
    z = Table.RemoveColumns(Table.FromRecords(generate_filter), {"last_included", "go"})
in
    z

 

 

View solution in original post

3 REPLIES 3
AlienSx
Super User
Super User

Hello, @cmbyrd your (x)=> function is just a declaration. Ones you define it - call it with some x value to get true or false as a result. You would want to get latest entry_date with filter = "included". But you are in the process of creation of this column. So this approach won't work. Try the code below with List.Generate. The idea behind it is to get list of records of your table, iterate it one by one (always having time of the latest "approved" entry_date) and generating your "filter" column.

By the way your example is not correct. There are 53 minutes between 1st and 2nd entry. 2nd must be "Included".

Another options would be to use List.Accumulate. One could even filter out (skip) those "excluded" but you decided to keep them. There is also some room for algo optimization (e.g. find and select next record with entry_date + 5mins while travelling over "entry_date" column. But that's another story 😀

code updated - bug fixing

let
    raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
    Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
    rec = List.Buffer(Table.ToRecords(Source)),
    generate_filter = 
        List.Generate(
            () => rec{0} & [Filter = "Include", last_included = rec{0}[ENTRY_DATE], go = true],
            (x) => x[go],
            (x) => 
                try
                    let 
                        current_record = rec{x[Index] + 1},
                        good_to_go = Duration.Minutes(current_record[ENTRY_DATE] - x[last_included]) >= 5,
                        add_fields = 
                        if good_to_go 
                        then [Filter = "Include", last_included = current_record[ENTRY_DATE], go =  true]
                        else [Filter = "Exclude", last_included = x[last_included], go =  true]
                    in current_record & add_fields
                otherwise [go = false]
        ),
    z = Table.RemoveColumns(Table.FromRecords(generate_filter), {"last_included", "go"})
in
    z

 

 

Thank you so much!  Fiddled a bit to figure out exactly what was going on, I think I learned at least a little, and I've got this working after pulling my hair out for many hours.

this one is even shorter. Selects "include" records. Tables "all" contain group of records with < 5 min interval. 

 

let
    raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
    Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
    groups = 
        Table.Group(
            Source, {"ENTRY_DATE"}, 
            {{"UPDATE_ID", each List.Max([UPDATE_ID])}, {"all", each _}},
            GroupKind.Local,
            (x, y) => Number.From(Duration.Minutes(y[ENTRY_DATE] - x[ENTRY_DATE]) >= 5)
        )
in
    groups

 

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

Top Solution Authors
Top Kudoed Authors