Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hi everyone. I have a dataset containing updates to records. When changes are made, and the user hits 'save', an entry is made for every field that is changed in the record. Sometimes this results in dozens of entries being recorded for what was really one update. I am attempting to do some rate limiting on the dataset to get a more accurate number of updates.
The basic logic is to include the first update in the dataset, and then include only updates that are at least 5 minutes after the previously included update; but it's been giving me fits for a while now. I've got all the preliminary filtering done to reduce the dataset to only the potential candidates, have grouped them by record identifier, and added an index column.
My current best effort:
Table.AddColumn(#"Custom 5", "finalFilter", each let
fauxTable = [Grouped rows], // Grouped Rows is the table created by the previous step.
fltrVar =
Table.AddColumn(
fauxTable,
"Filter",
each
if [Index] = 0 // This bit works
then "Include"
else if (x) => Duration.TotalSeconds([ENTRY_TIME] - x[ENTRY_TIME]) > (5 * 60) // This bit errors
// I've also tried [ENTRY_TIME] = (5 * (1/24/60)) >= x[ENTRY_TIME] with the same error
and x[Filter] = "Include" // even with this line commented out the section fails
then "Include"
else "Exclude") // Not sure on this one, but given the simplicity I assume it works.
in fltrVar)
A dummy sample of a dataset and current output with what I believe to be all relevant columns. Desired Output and Reason are my notes, which are not in the dataset.
UPDATE_ID | ENTRY_DATE | Index | Filter | Desired Output | Reason | |
1 | 03Feb23 17:00:00 | 0 | Include | Include | Index = 0 | |
8 | 03Feb23 17:53:51 | 1 | [Error] | Exclude | Less than 5 minutes since 03FEB23 17:00:00 | |
20 | 03Feb23 17:55:03 | 2 | [Error] | Include | More than 5 minutes since 03FEB23 17:00:00 | |
29 | 03Feb23 17:59:00 | 3 | [Error] | Exclude | Less than 5 minutes since 03FEB23 17:55:03 | |
30 | 03Feb23 23:23:45 | 4 | [Error] | Include | More than 5 minutes since 03FEB23 17:55:03 | |
87 | 04Feb23 09:28:40 | 5 | [Error] | Include | More than 5 minutes since 03FEB23 23:23:45 |
Any suggestions would be much apprecaited, doubly so if they include a link to help me learn why... but at this point I'm more than willing to copy and paste a black box and get this off my plate.
Thanks!
Solved! Go to Solution.
Hello, @cmbyrd your (x)=> function is just a declaration. Ones you define it - call it with some x value to get true or false as a result. You would want to get latest entry_date with filter = "included". But you are in the process of creation of this column. So this approach won't work. Try the code below with List.Generate. The idea behind it is to get list of records of your table, iterate it one by one (always having time of the latest "approved" entry_date) and generating your "filter" column.
By the way your example is not correct. There are 53 minutes between 1st and 2nd entry. 2nd must be "Included".
Another options would be to use List.Accumulate. One could even filter out (skip) those "excluded" but you decided to keep them. There is also some room for algo optimization (e.g. find and select next record with entry_date + 5mins while travelling over "entry_date" column. But that's another story 😀
code updated - bug fixing
let
raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
rec = List.Buffer(Table.ToRecords(Source)),
generate_filter =
List.Generate(
() => rec{0} & [Filter = "Include", last_included = rec{0}[ENTRY_DATE], go = true],
(x) => x[go],
(x) =>
try
let
current_record = rec{x[Index] + 1},
good_to_go = Duration.Minutes(current_record[ENTRY_DATE] - x[last_included]) >= 5,
add_fields =
if good_to_go
then [Filter = "Include", last_included = current_record[ENTRY_DATE], go = true]
else [Filter = "Exclude", last_included = x[last_included], go = true]
in current_record & add_fields
otherwise [go = false]
),
z = Table.RemoveColumns(Table.FromRecords(generate_filter), {"last_included", "go"})
in
z
Hello, @cmbyrd your (x)=> function is just a declaration. Ones you define it - call it with some x value to get true or false as a result. You would want to get latest entry_date with filter = "included". But you are in the process of creation of this column. So this approach won't work. Try the code below with List.Generate. The idea behind it is to get list of records of your table, iterate it one by one (always having time of the latest "approved" entry_date) and generating your "filter" column.
By the way your example is not correct. There are 53 minutes between 1st and 2nd entry. 2nd must be "Included".
Another options would be to use List.Accumulate. One could even filter out (skip) those "excluded" but you decided to keep them. There is also some room for algo optimization (e.g. find and select next record with entry_date + 5mins while travelling over "entry_date" column. But that's another story 😀
code updated - bug fixing
let
raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
rec = List.Buffer(Table.ToRecords(Source)),
generate_filter =
List.Generate(
() => rec{0} & [Filter = "Include", last_included = rec{0}[ENTRY_DATE], go = true],
(x) => x[go],
(x) =>
try
let
current_record = rec{x[Index] + 1},
good_to_go = Duration.Minutes(current_record[ENTRY_DATE] - x[last_included]) >= 5,
add_fields =
if good_to_go
then [Filter = "Include", last_included = current_record[ENTRY_DATE], go = true]
else [Filter = "Exclude", last_included = x[last_included], go = true]
in current_record & add_fields
otherwise [go = false]
),
z = Table.RemoveColumns(Table.FromRecords(generate_filter), {"last_included", "go"})
in
z
Thank you so much! Fiddled a bit to figure out exactly what was going on, I think I learned at least a little, and I've got this working after pulling my hair out for many hours.
this one is even shorter. Selects "include" records. Tables "all" contain group of records with < 5 min interval.
let
raw = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Xc3BDQAhCETRVjacTcRB40orhP7bWMHTev0vMGbUqBCqVDDkaVOZd2DyYvReNGSHlgS+bUQ4tm6LIGnyu4Mo4mc/czOtH1uK2B/k/gE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [UPDATE_ID = _t, ENTRY_DATE = _t, Index = _t]),
Source = Table.TransformColumnTypes(raw,{{"ENTRY_DATE", type datetime}, {"Index", Int64.Type}}),
groups =
Table.Group(
Source, {"ENTRY_DATE"},
{{"UPDATE_ID", each List.Max([UPDATE_ID])}, {"all", each _}},
GroupKind.Local,
(x, y) => Number.From(Duration.Minutes(y[ENTRY_DATE] - x[ENTRY_DATE]) >= 5)
)
in
groups
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
17 | |
15 | |
13 | |
9 | |
8 |