The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi People,
As a newbie to Power Query language, I looking for guidance.
I have a large table (more than 60000 entries) of training records within which some people have repeated the learning activity more than once within a period of 21 days, even on the same day. Removing duplicate entries on the same day I am OK with including using the table.buffer to keep the sorted data in memory. Figuring out the formula for the other removal is not as easy, made probably more complicated by the desire to also run the check within the 21 day period e.g. someone may have initial completed the learning activity on 1 May, then repeats it on 15 May (delete this entry), repeats again on 25 May (delete this entry because its within 21 days of the previous). Only want to retain the 1 May entry.
The data is sorted by UserID, CourseID and Completion Date and the duplication check is then looking for duplicate entries within that sort. Hope that makes sense.
Have I provided enough information?
TIA. Andrew
Solved! Go to Solution.
Wait what, are you running this in Excel?
let
Source = Excel.CurrentWorkbook(),
Table61617_2 = Source{[Name="Table61617_2"]}[Content],
#"Removed Other Columns" = Table.SelectColumns(Table61617_2,{"Participant ID", "Course ID", "Course Completion Date"}),
#"Changed Type" = Table.Buffer(Table.TransformColumnTypes(#"Removed Other Columns",{{"Course Completion Date", type date}})),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Participant ID", "Course ID"}, {{"Rows", each _, type table [Course Completion Date=nullable date]}}),
Check = (tbl)=> Table.AddColumn(tbl,"Flag",(k)=> Table.RowCount(Table.SelectRows(tbl,each [Course Completion Date]<k[Course Completion Date] and [Course Completion Date]>k[Course Completion Date]-#duration(21,0,0,0)))),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "FlagNew", each if Table.RowCount([Rows])=1 then Table.AddColumn([Rows],"Flag",each 0) else Check([Rows])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Participant ID", "Course ID", "FlagNew"}),
#"Expanded FlagNew" = Table.ExpandTableColumn(#"Removed Other Columns1", "FlagNew", {"Course Completion Date", "Flag"}, {"Course Completion Date", "Flag"})
in
#"Expanded FlagNew"
Looks like I ran into the no headers in table problem. Going to start a new thread as recommended.
if you delete the 15 May entry then the 25 May entry has nothing to compare to. Instead of deleting rows you will want to mark them.
Thanks. That is what the original VBA was doing - it created the Keep/Delete>21 Days entries. Was hoping to avoid that:-). I tried/am trying to send the data as a proper table but keep getting an error message of too many characters. Is it still useful to try to send?
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZa9rt4gDIZvJfrmI8U2f2E/Wy/h9AwdsnWo1Hbo3ddgSIAASbtEivT4xRjz4o+PFwCCd8q83l6kPH8VrGBWAnT882Xff7w+3zoY4gpbxr7+BlD7Qri8f/vz8xKgA4NQBCxXZuMvA6AC40sGwVFU8HFht7JUJ78aI1rBMUY4zC8HsFb4+pX3NFqbog6axLguE3cCVupHMMhPMOT8aJ4fOHQU1w5f8CvEtbf2XCoM1YqUt9HVBQdI/tg317Or22B8NFL2B7paT/JtMGTMZmyuqyD10ZlITzdhWpaPx9DX9cqiORqmOrelZaQICtPdaNeuMWZOqfnapFX4GmnWtqdbzEnPTGp1BoTikr3V1aIrJR3q8j3VVp8bpOIu1x5RYny20oszj0gByqWeGXhPiZEuruET3X9IRAEkIzLdy1ZhCAX2vn/ff+0nSEjHbQ+mavsbG2KNngaxKYgmuI0OoMKo9MrBvqsA7pbbyuYAhEeJROxeF/yGsYzJvrkTdLdzK4xfC2nwsSvlgOTyRiocbbenm7CtwKqDOMH4qt3rCfZAzzzTM3d6zuiiQCZ1gG31asymC2LHdawCgrPYmwNNAUSXgF4igvGB3jZKDlDQ6i5XRnrwbm3B7Oj2FSA906OpHs8C6OGwEcp22rpNg6kCG+mFd4JvThyjqqesYPL7SBPGi73h8Yb284qYeTBqDQPajXANjy4Lfqj6s16JPbC5OgDvbS4HyGCqcGBzE6zuIKeULC0Z6/5DN8NmejzqnIPzcEPoH1W2wNA+KBQqXRmy+u+AS1NHUAZbviTd2TszNtWr+8xOsPaiexVHLbDpddPdea/CQjHdaN77/As=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Participant ID" = _t, #"Course ID" = _t, #"Course Completion" = _t, Decision = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Participant ID", Int64.Type}, {"Course ID", Int64.Type}, {"Course Completion", type date}, {"Decision", type text}},"en-GB"),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Flag", (k)=> Table.RowCount( Table.SelectRows(#"Changed Type",each [Participant ID]=k[Participant ID] and [Course ID]=k[Course ID] and [Course Completion]<k[Course Completion] and [Course Completion]>k[Course Completion]-#duration(21,0,0,0))))
in
#"Added Custom"
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".
Thanks very much for this:-)
I've done a test with the full data set and discovered the code only gerenates a small subset of the data. The full data will expand in each month's report. Wondering how this might be achieved?
Andrew
Please provide sanitized sample data that fully covers your issue.
Apologies for the delay. Here's the link to the full set of anaymised data https://correctionsnz.sharefile.com/d-s5fd9a416deda4f49b12e324509ca082a. As I noted, the data set will expand monthly and is always in a table named "Table61617". Thanks for your help.
Here is a preliminary version. It is not very fast but I think I can improve on it.
let
Source = Excel.Workbook(Binary.Buffer(File.Contents("C:\Users\xxx\Downloads\Anonymised Data 230523.xlsx")), null, true),
Table61617_2_Table = Source{[Item="Table61617_2",Kind="Table"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(Table61617_2_Table,{"Participant ID", "Course ID", "Course Completion Date"}),
#"Changed Type" = Table.Buffer(Table.TransformColumnTypes(#"Removed Other Columns",{{"Course Completion Date", type date}})),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Flag", (k)=> Table.RowCount( Table.SelectRows(#"Changed Type",each [Participant ID]=k[Participant ID] and [Course ID]=k[Course ID] and [Course Completion Date]<k[Course Completion Date] and [Course Completion Date]>k[Course Completion Date]-#duration(21,0,0,0))))
in
#"Added Custom"
This version performs better over larger datasets.
let
Source = Excel.Workbook(File.Contents("C:\Users\xxx\Downloads\Anonymised Data 230523.xlsx"), null, true),
Table61617_2_Table = Source{[Item="Table61617_2",Kind="Table"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(Table61617_2_Table,{"Participant ID", "Course ID", "Course Completion Date"}),
#"Changed Type" = Table.Buffer(Table.TransformColumnTypes(#"Removed Other Columns",{{"Course Completion Date", type date}})),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Participant ID", "Course ID"}, {{"Rows", each _, type table [Course Completion Date=nullable date]}}),
Check = (tbl)=> Table.AddColumn(tbl,"Flag",(k)=> Table.RowCount(Table.SelectRows(tbl,each [Course Completion Date]<k[Course Completion Date] and [Course Completion Date]>k[Course Completion Date]-#duration(21,0,0,0)))),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "FlagNew", each if Table.RowCount([Rows])=1 then Table.AddColumn([Rows],"Flag",each 0) else Check([Rows])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Participant ID", "Course ID", "FlagNew"}),
#"Expanded FlagNew" = Table.ExpandTableColumn(#"Removed Other Columns1", "FlagNew", {"Course Completion Date", "Flag"}, {"Course Completion Date", "Flag"})
in
#"Expanded FlagNew"
I've modified the Source based on what the editor cmoes up with then I load the Advanced editor.
let
Source = Excel.CurrentWorkbook(){[Name="Table61617_2"]}[Content],
Table61617_2_Table = Source{[Item="Table61617_2",Kind="Table"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(Table61617_2_Table,{"Participant ID", "Course ID", "Course Completion Date"}),
#"Changed Type" = Table.Buffer(Table.TransformColumnTypes(#"Removed Other Columns",{{"Course Completion Date", type date}})),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Participant ID", "Course ID"}, {{"Rows", each _, type table [Course Completion Date=nullable date]}}),
Check = (tbl)=> Table.AddColumn(tbl,"Flag",(k)=> Table.RowCount(Table.SelectRows(tbl,each [Course Completion Date]<k[Course Completion Date] and [Course Completion Date]>k[Course Completion Date]-#duration(21,0,0,0)))),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "FlagNew", each if Table.RowCount([Rows])=1 then Table.AddColumn([Rows],"Flag",each 0) else Check([Rows])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Participant ID", "Course ID", "FlagNew"}),
#"Expanded FlagNew" = Table.ExpandTableColumn(#"Removed Other Columns1", "FlagNew", {"Course Completion Date", "Flag"}, {"Course Completion Date", "Flag"})
in
#"Expanded FlagNew"
Getting the following error. Possibly due to the Source code?
Thanks for your support.
Wait what, are you running this in Excel?
let
Source = Excel.CurrentWorkbook(),
Table61617_2 = Source{[Name="Table61617_2"]}[Content],
#"Removed Other Columns" = Table.SelectColumns(Table61617_2,{"Participant ID", "Course ID", "Course Completion Date"}),
#"Changed Type" = Table.Buffer(Table.TransformColumnTypes(#"Removed Other Columns",{{"Course Completion Date", type date}})),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Participant ID", "Course ID"}, {{"Rows", each _, type table [Course Completion Date=nullable date]}}),
Check = (tbl)=> Table.AddColumn(tbl,"Flag",(k)=> Table.RowCount(Table.SelectRows(tbl,each [Course Completion Date]<k[Course Completion Date] and [Course Completion Date]>k[Course Completion Date]-#duration(21,0,0,0)))),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "FlagNew", each if Table.RowCount([Rows])=1 then Table.AddColumn([Rows],"Flag",each 0) else Check([Rows])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Participant ID", "Course ID", "FlagNew"}),
#"Expanded FlagNew" = Table.ExpandTableColumn(#"Removed Other Columns1", "FlagNew", {"Course Completion Date", "Flag"}, {"Course Completion Date", "Flag"})
in
#"Expanded FlagNew"
Marvellous. Thanks so much for your knowledge and hard work. So much appreciated.
Yes. Sorry I didn't note this.
see message 17
Thanks. Unfortunately, the current data set is 47798 rows of data with 19 columns so will be too large to paste into the forum. The row numbers will increase at least 1000 in the following month. Any other way of providing the sanitised data?
Thanks. I missed the bit about uploading data with a link. Apologies. In process of anonymising the data which might take a while so aim to post a link tomorrow. Thanks again for your help.
Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
https://community.powerbi.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-Forum/ba-...
Please show the expected outcome based on the sample data you provided.
https://community.powerbi.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447523
Thanks. Below is a small snippet of the data with the first 3 columns being the criteria for the delete decision. The Action column contains the main reason for keeping or deleting, while the Reason for delete expands on that.
Participant IDCourse IDCourse Completion DateActionReason for Delete
|
Hope this works. Was getting an error that the message was too long.
Andrew
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.