Solved: Re: Help merging rows based on Time Constraints!

Neil_White · ‎07-27-2022

Ok so I have had a steep Power Query learning curve recently, but I have hit a wall and would appreciate any help getting this last bit over the line!!

I receive a CSV file of people clocking in and out of number of structures in the below format.

I need to work out the time (in min) each person was within a particular location, so far so good.

I tried grouping the columns and can work it out if it's a nice simple clock in and out once per day.

However sometimes people clock in before midnight and out the next morning as well as clocking in and out a few times during the day.

What I'm aiming for is something like the below!

Any pointers would be welcomed, but please be gentle!!

BA_Pete · ‎07-28-2022

No worries, that's what I'm here for 🙂

Try this code instead:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZJNC4JAEIb/ingOnK9d2712CgK7iwcRKQkU1P5/K9Rhs5bV08wwzMPD8JZlem3HaegTTA8pITED5sa4gTEDlREQuQG0hdzVYrzVfTfVc/c+Od3b5pEUz9n1l6H5LKpDLFhZMf/B534nl9iqgLDPpSAXHFdH+n49wgMv56JEGzFK8h+PYIwV5ngukoVjtPAWMAfBvrBs4OotwhIfCQQLsCtqK2EvEotwgLsWrl4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, ID = _t, Date = _t, Time = _t, #"Person Group" = _t, Status = _t, #"Attendance Check Point" = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}}),
    addNextStatusExpected = Table.AddColumn(chgTypes, "nextStatusExpected", each if [Status] = "Check Out" then "Check In" else "Check Out"),
    addDateTime = Table.AddColumn(addNextStatusExpected, "dateTime", each [Date] & [Time], type datetime),
    sort_ID_dateTime = Table.Sort(addDateTime,{{"ID", Order.Ascending}, {"dateTime", Order.Ascending}}),
    addIndex1 = Table.AddIndexColumn(sort_ID_dateTime, "Index1", 1, 1, Int64.Type),
    addIndex0 = Table.AddIndexColumn(addIndex1, "Index0", 0, 1, Int64.Type),
    mergeIndex1Index0 = Table.NestedJoin(addIndex0, {"ID", "Index1", "Status"}, addIndex0, {"ID", "Index0", "nextStatusExpected"}, "addIndex0", JoinKind.LeftOuter),
    expandIndex1Index0 = Table.ExpandTableColumn(mergeIndex1Index0, "addIndex0", {"dateTime", "Index1"}, {"addIndex0.dateTime", "addIndex0.Index1"}),
    filterRedundantRows = Table.SelectRows(expandIndex1Index0, each ([Status] = "Check In") or ([Status] = "Check Out" and [addIndex0.dateTime] = null and not List.Contains(List.Buffer(expandIndex1Index0[addIndex0.Index1]), [Index1]))),
    remOthCols = Table.SelectColumns(filterRedundantRows,{"Name", "ID", "Person Group", "Status", "Attendance Check Point", "dateTime", "addIndex0.dateTime"}),
    addTimeWorked = Table.AddColumn(remOthCols, "timeWorked", each [addIndex0.dateTime] - [dateTime])
in
    addTimeWorked

There's two key changes:

1) I added a new [nextStatusExpected] column at step 1, and I use this in the merge to force a blank record to be generated if there's two check outs or check ins in a row.

2) I've added a step called 'filterRedundantRows' which replaces our previous step 6. This is really where the smart stuff is done working out whether a row is the result of a missed check in/out, or whether it's a redundant row that's already been matched correctly.

I now get this output based on amended input data (added a double check in for person 1, and a double check out for person 2):

Let me know how you get on.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

View solution in original post

Neil_White · ‎07-27-2022

Thanks for the quick response Pete.

I'm hoping it's easy and just the perils of self-teaching! You don't know what you don't know!

let
    Source = Folder.Files("K:\Quality\DATA SOURCES\04 - Spencer Facial\Raw"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}, {"Column10", type text}, {"Column11", type text}, {"Column12", type text}, {"Column13", type text}, {"Column14", type any}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each not Text.Contains([Column1], "The Spencer Group")),
    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each not Text.Contains([Column1], "Raw Record Report")),
    #"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each not Text.Contains([Column1], "Total")),
    #"Promoted Headers" = Table.PromoteHeaders(#"Filtered Rows2", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Attendance_Raw Record Report_20220507_20220513.csv", type text}, {"Name#(tab)", type text}, {"ID#(tab)", type text}, {"Skin-Surface Temperature#(tab)", type text}, {"Skin-Surface Temperature Status#(tab)", type text}, {"Date#(tab)", type text}, {"Time#(tab)", type text}, {"Day of Week#(tab)", type text}, {"Person Group#(tab)", type text}, {"Status#(tab)", type text}, {"Attendance Check Point#(tab)", type text}, {"Customized Attendance Status#(tab)", type text}, {"Data Source#(tab)", type text}, {"Handling Type#(tab)", type text}, {"", type any}}),
    #"Filtered Rows3" = Table.SelectRows(#"Changed Type1", each not Text.Contains([#"Name#(tab)"], "name")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Customized Attendance Status#(tab)", "Data Source#(tab)", "Handling Type#(tab)", "", "Skin-Surface Temperature#(tab)", "Skin-Surface Temperature Status#(tab)", "Attendance_Raw Record Report_20220507_20220513.csv"}),
    #"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"ID#(tab)", Text.Trim, type text}, {"Date#(tab)", Text.Trim, type text}, {"Time#(tab)", Text.Trim, type text}, {"Day of Week#(tab)", Text.Trim, type text}, {"Person Group#(tab)", Text.Trim, type text}, {"Status#(tab)", Text.Trim, type text}, {"Attendance Check Point#(tab)", Text.Trim, type text}})
in
    #"Trimmed Text"

BA_Pete · ‎07-27-2022

Hi,

That's not what I'm after, I'm afraid.

If you filter your table to just the rows/columns that you scren-captured for your example, you can then go to the top left in PQ and select 'Copy entire table':

You can then paste this into the 'Enter Data' area in PQ:

Once that new table has been generated, copy out the code from it the same way you just did.

This will give me a ready-made example table that I can work on.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

Neil_White · ‎07-27-2022

Sorry Pete, blatantly out of my depth! Hope this is correct?

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ndA9C4MwEAbgvyLOgsnlq8laKBQKdhA6iINoqCJEiDr03zfSdggUSZxyd5B7eK+q0ru282QSnGYpYCAEYSGlawjOEcsBAbgGcYWEey926JqXKwr7bMwwN8vw/XvudTsmV+PK29T+5nUWCjBFt2HZr3beJYp1OWgAUUwcMWDXQM7g8Tn8U3nEtogyyiWVjIo/pyLbmofujI7KQcIRDAqdwhA/SIxBPka5xsWgEQQPJPwQTqjf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Name#(tab)" = _t, #"ID#(tab)" = _t, #"Date#(tab)" = _t, #"Time#(tab)" = _t, #"Day of Week#(tab)" = _t, #"Person Group#(tab)" = _t, #"Status#(tab)" = _t, #"Attendance Check Point#(tab)" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name#(tab)", type text}, {"ID#(tab)", Int64.Type}, {"Date#(tab)", type date}, {"Time#(tab)", type time}, {"Day of Week#(tab)", type text}, {"Person Group#(tab)", type text}, {"Status#(tab)", type text}, {"Attendance Check Point#(tab)", type text}})
in
    #"Changed Type"

BA_Pete · ‎07-28-2022

No problem. This is probably intermediate-level Power Query so I'll try and explain the steps in as much detail as possible.

First, copy this code, create a new blank query in PQ, open Advanced Editor an paste it over everything in that window. You'll then be able to follow each step:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ndCxCoMwEIDhV5HMgpe7S9JkdSoU7C4OItJKQUHt+zeFdkhpQ9IpCeE+fq5txXlct2UupCgFSiQCaaz1D5IVqAoB0T9AOzD+bNZLP09bv0+vkfo6Dreiue/+flqG90dXpsLKsf0NH+c/XSSnIsGhi1EXvKsTez8WEcDPcVasLVvF5ssiSKYGU7or0cEhOTgHpigcBnOGq3OCPdw9AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, ID = _t, Date = _t, Time = _t, #"Person Group" = _t, Status = _t, #"Attendance Check Point" = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}}),
    addDateTime = Table.AddColumn(chgTypes, "dateTime", each [Date] & [Time], type datetime),
    sort_ID_dateTime = Table.Sort(addDateTime,{{"ID", Order.Ascending}, {"dateTime", Order.Ascending}}),
    addIndex1 = Table.AddIndexColumn(sort_ID_dateTime, "Index1", 1, 1, Int64.Type),
    addIndex0 = Table.AddIndexColumn(addIndex1, "Index0", 0, 1, Int64.Type),
    mergeIndex1Index0 = Table.NestedJoin(addIndex0, {"ID", "Index1"}, addIndex0, {"ID", "Index0"}, "addIndex0", JoinKind.LeftOuter),
    expandIndex1Index0 = Table.ExpandTableColumn(mergeIndex1Index0, "addIndex0", {"dateTime"}, {"addIndex0.dateTime"}),
    filterCheckOutRows = Table.SelectRows(expandIndex1Index0, each ([Status] = "Check In")),
    remOthCols = Table.SelectColumns(filterCheckOutRows,{"Name", "ID", "Person Group", "Attendance Check Point", "dateTime", "addIndex0.dateTime"}),
    addTimeWorked = Table.AddColumn(remOthCols, "timeWorked", each [addIndex0.dateTime] - [dateTime])
in
    addTimeWorked

1) Merge [Date] and [Time] columns into [dateTime] column. This isn't entirely necessary, but makes some subsequent steps a bit simpler.

2) Sort the table by [ID], [dateTime] ascending. This is key to ensure the entries by person are sequential/chronological.

3) Add two index columns, one starting from 1, the other from 0. These are basically new key columns, offset by one, to allow us to merge the table on itself to put the next row values alongside the previous row values.

4) Merge the table on itself on [ID], [Index1] = [ID], [Index0]. This matches the next row for the same person to ensure we on't mix up two people's records.

5) Expand the [dateTime] column from the merged column. This gets us the next [dateTime] value on the current row.

6) Filter out any rows that show "Check Out". We now have the Check Out times on our Check In rows, so we don't need these any more.

7) Remove other columns - just to tidy up a bit.

8 ) Add a [timeWorked] column which is just the difference between our new [dateTime] column an the original.

Overall, this gives the following output:

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

Neil_White · ‎07-28-2022

Thanks Pete, that's really helpful and has helped me see the power or indexing!

It does however through up an issue that it looks like people don't always check out (or in). so this method means I get two check-ins so it looks like someone has worked 24 hours when in fact they worked 30 min and then snuck off without working!

Is there a way to adapt this query so that if we don't have a check in followed by a check out in order than we show an error? Feel free to tell me to go away by the way!

BA_Pete · ‎07-28-2022

No worries, that's what I'm here for 🙂

Try this code instead:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZJNC4JAEIb/ingOnK9d2712CgK7iwcRKQkU1P5/K9Rhs5bV08wwzMPD8JZlem3HaegTTA8pITED5sa4gTEDlREQuQG0hdzVYrzVfTfVc/c+Od3b5pEUz9n1l6H5LKpDLFhZMf/B534nl9iqgLDPpSAXHFdH+n49wgMv56JEGzFK8h+PYIwV5ngukoVjtPAWMAfBvrBs4OotwhIfCQQLsCtqK2EvEotwgLsWrl4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, ID = _t, Date = _t, Time = _t, #"Person Group" = _t, Status = _t, #"Attendance Check Point" = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}}),
    addNextStatusExpected = Table.AddColumn(chgTypes, "nextStatusExpected", each if [Status] = "Check Out" then "Check In" else "Check Out"),
    addDateTime = Table.AddColumn(addNextStatusExpected, "dateTime", each [Date] & [Time], type datetime),
    sort_ID_dateTime = Table.Sort(addDateTime,{{"ID", Order.Ascending}, {"dateTime", Order.Ascending}}),
    addIndex1 = Table.AddIndexColumn(sort_ID_dateTime, "Index1", 1, 1, Int64.Type),
    addIndex0 = Table.AddIndexColumn(addIndex1, "Index0", 0, 1, Int64.Type),
    mergeIndex1Index0 = Table.NestedJoin(addIndex0, {"ID", "Index1", "Status"}, addIndex0, {"ID", "Index0", "nextStatusExpected"}, "addIndex0", JoinKind.LeftOuter),
    expandIndex1Index0 = Table.ExpandTableColumn(mergeIndex1Index0, "addIndex0", {"dateTime", "Index1"}, {"addIndex0.dateTime", "addIndex0.Index1"}),
    filterRedundantRows = Table.SelectRows(expandIndex1Index0, each ([Status] = "Check In") or ([Status] = "Check Out" and [addIndex0.dateTime] = null and not List.Contains(List.Buffer(expandIndex1Index0[addIndex0.Index1]), [Index1]))),
    remOthCols = Table.SelectColumns(filterRedundantRows,{"Name", "ID", "Person Group", "Status", "Attendance Check Point", "dateTime", "addIndex0.dateTime"}),
    addTimeWorked = Table.AddColumn(remOthCols, "timeWorked", each [addIndex0.dateTime] - [dateTime])
in
    addTimeWorked

There's two key changes:

1) I added a new [nextStatusExpected] column at step 1, and I use this in the merge to force a blank record to be generated if there's two check outs or check ins in a row.

2) I've added a step called 'filterRedundantRows' which replaces our previous step 6. This is really where the smart stuff is done working out whether a row is the result of a missed check in/out, or whether it's a redundant row that's already been matched correctly.

I now get this output based on amended input data (added a double check in for person 1, and a double check out for person 2):

Let me know how you get on.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

Neil_White · ‎07-28-2022

Pete you really are a star. Really helpful but also instructional!

Thank you so much for all your help!

BA_Pete · ‎07-28-2022

No problem at all, happy to help.

I've just made a final tweak, really only cosmetic, but makes the overall output a bit more intuitive I think:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZJNC4JAEIb/ingOnK9d2712CgK7iwcRKQkU1P5/K9Rhs5bV08wwzMPD8JZlem3HaegTTA8pITED5sa4gTEDlREQuQG0hdzVYrzVfTfVc/c+Od3b5pEUz9n1l6H5LKpDLFhZMf/B534nl9iqgLDPpSAXHFdH+n49wgMv56JEGzFK8h+PYIwV5ngukoVjtPAWMAfBvrBs4OotwhIfCQQLsCtqK2EvEotwgLsWrl4=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, ID = _t, Date = _t, Time = _t, #"Person Group" = _t, Status = _t, #"Attendance Check Point" = _t]),
    chgTypes = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Time", type time}}),
    addNextStatusExpected = Table.AddColumn(chgTypes, "nextStatusExpected", each if [Status] = "Check Out" then "Check In" else "Check Out"),
    addDateTime = Table.AddColumn(addNextStatusExpected, "dateTime", each [Date] & [Time], type datetime),
    sort_ID_dateTime = Table.Sort(addDateTime,{{"ID", Order.Ascending}, {"dateTime", Order.Ascending}}),
    addIndex1 = Table.AddIndexColumn(sort_ID_dateTime, "Index1", 1, 1, Int64.Type),
    addIndex0 = Table.AddIndexColumn(addIndex1, "Index0", 0, 1, Int64.Type),
    mergeIndex1Index0 = Table.NestedJoin(addIndex0, {"ID", "Index1", "Status"}, addIndex0, {"ID", "Index0", "nextStatusExpected"}, "addIndex0", JoinKind.LeftOuter),
    expandIndex1Index0 = Table.ExpandTableColumn(mergeIndex1Index0, "addIndex0", {"dateTime", "Index1"}, {"addIndex0.dateTime", "addIndex0.Index1"}),
    filterRedundantRows = Table.SelectRows(expandIndex1Index0, each ([Status] = "Check In") or ([Status] = "Check Out" and [addIndex0.dateTime] = null and not List.Contains(List.Buffer(expandIndex1Index0[addIndex0.Index1]), [Index1]))),
    remOthCols = Table.SelectColumns(filterRedundantRows,{"Name", "ID", "Person Group", "Status", "Attendance Check Point", "dateTime", "addIndex0.dateTime"}),
    addTimeWorked = Table.AddColumn(remOthCols, "timeWorked", each [addIndex0.dateTime] - [dateTime]),
    repCheckOutEndTime = Table.ReplaceValue(addTimeWorked, each [addIndex0.dateTime], each if [addIndex0.dateTime] = null and [Status] = "Check Out" then [dateTime] else [addIndex0.dateTime],Replacer.ReplaceValue,{"addIndex0.dateTime"}),
    repCheckOutStartTime = Table.ReplaceValue(repCheckOutEndTime, each [dateTime], each if [Status] = "Check Out" then null else [dateTime],Replacer.ReplaceValue,{"dateTime"})
in
    repCheckOutStartTime

I've just added a couple of replacement steps right at the end to switch the start datetime into the end datetime field if the record is a check out with no check in. Hopefully makes it a bit clearer wat the issue is and what needs to be corrected.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

BA_Pete · ‎07-27-2022

Hi @Neil_White ,

Can you share your initial screenshot as copyable data please?

You can just copy the part of the table you screen-captured and paste it into 'Enter Data' in PQ, then copy all the code out of Advanced Editor and paste it into a code window ( </> button ) here.

You should just be able to merge the table on itself to get in-row Clock In/Out dates/times.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!