The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
In this article, we will discuss how to deal with some common problems when creating reports based on employee attendance sheets. Next, let's take a closer look at the below scenarios.
Scenario 1: Clean and extract valid records from attendance sheets
Invalid or repeated employee punching behavior can sometimes lead to duplication or loss of data, which will affect report creation.
This requires us to eliminate useless records as much as possible and keep only the employee's valid punch-in records (earliest time and latest time).
Sample data:
Solution1:
In Dax, we can use FIRSTNONBLANK and LASTNONBLANK to return the first/last value in the column filtered by the current context, where the expression is not blank.
Calculated column:
First_record =
CALCULATE (
FIRSTNONBLANK ( Table1_D[DateTime], 1 ), // MIN(Table1_D[DateTime])
ALLEXCEPT ( Table1_D, Table1_D[EmployeeID], Table1_D[Date] )
)
Last_record =
CALCULATE (
LASTNONBLANK ( Table1_D[DateTime], 1 ), // MAX(Table1_D[DateTime])
ALLEXCEPT ( Table1_D, Table1_D[EmployeeID], Table1_D[Date] )
)
We can also create a summarize calculated table as below:
Table1_D2 =
SUMMARIZE (
Table1_D,
Table1_D[Date],
Table1_D[EmployeeID],
"First Record", FIRSTNONBLANK ( Table1_D[DateTime], 1 ),//MIN ( Table1_D[DateTime] )
"Last Record", LASTNONBLANK ( Table1_D[DateTime], 1 ) //MAX ( Table1_D[DateTime] )
)
Solution2:
In PQ, we can use button ‘Group By’ to achieve the same result.
The button implements the same function as the below M code:
= Table.Group(#"Inserted Date", {"EmployeeID", "Date"}, {{"First_record", each List.Min([DateTime]), type nullable datetime}, {"Last_record", each List.Max([DateTime]), type nullable datetime}})
Solution 3:
Considering that the sample data is sorted by datetime field (that is, the first/last record naturally corresponds to the earliest/latest record), we can also use List.First/ List.Last to filter the data in each group to get the first record and the last record.
M code:
= Table.Group(Custom1, {"EmployeeID","Date"}, {{ "First_record", each List.First([DateTime]),type datetime},{"Last_record", each List.Last([DateTime]),type datetime}
Scenario 2: Analyze consecutive absences for each employee
At present, we have completed the filtering and extraction of data. By adding another custom condition column, we can easily obtain the employee’s daily attendance status.
The following figure is the processed sample form, which records the daily attendance of each employee. Next, we'll explore how to calculate the consecutive times of absences for each employee.
In Dax:
Solution 1: we can use formula as below to count consecutive times of absences.
consecutive absences =
VAR _date =
CALCULATE (
MAX ( Table2_D[Date] ),
FILTER (
ALL ( Table2_D ),
Table2_D[Status] <> "absent"
&& Table2_D[Date] <= EARLIER ( Table2_D[Date] )
&& Table2_D[EmployeeID] = EARLIER ( Table2_D[EmployeeID] )
)
) // group and get the last non-absence date before current date
VAR _times =
CALCULATE (
COUNTROWS ( Table2_D ),
FILTER (
ALL ( Table2_D ),
Table2_D[Date] <= EARLIER ( Table2_D[Date] )
&& Table2_D[Date] > _date
&& Table2_D[EmployeeID] = EARLIER ( Table2_D[EmployeeID] )
)
) + 0 //Group and count all records between the current date and the last non-absent date
RETURN
_times
Result:
In Power Query, we can pre-group the data, and then discuss absences based on these grouped sub-tables.
Below are two methods based on two different groupings.
Solution 2.1:
1. Group the filtered data based on ‘EmployeeID’
2. Group sub-tables obtained by the first grouping bases on column ‘Status’
3. Consider using Table.RowCount to count the status in each sub-table. Then, filter each sub-table to only keep records with a status of ‘absent’.
4. Extract the values of 'Times' in all sub-tables and use List.Max to get the maximum value
Result:
Entire M code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZJLCoNAEETvMmvFsdr4WXoBLyAuIrgNIfH+qKAQTPXUohdNPbqnHzOOoQxZQATyaHk8mqPen+W7vNYwZTcACjAFVAp47E3cayj6/7BOhc05+jnzya3IO/I0KD1gen42gNnx8krkdzdIuYFwA+bGPY3JMSXHmBwXIHaM2XEHXHpcoFZAo1aQP2RJTdMG", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [EmployeeID = _t, Date = _t, Workday = _t, Status = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"EmployeeID", Int64.Type}, {"Date", type date}, {"Workday", Int64.Type}, {"Status", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Workday] = 1)),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"EmployeeID"}, {"Maximum consecutive absences", each
List.Max(Table.SelectRows(
Table.Group(_,"Status",{"Times",each Table.RowCount(_)},0),
each [Status]="absent")[Times])
})
in
#"Grouped Rows"
Solution 2.2:
1. Group the filtered data and count rows based on ‘Employee ID’ and ‘Status’.
Different from the grouping method in Solution 2.1, here I set the parameter ‘GroupKind’ of Table.Group to GroupKind.Local. (GroupKind.Global is used by default)
GroupKind.Global treats all rows with the same key value as a single group.
GroupKind.Local treats consecutive rows with the same key value as a single group.
Therefore, when a single key value appears multiple times, it is regarded as a single value under the global grouping, while it is regarded as multiple values under the local grouping.
2. Group the new table based on 'EmployeeID' and 'Status' and create a new custom column to get the maximum value in ‘Count’.
Result:
Entire M code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("hZJLCoNAEETvMmvFsdr4WXoBLyAuIrgNIfH+qKAQTPXUohdNPbqnHzOOoQxZQATyaHk8mqPen+W7vNYwZTcACjAFVAp47E3cayj6/7BOhc05+jnzya3IO/I0KD1gen42gNnx8krkdzdIuYFwA+bGPY3JMSXHmBwXIHaM2XEHXHpcoFZAo1aQP2RJTdMG", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [EmployeeID = _t, Date = _t, Workday = _t, Status = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"EmployeeID", Int64.Type}, {"Date", type date}, {"Workday", Int64.Type}, {"Status", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Workday] = 1)),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"EmployeeID", "Status"}, {{"Count", each Table.RowCount(_), Int64.Type}},GroupKind.Local),
#"Grouped Rows1" = Table.Group(#"Grouped Rows", {"EmployeeID", "Status"}, {{"Count", each List.Max([Count]), type number}})
in
#"Grouped Rows1"
For more information, please check the attachment. Hope you can get something out of this article.
Author: Eason Fang
Reviewer: Kerry Wang & Ula Huang
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.