Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hi guys,
Having bit of a challenge here. I have a dataset with employees and dates of their illness, which looks like:
Every employee should get a separate record with Start and End date of the illness, but only if there are no missing dates in the sequence. The end result should look like this: https://imgur.com/CS0OVmm
My first thought was giving each record a unique identifier, but only of there are no missing dates in the sequences. Then i can group based on employee, unique identifier and get the Min and Max date for the start and end dates but i can't seem to figure it out.
You guys have any idea?
Thanks & let me know.
Solved! Go to Solution.
Create a column on which you can group:
let
Source = Excel.CurrentWorkbook(){[Name="sickDays"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"PersonnelNumber", Int64.Type}, {"WageType", Int64.Type},
{"Number", Int64.Type}, {"StartDate", type date}}),
//add offset Date column and Index column
offsetDate = Table.FromColumns(
Table.ToColumns(#"Changed Type") & {List.RemoveFirstN(#"Changed Type"[StartDate],1) & {null}},
type table[PersonnelNumber=Int64.Type, WageType=Int64.Type, Number=Int64.Type, StartDate=date,ShiftedDate=date]),
#"Added Index" = Table.AddIndexColumn(offsetDate, "Index", 0, 1, Int64.Type),
//Add custom column to bring over Index number if dates are not seuential
// Then fill up to create a column on which to group the dates
#"Added Custom" = Table.AddColumn(#"Added Index", "dateGroups", each
if Date.AddDays([StartDate],1) = [ShiftedDate] then null else [Index]),
#"Filled Up" = Table.FillUp(#"Added Custom",{"dateGroups"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Up",{"Index", "ShiftedDate"}),
//Group and then the first entry in StartDate will be the Start Date; the last entry will be the End Date
#"Grouped Rows" = Table.Group(#"Removed Columns", {"PersonnelNumber","dateGroups"}, {
{"Start", each List.First([StartDate]), type date},
{"End", each List.Last([StartDate]), type date}
}),
//cleanup
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"dateGroups"})
in
#"Removed Columns1"
None of that stuff is necessary. Just sort your table how you want it, then group by the date column, and then just add the optional parameter to the Group function, at the end: GroupKind.Local. This will group only contiguous values.
--Nate
@Anonymous Did you try it? I didn't think it would work and I can't get it to work here. Please post your code if you did.
MS documentation: A local group is formed from a consecutive sequence of rows from an input table with the same key value.
His date groups have consecutive key values.
In this case, isn't the key value the PersonnelNumber, but I do concede that the rows might be consecutive, but not the dates.
@Anonymous I think you are misunderstanding the problem. Your method would produce the earliest and latest dates for an employee, but that is not what the OP wants.
No.
Haven't tried on this particular dataset, but if you sort by employee ascending, then Date ascending, then group by employee and Date, and use an "All Rows" aggregation. After you've done that, go back and add , GroupKind.Local right before the final end parentheses. This makes the groups of employees until interrupted by a changing value. Then you can grab the Min and Max dates from the nested tables by adding a column with the formulas
each List.Max(_[TableColumn][Date])
and
each List.Min(_[TableColumn][Date])
Know what I mean?
--Nate
Create a column on which you can group:
let
Source = Excel.CurrentWorkbook(){[Name="sickDays"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"PersonnelNumber", Int64.Type}, {"WageType", Int64.Type},
{"Number", Int64.Type}, {"StartDate", type date}}),
//add offset Date column and Index column
offsetDate = Table.FromColumns(
Table.ToColumns(#"Changed Type") & {List.RemoveFirstN(#"Changed Type"[StartDate],1) & {null}},
type table[PersonnelNumber=Int64.Type, WageType=Int64.Type, Number=Int64.Type, StartDate=date,ShiftedDate=date]),
#"Added Index" = Table.AddIndexColumn(offsetDate, "Index", 0, 1, Int64.Type),
//Add custom column to bring over Index number if dates are not seuential
// Then fill up to create a column on which to group the dates
#"Added Custom" = Table.AddColumn(#"Added Index", "dateGroups", each
if Date.AddDays([StartDate],1) = [ShiftedDate] then null else [Index]),
#"Filled Up" = Table.FillUp(#"Added Custom",{"dateGroups"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Up",{"Index", "ShiftedDate"}),
//Group and then the first entry in StartDate will be the Start Date; the last entry will be the End Date
#"Grouped Rows" = Table.Group(#"Removed Columns", {"PersonnelNumber","dateGroups"}, {
{"Start", each List.First([StartDate]), type date},
{"End", each List.Last([StartDate]), type date}
}),
//cleanup
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"dateGroups"})
in
#"Removed Columns1"
Put following in a custom column and then expand to new Rows
= List.Dates([Start],Duration.Days([End]-[Start])+1,#duration(1,0,0,0))
To see the working-
Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately)
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyNjFV0lEy1jfUNzICM4xBjFgdFDkDmKShCUzWzNzCDChkpG8EkYMyUKQMTWFyhuZokiYQe0AMiOmxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [PersonnelNumber = _t, Start = _t, End = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"PersonnelNumber", Int64.Type}, {"Start", type date}, {"End", type date}}), #"Added Custom" = Table.AddColumn(#"Changed Type", "StartDate", each List.Dates([Start],Duration.Days([End]-[Start])+1,#duration(1,0,0,0))), #"Expanded StartDate" = Table.ExpandListColumn(#"Added Custom", "StartDate") in #"Expanded StartDate"
Hey!
Thanks for the reply. Could it be the case that i have confused you? Your end result is my starting point. I should go from single dates to Start and End dates. The end result screenshot in my initial post is what the final outcome should look like