The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hi,
I'm a college student currently working on a non-school related project using Excel, Power BI, and Power Query. I've got a dataset I'm trying to make sense of and could really use some advice on how to best analyze it. I won't share many specific details, but here's a brief overview of my data structure:
HTML Fake Sample Below:
Employee ID | Regular/Temporary | Full Time/Part Time | Location | Sex | Last Start | Manager (only mark yes if true) | Ethnicity | Fiscal Year | Seperation Date |
000069 | Regular | p | santa land | f | 1/2/1986 | Black | 2015 | ||
000069 | Regular | p | santa land | f | 1/2/1986 | Black | 2016 | ||
000069 | Regular | p | santa land | f | 1/2/1986 | Black | 2017 | 9/21/2017 | |
008677 | Regular | f | bowling green | f | 6/1/2015 | yes | Asian | 2015 | |
008677 | Regular | f | bowling green | f | 6/1/2015 | yes | Asian | 2016 | 6/16/2017 |
Overall Goal: I want to explore employment trends, job roles, retention, and exits, focusing on diversity over time.
How I could do this:
Mock of what I think I want:
*note (my real data spans from 2015-2023)
Employee ID | Regular/Temporary | Full Time/Part Time | Location | Sex | Last Start | Manager (only mark yes if true) | Ethnicity | Seperation Date | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 |
000069 | Regular | p | santa land | f | 1/2/1986 | Black | 9/21/2017 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
008677 | Regular | f | bowling green | f | 6/1/2015 | yes | Asian | 6/16/2017 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
I want, whatever the final product is, to be able to be used, to make pivot tables and graphs, maybe even an HR dashboard. Basically, I know I cant do that with the current way my data is structured.
My questions to the community are:
Any insights, suggestions, or examples from similar projects would be incredibly helpful. Thank you in advance for your time and assistance!
Thank you,
A very stressed out amost college graduate trying to complete internship project
Hi I've tried to create something with sample data, but I'm not sure if I understand what do you exactly need.
(I've added 2 more rows to your sample data).
v1
You can check it by creating blank query and replacing whole code with this one:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rY/BDsIgDIZfhXBewiCxwFETX8DrsgMqkkXClrHF+PYWh4lznnQNpT8FvvSvKlpigKYFPVg3etOj6jCjCYMh3oQzHi6YnAnGtQKUuHbenK5YRck3WAiti59QZM6CFVkSi2YCHyQ9MRVIOWMmxLG9+SY44nprQ+4Be/5L3u424r6NjQlLw6sAYbqHt0l1igV4sHHIUrFs7MXbj33b2S8z/ov6mK5+AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Employee ID" = _t, #"Regular/Temporary" = _t, #"Full Time/Part Time" = _t, Location = _t, Sex = _t, #"Last Start" = _t, #"Manager#(lf)#(lf)(only mark yes if true)" = _t, Ethnicity = _t, #"Fiscal Year" = _t, #"Seperation Date" = _t]),
CleanTrimColumnNames = Table.TransformColumnNames(Source, each Text.Trim(Text.Clean(_))),
ChangedTypeUS = Table.TransformColumnTypes(CleanTrimColumnNames,{{"Last Start", type date}, {"Fiscal Year", Int64.Type}, {"Seperation Date", type date}}, "en-US"),
Ad_FiscalYearsAsColumns = List.Accumulate(
List.Sort(List.Distinct(ChangedTypeUS[Fiscal Year])),
ChangedTypeUS,
(s,c)=> Table.AddColumn(s, Text.From(c), each if Date.Year([Last Start]) <= Date.Year(#date(Number.From(c), 1, 1)) then 1 else 0, Int64.Type)
),
#"Reordered Columns" = Table.ReorderColumns(Ad_FiscalYearsAsColumns,{"Employee ID", "Last Start", "2015", "2016", "2017", "Regular/Temporary", "Full Time/Part Time", "Location", "Sex", "Manager(only mark yes if true)", "Ethnicity", "Fiscal Year", "Seperation Date"})
in
#"Reordered Columns"
v2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rY/BDsIgDIZfhXBewiCxwFETX8DrsgMqkkXClrHF+PYWh4lznnQNpT8FvvSvKlpigKYFPVg3etOj6jCjCYMh3oQzHi6YnAnGtQKUuHbenK5YRck3WAiti59QZM6CFVkSi2YCHyQ9MRVIOWMmxLG9+SY44nprQ+4Be/5L3u424r6NjQlLw6sAYbqHt0l1igV4sHHIUrFs7MXbj33b2S8z/ov6mK5+AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Employee ID" = _t, #"Regular/Temporary" = _t, #"Full Time/Part Time" = _t, Location = _t, Sex = _t, #"Last Start" = _t, #"Manager#(lf)#(lf)(only mark yes if true)" = _t, Ethnicity = _t, #"Fiscal Year" = _t, #"Seperation Date" = _t]),
CleanTrimColumnNames = Table.TransformColumnNames(Source, each Text.Trim(Text.Clean(_))),
ChangedTypeUS = Table.TransformColumnTypes(CleanTrimColumnNames,{{"Last Start", type date}, {"Fiscal Year", Int64.Type}, {"Seperation Date", type date}}, "en-US"),
FiscalYears = List.Buffer(List.Transform(List.Sort(List.Distinct(ChangedTypeUS[Fiscal Year])), Text.From)),
StepBack = ChangedTypeUS,
Ad_FiscalYearsAsColumns =
List.Accumulate(
FiscalYears,
StepBack,
(s,c)=> Table.AddColumn(s, c, each if Date.Year([Last Start]) <= Date.Year(#date(Number.From(c), 1, 1)) then 1 else 0, Int64.Type)
),
#"Grouped Rows" = Table.Group(Ad_FiscalYearsAsColumns, {"Employee ID"}, {{"Details", each _}}),
Result =
List.Accumulate(
FiscalYears,
#"Grouped Rows",
(s,c)=> Table.AddColumn(s, c, each Record.Field(Table.Max(Table.SelectColumns([Details], c), c), c), Int64.Type)
)
in
Result
hi @dufoq3 ,
Thank you so much for reading and responding to my post!!!! You are defintly on the right track!!!!! I should have includeded an example of the table I was envisioning as the end product. I essentially, want to condense all of the repeat Employee ID's which is why i was thinking of using the Group By function. I dont necassarily want to treat the repeating ID's as duplicates because they arent, they are unique records for a different fiscal year.
Mock of what I think I want:
*note (my real data spans from 2015-2023)
Employee ID | Regular/Temporary | Full Time/Part Time | Location | Sex | Last Start | Manager (only mark yes if true) | Ethnicity | Seperation Date | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 |
000069 | Regular | p | santa land | f | 1/2/1986 | Black | 9/21/2017 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
008677 | Regular | f | bowling green | f | 6/1/2015 | yes | Asian | 6/16/2017 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
I want, whatever the final product is, to be able to be used, to make pivot tables and graphs, maybe even an HR dashboard. Basically, I know I cant do that with the current way my data is structured.
Hi,
I would love if you could edit the code. That would be very helpful 🙂 The fiscal year runs from July 1 to June 30. I'm not quite sure what the seperation date should be greater than. But I think, it would be greater than July 1, 2015. If there are any other conditions that would be helpful for me to share, let me know. Thank you, thank you!
Try this one. Edit 2nd step YourSource = Source and refer your data.
Logic: if employee worked at least 1 day in fiscal year then then 1 else 0.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rY/BDsIgDIZfhXBewjZdB0dNfAGvCwdUXBYJW2CL8e0tQozLPOkaSn+gfPnbNDTHAEEzetTtZJRDNWB6ZUdFjLIXPFwxC1ayQnBAiWtv1PmGtcyLCguhMvsJReYsWJFVYxGsxIagI5NDXc+YAXHq76azLWmd1jbdAXv9C7M9tMd95ztllwOvAoT4Dh9ORYgFeNR+THIbefzNO0yuH/QXj/+igruKbVKPlE8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Employee ID" = _t, #"Regular/Temporary" = _t, #"Full Time/Part Time" = _t, Location = _t, Sex = _t, #"Last Start" = _t, #"Manager#(lf)#(lf)(only mark yes if true)" = _t, Ethnicity = _t, #"Fiscal Year" = _t, #"Seperation Date" = _t]),
YourSource = Source,
CleanTrimColumnNames = Table.TransformColumnNames(YourSource, each Text.Trim(Text.Clean(_))),
ChangedTypeUS = Table.TransformColumnTypes(CleanTrimColumnNames,{{"Last Start", type date}, {"Fiscal Year", Int64.Type}, {"Seperation Date", type date}}, "en-US"),
FiscalYears = List.Buffer(List.Transform(List.Sort(List.Distinct(ChangedTypeUS[Fiscal Year])), Text.From)),
StepBack = ChangedTypeUS,
Ad_FiscalYearsAsColumns =
List.Accumulate(
FiscalYears,
StepBack,
(s,c)=> Table.AddColumn(s, c, each
if Date.Year([Last Start]) < Number.From(c) and ( [Seperation Date] = null or Date.Year([Seperation Date]) > Number.From(c) ) then 1 else
if [Last Start] >= #date(Number.From(c), 7, 1) or ( [Seperation Date] = null or [Seperation Date] >= #date(Number.From(c), 7, 1) ) then 1 else 0 )
),
GroupedRows = Table.Group(Ad_FiscalYearsAsColumns, {"Employee ID"}, {{"Details", each _, type table}}),
Ad_FiscalYearsAsColumns2 =
List.Accumulate(
FiscalYears,
GroupedRows,
(s,c)=> Table.AddColumn(s, c, each Record.Field(Table.Last(Table.SelectColumns([Details], c)), c), Int64.Type)
),
DetailsLastRow = Table.TransformColumns(Ad_FiscalYearsAsColumns2, {{"Details", Table.Last}}),
ExpandedDetails = Table.ExpandRecordColumn(DetailsLastRow, "Details", {"Regular/Temporary", "Full Time/Part Time", "Location", "Sex", "Last Start", "Manager(only mark yes if true)", "Ethnicity", "Seperation Date"}, {"Regular/Temporary", "Full Time/Part Time", "Location", "Sex", "Last Start", "Manager(only mark yes if true)", "Ethnicity", "Seperation Date"})
in
ExpandedDetails