Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Anonymous
Not applicable

merging data sets based on dates

I have two data sources that I need to merge.  The first data source contains Employee Hire information and the second contains Termination information. 

 

The Hired file contains EmployeeID, Location, StartDate (plus other irrelevant data)

The Term file contains EmployeeID, Location, TermDate ( and other irrelevant data)

 

The problem that I have is that an employee can be hired multiple times at the same location (summer help for example)....or they can be hired at multiple locations.  So they would have multiple hire records.  Likewise, they would have multiple term records.  I need to be able to merge these records into one dataset.

 

I can match on EmployeeID and Location, but I need to match the hire record to the term record that has a term date that is after the hiredate....but before the next Hire date.

 

For example...

Employee #1 has 3 records with the following hire dates. 

20190101

20190501

20190701

 

and two records with the following term dates

20190415

20190615

 

I need to merge the files so I have 3 records as such....

Employee#1      Hire 20190101    Term 20190415

Employee#1      Hire 20190501    Term 20190615

Employee#1     Hire  20190701    Term  Null

 

Any help or ideas are appreciated

1 ACCEPTED SOLUTION
Anonymous
Not applicable

@Anonymous  - Since an employee presumably can't be hired until their previous employment has terminated, you could rank the Hires and Terminations. 

@ImkeF has a great solution here. Basically, you will need to do the following in Power Query:

 

1. For each table (Hires and Terms) Go to Advanced Editor and add a step like this:

Partition = Table.Group(<Previous Step Name>, {"EmployeeId"}, {{"Partition", each Table.AddIndexColumn(Table.Sort(_,{{"StartDate", Order.Ascending}}), "Index",1,1), type table}}) 

(If it can't be guaranteed that a Term will follow each Hire, then you may also need to add Location to the Grouping.)

2. Expand the table to retrieve the columns you want (EmployeeId, StartDate, Location, and the new Index.

3. Now you can Merge the Terms with the Hires table (Left outer Join on the EmployeeId and Index columns.) This will preserve every hire row and add relevant term rows.

 

I hope this helps. If it does, please Mark as a solution.
I also appreciate Kudos.
Nathan Peterson

View solution in original post

3 REPLIES 3
Anonymous
Not applicable

@Anonymous  - Since an employee presumably can't be hired until their previous employment has terminated, you could rank the Hires and Terminations. 

@ImkeF has a great solution here. Basically, you will need to do the following in Power Query:

 

1. For each table (Hires and Terms) Go to Advanced Editor and add a step like this:

Partition = Table.Group(<Previous Step Name>, {"EmployeeId"}, {{"Partition", each Table.AddIndexColumn(Table.Sort(_,{{"StartDate", Order.Ascending}}), "Index",1,1), type table}}) 

(If it can't be guaranteed that a Term will follow each Hire, then you may also need to add Location to the Grouping.)

2. Expand the table to retrieve the columns you want (EmployeeId, StartDate, Location, and the new Index.

3. Now you can Merge the Terms with the Hires table (Left outer Join on the EmployeeId and Index columns.) This will preserve every hire row and add relevant term rows.

 

I hope this helps. If it does, please Mark as a solution.
I also appreciate Kudos.
Nathan Peterson
Anonymous
Not applicable

Thanks for the help.... this worked well.

Anonymous
Not applicable

Thanks, I will give this a try and let you know.

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors