Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
crispybits77
Helper I
Helper I

Cross-table data manipulation in Power Query

Hi all

 

Having a headache with something, just looking for some advice. I have a data model and there are two relevant tables to this problem. STUDENTS table lists all students at a school, and YEARS table has the details of which year groups they are in in each academic year period.

 

STUDENTS table has StudentID and a bunch of other irrelevant columns. It has data going back 20 years (I know, there's a data protection/retention issue there that I'm also looking to deal with separately but that's not going to be a quick fix)

 

YEARS table has StudentID, YearGroup and AcademicYear. I currently have a step in Power Query filtering that down so that it contains no data beyond 5 years old.

 

I want to apply that same filter to the STUDENTS table so it contains no data more than 5 years old, but there is no StartDate/EndDate type data in the raw feed for that table. The only way I can think of getting it into the student table is to do some sort of lookup on the StudentID against the YEARS table, and remove any that don't return a result.

 

I'm reluctant to merge the tables as the STUDENTS table is already quite wide, and duplicating rows there to accommodate the multiple year group instances over the years will become quite unwieldy for my report builders.

 

I've tried sorting the YEARS table to most recent at the top and doing a List.PositionOf formula between the tables in Power Query  to get the most recent AcademicYear on record for that student but it's incredibly slow and I'm not sure it's fit for purpose for the structure of my data, or maybe I'm just doing it wrong.

 

If this was Excel I'd just do a IF(COUNTIF(YEARS, StudentID=StudentID, AcademicYear>2018)>0,"Don't Filter","Filter") type of formula - it seems like this shouldn't be that complicated in Power Query but everything I try slows everything down so so much...

 

Anyone got any hints, tips or workarounds for this kind of situation?

1 ACCEPTED SOLUTION
crispybits77
Helper I
Helper I

Just in case someone else stumbles across this I solved this by merging the tables into a new table, doing my filtering based on year, and then removing the years column again and deduping the table to get back down to one line per student in a new STUDENTS table that only contains students from the last 5 years.

 

I have no idea if this is best practice, but at least it pushes the workload onto the Microsoft servers instead of local machines and it achieves the result I wanted so...

View solution in original post

3 REPLIES 3
crispybits77
Helper I
Helper I

Just in case someone else stumbles across this I solved this by merging the tables into a new table, doing my filtering based on year, and then removing the years column again and deduping the table to get back down to one line per student in a new STUDENTS table that only contains students from the last 5 years.

 

I have no idea if this is best practice, but at least it pushes the workload onto the Microsoft servers instead of local machines and it achieves the result I wanted so...

crispybits77
Helper I
Helper I

Can't see an option to upload a file so not sure how detailed I can be, but here goes:

Here's sample STUDENTS (mine has a LOT more columns in there hence the reluctance to merge tables):

 

StudentID/Name/DoB/Gender/PostCode/Language/Ethnicity
1Adam6/1/2014M12345EnglishWBR
2Ben18/2/2017M23456EnglishWBR
3Charles9/3/2009M34567EnglishBBR
4Daisy23/4/2012F45678EnglishWBR
5Emily22/5/2019F34245SpanishABR
6Frank18/6/2008M54634EnglishBAS
7Georgia2/7/2004F34678EnglishBOT

 

Here's a sample of the YEARS table:

 

StudentID/YearGroup/AcademicYear
1Y22023
1Y12022
2Y32019
2Y22018
2Y12017
3Y42023
3Y32022
4Y42017
4Y32016
4Y22015
5Y62020
5Y52019
5Y42018
5Y32017
5Y22016
6Y52012
6Y42011
6Y32010
6Y22009
6Y12008
7Y32019
7Y22018
7Y12017

 

I want an extra column in the STUDENT table built in power query that will return "YES" if there is any academic year from 2018 onwards for that student and "NO" if there is not:

 

(I've highlighted the years that will trigger the "YES" red in the table above)

 

StudentID/DesiredResult
1YES
2YES
3YES
4NO
5YES
6NO
7YES

 

Edit - I am filtering out all the non-red highlighted rows in the YEARS table in one of the processing steps there, so it could just be something like "If StudentID exists in YEARS table" instead of a conditional on the AcademicYear column like I described above

danextian
Super User
Super User

Hi @crispybits77 ,

 

Rather than making us imagine what you want to achieve, please post  a  usable sample data (not an image) that covers your use case and your expected result using that sample data.










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Helpful resources

Announcements
Sept PBI Carousel

Power BI Monthly Update - September 2024

Check out the September 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.