The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi all
Having a headache with something, just looking for some advice. I have a data model and there are two relevant tables to this problem. STUDENTS table lists all students at a school, and YEARS table has the details of which year groups they are in in each academic year period.
STUDENTS table has StudentID and a bunch of other irrelevant columns. It has data going back 20 years (I know, there's a data protection/retention issue there that I'm also looking to deal with separately but that's not going to be a quick fix)
YEARS table has StudentID, YearGroup and AcademicYear. I currently have a step in Power Query filtering that down so that it contains no data beyond 5 years old.
I want to apply that same filter to the STUDENTS table so it contains no data more than 5 years old, but there is no StartDate/EndDate type data in the raw feed for that table. The only way I can think of getting it into the student table is to do some sort of lookup on the StudentID against the YEARS table, and remove any that don't return a result.
I'm reluctant to merge the tables as the STUDENTS table is already quite wide, and duplicating rows there to accommodate the multiple year group instances over the years will become quite unwieldy for my report builders.
I've tried sorting the YEARS table to most recent at the top and doing a List.PositionOf formula between the tables in Power Query to get the most recent AcademicYear on record for that student but it's incredibly slow and I'm not sure it's fit for purpose for the structure of my data, or maybe I'm just doing it wrong.
If this was Excel I'd just do a IF(COUNTIF(YEARS, StudentID=StudentID, AcademicYear>2018)>0,"Don't Filter","Filter") type of formula - it seems like this shouldn't be that complicated in Power Query but everything I try slows everything down so so much...
Anyone got any hints, tips or workarounds for this kind of situation?
Solved! Go to Solution.
Just in case someone else stumbles across this I solved this by merging the tables into a new table, doing my filtering based on year, and then removing the years column again and deduping the table to get back down to one line per student in a new STUDENTS table that only contains students from the last 5 years.
I have no idea if this is best practice, but at least it pushes the workload onto the Microsoft servers instead of local machines and it achieves the result I wanted so...
Just in case someone else stumbles across this I solved this by merging the tables into a new table, doing my filtering based on year, and then removing the years column again and deduping the table to get back down to one line per student in a new STUDENTS table that only contains students from the last 5 years.
I have no idea if this is best practice, but at least it pushes the workload onto the Microsoft servers instead of local machines and it achieves the result I wanted so...
Can't see an option to upload a file so not sure how detailed I can be, but here goes:
Here's sample STUDENTS (mine has a LOT more columns in there hence the reluctance to merge tables):
StudentID/ | Name/ | DoB/ | Gender/ | PostCode/ | Language/ | Ethnicity |
1 | Adam | 6/1/2014 | M | 12345 | English | WBR |
2 | Ben | 18/2/2017 | M | 23456 | English | WBR |
3 | Charles | 9/3/2009 | M | 34567 | English | BBR |
4 | Daisy | 23/4/2012 | F | 45678 | English | WBR |
5 | Emily | 22/5/2019 | F | 34245 | Spanish | ABR |
6 | Frank | 18/6/2008 | M | 54634 | English | BAS |
7 | Georgia | 2/7/2004 | F | 34678 | English | BOT |
Here's a sample of the YEARS table:
StudentID/ | YearGroup/ | AcademicYear |
1 | Y2 | 2023 |
1 | Y1 | 2022 |
2 | Y3 | 2019 |
2 | Y2 | 2018 |
2 | Y1 | 2017 |
3 | Y4 | 2023 |
3 | Y3 | 2022 |
4 | Y4 | 2017 |
4 | Y3 | 2016 |
4 | Y2 | 2015 |
5 | Y6 | 2020 |
5 | Y5 | 2019 |
5 | Y4 | 2018 |
5 | Y3 | 2017 |
5 | Y2 | 2016 |
6 | Y5 | 2012 |
6 | Y4 | 2011 |
6 | Y3 | 2010 |
6 | Y2 | 2009 |
6 | Y1 | 2008 |
7 | Y3 | 2019 |
7 | Y2 | 2018 |
7 | Y1 | 2017 |
I want an extra column in the STUDENT table built in power query that will return "YES" if there is any academic year from 2018 onwards for that student and "NO" if there is not:
(I've highlighted the years that will trigger the "YES" red in the table above)
StudentID/ | DesiredResult |
1 | YES |
2 | YES |
3 | YES |
4 | NO |
5 | YES |
6 | NO |
7 | YES |
Edit - I am filtering out all the non-red highlighted rows in the YEARS table in one of the processing steps there, so it could just be something like "If StudentID exists in YEARS table" instead of a conditional on the AcademicYear column like I described above
Hi @crispybits77 ,
Rather than making us imagine what you want to achieve, please post a usable sample data (not an image) that covers your use case and your expected result using that sample data.
Proud to be a Super User!
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
80 | |
71 | |
51 | |
50 |
User | Count |
---|---|
128 | |
122 | |
77 | |
64 | |
60 |