Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am working on a candidate tracking report and am struggling with the logic of how to report on the number of candidates in each status (as of a selected date)
The data looks like this (simplified)
Candidate | Date | Status |
450 | 1/15/23 | A |
450 | 1/15/23 | B |
450 | 1/31/23 | C |
500 | 12/31/22 | A |
If the selected date on the report is 12/31/22 - then 1 Candidate is in Status A
on 1/15/23 1 is in Status A and 1 is in Status B
(Max status is used so for the status change on 1/15 "B" should be used
Thanks in advance.
Solved! Go to Solution.
Hi LindaJonesCT,
This scenario is a typical application of a snapshot fact table. That means, you would transform your status event table into a table that contains for all dates that you want to analyze, all candidates and all their states that were valid at that date. You could create these snapshots for every date, once a week, once a month, at every date when a new status was entered, whatever you prefer. Then you could draw a visual by status and just filter by date and show distinct count of candidates as values.
But you can also use your transactional table structure with a measure.
To do so, first, you need to make sure that selecting a date does not also implicitly select the states, otherwise you are not able to see values for states that have not been entered by any candidate at the selected date. You can achieve this by creating a date dimension table, or a status dimension table, or both. I'd start with a date table, because this also allows you to analyze dates at which no new states were assigned. The date table is just a disconnected Table containing all the dates that you want to analyze.
The simplest way to create the date table is to create a calculated table using the formular:
Dates = CALENDARAUTO()
Then change the datatype of the Date column in the new table from date/time to date.
Alternatively, you can also create a custom date table in PowerQuery.
Finally, doublecheck in data model view that there is no relationship between the new date table und your existing table.
Next you need to create a measure:
Count of Candidate with Date Table =
// Get selected date
VAR _Date = MAX ( 'Date'[Date] )
// Get selected states
VAR _States = VALUES ( 'Candidate Status Changes'[Status] )
// Per candidate, get the latest date of new stats change
VAR _CandidatesAndDates =
CALCULATETABLE (
ADDCOLUMNS (
SUMMARIZECOLUMNS (
'Candidate Status Changes'[Candidate]
),
"@LastDate",
CALCULATE (
MAX ( 'Candidate Status Changes'[Date] )
)
),
'Candidate Status Changes'[Date] <= _Date,
ALL ( 'Candidate Status Changes'[Status] )
)
// Per candidate, get max. status set at last date
VAR _CandidatesAndStatus =
ADDCOLUMNS (
_CandidatesAndDates,
"@MaxStatus",
VAR _Date = [@LastDate]
RETURN
CALCULATE (
MAX ( 'Candidate Status Changes'[Status] ),
'Candidate Status Changes'[Date] = _Date,
ALL ( 'Candidate Status Changes'[Status] )
)
)
RETURN
COUNTROWS ( FILTER ( _CandidatesAndStatus, [@MaxStatus] IN _States ) )
Now you can create a visual using status and this measure and add a date filter (from the date table) to your page and select your date of interest.
In this file you can explore both approaches, the one with the snapshot fact table and the one with the measure for your table as described above.
Hi LindaJonesCT,
This scenario is a typical application of a snapshot fact table. That means, you would transform your status event table into a table that contains for all dates that you want to analyze, all candidates and all their states that were valid at that date. You could create these snapshots for every date, once a week, once a month, at every date when a new status was entered, whatever you prefer. Then you could draw a visual by status and just filter by date and show distinct count of candidates as values.
But you can also use your transactional table structure with a measure.
To do so, first, you need to make sure that selecting a date does not also implicitly select the states, otherwise you are not able to see values for states that have not been entered by any candidate at the selected date. You can achieve this by creating a date dimension table, or a status dimension table, or both. I'd start with a date table, because this also allows you to analyze dates at which no new states were assigned. The date table is just a disconnected Table containing all the dates that you want to analyze.
The simplest way to create the date table is to create a calculated table using the formular:
Dates = CALENDARAUTO()
Then change the datatype of the Date column in the new table from date/time to date.
Alternatively, you can also create a custom date table in PowerQuery.
Finally, doublecheck in data model view that there is no relationship between the new date table und your existing table.
Next you need to create a measure:
Count of Candidate with Date Table =
// Get selected date
VAR _Date = MAX ( 'Date'[Date] )
// Get selected states
VAR _States = VALUES ( 'Candidate Status Changes'[Status] )
// Per candidate, get the latest date of new stats change
VAR _CandidatesAndDates =
CALCULATETABLE (
ADDCOLUMNS (
SUMMARIZECOLUMNS (
'Candidate Status Changes'[Candidate]
),
"@LastDate",
CALCULATE (
MAX ( 'Candidate Status Changes'[Date] )
)
),
'Candidate Status Changes'[Date] <= _Date,
ALL ( 'Candidate Status Changes'[Status] )
)
// Per candidate, get max. status set at last date
VAR _CandidatesAndStatus =
ADDCOLUMNS (
_CandidatesAndDates,
"@MaxStatus",
VAR _Date = [@LastDate]
RETURN
CALCULATE (
MAX ( 'Candidate Status Changes'[Status] ),
'Candidate Status Changes'[Date] = _Date,
ALL ( 'Candidate Status Changes'[Status] )
)
)
RETURN
COUNTROWS ( FILTER ( _CandidatesAndStatus, [@MaxStatus] IN _States ) )
Now you can create a visual using status and this measure and add a date filter (from the date table) to your page and select your date of interest.
In this file you can explore both approaches, the one with the snapshot fact table and the one with the measure for your table as described above.
Thanks Martin - I had no idea you could do that kind of work in Power Query. (I am used to doing the heavy lifting in SQL, but that's not an option for this task) The data I am working on is already several million rows - so I'm going to try the measure version first. I will let you know how I make out later today.
Yes, the size of the table can grow a lot with the snapshot structure. Therefore the DAX code is much simpler and faster. You have to expect number of candidates x number of dates to analyze rows. There is no upper row limit in Power BI, so you can always give it a try. It all depends on how well the data can be compressed.
The first step to reduce the size would usually be to reduce to snapshot frequency: Instead of once a day, maybe once a week or once a month is sufficient. Next consideration could be to limit the history: Maybe the past two or three years are sufficient. It all depends on your requirements.
I'd really be curious to hear from you whether the snapshot approach worked with your amount of data - loading duration wise as well as file size wise.
As time permits I will try the snapshot approach and post an update. Thank you so much for your help.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
46 | |
28 | |
22 | |
12 | |
8 |
User | Count |
---|---|
75 | |
52 | |
46 | |
15 | |
12 |