Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hello all - this is a repost. I've cleaned up and clarified the question as best I can in hopes of getting some answers this time I thank you all in advance for any help you are able to provide as it's something of a whopper of a question - with multiple parts. I've provided a link to a simplified, anonymized, and reduced sample of the data in question in an excel format- I can't provide the full pbix due to concerns about the highly confidential nature of our database. Each page of the excel sheet represents a different table in the database.
https://docs.google.com/spreadsheets/d/1m9x73zpdcSO-_zFb5UELXZemzR6DOB5s/edit?usp=sharing&ouid=10431...
(For now, we're going to ignore the Client in Case (CiC) table, as I think that makes it too complicated. I may ask about that as a follow up question if I get some answers to the first one.)
To help clarify this data, here is a simplified version of the above dataset - focusing specifically on what the expected/desired outcomes are:
The most important of these tables for our purposes is the "SCORE" Table (which is essentially what I am recreating/simplifying in the image above), which has a ClientinSession(CiS) ID column, the "pre/post" column, and 6 'assessment' columns (in truth there are 20, but I simplified it for the sample data). All of the assessment columns can be broken up into three 'assessment categories': Circumstances, Goals, and Satisfaction (i've kept it at 2 for the simplified version). For the purposes of this question, we dont need to worry about the individual columns so much as the categories themselves.
So what do we actually need from this data?:
1. #/% of CiS codes with at least one row on the SCORE table, and at least one non-blank value in any of the assessment columns.
2. #/% of CiS codes with at least TWO rows on the SCORE table - at least one of those rows having a "pre" value in the pre/post column and at least one of those rows having a "post" value in the pre/post column. Rows are only counted for this calculation if at least *one* entry in any of the assessment columns is non-blank.
3. Same as 1 but broken up by Assessment Category. I have attached an image as an example - marking with blue each of the rows that would need to be counted when calculating the #/% of CiS entries for the "Circumstance" assessment category. So in this case we wouldn't consider any values entered into the "Goals" or "satisfaction" categories - we only count rows with at least one non-blank value in the "Circumstances" columns.
4. Same question as 2 (two rows, a pre and a post, one non-blank assessment column), but with the same stipulations as 3 (broken up by 'assessment category' columns). Once again I have attached an example to indicate which rows would need to be counted when calculating the #/% of CiS entries for the "Circumstance" assessment category. As we can see in this example, only rows that have a Pre and Post entry for the *same* CiS ID number are included in the count - and once again we must break the count up by category - one for Circumstances, one for Goals, and one for Satisfaction.
I think this is more than enough for now, though there may be some follow up questions.
I know this is a massive ask, and will almost certainly involve a good many steps/measures to complete. But all I really need is some guidance on how to construct the DAX formulaes/measurements for each of the 4 questions...
I'll try my best to clarify where needed; don't hesitate to ask! I thank you again, in advance, for any help you can provide.
Solved! Go to Solution.
Hey @Flawn
here you will find a pbix that contains a first approach: ClientsCasesSessions.pbix
Please be aware that this is not a complete solution, answering all your questions.
I think the first step should be to create a star schema. I created the following new tables
The model I created looks like this:
First, I pivoted the assessments in the SCORE table using Power Query. I consider the SCORE table a fact table. Pivoting the assessments (creating a single column) makes it simple to count clients with at least one assessment because assessments with no value will be removed automatically.
After pivoting the SCORE table the table will look like this:
Then I created some measures; one measure is more complex, “# of clients with pre and post”; this measure is more complex because you are looking for clients in the SCORE table that have pre and post records. I use SUMMARIZE to find these clients by counting the Pre/Post values, only clients where the count is 2 are valid clients:
COUNTROWS(
FILTER(
ADDCOLUMNS(
SUMMARIZE(
'SCORE'
, 'Clients'[Client]
// , 'Pre / Post'[Pre/Post]
)
, "pre and post", CALCULATE( DISTINCTCOUNT( 'SCORE'[Pre/Post] ) )
)
, [pre and post] = 2
)
)
Hopefully, this will help you to tackle your challenge.
Regards,
Tom
Hey @Flawn
here you will find a pbix that contains a first approach: ClientsCasesSessions.pbix
Please be aware that this is not a complete solution, answering all your questions.
I think the first step should be to create a star schema. I created the following new tables
The model I created looks like this:
First, I pivoted the assessments in the SCORE table using Power Query. I consider the SCORE table a fact table. Pivoting the assessments (creating a single column) makes it simple to count clients with at least one assessment because assessments with no value will be removed automatically.
After pivoting the SCORE table the table will look like this:
Then I created some measures; one measure is more complex, “# of clients with pre and post”; this measure is more complex because you are looking for clients in the SCORE table that have pre and post records. I use SUMMARIZE to find these clients by counting the Pre/Post values, only clients where the count is 2 are valid clients:
COUNTROWS(
FILTER(
ADDCOLUMNS(
SUMMARIZE(
'SCORE'
, 'Clients'[Client]
// , 'Pre / Post'[Pre/Post]
)
, "pre and post", CALCULATE( DISTINCTCOUNT( 'SCORE'[Pre/Post] ) )
)
, [pre and post] = 2
)
)
Hopefully, this will help you to tackle your challenge.
Regards,
Tom
Hey Tom,
This looks amazing - i'm going to try it nout now.
But just to clarify - yes, this is sample data and theoretically the complete data set should have all clients in the SCORE table have a corresponding row in the CiS table. You are saying that, as that is the case, I should create a new table that combines all the unique clients from the SCORE table with the CiS and CiC table? I'm not sure what function that would perform - would you be able to clarify? I'm looking to learn, not just copy paste!
Warm Regards,
Flawn
Hey @Flawn ,
the table Clients from my sample pbix is already extracting the clients from the tables CiS and CIC, just have a look at the M code using the Advanced editor inside Power Query:
let
tblClientsInCase = Table.SelectColumns(#"CIC", {"Client In Case"}),
#"tblClientsInCaseWithoutDuplicates" = Table.Distinct(tblClientsInCase),
#"tblClientsInCaseWithoutDuplicatesNonEmpty" = Table.SelectRows(tblClientsInCaseWithoutDuplicates,
each [Client In Case] <> null and [Client In Case] <> ""),
#"Renamed Columns1" = Table.RenameColumns(#"tblClientsInCaseWithoutDuplicatesNonEmpty",{{"Client In Case", "Client"}}),
tblClientsInSession = Table.SelectColumns(#"CiS", {"Client In Session"}),
tblClientsInSessionWithoutDuplicates= Table.Distinct(tblClientsInSession),
tblClientsInSessionWithoutDuplicatesNonEmpty = Table.SelectRows(tblClientsInSessionWithoutDuplicates,
each [Client In Session] <> null and [Client In Session] <> ""),
#"Renamed Columns2" = Table.RenameColumns(#"tblClientsInSessionWithoutDuplicatesNonEmpty",{{"Client In Session", "Client"}}),
combinedTables = Table.Distinct(Table.Combine( {#"Renamed Columns1", #"Renamed Columns2"} ) )
in
combinedTables
In the above code you can see two sections, the first sectrion extracts data from the CIC table and the second section from the CiS table. Just add a third section that extract the client data from the SCORE data.
Do not forget to add the the 3rd table to the Table.Combine statement.
Of course this is only necessary, if the source data can contain clients that are only available in one of the tables, CiS, CIC, and SCORE. But this can only be answered by the people who are creating/providing the source files.
I recommend spending some time learning about the importance of the semantic model, the articles are not explaining how you extract data from multiple tables, and combine the data to create a dimension table, the articles explain, why it's important:
Regards,
Tom
Thanks again Tom, you've been an incredible help.
So far this looks like it should work great. If i'm understanding correctly all I really have to add on top of this for my 3rd and 4th question is an additional filter for each assessment type?
Hey Flawn,
exactly. This is the reason why I pivoted the SCORE table in the first place, I derived the Assessment table from this pivoted table being able to have a table that can be used to filter the SCORE table.
Of course you have to check the measures carefully 😉
Regards,
Tom
User | Count |
---|---|
96 | |
85 | |
77 | |
66 | |
63 |
User | Count |
---|---|
110 | |
95 | |
95 | |
64 | |
59 |