Solved: Calculating %/# of ID values that have a certain n...

Flawn · ‎05-05-2024

Hello all - this is a repost. I've cleaned up and clarified the question as best I can in hopes of getting some answers this time I thank you all in advance for any help you are able to provide as it's something of a whopper of a question - with multiple parts. I've provided a link to a simplified, anonymized, and reduced sample of the data in question in an excel format- I can't provide the full pbix due to concerns about the highly confidential nature of our database. Each page of the excel sheet represents a different table in the database.

https://docs.google.com/spreadsheets/d/1m9x73zpdcSO-_zFb5UELXZemzR6DOB5s/edit?usp=sharing&ouid=10431...

(For now, we're going to ignore the Client in Case (CiC) table, as I think that makes it too complicated. I may ask about that as a follow up question if I get some answers to the first one.)

To help clarify this data, here is a simplified version of the above dataset - focusing specifically on what the expected/desired outcomes are:

The most important of these tables for our purposes is the "SCORE" Table (which is essentially what I am recreating/simplifying in the image above), which has a ClientinSession(CiS) ID column, the "pre/post" column, and 6 'assessment' columns (in truth there are 20, but I simplified it for the sample data). All of the assessment columns can be broken up into three 'assessment categories': Circumstances, Goals, and Satisfaction (i've kept it at 2 for the simplified version). For the purposes of this question, we dont need to worry about the individual columns so much as the categories themselves.

So what do we actually need from this data?:

1. #/% of CiS codes with at least one row on the SCORE table, and at least one non-blank value in any of the assessment columns.
2. #/% of CiS codes with at least TWO rows on the SCORE table - at least one of those rows having a "pre" value in the pre/post column and at least one of those rows having a "post" value in the pre/post column. Rows are only counted for this calculation if at least *one* entry in any of the assessment columns is non-blank.
3. Same as 1 but broken up by Assessment Category. I have attached an image as an example - marking with blue each of the rows that would need to be counted when calculating the #/% of CiS entries for the "Circumstance" assessment category. So in this case we wouldn't consider any values entered into the "Goals" or "satisfaction" categories - we only count rows with at least one non-blank value in the "Circumstances" columns.

4. Same question as 2 (two rows, a pre and a post, one non-blank assessment column), but with the same stipulations as 3 (broken up by 'assessment category' columns). Once again I have attached an example to indicate which rows would need to be counted when calculating the #/% of CiS entries for the "Circumstance" assessment category. As we can see in this example, only rows that have a Pre and Post entry for the *same* CiS ID number are included in the count - and once again we must break the count up by category - one for Circumstances, one for Goals, and one for Satisfaction.

I think this is more than enough for now, though there may be some follow up questions.

I know this is a massive ask, and will almost certainly involve a good many steps/measures to complete. But all I really need is some guidance on how to construct the DAX formulaes/measurements for each of the 4 questions...

I'll try my best to clarify where needed; don't hesitate to ask! I thank you again, in advance, for any help you can provide.

TomMartens · ‎05-06-2024

Hey @Flawn

here you will find a pbix that contains a first approach: ClientsCasesSessions.pbix

Please be aware that this is not a complete solution, answering all your questions.

I think the first step should be to create a star schema. I created the following new tables

Clients this table is derived from all the clients in the tables CIC and CiS, there are no duplicates and empty vlaues are removed. I realized that there are clients in the SCORE table that are not present in the tables CIC and CiS. Maybe this because of creating the sample data, if not adapt the Power Query by adding a 3rd section that derives the unique clients from the SCORE table and combine all three tables to get the Clients dimension table. I created a calculated column in the SCORE table that checks if the client exists in the Client table.
Assessment this table is derived from the pivoted SCORE table, I created a 2nd column that only contains the assessment categories, this will help to properly count the numbers for your 3rd and 4th question.
Pre / Post this table is derived from the SCORE table

The model I created looks like this:

First, I pivoted the assessments in the SCORE table using Power Query. I consider the SCORE table a fact table. Pivoting the assessments (creating a single column) makes it simple to count clients with at least one assessment because assessments with no value will be removed automatically.

After pivoting the SCORE table the table will look like this:

Then I created some measures; one measure is more complex, “# of clients with pre and post”; this measure is more complex because you are looking for clients in the SCORE table that have pre and post records. I use SUMMARIZE to find these clients by counting the Pre/Post values, only clients where the count is 2 are valid clients:

COUNTROWS(
    FILTER(
        ADDCOLUMNS(
            SUMMARIZE(
                'SCORE'
                , 'Clients'[Client]
                // , 'Pre / Post'[Pre/Post]
            )
            , "pre and post", CALCULATE( DISTINCTCOUNT( 'SCORE'[Pre/Post] ) )
        )
        , [pre and post] = 2
    )
)

Hopefully, this will help you to tackle your challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

TomMartens · ‎05-06-2024

Hey @Flawn

here you will find a pbix that contains a first approach: ClientsCasesSessions.pbix

Please be aware that this is not a complete solution, answering all your questions.

I think the first step should be to create a star schema. I created the following new tables

Clients this table is derived from all the clients in the tables CIC and CiS, there are no duplicates and empty vlaues are removed. I realized that there are clients in the SCORE table that are not present in the tables CIC and CiS. Maybe this because of creating the sample data, if not adapt the Power Query by adding a 3rd section that derives the unique clients from the SCORE table and combine all three tables to get the Clients dimension table. I created a calculated column in the SCORE table that checks if the client exists in the Client table.
Assessment this table is derived from the pivoted SCORE table, I created a 2nd column that only contains the assessment categories, this will help to properly count the numbers for your 3rd and 4th question.
Pre / Post this table is derived from the SCORE table

The model I created looks like this:

First, I pivoted the assessments in the SCORE table using Power Query. I consider the SCORE table a fact table. Pivoting the assessments (creating a single column) makes it simple to count clients with at least one assessment because assessments with no value will be removed automatically.

After pivoting the SCORE table the table will look like this:

Then I created some measures; one measure is more complex, “# of clients with pre and post”; this measure is more complex because you are looking for clients in the SCORE table that have pre and post records. I use SUMMARIZE to find these clients by counting the Pre/Post values, only clients where the count is 2 are valid clients:

COUNTROWS(
    FILTER(
        ADDCOLUMNS(
            SUMMARIZE(
                'SCORE'
                , 'Clients'[Client]
                // , 'Pre / Post'[Pre/Post]
            )
            , "pre and post", CALCULATE( DISTINCTCOUNT( 'SCORE'[Pre/Post] ) )
        )
        , [pre and post] = 2
    )
)

Hopefully, this will help you to tackle your challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Flawn · ‎05-06-2024

Hey Tom,

This looks amazing - i'm going to try it nout now.

But just to clarify - yes, this is sample data and theoretically the complete data set should have all clients in the SCORE table have a corresponding row in the CiS table. You are saying that, as that is the case, I should create a new table that combines all the unique clients from the SCORE table with the CiS and CiC table? I'm not sure what function that would perform - would you be able to clarify? I'm looking to learn, not just copy paste!

Warm Regards,
Flawn

TomMartens · ‎05-06-2024

Hey @Flawn ,

the table Clients from my sample pbix is already extracting the clients from the tables CiS and CIC, just have a look at the M code using the Advanced editor inside Power Query:

let
    tblClientsInCase = Table.SelectColumns(#"CIC", {"Client In Case"}),
    #"tblClientsInCaseWithoutDuplicates" = Table.Distinct(tblClientsInCase),
    #"tblClientsInCaseWithoutDuplicatesNonEmpty" = Table.SelectRows(tblClientsInCaseWithoutDuplicates, 
        each [Client In Case] <> null and [Client In Case] <> ""),
    #"Renamed Columns1" = Table.RenameColumns(#"tblClientsInCaseWithoutDuplicatesNonEmpty",{{"Client In Case", "Client"}}),

    tblClientsInSession = Table.SelectColumns(#"CiS", {"Client In Session"}),
    tblClientsInSessionWithoutDuplicates= Table.Distinct(tblClientsInSession),
    tblClientsInSessionWithoutDuplicatesNonEmpty = Table.SelectRows(tblClientsInSessionWithoutDuplicates, 
        each [Client In Session] <> null and [Client In Session] <> ""),
    #"Renamed Columns2" = Table.RenameColumns(#"tblClientsInSessionWithoutDuplicatesNonEmpty",{{"Client In Session", "Client"}}),

    combinedTables = Table.Distinct(Table.Combine( {#"Renamed Columns1", #"Renamed Columns2"} ) )
in
    combinedTables

In the above code you can see two sections, the first sectrion extracts data from the CIC table and the second section from the CiS table. Just add a third section that extract the client data from the SCORE data.
Do not forget to add the the 3rd table to the Table.Combine statement.

Of course this is only necessary, if the source data can contain clients that are only available in one of the tables, CiS, CIC, and SCORE. But this can only be answered by the people who are creating/providing the source files.

I recommend spending some time learning about the importance of the semantic model, the articles are not explaining how you extract data from multiple tables, and combine the data to create a dimension table, the articles explain, why it's important:

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Flawn · ‎05-06-2024

Thanks again Tom, you've been an incredible help.

So far this looks like it should work great. If i'm understanding correctly all I really have to add on top of this for my 3rd and 4th question is an additional filter for each assessment type?

TomMartens · ‎05-06-2024

Hey Flawn,

exactly. This is the reason why I pivoted the SCORE table in the first place, I derived the Assessment table from this pivoted table being able to have a table that can be used to filter the SCORE table.

Of course you have to check the measures carefully 😉

Regards,
Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Calculating %/# of ID values that have a certain number of rows & are non-blank in certain columns

Helpful resources

Fabric certifications survey

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly