Solved: Only counting an ID once for each date it appears.

LR_PublicHealth · ‎02-08-2024

Hello, everyone!

Having used a different BI tool for many years, our team is making the move to Power BI. I'm steadily getting used with it, but I've hit a bit of a stumbling block. I'm sure it's something very simple, but after a search of the forum I can't quite find a question and answer that match my issue.

I have a dataset (pulled from a SQL Server using DirectQuery) of record-level patient attendance data, a date table (created using PowerQuery M) and a couple of lookup tables (.csvs stored in SharePoint). My only relationship is a many to one from a date field in my dataset to my date table.

Now, I might be going about this all wrong (and please tell me, if so) but I've created a number of measures to match-up to our KPIs using CALCULATE. For example:

Indicator 1.1 = CALCULATE(
    DISTINCTCOUNT( 'table'[ID] ),
    LEFT('table'[ID],2) = "SH",
    'table'[WaitingDays] <=2
)
/
CALCULATE(
    DISTINCTCOUNT( 'table'[ID] ),
    LEFT('table'[ID],2) = "SH"
)

The above indicator should give us the percentage of distinct patients whose IDs begin with "SH" who waited 2 days or less for their appointment. Seems to work fine.

The issue I'm having is for some indicators we need to count non-distinct IDs - like when a patient appears two or three times in a period. However, it's complicated by each ID having multiple records per date which capture other fields (it's how the client's recording system is setup, so I don't think we can escape it). This throws up an issue. For example:

Indicator 10.2 = CALCULATE(
    COUNT( 'table'[ID] ),
    'table'[Outcome] IN {"New","Repeat"}
)

This measure counts way too many patients, as they appear more than once each date.
What I need is a way to count each ID only once per date they appear in the period.

I might be approaching this totally wrong (again, although I'm not new to BI, I'm very new to PBI) so happy to change the approach if easier/more logical. Here is a mock of the data and problem:

ID	Date	Outcome
001	01/01/2024	New
001	01/01/2024	New
002	01/01/2024	New
002	01/01/2024	New
002	01/01/2024	New
003	02/01/2024	New
001	04/01/2024	Repeat
001	04/01/2024	Repeat
003	08/01/2024	Repeat
004	08/01/2024	New
004	08/01/2024	New
001	08/01/2024	Follow-up
001	08/01/2024	Follow-up

What I'd expect to see is a count of 6. As is, I'm getting every row other than the "Follow-up" outcomes.

Thanks in advance!

audreygerred · ‎02-08-2024

Hello! I entered the sample data as below:

I created the following DAX:

Count = COUNTROWS(

SUMMARIZE(

FILTER('Table',

'Table'[Outcome] = "New" || 'Table'[Outcome] = "Repeat"

),

[ID], [Date]

)

And, I get the following count of 6 🙂

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

View solution in original post

vicky_ · ‎02-08-2024

The quickest way to do this is to simply replace COUNT with DISTINCTCOUNT(), i.e

Indicator 10.2 = CALCULATE(
    DISTINCTCOUNT( 'table'[ID] ),
    'table'[Outcome] IN {"New","Repeat"}
)

But the issue with this way is that the totals (if you don't have the dates in the visual) will be incorrect, as it's counting the distinct new / repeat customers over all days

To get around this, i would recommend creating a new table just so that the totals are correct.

NEW Indicator 10.2 = 
var __newTable = ADDCOLUMNS(
    VALUES('Table'[Date]), 
    "distinct customer count",
        CALCULATE(
            DISTINCTCOUNT( 'table'[ID] ),
            'table'[Outcome] IN {"New","Repeat"}
        )
)
return SUMX(__newTable, [distinct customer count])

LR_PublicHealth · ‎02-08-2024

Hi, Vicky.

First, thanks for your time.

Yes, as you say DISTINCTCOUNT is no good in this instance as the date field will not be used in the visual.

My only concern with this is whether adding new tables impacts performance. I've been impressed with the speed of reports thus far - but with *lots* of KPIs to calculate, if I have to make a new table for each one not only might it become messy quickly but will it also impact report speed?

audreygerred · ‎02-08-2024

Hello! I entered the sample data as below:

I created the following DAX:

Count = COUNTROWS(

SUMMARIZE(

FILTER('Table',

'Table'[Outcome] = "New" || 'Table'[Outcome] = "Repeat"

),

[ID], [Date]

)

And, I get the following count of 6 🙂

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

LR_PublicHealth · ‎02-08-2024

Hi, Audrey.

Thanks for your time to respond!

I think this may work. My only "issue" is that it's throwing an error to say the date column used in SUMMARIZE doesn't appear in the table. Am I correct in thinking that I can simply switch out the date column from my date table to the original date column from my dataset and it won't affect time intelligence?

audreygerred · ‎02-08-2024

For this, it's not really doing any time intelligence (MTD, YTD, YoY, MoM, etc.), it's really just doing the following:

- FILTER is used to filter the table to include only rows where the [Outcome] is "New" or "Repeat".
- SUMMARIZE then creates a summary table with unique combinations of [ID] and [Date].
- COUNTROWS counts the number of unique combinations meeting the criteria of "New" or "Repeat" outcomes

In this case you would want the summarization to have the date field from your fact table because that is what summarize is doing, it's making another temporary table in the background with each unique combination of ID and Date if the Outcome is New or Repeat, then it is adding them together.

If you were doing any time intelligence to compare to another time period, in that meausre, you would reference the Date from your date table.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

LR_PublicHealth · ‎02-08-2024

That's perfect; with such a good explanation, I should be able to adapt it as neccesary for each KPI measure myself.

Thanks so much for taking the time to help - it's been a big change getting to grips with PBI, but I'm very inspired by the possibilities and community that enables it!

Best wishes.

audreygerred · ‎02-08-2024

You are very welcome! Power BI is a great tool. It takes, some time but there are so many great references. Good luck on your journey!

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

LR_PublicHealth · ‎02-09-2024

Very sorry to bother you, Audrey.

All was going well until I got to a specific indicator:

Indicator 13.1 = 
COUNTROWS(
    SUMMARIZE(
        FILTER('table',
           'table'[Outcome] <> "NEW" || 'table'[Outcome] <> "REVIEW" || 'table'[Age] < 19
        ),
        'table'[ID], 'table'[Date]
    )
)

The [Age] column is another measure, like so:

Age = ROUNDDOWN(DATEDIFF('table'[BirthDate], TODAY(), DAY) / 365.25, 0)

For some reason, the indicator is counting everybody, rather than just patients who do not have those outcomes and are under 19 years of age. If I remove the age part of the filter, there is no change in the end count. Am I missing something really simple?

Thanks in advance.

audreygerred · ‎02-09-2024

Hi! Can you add a column for age in Power Query? That should fix the issue.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Only counting an ID once for each date it appears.

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly

Jumpstart your career with the Fabric Career Hub

Only counting an ID once for each date it appears.

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly