Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
LR_PublicHealth
Regular Visitor

Only counting an ID once for each date it appears.

 

Hello, everyone! 

Having used a different BI tool for many years, our team is making the move to Power BI. I'm steadily getting used with it, but I've hit a bit of a stumbling block. I'm sure it's something very simple, but after a search of the forum I can't quite find a question and answer that match my issue.

I have a dataset (pulled from a SQL Server using DirectQuery) of record-level patient attendance data, a date table (created using PowerQuery M) and a couple of lookup tables (.csvs stored in SharePoint). My only relationship is a many to one from a date field in my dataset to my date table.

 

Now, I might be going about this all wrong (and please tell me, if so) but I've created a number of measures to match-up to our KPIs using CALCULATE. For example:

 

Indicator 1.1 = CALCULATE(
    DISTINCTCOUNT( 'table'[ID] ),
    LEFT('table'[ID],2) = "SH",
    'table'[WaitingDays] <=2
)
/
CALCULATE(
    DISTINCTCOUNT( 'table'[ID] ),
    LEFT('table'[ID],2) = "SH"
)

 

The above indicator should give us the percentage of distinct patients whose IDs begin with "SH" who waited 2 days or less for their appointment. Seems to work fine.

The issue I'm having is for some indicators we need to count non-distinct IDs - like when a patient appears two or three times in a period. However, it's complicated by each ID having multiple records per date which capture other fields (it's how the client's recording system is setup, so I don't think we can escape it). This throws up an issue. For example:

 

Indicator 10.2 = CALCULATE(
    COUNT( 'table'[ID] ),
    'table'[Outcome] IN {"New","Repeat"}
)

 

This measure counts way too many patients, as they appear more than once each date.
What I need is a way to count each ID only once per date they appear in the period.

 

I might be approaching this totally wrong (again, although I'm not new to BI, I'm very new to PBI) so happy to change the approach if easier/more logical. Here is a mock of the data and problem:

IDDateOutcome
00101/01/2024

New

00101/01/2024 New
00201/01/2024 New
00201/01/2024 New
00201/01/2024 New
00302/01/2024New
00104/01/2024Repeat
00104/01/2024Repeat

003

08/01/2024Repeat
00408/01/2024New
00408/01/2024New
00108/01/2024Follow-up
00108/01/2024Follow-up


What I'd expect to see is a count of 6. As is, I'm getting every row other than the "Follow-up" outcomes.

 

Thanks in advance!

1 ACCEPTED SOLUTION
audreygerred
Super User
Super User

Hello! I entered the sample data as below:

audreygerred_0-1707427595396.png

I created the following DAX: 

Count = COUNTROWS(
    SUMMARIZE(
        FILTER('Table',
            'Table'[Outcome] = "New" || 'Table'[Outcome] = "Repeat"
        ),
        [ID], [Date]
    )
)
 
And, I get the following count of 6 🙂
 audreygerred_0-1707430427487.png

 

audreygerred_1-1707427687472.png

 




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





View solution in original post

9 REPLIES 9
vicky_
Super User
Super User

The quickest way to do this is to simply replace COUNT with DISTINCTCOUNT(), i.e

Indicator 10.2 = CALCULATE(
    DISTINCTCOUNT( 'table'[ID] ),
    'table'[Outcome] IN {"New","Repeat"}
)

But the issue with this way is that the totals (if you don't have the dates in the visual) will be incorrect, as it's counting the distinct new / repeat customers over all days

vicky__0-1707428421770.png

To get around this, i would recommend creating a new table just so that the totals are correct.

NEW Indicator 10.2 = 
var __newTable = ADDCOLUMNS(
    VALUES('Table'[Date]), 
    "distinct customer count",
        CALCULATE(
            DISTINCTCOUNT( 'table'[ID] ),
            'table'[Outcome] IN {"New","Repeat"}
        )
)
return SUMX(__newTable, [distinct customer count])

vicky__1-1707428567240.png

 

Hi, Vicky.

First, thanks for your time.

 

Yes, as you say DISTINCTCOUNT is no good in this instance as the date field will not be used in the visual.

My only concern with this is whether adding new tables impacts performance. I've been impressed with the speed of reports thus far - but with *lots* of KPIs to calculate, if I have to make a new table for each one not only might it become messy quickly but will it also impact report speed?

audreygerred
Super User
Super User

Hello! I entered the sample data as below:

audreygerred_0-1707427595396.png

I created the following DAX: 

Count = COUNTROWS(
    SUMMARIZE(
        FILTER('Table',
            'Table'[Outcome] = "New" || 'Table'[Outcome] = "Repeat"
        ),
        [ID], [Date]
    )
)
 
And, I get the following count of 6 🙂
 audreygerred_0-1707430427487.png

 

audreygerred_1-1707427687472.png

 




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Hi, Audrey.

 

Thanks for your time to respond!

I think this may work. My only "issue" is that it's throwing an error to say the date column used in SUMMARIZE doesn't appear in the table. Am I correct in thinking that I can simply switch out the date column from my date table to the original date column from my dataset and it won't affect time intelligence?

For this, it's not really doing any time intelligence (MTD, YTD, YoY, MoM, etc.), it's really just doing the following:

- FILTER is used to filter the table to include only rows where the [Outcome] is "New" or "Repeat".
- SUMMARIZE then creates a summary table with unique combinations of [ID] and [Date].
- COUNTROWS counts the number of unique combinations meeting the criteria of "New" or "Repeat" outcomes

 

In this case you would want the summarization to have the date field from your fact table because that is what summarize is doing, it's making another temporary table in the background with each unique combination of ID and Date if the Outcome is New or Repeat, then it is adding them together. 

audreygerred_1-1707430453265.png

 

If you were doing any time intelligence to compare to another time period, in that meausre, you would reference the Date from your date table.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





That's perfect; with such a good explanation, I should be able to adapt it as neccesary for each KPI measure myself.

 

Thanks so much for taking the time to help - it's been a big change getting to grips with PBI, but I'm very inspired by the possibilities and community that enables it! 

Best wishes.

You are very welcome! Power BI is a great tool. It takes, some time but there are so many great references. Good luck on your journey!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Very sorry to bother you, Audrey.

All was going well until I got to a specific indicator:

Indicator 13.1 = 
COUNTROWS(
    SUMMARIZE(
        FILTER('table',
           'table'[Outcome] <> "NEW" || 'table'[Outcome] <> "REVIEW" || 'table'[Age] < 19
        ),
        'table'[ID], 'table'[Date]
    )
)


The [Age] column is another measure, like so:

Age = ROUNDDOWN(DATEDIFF('table'[BirthDate], TODAY(), DAY) / 365.25, 0)

For some reason, the indicator is counting everybody, rather than just patients who do not have those outcomes and are under 19 years of age. If I remove the age part of the filter, there is no change in the end count. Am I missing something really simple?

Thanks in advance.

Hi! Can you add a column for age in Power Query? That should fix the issue. 




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.