Solved: Re: Best practice on how to aggregate based on exi...

TalkingBits · ‎10-07-2024

Hi,

I'm still quite new to dax so this might be an obvious question to some. I feel like i have googled around quite a bit but have not found a satisfying solution yet.

I have the following problem.

I have a table with pregnancies and another with children. They are related through pregnancy_id. The child table can contain multiple children for each pregnancy. I want to count the number of pregnancies (distinct pregnancy_id) that fulfil the following conditions:

1. at least one 'children criteria1' in the children table

2. absence of 'children criteria2' in the children table for all children for the given pregnancy_id,

3. existance of 'pregnancy criteria1' in the pregnancy table

4. absence of 'pregnancy criteria2' in the pregnancy table

Here is a short example:

The red cells are the ones disqualifying the pregnancy id from being included in the basis for the calculation.

So for example pregancy_id = 1 is disqualified because child_nr = 1 has child_criteria2 = TRUE.

pregnancy_id = 2 and 4 are the only one that should be included in the basis for the calculation, hence there should only be 2 distinct pregnancy_ids.

Many of my solutions that i've tried so far reach the 1 million row limitation. There are a bit more than 1 million pregnancies.

Also, are there any guidelines/ways to think on how to keep calculations in the storage engine?

TalkingBits · ‎10-20-2024

I came up with this solution that works with more than 1 million rows and is reasonably fast:

Total_with_relatedtable =
COUNTROWS (
    DISTINCT (
        SELECTCOLUMNS (
            FILTER (
                pregnancies,
                ISEMPTY (
                    FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria2] = TRUE () )
                )
                    && NOT (
                        ISEMPTY (
                            FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria1] = TRUE () )
                        )
                    )
                        && pregnancies[pregnancy_criteria2] <> TRUE ()
                        && pregnancies[pregnancy_criteria1] = TRUE ()
            ),
            "pregnancy_id", pregnancies[pregnancy_id]
        )
    )
)

View solution in original post

TalkingBits · ‎10-20-2024

I came up with this solution that works with more than 1 million rows and is reasonably fast:

Total_with_relatedtable =
COUNTROWS (
    DISTINCT (
        SELECTCOLUMNS (
            FILTER (
                pregnancies,
                ISEMPTY (
                    FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria2] = TRUE () )
                )
                    && NOT (
                        ISEMPTY (
                            FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria1] = TRUE () )
                        )
                    )
                        && pregnancies[pregnancy_criteria2] <> TRUE ()
                        && pregnancies[pregnancy_criteria1] = TRUE ()
            ),
            "pregnancy_id", pregnancies[pregnancy_id]
        )
    )
)

v-tianyich-msft · ‎10-07-2024

Hi @TalkingBits ,

Could you share some of the sample data and your expected results?

Best regards,
Community Support Team_ Scott Chang

TalkingBits · ‎10-08-2024

Now I have updated the question with a short example.

v-tianyich-msft · ‎10-09-2024

Hi @TalkingBits ,

I made simple samples and you can check the results below:

HasChildrenCriteria1 = 
CALCULATE(
    COUNTROWS('Children'),
    FILTER('Children', 'Children'[child_criteria1] = TRUE)
)

HasPregnancyCriteria1 = 
CALCULATE(
    COUNTROWS('pregnancies'),
    FILTER('pregnancies', 'pregnancies'[pregnancy_criteria1] = TRUE)
)

NoChildrenCriteria2 = 
CALCULATE(
    COUNTROWS('Children'),
    FILTER('Children', 'Children'[child_criteria2] = TRUE)
) = 0

NoPregnancyCriteria2 = 
CALCULATE(
    COUNTROWS('pregnancies'),
    FILTER('pregnancies', 'pregnancies'[pregnancy_criteria2] = FALSE)
)


Total = 
CALCULATE(
    DISTINCTCOUNT(pregnancies[pregnancy_id]),
    FILTER(
        pregnancies,
        [HasChildrenCriteria1] > 0 &&
        [NoChildrenCriteria2] &&
        [HasPregnancyCriteria1] > 0 &&
        [NoPregnancyCriteria2]
    )
)

An attachment for your reference. Hope it helps!

Best regards,
Community Support Team_ Scott Chang

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

TalkingBits · ‎10-11-2024

Thank you @v-tianyich-msftfor you solution and your example. It gives me the correct numbers and does what it should, however, it seems that it pulls too many rows into the formula engine, hence i get the 1 million rows limit error message.

This is so frustrating with dax that you have no idea what happens once you get the code working correctly. I don't consider 1 million rows being an especially large collection of data. What kind of dax should you write to keep the calculations in the storage engine? Or is it just trial and error?

v-tianyich-msft · ‎10-13-2024

Hi @TalkingBits ,

Try declaring the first four measures as var, which might reduce the memory used somewhat.

Best regards,
Community Support Team_ Scott Chang

TalkingBits · ‎10-18-2024

Can you help me out on where to declare your measures as variables? I belive that they need to be measures, since they are evaluated in the row context, which is transitioned to the filter context and evaluated once for each row. If you declare them as variables, it will only be a scalar, no?
Regardless I don't think that the problem is memory, but rather too many rows being returned into the formula engine. I have solved the problem by precalculating the measures as true false variables in the pregnancy table in the database, which is also a lot faster, but am still curious if that is the best approach.

dharmendars007 · ‎10-07-2024

Hello @TalkingBits ,

Assuming that you ahve 2 tables Pregnancy and Children

First, ensure you apply the conditions related to children in one part of the DAX code. This can be done by checking for the presence or absence of criteria1 and criteria2.

Do the same for the pregnancies table using conditions on criteria1 and criteria2

Use the below DAX code which will help you in geeting your logic..

PregnancyCount =
CALCULATE (DISTINCTCOUNT(Pregnancies[pregnancy_id]),
FILTER (
Pregnancies,
Pregnancies[pregnancy_criteria1] = TRUE &&
Pregnancies[pregnancy_criteria2] = FALSE),
FILTER (
Children,
Children[children_criteria1] = TRUE &&
Children[children_criteria2] = FALSE))

In regards to optimizing your DAX follow the below basic practices

Depending on the size of your data and query, consider creating aggregated tables in your data model for faster computation. Pre-calculate some of the metrics in the source system (if possible) to avoid overloading Power BI.

If you find this helpful , please mark it as solution which will be helpful for others and Your Kudos/Likes 👍 are much appreciated!

Thank You

Dharmendar S

TalkingBits · ‎10-07-2024

Thank you for your solution. I think I was a bit unclear regarding children_criteria2, I have updated the question. It needs to be absent for all the children, for a given pregnancy, i think in your solution a pregnancy where one child has children_criteria2 = FALSE and the other TRUE would still be included.

Best practice on how to aggregate based on existence/non-existence of criteria in related table

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - February 2025

Fabric Community Update - February 2025

Join us at the 2025 Microsoft Fabric Community Conference

Best practice on how to aggregate based on existence/non-existence of criteria in related table

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - February 2025

Fabric Community Update - February 2025