Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
TalkingBits
Frequent Visitor

Best practice on how to aggregate based on existence/non-existence of criteria in related table

Hi, 

 

I'm still quite new to dax so this might be an obvious question to some. I feel like i have googled around quite a bit but have not found a satisfying solution yet. 

 

I have the following problem. 

I have a table with pregnancies and another with children. They are related through pregnancy_id. The child table can contain multiple children for each pregnancy. I want to count the number of pregnancies (distinct pregnancy_id) that fulfil the following conditions:

1. at least one 'children criteria1' in the children table

2. absence of 'children criteria2' in the children table for all children for the given pregnancy_id, 

3. existance of 'pregnancy criteria1' in the pregnancy table

4. absence of 'pregnancy criteria2' in the pregnancy table

 

Here is a short example:

TalkingBits_0-1728370888318.png

The red cells are the ones disqualifying the pregnancy id from being included in the basis for the calculation. 

So for example pregancy_id = 1 is disqualified because child_nr = 1 has child_criteria2 = TRUE. 

pregnancy_id = 2 and 4 are the only one that should be included in the basis for the calculation, hence there should only be 2 distinct pregnancy_ids. 

 

Many of my solutions that i've tried so far reach the 1 million row limitation. There are a bit more than 1 million pregnancies. 

Also, are there any guidelines/ways to think on how to keep calculations in the storage engine? 

 

1 ACCEPTED SOLUTION
TalkingBits
Frequent Visitor

I came up with this solution that works with more than 1 million rows and is reasonably fast:

Total_with_relatedtable =
COUNTROWS (
    DISTINCT (
        SELECTCOLUMNS (
            FILTER (
                pregnancies,
                ISEMPTY (
                    FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria2] = TRUE () )
                )
                    && NOT (
                        ISEMPTY (
                            FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria1] = TRUE () )
                        )
                    )
                        && pregnancies[pregnancy_criteria2] <> TRUE ()
                        && pregnancies[pregnancy_criteria1] = TRUE ()
            ),
            "pregnancy_id", pregnancies[pregnancy_id]
        )
    )
)

 

View solution in original post

9 REPLIES 9
TalkingBits
Frequent Visitor

I came up with this solution that works with more than 1 million rows and is reasonably fast:

Total_with_relatedtable =
COUNTROWS (
    DISTINCT (
        SELECTCOLUMNS (
            FILTER (
                pregnancies,
                ISEMPTY (
                    FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria2] = TRUE () )
                )
                    && NOT (
                        ISEMPTY (
                            FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria1] = TRUE () )
                        )
                    )
                        && pregnancies[pregnancy_criteria2] <> TRUE ()
                        && pregnancies[pregnancy_criteria1] = TRUE ()
            ),
            "pregnancy_id", pregnancies[pregnancy_id]
        )
    )
)

 

v-tianyich-msft
Community Support
Community Support

Hi @TalkingBits ,

 

Could you share some of the sample data and your expected results?

 

Best regards,
Community Support Team_ Scott Chang

Now I have updated the question with a short example. 

Hi @TalkingBits ,

 

I made simple samples and you can check the results below:

vtianyichmsft_0-1728466017685.png

HasChildrenCriteria1 = 
CALCULATE(
    COUNTROWS('Children'),
    FILTER('Children', 'Children'[child_criteria1] = TRUE)
)

HasPregnancyCriteria1 = 
CALCULATE(
    COUNTROWS('pregnancies'),
    FILTER('pregnancies', 'pregnancies'[pregnancy_criteria1] = TRUE)
)

NoChildrenCriteria2 = 
CALCULATE(
    COUNTROWS('Children'),
    FILTER('Children', 'Children'[child_criteria2] = TRUE)
) = 0

NoPregnancyCriteria2 = 
CALCULATE(
    COUNTROWS('pregnancies'),
    FILTER('pregnancies', 'pregnancies'[pregnancy_criteria2] = FALSE)
)


Total = 
CALCULATE(
    DISTINCTCOUNT(pregnancies[pregnancy_id]),
    FILTER(
        pregnancies,
        [HasChildrenCriteria1] > 0 &&
        [NoChildrenCriteria2] &&
        [HasPregnancyCriteria1] > 0 &&
        [NoPregnancyCriteria2]
    )
)

 

An attachment for your reference. Hope it helps!

 

Best regards,
Community Support Team_ Scott Chang

 

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

Thank you @v-tianyich-msftfor you solution and your example. It gives me the correct numbers and does what it should, however, it seems that it pulls too many rows into the formula engine, hence i get the 1 million rows limit error message. 

This is so frustrating with dax that you have no idea what happens once you get the code working correctly. I don't consider 1 million rows being an especially large collection of data. What kind of dax should you write to keep the calculations in the storage engine? Or is it just trial and error? 

Hi @TalkingBits ,

 

Try declaring the first four measures as var, which might reduce the memory used somewhat.

 

Best regards,
Community Support Team_ Scott Chang

 

Can you help me out on where to declare your measures as variables? I belive that they need to be measures, since they are evaluated in the row context, which is transitioned to the filter context and evaluated once for each row. If you declare them as variables, it will only be a scalar, no? 
Regardless I don't think that the problem is memory, but rather too many rows being returned into the formula engine. I have solved the problem by precalculating the measures as true false variables in the pregnancy table in the database, which is also a lot faster, but am still curious if that is the best approach. 

dharmendars007
Memorable Member
Memorable Member

Hello @TalkingBits , 

 

Assuming that you ahve 2 tables Pregnancy and Children 

 

First, ensure you apply the conditions related to children in one part of the DAX code. This can be done by checking for the presence or absence of criteria1 and criteria2.

 

Do the same for the pregnancies table using conditions on criteria1 and criteria2

 

Use the below DAX code which will help you in geeting your logic..

 

PregnancyCount =
CALCULATE (DISTINCTCOUNT(Pregnancies[pregnancy_id]),
FILTER (
Pregnancies,
Pregnancies[pregnancy_criteria1] = TRUE &&
Pregnancies[pregnancy_criteria2] = FALSE),
FILTER (
Children,
Children[children_criteria1] = TRUE &&
Children[children_criteria2] = FALSE))

 

In regards to optimizing your DAX follow the below basic practices

 

Depending on the size of your data and query, consider creating aggregated tables in your data model for faster computation. Pre-calculate some of the metrics in the source system (if possible) to avoid overloading Power BI.

 

If you find this helpful , please mark it as solution which will be helpful for others and Your Kudos/Likes 👍 are much appreciated!

 

Thank You

Dharmendar S

LinkedIN 

Thank you for your solution. I think I was a bit unclear regarding children_criteria2, I have updated the question. It needs to be absent for all the children, for a given pregnancy, i think in your solution a pregnancy where one child has children_criteria2 = FALSE and the other TRUE would still be included. 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.