March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
I'm still quite new to dax so this might be an obvious question to some. I feel like i have googled around quite a bit but have not found a satisfying solution yet.
I have the following problem.
I have a table with pregnancies and another with children. They are related through pregnancy_id. The child table can contain multiple children for each pregnancy. I want to count the number of pregnancies (distinct pregnancy_id) that fulfil the following conditions:
1. at least one 'children criteria1' in the children table
2. absence of 'children criteria2' in the children table for all children for the given pregnancy_id,
3. existance of 'pregnancy criteria1' in the pregnancy table
4. absence of 'pregnancy criteria2' in the pregnancy table
Here is a short example:
The red cells are the ones disqualifying the pregnancy id from being included in the basis for the calculation.
So for example pregancy_id = 1 is disqualified because child_nr = 1 has child_criteria2 = TRUE.
pregnancy_id = 2 and 4 are the only one that should be included in the basis for the calculation, hence there should only be 2 distinct pregnancy_ids.
Many of my solutions that i've tried so far reach the 1 million row limitation. There are a bit more than 1 million pregnancies.
Also, are there any guidelines/ways to think on how to keep calculations in the storage engine?
Solved! Go to Solution.
I came up with this solution that works with more than 1 million rows and is reasonably fast:
Total_with_relatedtable =
COUNTROWS (
DISTINCT (
SELECTCOLUMNS (
FILTER (
pregnancies,
ISEMPTY (
FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria2] = TRUE () )
)
&& NOT (
ISEMPTY (
FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria1] = TRUE () )
)
)
&& pregnancies[pregnancy_criteria2] <> TRUE ()
&& pregnancies[pregnancy_criteria1] = TRUE ()
),
"pregnancy_id", pregnancies[pregnancy_id]
)
)
)
I came up with this solution that works with more than 1 million rows and is reasonably fast:
Total_with_relatedtable =
COUNTROWS (
DISTINCT (
SELECTCOLUMNS (
FILTER (
pregnancies,
ISEMPTY (
FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria2] = TRUE () )
)
&& NOT (
ISEMPTY (
FILTER ( RELATEDTABLE ( 'Children' ), 'Children'[child_criteria1] = TRUE () )
)
)
&& pregnancies[pregnancy_criteria2] <> TRUE ()
&& pregnancies[pregnancy_criteria1] = TRUE ()
),
"pregnancy_id", pregnancies[pregnancy_id]
)
)
)
Hi @TalkingBits ,
Could you share some of the sample data and your expected results?
Best regards,
Community Support Team_ Scott Chang
Now I have updated the question with a short example.
Hi @TalkingBits ,
I made simple samples and you can check the results below:
HasChildrenCriteria1 =
CALCULATE(
COUNTROWS('Children'),
FILTER('Children', 'Children'[child_criteria1] = TRUE)
)
HasPregnancyCriteria1 =
CALCULATE(
COUNTROWS('pregnancies'),
FILTER('pregnancies', 'pregnancies'[pregnancy_criteria1] = TRUE)
)
NoChildrenCriteria2 =
CALCULATE(
COUNTROWS('Children'),
FILTER('Children', 'Children'[child_criteria2] = TRUE)
) = 0
NoPregnancyCriteria2 =
CALCULATE(
COUNTROWS('pregnancies'),
FILTER('pregnancies', 'pregnancies'[pregnancy_criteria2] = FALSE)
)
Total =
CALCULATE(
DISTINCTCOUNT(pregnancies[pregnancy_id]),
FILTER(
pregnancies,
[HasChildrenCriteria1] > 0 &&
[NoChildrenCriteria2] &&
[HasPregnancyCriteria1] > 0 &&
[NoPregnancyCriteria2]
)
)
An attachment for your reference. Hope it helps!
Best regards,
Community Support Team_ Scott Chang
If this post helps then please consider Accept it as the solution to help the other members find it more quickly.
Thank you @v-tianyich-msftfor you solution and your example. It gives me the correct numbers and does what it should, however, it seems that it pulls too many rows into the formula engine, hence i get the 1 million rows limit error message.
This is so frustrating with dax that you have no idea what happens once you get the code working correctly. I don't consider 1 million rows being an especially large collection of data. What kind of dax should you write to keep the calculations in the storage engine? Or is it just trial and error?
Hi @TalkingBits ,
Try declaring the first four measures as var, which might reduce the memory used somewhat.
Best regards,
Community Support Team_ Scott Chang
Can you help me out on where to declare your measures as variables? I belive that they need to be measures, since they are evaluated in the row context, which is transitioned to the filter context and evaluated once for each row. If you declare them as variables, it will only be a scalar, no?
Regardless I don't think that the problem is memory, but rather too many rows being returned into the formula engine. I have solved the problem by precalculating the measures as true false variables in the pregnancy table in the database, which is also a lot faster, but am still curious if that is the best approach.
Hello @TalkingBits ,
Assuming that you ahve 2 tables Pregnancy and Children
First, ensure you apply the conditions related to children in one part of the DAX code. This can be done by checking for the presence or absence of criteria1 and criteria2.
Do the same for the pregnancies table using conditions on criteria1 and criteria2
Use the below DAX code which will help you in geeting your logic..
PregnancyCount =
CALCULATE (DISTINCTCOUNT(Pregnancies[pregnancy_id]),
FILTER (
Pregnancies,
Pregnancies[pregnancy_criteria1] = TRUE &&
Pregnancies[pregnancy_criteria2] = FALSE),
FILTER (
Children,
Children[children_criteria1] = TRUE &&
Children[children_criteria2] = FALSE))
In regards to optimizing your DAX follow the below basic practices
Depending on the size of your data and query, consider creating aggregated tables in your data model for faster computation. Pre-calculate some of the metrics in the source system (if possible) to avoid overloading Power BI.
If you find this helpful , please mark it as solution which will be helpful for others and Your Kudos/Likes 👍 are much appreciated!
Thank You
Dharmendar S
Thank you for your solution. I think I was a bit unclear regarding children_criteria2, I have updated the question. It needs to be absent for all the children, for a given pregnancy, i think in your solution a pregnancy where one child has children_criteria2 = FALSE and the other TRUE would still be included.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
User | Count |
---|---|
26 | |
17 | |
16 | |
12 | |
11 |
User | Count |
---|---|
38 | |
28 | |
27 | |
20 | |
18 |