The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hello,
I'm using a tabular model that feeds into report server power bi.
The issue is I have a table with over 18 million rows with patient ids and their test results. The client wants to export the underlying data through power bi but wants a summarized table of the patients with the last test result only.
After creating a calculated SUMMARIZE table of the patient IDs, I added colomns using: CALCULATE(FIRSTNONBLANK(RESULT,1), FILTER([Main Table],Main Table[Patient ID]=Calculated Table[Patient ID])
This worked in the beginning. But as the tables grew exponentially and the requirements became more complex, the above formula started to push processing time to the limits. Now I'm getting out of memory errors.
Is there a better way to write the above formula? I'm looking for a lookup formula that I can easily add more complex filters and logical expressions.
Would really appreciate any help.
Solved! Go to Solution.
HI @arzukari ,
Instead of creating Calculated Column, try using them in the same Summarize function.
Also, using Calculate function in Column creates rows transitions which is not recommended.
Table2 = SUMMARIZE(Table1,Table1[PatientID],"LAst Result", LASTNONBLANK(Table1[Result],True()))
Regards,
Harsh Nathani
// The easiest way to generate PatientID's is to exec:
[Calculated Table] = // calculated table
distinct( 'Main Table'[PatientID] )
// I understand you want to get the first result
// for the patient which means the result on the
// very first date recorded in the table. To get
// such a result, all of them must be ordered. I assume
// that only 1 result is possible for any patient
// on any single date. If this is not true, you have
// to have something (e.g., a time column) that would
// be used to break ties. Please steer clear of
// CALCULATE in calculated columns as this will
// slow down calculations and will in fact bring them
// to a halt for big tables.
First Result = // calculated column in the 2nd table
var __patientId = 'Calculated Table'[PatientID]
var __result =
MAXX(
// Again, this will work OK only if
// there are no ties for the same
// patient id on the same day.
TOPN(1,
filter(
'Main Table',
'Main Table'[PatientID] = __patientId
// I also assume that there are no
// blanks in the Result column. If
// there are you have to filter them
// out like:
// 'Main Table'[Result] <> BLANK()
),
'Main Table'[Result Date],
DESC
// If you have a column to break ties
// you'll put it in here as well specifing
// it's order. If it were a time column,
// you'd write
// 'Main Table'[Result Time],
// DESC
),
'Main Table'[Result]
)
return
__result
@arzukari , Not very clear with given formula. Can you share sample data and sample output in table format?
Hi @amitchandak,
So the main table (simplified) would be like this:
PatientID | Result Date | Result |
1 | 8/6/2020 | Positive |
2 | 8/7/2020 | Negative |
3 | 8/8/2020 | Positive |
1 | 8/9/2020 | Negative |
2 | 8/10/2020 | Positive |
2 | 8/6/2020 | Positive |
2 | 8/7/2020 | Positive |
3 | 8/8/2020 | Negative |
5 | 8/9/2020 | Negative |
5 | 8/10/2020 | Negative |
5 | 8/6/2020 | Negative |
4 | 8/7/2020 | Positive |
4 | 8/18/2020 | Positive |
4 | 8/19/2020 | Negative |
3 | 8/20/2020 | Negative |
2 | 8/18/2020 | Negative |
5 | 8/19/2020 | Negative |
2 | 8/20/2020 | Negative |
1 | 8/24/2020 | Positive |
2 | 8/25/2020 | Positive |
but x18 million and exponentially increasing everyday. The above table is called Table1.
Then I create a calculated table (Table2) with the formula: SUMMARIZE (Table1, PatientID) and got the below:
PatientID |
1 |
2 |
3 |
4 |
5 |
I want to add column in Table2 that gets the first result from table1. So I added a calculated column:
CALCULATE (FIRSTNONBLANK(Table1[Result],1),FILTER(Table1,Table1[PatientID]=Table2[PatientID]))
This is just one requirement out of at least 50 other columns across many similar tables that I keep tweaking. The size of data is making processing the above dax formula take a lot of time and memory.
Unfortunately, the client loves exporting the underlying data of the power BI visuals, but needs each table to be summarized in a specific way.
// The easiest way to generate PatientID's is to exec:
[Calculated Table] = // calculated table
distinct( 'Main Table'[PatientID] )
// I understand you want to get the first result
// for the patient which means the result on the
// very first date recorded in the table. To get
// such a result, all of them must be ordered. I assume
// that only 1 result is possible for any patient
// on any single date. If this is not true, you have
// to have something (e.g., a time column) that would
// be used to break ties. Please steer clear of
// CALCULATE in calculated columns as this will
// slow down calculations and will in fact bring them
// to a halt for big tables.
First Result = // calculated column in the 2nd table
var __patientId = 'Calculated Table'[PatientID]
var __result =
MAXX(
// Again, this will work OK only if
// there are no ties for the same
// patient id on the same day.
TOPN(1,
filter(
'Main Table',
'Main Table'[PatientID] = __patientId
// I also assume that there are no
// blanks in the Result column. If
// there are you have to filter them
// out like:
// 'Main Table'[Result] <> BLANK()
),
'Main Table'[Result Date],
DESC
// If you have a column to break ties
// you'll put it in here as well specifing
// it's order. If it were a time column,
// you'd write
// 'Main Table'[Result Time],
// DESC
),
'Main Table'[Result]
)
return
__result
Thank you this has been very helpful.
I applied it, processing is a bit better but still slow. Is there a way to pinpoint where the inefficiencies are?
HI @arzukari ,
Instead of creating Calculated Column, try using them in the same Summarize function.
Also, using Calculate function in Column creates rows transitions which is not recommended.
Table2 = SUMMARIZE(Table1,Table1[PatientID],"LAst Result", LASTNONBLANK(Table1[Result],True()))
Regards,
Harsh Nathani
This was very helpful. I applied it to the straight forward lookups and it helped a lot.
I'm still trying to figure out the best way to handle the more complex lookups.
@harshnathani Thank you for the response.
Adding the expression into the SUMMARIZE table works great for some of the requirements. But many of the other requirements need more logical expressions that it becomes too complicated. Is there anything I can do directly into calculated columns?
Hi @arzukari ,
Try using measures for those.
https://www.youtube.com/watch?v=SVEGfqCTodc
You can also explore the option to do some of the calculations in Power Query.
Regards,
Harsh Nathani
Did I answer your question? Mark my post as a solution! Appreciate with a Kudos!! (Click the Thumbs Up Button)
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
21 | |
18 | |
16 | |
15 | |
13 |
User | Count |
---|---|
39 | |
32 | |
22 | |
19 | |
18 |