Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
arzukari
Frequent Visitor

Calculated Colomn taking too much memory

Hello,

 

I'm using a tabular model that feeds into report server power bi.

 

The issue is I have a table with over 18 million rows with patient ids and their test results. The client wants to export the underlying data through power bi but wants a summarized table of the patients with the last test result only.

 

After creating a calculated SUMMARIZE table of the patient IDs, I added colomns using: CALCULATE(FIRSTNONBLANK(RESULT,1), FILTER([Main Table],Main Table[Patient ID]=Calculated Table[Patient ID])

This worked in the beginning. But as the tables grew exponentially and the requirements became more complex, the above formula started to push processing time to the limits. Now I'm getting out of memory errors.

 

Is there a better way to write the above formula? I'm looking for a lookup formula that I can easily add more complex filters and logical expressions.

 

Would really appreciate any help.

2 ACCEPTED SOLUTIONS

HI @arzukari ,

 

Instead of creating Calculated Column, try using them in the same Summarize function.

 

Also, using Calculate function in Column creates rows transitions which is not recommended.

https://community.powerbi.com/t5/Community-Blog/CALCUHATE-Why-I-Don-t-Use-DAX-s-CALCULATE-Function/b...

 

Table2 = SUMMARIZE(Table1,Table1[PatientID],"LAst Result", LASTNONBLANK(Table1[Result],True()))

 

Regards,

Harsh Nathani

 

View solution in original post

Anonymous
Not applicable

 

// The easiest way to generate PatientID's is to exec:
[Calculated Table] = // calculated table
    distinct( 'Main Table'[PatientID] )

// I understand you want to get the first result
// for the patient which means the result on the
// very first date recorded in the table. To get
// such a result, all of them must be ordered. I assume
// that only 1 result is possible for any patient
// on any single date. If this is not true, you have
// to have something (e.g., a time column) that would
// be used to break ties. Please steer clear of
// CALCULATE in calculated columns as this will
// slow down calculations and will in fact bring them
// to a halt for big tables.
First Result = // calculated column in the 2nd table
var __patientId = 'Calculated Table'[PatientID]
var __result =
    MAXX(
        // Again, this will work OK only if
        // there are no ties for the same
        // patient id on the same day.
        TOPN(1,
            filter(
                'Main Table',
                'Main Table'[PatientID] = __patientId
                // I also assume that there are no
                // blanks in the Result column. If
                // there are you have to filter them
                // out like:
                // 'Main Table'[Result] <> BLANK()
            ),
            'Main Table'[Result Date],
            DESC
            // If you have a column to break ties
            // you'll put it in here as well specifing
            // it's order. If it were a time column,
            // you'd write
            // 'Main Table'[Result Time],
            // DESC
        ),
        'Main Table'[Result]
    )
return
    __result

 

View solution in original post

9 REPLIES 9
amitchandak
Super User
Super User

@arzukari , Not very clear with given formula.  Can you share sample data and sample output in table format?

Join us as experts from around the world come together to shape the future of data and AI!
At the Microsoft Analytics Community Conference, global leaders and influential voices are stepping up to share their knowledge and help you master the latest in Microsoft Fabric, Copilot, and Purview.
️ November 12th-14th, 2024
 Online Event
Register Here

Hi @amitchandak,

 

So the main table (simplified) would be like this:

PatientIDResult DateResult
18/6/2020Positive
28/7/2020Negative
38/8/2020Positive
18/9/2020Negative
28/10/2020Positive
28/6/2020Positive
28/7/2020Positive
38/8/2020Negative
58/9/2020Negative
58/10/2020Negative
58/6/2020Negative
48/7/2020Positive
48/18/2020Positive
48/19/2020Negative
38/20/2020Negative
28/18/2020Negative
58/19/2020Negative
28/20/2020Negative
18/24/2020Positive
28/25/2020

Positive

but x18 million and exponentially increasing everyday. The above table is called Table1.

 

Then I create a calculated table (Table2) with the formula: SUMMARIZE (Table1, PatientID) and got the below:

PatientID
1
2
3
4
5

 

I want to add column in Table2 that gets the first result from table1. So I added a calculated column:

CALCULATE (FIRSTNONBLANK(Table1[Result],1),FILTER(Table1,Table1[PatientID]=Table2[PatientID]))

 

This is just one requirement out of at least 50 other columns across many similar tables that I keep tweaking. The size of data is making processing the above dax formula take a lot of time and memory.

 

Unfortunately, the client loves exporting the underlying data of the power BI visuals, but needs each table to be summarized in a specific way. 

 

Anonymous
Not applicable

 

// The easiest way to generate PatientID's is to exec:
[Calculated Table] = // calculated table
    distinct( 'Main Table'[PatientID] )

// I understand you want to get the first result
// for the patient which means the result on the
// very first date recorded in the table. To get
// such a result, all of them must be ordered. I assume
// that only 1 result is possible for any patient
// on any single date. If this is not true, you have
// to have something (e.g., a time column) that would
// be used to break ties. Please steer clear of
// CALCULATE in calculated columns as this will
// slow down calculations and will in fact bring them
// to a halt for big tables.
First Result = // calculated column in the 2nd table
var __patientId = 'Calculated Table'[PatientID]
var __result =
    MAXX(
        // Again, this will work OK only if
        // there are no ties for the same
        // patient id on the same day.
        TOPN(1,
            filter(
                'Main Table',
                'Main Table'[PatientID] = __patientId
                // I also assume that there are no
                // blanks in the Result column. If
                // there are you have to filter them
                // out like:
                // 'Main Table'[Result] <> BLANK()
            ),
            'Main Table'[Result Date],
            DESC
            // If you have a column to break ties
            // you'll put it in here as well specifing
            // it's order. If it were a time column,
            // you'd write
            // 'Main Table'[Result Time],
            // DESC
        ),
        'Main Table'[Result]
    )
return
    __result

 

Thank you this has been very helpful.

I applied it, processing is a bit better but still slow. Is there a way to pinpoint where the inefficiencies are?

Anonymous
Not applicable

This is taking a lot of memory because you're doing in DAX something that should be done in Power Query. That's the first thing. Secondly, you're filtering in your DAX formulas by entire tables, which is always not only slow but bad practice, not to say horrible. You should only filter by columns (99.9999% of the time). I strongly suggest you do your calculations of the tables in Power Query but also in the right way because if you do it recklessly, the code will be slow as well. There are many articles on the Net about how to optimize Power Query code. Please get to know them. By the way, I'll show you in a sec how to write the code for Table2 so that it (should be) is much quicker (without using context transition).

HI @arzukari ,

 

Instead of creating Calculated Column, try using them in the same Summarize function.

 

Also, using Calculate function in Column creates rows transitions which is not recommended.

https://community.powerbi.com/t5/Community-Blog/CALCUHATE-Why-I-Don-t-Use-DAX-s-CALCULATE-Function/b...

 

Table2 = SUMMARIZE(Table1,Table1[PatientID],"LAst Result", LASTNONBLANK(Table1[Result],True()))

 

Regards,

Harsh Nathani

 

This was very helpful. I applied it to the straight forward lookups and it helped a lot.

 

I'm still trying to figure out the best way to handle the more complex lookups.

@harshnathani Thank you for the response.

 

Adding the expression into the SUMMARIZE table works great for some of the requirements. But many of the other requirements need more logical expressions that it becomes too complicated. Is there anything I can do directly into calculated columns?

Hi @arzukari ,

 

Try using measures for those.

 

https://www.youtube.com/watch?v=SVEGfqCTodc

 

https://www.sqlbi.com/articles/calculated-columns-and-measures-in-dax/#:~:text=A%20measure%20is%20ev...

 

You can also explore the option to do some of the calculations in Power Query.

 

Regards,
Harsh Nathani
Did I answer your question? Mark my post as a solution! Appreciate with a Kudos!! (Click the Thumbs Up Button)

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors