Re: RANK Categories runs out of memory (need smart...

DoctorYSG · ‎01-11-2024

True, I am a newbie to DAX, I have a Dimension table 'NodeInfo, which has rows of with fields (categories) called Application (e.g. Outlook, Word, Chrome, etc.) and other fields (Computer Name, CPU count, etc.)

and a Fact Table called Telemetry (which has timestamps, usage-time, and other facts.

I am trying to RANKX the applications by UsageTime) by creating a DAX New Column that adds a rank value to each row of the NodeInfo Table. Here was simple solutions I tried:

Application Rank = RANKX(
    'NodeInfo'[Application],
    CALCULATE(SUM('Telemetry'[UsageTime]) / (60 * 60 * 1000), 'Telemetry'[Used]),
    DESC,
    Dense
)

Well afer about 15 minutes on my Laptop, I got a error about running out of memory.
True, the NodeTable has 6M rows, and there are 150M rows in the Telemetry table, as well as about 2K application categories.

So I am not surprised that I am getting a combinatorial explosions. (If I am correct, the loop is 2 or 3 levels deep), probably all in the Formula Engine. I have tried FILTER and SUMMARY for the table, but that did not help.

I am looking for a smarter solution. If I could control this, I would do a Inner Join to get the usage into the NodeInfo, then Aggregate by Application Category (SUM) and then do a rank on the new Sum column.

What do you suggest?

Anonymous · ‎01-15-2024

Hi @DoctorYSG ,

This may not be a problem with the DAX, as this DAX is already in the form of the best performance, and any further optimization will hardly make a qualitative difference to the results. The reason for the lack of memory is most likely simply just that the amount of data you have is too large.
Can you tell me what your data source is? Maybe we can try to find ways to optimize from the data source.

Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other member find it more quickly.

DoctorYSG · ‎01-15-2024

Thank you for the offer of help. Not sure if there is too much that can be done in the DAX side (we wanted to stress test it to see how far we can push it).

As you might guess, we are a big I.T. organization, (probably one of MSFT's very large enterprise customers) and we are looking at finding usage times for 2K+ apps. But that is in the Dimension table. I cannot share that with you, but I can tell you the Fact Telemetry Table is large (162M rows).

Quick question: of the two DAX approaches I listed which is better, the one with the ALLSELECTED() or the KEEPFILTERS()?

The Telementry (Fact) table has only 6 colunns.

1. TimeKey (this is a TimeStamp) and yes, I know that Vertpaq has no RLE, Hash, or value compression on Doubles. But we need it to be a TimeStamp for Incremtal refresh. I put in a Fabric request to see if we can get a integer variant of TimeStamp since this only needs to be accurate to a minute. The float strikes me as a odd way to represent time. But that is for your developers

2. NodeID (this is the foreign key back to the NodeInfo table. It is a GUID. (SHA256 hash), Can't help VertiPaq with that.

3. There is one text column with cardinality 8, so that should be very well hadled by Veripaq (It is a foreigh key to another table

4. There are three Whole Number fields that hold the facts. Shourd be pretty low cardinanity, and lots of room for RLE compression.

We cannot use M-Querys to compute Rank(). Maybe in theory, but the backend database is Palantir Foundry, and it keeps this in a DataLake, and has hard limits of the M-Query Native SQL (via an ODBC connector) and will reject anything that is query folded that results in looking at more than 1M rows. That is a hard performance limit, so M-Query is out (we tried this a lot).

I know you encourage ETL to be done on the source machine. We want to see how far we can do this on the client. I was able to put together an ETL on the Palantir Foundry in 10 minutes (they have true industrial strength ETL, not PowerQuery). It does:

1. Inner join to get the fact and dimension together
2. Partiioned (by app) Sum of the usage values
3. Rank the usage

Takes about 5 mintues on a medium SPARK cluster.

(yes it really sings at this problem, but we want end users to be able to use PowerBI to do this sort of stuff, Palantir Foundry is for Data Engineers).

Anonymous · ‎01-15-2024

Hi @DoctorYSG ,

About this question:

It's hard for me to provide you with a precise answer because DAX performance is also affected by the computer you are using and the complexity of your data model. I would recommend you to download DAX studio or Performance analyzer and DAX Query View in Power BI Desktop to see your DAX runtime.

Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

DoctorYSG · ‎01-17-2024

@Anonymous Dino, I am definitely enjoying this on an intellectual basis, and learning a lot. I hope you don't mind, here is where I am still stuck.

1. I created a column in the "NodeInfo' table (Dimension) that Joins the timing from the Telemetry. In short, this allows me to get all the hours of usage by user or app in the Dimension (and not recompute that all the time). Model in screen shot one.2. I can use that to create the (screen shot 2) matrix below that shows By user what they total usage is. However, if I do this for all users it gets the performance error. (I can show you it only because I set the matrix visual to show the Top100 users.

3. I also tried to create a RANKX metric, but as you can see it always returns "1" (screen shot two).

Rank Jobs = RANKX('NodeInfo', SUM('NodeInfo'[Usage Hours]),,DESC,Dense)

4. I tried this in the Dec-2023 performance Analyzer (screen shot 3) and I think it is doing this for the users (not just the top100) and it returns immediately, but the result confirms that rank is always 1.

Anonymous · ‎01-17-2024

Hi @DoctorYSG ,

The result is always 1 probably because DAX applies the filtering.
Try to change the DAX into:

Rank Jobs = RANKX(ALL'NodeInfo', SUM('NodeInfo'[Usage Hours]),,DESC,Dense)

Best Regards,
Dino Tao

DoctorYSG · ‎01-18-2024

@Anonymous What you said sounded right to me, and this is helping me understand things better. But I am now using the equation below, and it still ranks everything at "1". It does take more compute, so I can no longer look at 100 users, But for 10, it responds in 2 seconds.

Rank Usage = RANKX(ALL('NodeInfo'), SUM('NodeInfo'[Usage Hours]),,DESC,Dense)

amitchandak · ‎01-11-2024

@DoctorYSG , to me seem like you need a measure here. Sum in column means sum of complete table data, unless filters are used

To create the measure below

Application Rank = RANKX(
allselected('NodeInfo'[Application]),
CALCULATE(SUM('Telemetry'[UsageTime]) / (60 * 60 * 1000)),,
DESC,
Dense
)

Not sure of the role of other columns used in calculate

Power BI - New DAX Function: RANK - How It Differs from RANKX: https://youtu.be/TjGkF44VtDo

Share with Power BI Enthusiasts: Full Power BI Video (20 Hours) YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube

DoctorYSG · ‎01-12-2024

The other columns are not used to calculate, and now that it is a measure, they are not relevant. But the Measure is also giving the same resource limit error:
I am placing it in a MATRIX, where the row is the Application, and the value is the AppRank. So I need the filter to include the ApplicationName, I don't need this scaled to hours, so I took that out.

App Rank = RANKX(
    ALLEXCEPT('NodeInfo', NodeInfo[Application]),
    CALCULATE(SUM('Telemetry'[UsageTime])),,DESC,
    Dense
    )

DoctorYSG · ‎01-12-2024

I also tried this, but the same result, even if I filter down to a day (instead of 6 months).

App Rank = RANKX(
    'NodeInfo',
    CALCULATE(SUM('Telemetry'[UsageTime]), KEEPFILTERS('NodeInfo'[Application])),,
    DESC,
    Dense
    )

RANK Categories runs out of memory (need smarter solution)

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

Fabric Data Days starts November 4th!