Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Anonymous
Not applicable

Optimizing DAX calculation for large dataset

Hi,

 

I have a dataset with 4-5M rows. I want to check both the next 5 and the previous 5 rows for a Location to see if, in either case, they all meet a condition ([PlantActive]=0.5). Each row is a reading a minute apart (hence the 5/24/60).

I tested this on a smaller dataset and it works fine, but on my main set, it runs for 15+ minutes with no sign of finishing the calculation. I tried a few different approaches with no success of it running in a timely matter.

Here is what I believe was my best shot at it:

 

VAR Previous5 =
COUNTROWS(
FILTER (Table1,
Table1[Location]="Location1"
&& Table1[Timestamp] <= EARLIER(Table1[Timestamp])
&& Table1[Timestamp]> EARLIER(Table1[Timestamp])-(5/24)/60
&& Table1[PlantActive]=0.5
)
)

VAR Next5 =
COUNTROWS(
FILTER (Table1,
Table1Table1[Location]="Location1"
&& Table1[Timestamp] >= EARLIER(Table1[Timestamp])
&& Table1[Timestamp]< EARLIER(Table1[Timestamp])+(5/24)/60
&& Table1[PlantActive]=0.5
)
)

RETURN
 
MAX(Previous, Next)

Is there anything I'm missing that could be optimized? Or in a large dataset, there is no way of running this faster?

Thank you.
6 REPLIES 6
Anonymous
Not applicable

@Anonymous 

Have you considered to create calculated columns for Previous5 and Next5 instead of using variables. In this way it just calculate the result at the very beginning and save it to the model to prevent long loading time in visual interactions. The downside it this increases the dataset size.


Paul Zheng _ Community Support Team
If this post helps, please Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi, thanks for the idea, I've tried that but with no success. It takes ages to process.

AlB
Community Champion
Community Champion

@Anonymous 

I'll look at the explanation in detail later when i get some time but for the time being, if you can share just the table some mock data where i can see the result with your code and with mine... that will help spot why they are different (clearly I am doing something wrong)

 

SU18_powerbi_badge

Please accept the solution when done and consider giving a thumbs up if posts are helpful. 

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

 

Anonymous
Not applicable

Sure, here is a sample that I tweaked a bit. I excluded other locations, but that would need to be included in the filter. The goal is to flag/select between 14:00 and 14:18 (with 5 for example) because it reads at 0.5 for longer than 5 minutes, and exclude 14:21-14:23 because it is shorter. 1s and 0's don't need any flagging.


Here is a link: https://docs.google.com/spreadsheets/d/11WtiHi0iGQpyJou4YTwxguR9JgJcf8xyu9hBvuTIcqE/edit?usp=sharing 

(Sorry, the website is not letting me post this as a table)

 

Location PlantActive Timestamp

Location1 0.0 16/04/2021 13:56
Location1 1.0 16/04/2021 13:57
Location1 1.0 16/04/2021 13:58
Location1 1.0 16/04/2021 13:59
Location1 0.5 16/04/2021 14:00
Location1 0.5 16/04/2021 14:01
Location1 0.5 16/04/2021 14:02
Location1 0.5 16/04/2021 14:03
Location1 0.5 16/04/2021 14:04
Location1 0.5 16/04/2021 14:05
Location1 0.5 16/04/2021 14:06
Location1 0.5 16/04/2021 14:07
Location1 0.5 16/04/2021 14:08
Location1 0.5 16/04/2021 14:09
Location1 0.5 16/04/2021 14:10
Location1 0.5 16/04/2021 14:11
Location1 0.5 16/04/2021 14:12
Location1 0.5 16/04/2021 14:13
Location1 0.5 16/04/2021 14:14
Location1 0.5 16/04/2021 14:15
Location1 0.5 16/04/2021 14:16
Location1 0.5 16/04/2021 14:17
Location1 0.5 16/04/2021 14:18
Location1 0.0 16/04/2021 14:19
Location1 0.0 16/04/2021 14:20
Location1 0.5 16/04/2021 14:21
Location1 0.5 16/04/2021 14:22
Location1 0.5 16/04/2021 14:23
Location1 0.0 16/04/2021 14:24
Location1 0.0 16/04/2021 14:25
Location1 0.0 16/04/2021 14:26
Location1 0.0 16/04/2021 14:27
Location1 0.0 16/04/2021 14:28

AlB
Community Champion
Community Champion

Hi @Anonymous 

Can you share the dataset by any chance?

Two questions:

A) What are you ultimately trying to achieve? Perhaps we can try a dfferent  approach that is less resource-intensive?

B) What is the more restrictive constraint (the one that will filter out more rows):  [PlantActive]=0.5, [Location]="Location1" or the ones related to the TimeStamps? 

In the meantime, gove this a try:

 

CalcCol =
CALCULATE (
    VAR Previous5 =
        COUNTROWS (
            FILTER (
                DISTINCT ( Table1[Timestamp] ),
                VAR current_ = Table1[Timestamp]
                RETURN
                    Table1[Timestamp] <= current_
                        && Table1[Timestamp] > ( current_ - ( ( 5 / 24 ) / 60 ) )
            )
        )
    VAR Next5 =
        COUNTROWS (
            FILTER (
                DISTINCT ( Table1[Timestamp] ),
                VAR current_ = Table1[Timestamp]
                RETURN
                    Table1[Timestamp] >= current_
                        && Table1[Timestamp] < ( current_ - ( ( 5 / 24 ) / 60 ) )
            )
        )
    RETURN
        MAX ( Previous, Next ),
    Table1[Location] = "Location1",
    Table1[PlantActive] = 0.5
)

 

SU18_powerbi_badge

Please accept the solution when done and consider giving a thumbs up if posts are helpful. 

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

 

Anonymous
Not applicable

Thank you, I tried implementing that. It was a LOT faster, but it is returning only the current row, as the count results are either 1 (when PlantActive = 0.5) or otherwise blank. It is not getting the next 5 or the previous 5 rows in the filter. Do you know why that is?

 

I'll see about sharing the data, if that is possible I'll send you a private message.

 

As for the questions here it is:

 

A) What are you ultimately trying to achieve? Perhaps we can try a different approach that is less resource-intensive?

 

It is quite an explanation, but I'm happy to through if you have the time. In this instance, we have two sensors reading and logging once every minute. One is the main sensor and the other is the secondary sensor. If the main sensor is above a threshold, the equipment is active (PlantActive = 1), if it is bellow but the secondary sensor is active the equipment is at half capacity (0.5), otherwise it is completely off (0).

There is however a delay between both sensors that can vary (4-8 minutes), I've offset the second sensor by 6 minutes to accommodate some of that, but there can still be a couple of minutes every time the main sensor goes below the threshold and the second sensor goes down too soon after (so not a real half capacity). What I'm trying to do is filter out those couple of minutes that are "alone" and are a result of a difference in the offset (they make up at least about 30-40% of the time the equipment reads at half capacity in the current setup).

 

In other words. if the token PlantActive is at 0.5 for more than 5 minutes uninterrupted (neither going to 0 or 1), that certainly means that the equipment main sensor went down but the secondary sensor continued, and therefore it is indeed at half capacity. I use the result to compute the efficiency of the equipment over time.


B) What is the more restrictive constraint (the one that will filter out more rows):  [PlantActive]=0.5, [Location]="Location1" or the ones related to the TimeStamps? 


Timestamp, I believe. There are about 12 different locations, each has a timestamp every minute for the last 2 years. That is where the 4-5M rows come from. Only one of the locations has a half-capacity setup that I'm implementing, however. So that would also filter out a lot of the rows.

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors