Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
RomuloLima
New Member

Consecutive days count in huge dataset

Hello everyone.


I'm currently working with a huge dataset with 3 important columns: [serial] - the identification of a machine | [status] - a status code generated by a machine | [timestamp] - the date a machine generated the status that look like this:

SerialStatusTimestamp
1A01/06/2025
1B01/06/2025
2A01/06/2025
1A02/06/2025
2B02/06/2025

a machine can output several (different) status a day


Then I created a calculated table with specific serials and specific status, now I need a calculeted column to tell how many consecutive days a machine outputed the status. In the case of the example above the new table should look like this:

SerialStatusLast occurrenceConsecutive days
1A02/06/20252
2A01/06/20251

 

For the [Last occurrence] I managed to make it propperly.

 

The problem is that I can't create [Consecutive days] because the dataset I'm working with is too huge. I always end up with the errors "the resultset of a external query is over 1M" error message or "OLE or ODBC error".

 

Need optimized ways to get this calculation done.

1 ACCEPTED SOLUTION
v-veshwara-msft
Community Support
Community Support

Hi @RomuloLima ,
Thanks for using Microsoft Fabric Community to post your query.

 

Calculated columns on large datasets in DirectQuery mode can often lead to performance issues or limits being hit. Specifically, when the logic involves row-by-row evaluations across millions of records, you might encounter errors such as:

"The resultset of a query to external data source has exceeded the maximum allowed size of '1000000' rows"

These occur because Analysis Services  enforces a default limit of 1 million rows per intermediate query result in DirectQuery mode.

Reference: https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-use-directquery#1-million-row-limit 

 

While it's technically possible to increase this threshold in SSAS or Premium capacities by modifying the MaxIntermediateRowsetSize setting (in msmdsrv.ini), it’s important to note:

Even if you raise this threshold, large DirectQuery result sets can still lead to memory and performance issues. So it's usually better to redesign the logic to avoid returning such large rowsets in the first place.


To avoid these limitations and improve scalability, as mentioned by @Ritaf1983:

--Pre-aggregate data using Power Query (Remove Duplicates, Group By, etc.) during the load step.

--Create Import-mode aggregation tables for key columns like [Serial], [Status], or [Timestamp] and configure them to serve DirectQuery models.


Also as suggested by @Ritaf1983 to use measures, I recreated the scenario using a sample table with [Serial], [Status], and [Timestamp] values as below.

 

vveshwaramsft_0-1751361526488.png

DAX Measures used:

Last Occurrence = 
CALCULATE (
    MAX ( 'Table'[Timestamp] ),
    FILTER (
        'Table',
        'Table'[Serial] = SELECTEDVALUE ( SerialStatus[Serial] ) &&
        'Table'[Status] = SELECTEDVALUE ( SerialStatus[Status] )
    )
)

 

Consecutive Days = 
VAR SelectedSerial = SELECTEDVALUE ( SerialStatus[Serial] )
VAR SelectedStatus = SELECTEDVALUE ( SerialStatus[Status] )
VAR DatesTable =
    CALCULATETABLE (
        VALUES ( 'Table'[Timestamp] ),
        'Table'[Serial] = SelectedSerial,
        'Table'[Status] = SelectedStatus
    )
VAR SortedDates =
    ADDCOLUMNS (
        DatesTable,
        "Index", RANKX ( DatesTable, 'Table'[Timestamp], , ASC ),
        "GroupKey", 'Table'[Timestamp] - RANKX ( DatesTable, 'Table'[Timestamp], , ASC )
    )
VAR Grouped =
    GROUPBY (
        SortedDates,
        [GroupKey],
        "GroupCount", COUNTX ( CURRENTGROUP (), [Timestamp] ),
        "MaxDate", MAXX ( CURRENTGROUP (), [Timestamp] )
    )
VAR LatestDate = [Last Occurrence]
VAR Result =
    MAXX (
        FILTER ( Grouped, [MaxDate] = LatestDate ),
        [GroupCount]
    )
RETURN Result

 

Output(Table Visual):

vveshwaramsft_1-1751361669653.png

 

Hope this helps. Please reach out for further assistance.

Please consider marking the helpful reply as Accepted Solution to assist others with similar issues.

Thank you.
Please find the attached .pbix for reference.

 

 

View solution in original post

5 REPLIES 5
v-veshwara-msft
Community Support
Community Support

Hi @RomuloLima ,

Following up to see if your query is resolved and if any of the responses helped.
If you still need assistance, feel free to reach out.

Thank you.

v-veshwara-msft
Community Support
Community Support

Hi @RomuloLima ,
We wanted to kindly follow up regarding your query. If you need any further assistance, please reach out.
Thank you.

v-veshwara-msft
Community Support
Community Support

Hi @RomuloLima ,
Thanks for using Microsoft Fabric Community to post your query.

 

Calculated columns on large datasets in DirectQuery mode can often lead to performance issues or limits being hit. Specifically, when the logic involves row-by-row evaluations across millions of records, you might encounter errors such as:

"The resultset of a query to external data source has exceeded the maximum allowed size of '1000000' rows"

These occur because Analysis Services  enforces a default limit of 1 million rows per intermediate query result in DirectQuery mode.

Reference: https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-use-directquery#1-million-row-limit 

 

While it's technically possible to increase this threshold in SSAS or Premium capacities by modifying the MaxIntermediateRowsetSize setting (in msmdsrv.ini), it’s important to note:

Even if you raise this threshold, large DirectQuery result sets can still lead to memory and performance issues. So it's usually better to redesign the logic to avoid returning such large rowsets in the first place.


To avoid these limitations and improve scalability, as mentioned by @Ritaf1983:

--Pre-aggregate data using Power Query (Remove Duplicates, Group By, etc.) during the load step.

--Create Import-mode aggregation tables for key columns like [Serial], [Status], or [Timestamp] and configure them to serve DirectQuery models.


Also as suggested by @Ritaf1983 to use measures, I recreated the scenario using a sample table with [Serial], [Status], and [Timestamp] values as below.

 

vveshwaramsft_0-1751361526488.png

DAX Measures used:

Last Occurrence = 
CALCULATE (
    MAX ( 'Table'[Timestamp] ),
    FILTER (
        'Table',
        'Table'[Serial] = SELECTEDVALUE ( SerialStatus[Serial] ) &&
        'Table'[Status] = SELECTEDVALUE ( SerialStatus[Status] )
    )
)

 

Consecutive Days = 
VAR SelectedSerial = SELECTEDVALUE ( SerialStatus[Serial] )
VAR SelectedStatus = SELECTEDVALUE ( SerialStatus[Status] )
VAR DatesTable =
    CALCULATETABLE (
        VALUES ( 'Table'[Timestamp] ),
        'Table'[Serial] = SelectedSerial,
        'Table'[Status] = SelectedStatus
    )
VAR SortedDates =
    ADDCOLUMNS (
        DatesTable,
        "Index", RANKX ( DatesTable, 'Table'[Timestamp], , ASC ),
        "GroupKey", 'Table'[Timestamp] - RANKX ( DatesTable, 'Table'[Timestamp], , ASC )
    )
VAR Grouped =
    GROUPBY (
        SortedDates,
        [GroupKey],
        "GroupCount", COUNTX ( CURRENTGROUP (), [Timestamp] ),
        "MaxDate", MAXX ( CURRENTGROUP (), [Timestamp] )
    )
VAR LatestDate = [Last Occurrence]
VAR Result =
    MAXX (
        FILTER ( Grouped, [MaxDate] = LatestDate ),
        [GroupCount]
    )
RETURN Result

 

Output(Table Visual):

vveshwaramsft_1-1751361669653.png

 

Hope this helps. Please reach out for further assistance.

Please consider marking the helpful reply as Accepted Solution to assist others with similar issues.

Thank you.
Please find the attached .pbix for reference.

 

 

Hi @RomuloLima ,
Just wanted to check if your query is resolved and if the response provided was helpful. If further assistance is needed, please reach out.
Thank you.

Ritaf1983
Super User
Super User

Hi @RomuloLima 

It’s not entirely clear why you need a calculated table for this.

If your goal is simply to display a result like the one shown in your screenshot (showing the last occurrence and count of consecutive days), you could use measures like:

  • MAX([Timestamp])

  • COUNTROWS(...)

These are very efficient and should work well even on large datasets.

However, if you’re trying to build an aggregated table for further logic and the calculated table fails due to performance/resource issues, here are some better alternatives:

1. Pre-aggregate the table at the data source (e.g., SQL) and load only the summary to Power BI.

2. Use a native SQL query during data import in Power BI to push the logic to the source system and reduce the volume upfront.
Docs:  Import data using native database query

3. Apply aggregations using the Group By feature in Power Query (PQ)
Docs:  Group by in Power Query

4. Use Power BI’s aggregate table capabilities
Docs: Use aggregate tables in Power BI

If this post helps, then please consider Accepting it as the solution to help the other members find it more quickly.

Regards,
Rita Fainshtein | Microsoft MVP
https://www.linkedin.com/in/rita-fainshtein/
Blog : https://www.madeiradata.com/profile/ritaf/profile

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.