Solved: Consecutive days count in huge dataset

RomuloLima · ‎06-25-2025

Hello everyone.

I'm currently working with a huge dataset with 3 important columns: [serial] - the identification of a machine | [status] - a status code generated by a machine | [timestamp] - the date a machine generated the status that look like this:

Serial	Status	Timestamp
1	A	01/06/2025
1	B	01/06/2025
2	A	01/06/2025
1	A	02/06/2025
2	B	02/06/2025

a machine can output several (different) status a day

Then I created a calculated table with specific serials and specific status, now I need a calculeted column to tell how many consecutive days a machine outputed the status. In the case of the example above the new table should look like this:

Serial	Status	Last occurrence	Consecutive days
1	A	02/06/2025	2
2	A	01/06/2025	1

For the [Last occurrence] I managed to make it propperly.

The problem is that I can't create [Consecutive days] because the dataset I'm working with is too huge. I always end up with the errors "the resultset of a external query is over 1M" error message or "OLE or ODBC error".

Need optimized ways to get this calculation done.

v-veshwara-msft · ‎07-01-2025

Hi @RomuloLima ,
Thanks for using Microsoft Fabric Community to post your query.

Calculated columns on large datasets in DirectQuery mode can often lead to performance issues or limits being hit. Specifically, when the logic involves row-by-row evaluations across millions of records, you might encounter errors such as:

"The resultset of a query to external data source has exceeded the maximum allowed size of '1000000' rows"

These occur because Analysis Services enforces a default limit of 1 million rows per intermediate query result in DirectQuery mode.

Reference: https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-use-directquery#1-million-row-limit

While it's technically possible to increase this threshold in SSAS or Premium capacities by modifying the MaxIntermediateRowsetSize setting (in msmdsrv.ini), it’s important to note:

Even if you raise this threshold, large DirectQuery result sets can still lead to memory and performance issues. So it's usually better to redesign the logic to avoid returning such large rowsets in the first place.

To avoid these limitations and improve scalability, as mentioned by @Ritaf1983:

--Pre-aggregate data using Power Query (Remove Duplicates, Group By, etc.) during the load step.

--Create Import-mode aggregation tables for key columns like [Serial], [Status], or [Timestamp] and configure them to serve DirectQuery models.

Also as suggested by @Ritaf1983 to use measures, I recreated the scenario using a sample table with [Serial], [Status], and [Timestamp] values as below.

DAX Measures used:

Last Occurrence = 
CALCULATE (
    MAX ( 'Table'[Timestamp] ),
    FILTER (
        'Table',
        'Table'[Serial] = SELECTEDVALUE ( SerialStatus[Serial] ) &&
        'Table'[Status] = SELECTEDVALUE ( SerialStatus[Status] )
    )
)

Consecutive Days = 
VAR SelectedSerial = SELECTEDVALUE ( SerialStatus[Serial] )
VAR SelectedStatus = SELECTEDVALUE ( SerialStatus[Status] )
VAR DatesTable =
    CALCULATETABLE (
        VALUES ( 'Table'[Timestamp] ),
        'Table'[Serial] = SelectedSerial,
        'Table'[Status] = SelectedStatus
    )
VAR SortedDates =
    ADDCOLUMNS (
        DatesTable,
        "Index", RANKX ( DatesTable, 'Table'[Timestamp], , ASC ),
        "GroupKey", 'Table'[Timestamp] - RANKX ( DatesTable, 'Table'[Timestamp], , ASC )
    )
VAR Grouped =
    GROUPBY (
        SortedDates,
        [GroupKey],
        "GroupCount", COUNTX ( CURRENTGROUP (), [Timestamp] ),
        "MaxDate", MAXX ( CURRENTGROUP (), [Timestamp] )
    )
VAR LatestDate = [Last Occurrence]
VAR Result =
    MAXX (
        FILTER ( Grouped, [MaxDate] = LatestDate ),
        [GroupCount]
    )
RETURN Result

Output(Table Visual):

Hope this helps. Please reach out for further assistance.

Please consider marking the helpful reply as Accepted Solution to assist others with similar issues.

Thank you.
Please find the attached .pbix for reference.

View solution in original post

v-veshwara-msft · ‎07-14-2025

Hi @RomuloLima ,

Following up to see if your query is resolved and if any of the responses helped.
If you still need assistance, feel free to reach out.

Thank you.

v-veshwara-msft · ‎07-09-2025

Hi @RomuloLima ,
We wanted to kindly follow up regarding your query. If you need any further assistance, please reach out.
Thank you.

v-veshwara-msft · ‎07-01-2025

Hi @RomuloLima ,
Thanks for using Microsoft Fabric Community to post your query.

Calculated columns on large datasets in DirectQuery mode can often lead to performance issues or limits being hit. Specifically, when the logic involves row-by-row evaluations across millions of records, you might encounter errors such as:

"The resultset of a query to external data source has exceeded the maximum allowed size of '1000000' rows"

These occur because Analysis Services enforces a default limit of 1 million rows per intermediate query result in DirectQuery mode.

Reference: https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-use-directquery#1-million-row-limit

While it's technically possible to increase this threshold in SSAS or Premium capacities by modifying the MaxIntermediateRowsetSize setting (in msmdsrv.ini), it’s important to note:

Even if you raise this threshold, large DirectQuery result sets can still lead to memory and performance issues. So it's usually better to redesign the logic to avoid returning such large rowsets in the first place.

To avoid these limitations and improve scalability, as mentioned by @Ritaf1983:

--Pre-aggregate data using Power Query (Remove Duplicates, Group By, etc.) during the load step.

--Create Import-mode aggregation tables for key columns like [Serial], [Status], or [Timestamp] and configure them to serve DirectQuery models.

Also as suggested by @Ritaf1983 to use measures, I recreated the scenario using a sample table with [Serial], [Status], and [Timestamp] values as below.

DAX Measures used:

Last Occurrence = 
CALCULATE (
    MAX ( 'Table'[Timestamp] ),
    FILTER (
        'Table',
        'Table'[Serial] = SELECTEDVALUE ( SerialStatus[Serial] ) &&
        'Table'[Status] = SELECTEDVALUE ( SerialStatus[Status] )
    )
)

Consecutive Days = 
VAR SelectedSerial = SELECTEDVALUE ( SerialStatus[Serial] )
VAR SelectedStatus = SELECTEDVALUE ( SerialStatus[Status] )
VAR DatesTable =
    CALCULATETABLE (
        VALUES ( 'Table'[Timestamp] ),
        'Table'[Serial] = SelectedSerial,
        'Table'[Status] = SelectedStatus
    )
VAR SortedDates =
    ADDCOLUMNS (
        DatesTable,
        "Index", RANKX ( DatesTable, 'Table'[Timestamp], , ASC ),
        "GroupKey", 'Table'[Timestamp] - RANKX ( DatesTable, 'Table'[Timestamp], , ASC )
    )
VAR Grouped =
    GROUPBY (
        SortedDates,
        [GroupKey],
        "GroupCount", COUNTX ( CURRENTGROUP (), [Timestamp] ),
        "MaxDate", MAXX ( CURRENTGROUP (), [Timestamp] )
    )
VAR LatestDate = [Last Occurrence]
VAR Result =
    MAXX (
        FILTER ( Grouped, [MaxDate] = LatestDate ),
        [GroupCount]
    )
RETURN Result

Output(Table Visual):

Hope this helps. Please reach out for further assistance.

Please consider marking the helpful reply as Accepted Solution to assist others with similar issues.

Thank you.
Please find the attached .pbix for reference.

v-veshwara-msft · ‎07-06-2025

Hi @RomuloLima ,
Just wanted to check if your query is resolved and if the response provided was helpful. If further assistance is needed, please reach out.
Thank you.

Ritaf1983 · ‎06-25-2025

Hi @RomuloLima

It’s not entirely clear why you need a calculated table for this.

If your goal is simply to display a result like the one shown in your screenshot (showing the last occurrence and count of consecutive days), you could use measures like:

MAX([Timestamp])
COUNTROWS(...)

These are very efficient and should work well even on large datasets.

However, if you’re trying to build an aggregated table for further logic and the calculated table fails due to performance/resource issues, here are some better alternatives:

1. Pre-aggregate the table at the data source (e.g., SQL) and load only the summary to Power BI.

2. Use a native SQL query during data import in Power BI to push the logic to the source system and reduce the volume upfront.
Docs: Import data using native database query

3. Apply aggregations using the Group By feature in Power Query (PQ)
Docs: Group by in Power Query

4. Use Power BI’s aggregate table capabilities
Docs: Use aggregate tables in Power BI

If this post helps, then please consider Accepting it as the solution to help the other members find it more quickly.

Regards,
Rita Fainshtein | Microsoft MVP
https://www.linkedin.com/in/rita-fainshtein/
Blog : https://www.madeiradata.com/profile/ritaf/profile

Consecutive days count in huge dataset

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly

Join us at FabCon Vienna from September 15-18, 2025

Consecutive days count in huge dataset

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

How to Get Your Question Answered Quickly