Solved: Re: Same PBIX with databricks source runs out memo...

Mixednuts · ‎03-31-2025

Hi,

We have a 223mb PBIX, which when published to a PRO workspace and is refreshed online, returns the following error:

Resource Governing: This operation was canceled because there wasn't enough memory to finish running it. Either reduce the memory footprint of your dataset by doing things such as limiting the amount of imported data, or if using Power BI Premium, increase the memory of the Premium capacity where this dataset is hosted. More details: consumed memory 15770 MB, memory limit 15770 MB, database size before command execution 613 MB. See https://go.microsoft.com/fwlink/?linkid=2159753 to learn more.

When the exact same PBIX is published to a PPU workspace, refreshes online successfully and takes approximately 15 minutes to run. It occupies approximately 192mb in this workspace.

When refreshed on the desktop, it completes successfully in approx 20 minutes.

The credentials in all instances are the same. The data source is only databricks and sharepoint, nothing from on-prem so a gateway is not involved.

The PRO workspace version was refreshing successfully for some weeks until about 5 days ago when it began began to fail with the error above. Successful refreshes would take approx 40 minutes to complete. Now they fail after about 2 hours.

I have tried deleting the dataset from the PRO workspace and republishing it, but the error still occurs.

Any suggestions on what to try next?

Thanks!

nilendraFabric · ‎03-31-2025

hi @Mixednuts

In a Pro workspace, datasets are limited to 1GB in size, but the actual memory required during refresh operations can be significantly higher than the final dataset size. When refreshing a dataset, Power BI needs to:

Unzip the existing dataset into memory
Upload and process new data in memory
Create the refreshed dataset

This process can require almost double the memory needed to store the dataset itself, which explains why your 223MB PBIX is hitting memory limits during refresh

look like Your data processing is memory-intensive, possibly due to complex transformations

try :

Remove unnecessary columns and rows to reduce the dataset size
Minimize calculated columns and set appropriate data types
Move transformations to the data source (Databricks) rather than performing them in Power BI

Optimize Your Databricks Source:

Use Delta Tables in Databricks which are optimized for performance
Run OPTIMIZE commands on your Delta Tables regularly
Consider using SQL Warehouses instead of All-Purpose clusters for better BI workload performance

Worth looking at

Implement Power BI's Automatic Aggregations feature with Azure Databricks to improve performance on large datasetshttps://techcommunity.microsoft.com/blog/analyticsonazure/boosting-power-bi-performance-with-azure-d...

View solution in original post

Mixednuts · ‎04-07-2025

Thank you @v-hashadapu and @nilendraFabric for your thoughts and suggestions.

While my focus had been on the Power Query processing (data extraction and transformation), I finally found the cause was outside of Power Query but in the model itself.

Typically, we would use calculated columns to calculate values which required queries covering more than one of the model's tables. In this instance, we had a number of required columns which were determined by summarising related rows in related tables in temporary table variables in DAX.

The processing time for these were not evident in any of the traces we ran, however when refreshing on the desktop, it accounts for the long delay between ingestion row count completing and when the refresh actually finished.

In a test in which I removed the calculated columns from the model, the refresh time dropped from 40 minutes when last it ran, to only 3 minutes!

I am now modifying the model to perform the calculations in the databricks engine. This is made possible by previous work to move the power query transformations back into databricks Materialised Views.

I would like to thank you both for your helpful insights, all of which contributed significantly to the final solution. However as I can only mark one reply as a solution, I will in this case refer to that of @nilendraFabric as this reply suggested the issue might be the calculated columns.

Thank you both for your help. It is appreciated.

Mixednuts · ‎04-02-2025

Further to this, I thought to adjust the settings on my desktop to try and simulate the memory restrictions experienced in the shared service. My hope was to further isolate the particular query which was the culprit.

To this end I set the memory limit thus:

Desktop would not let me drop the number of simultaneous evaluations below 2. I also had the parallel processing both turned off and turned on in separate executions.

In all of these, the refresh completed successfully.

It is possible I have misunderstood the way these setting operate, however I find it strange that in none of these tests did it report a memory threshold breach (particularly one at 16gb)

Thanks,

v-hashadapu · ‎04-03-2025

Hi @Mixednuts , Thank you for reaching out to the Microsoft Community Forum.

Your tests and the Vertipaq Analyzer’s 69.63MB dictionary size point to Databricks query inefficiencies and PRO constraints. Since PPU and incremental refresh aren’t options, Optimize the Databricks query. In Power BI Desktop, use Performance Analyzer (View > Performance Analyzer) to find slow queries. Ensure query folding isn’t broken (right-click a step > View Native Query; if disabled, rewrite in Databricks as a simple SELECT materialized view).

Reduce the query load by fetching smaller chunks: SELECT * FROM table WHERE Date >= '2024-01-01' AND Date < '2024-02-01' UNION SELECT * FROM table WHERE Date >= '2024-02-01' AND Date < '2024-03-01'. This keeps memory below PRO’s 15.77GB limit. Enhance Databricks with Delta Tables: OPTIMIZE table_name ZORDER BY (join_column). This reduces memory overhead. Test in another PRO workspace to rule out contention. If it fails, contact Microsoft support about recent PRO memory limit changes.

If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.

nilendraFabric · ‎03-31-2025

hi @Mixednuts

In a Pro workspace, datasets are limited to 1GB in size, but the actual memory required during refresh operations can be significantly higher than the final dataset size. When refreshing a dataset, Power BI needs to:

Unzip the existing dataset into memory
Upload and process new data in memory
Create the refreshed dataset

This process can require almost double the memory needed to store the dataset itself, which explains why your 223MB PBIX is hitting memory limits during refresh

look like Your data processing is memory-intensive, possibly due to complex transformations

try :

Remove unnecessary columns and rows to reduce the dataset size
Minimize calculated columns and set appropriate data types
Move transformations to the data source (Databricks) rather than performing them in Power BI

Optimize Your Databricks Source:

Use Delta Tables in Databricks which are optimized for performance
Run OPTIMIZE commands on your Delta Tables regularly
Consider using SQL Warehouses instead of All-Purpose clusters for better BI workload performance

Worth looking at

Implement Power BI's Automatic Aggregations feature with Azure Databricks to improve performance on large datasetshttps://techcommunity.microsoft.com/blog/analyticsonazure/boosting-power-bi-performance-with-azure-d...

Mixednuts · ‎04-07-2025

Thank you @v-hashadapu and @nilendraFabric

It seems my original reply was lost, however we have identified the cause.

On one of the tables, the model requires additional columns which are calculated from summarising a number of other model tables. It seems these calculated columns were consuming all the remaining memory post ingestion of the data through Power Query.

Removing these columns allowed the refresh to complete. Originally taking over 40 minutes, it took only 3 minutes to refresh after removing those columns.

The problem was not in the Power Queries as we first thought, but in the post ingestion processing of the model.

Because we had pushed all the original table transformations from Power Query into databricks Materialised Views, we are now able to push the calculated columns down into the databricks engine as well, so we are currently refactoring that code.

Thank you both for your assistance. Your advice has been very helpful.

Mixednuts · ‎03-31-2025

Thanks @nilendraFabric. Your detailed response is very much appreciated.

There are a couple of concerns which I feel cannot be explained by your observations, though.

1. The memory threshold being breached by a query is reported as approx 16gig, however the size of the model at the time was only 613mb in memory. This is a 2654% increase. At 613mb we are still well below the 1gig limit I would have thought.

2. All the queries have already been submerged into materialised views in Databricks and so they are just simple selects. Databricks Query history is showing little to no processing time, mainly client fetch time.

3. The PPU and PRO versions are from the exact same PBIX, yet the the PPU version suffers no such effects and completes in 15 minutes. I will try to run a trace on the refresh under PPU to see if it offers any insights.

4. The PRO version has been refreshing successfully for months with a duration of approx 40 minutes each up until the middle of last week.

I ran the DAX Studio Vertipaq Analyser on the model and even though there may be opportunities to reduce size, it is not one of our biggest models:

Thank you again for your thoughts.

v-hashadapu · ‎04-01-2025

Hi @Mixednuts , Thank you for reaching out to the Microsoft Community Forum.

The Vertipaq Analyzer’s large dictionary size and the sudden failure after months suggest high cardinality, PRO constraints, and a data volume increase. In Power BI Desktop, use DAX Studio’s View Metrics to find high cardinality columns. Remove or aggregate them e.g., round timestamps to seconds or exclude unique IDs. In Power BI Desktop, go to File -> Options and Settings -> Options -> Data Load and uncheck “Enable parallel loading of tables.” This prevents memory spikes during refresh.

Configure incremental refresh for your largest tables using a date column. In Power Query, apply a filter like: Table.SelectRows(TableName, each [Date] >= RangeStart and [Date] < RangeEnd). Enable incremental refresh in the table properties to reduce memory load. Schedule the PRO refresh during off-peak hours to avoid contention. In the Power BI Service, set the refresh for a low-traffic time, to ensure more memory is available.

If PRO still fails, switch to the PPU workspace. Since PPU refreshes successfully (15 minutes, 25GB limit), delete the dataset from PRO, republish to PPU, and set the same refresh schedule.

If this helped solve the issue, please consider marking it 'Accept as Solution' so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.

Mixednuts · ‎04-01-2025

Thank you @v-hashadapu . Appreciate the feedback.

I am still working through your suggestions.

I ran a refresh trace on the PPU version using the method described here: Visualise your Power BI Refresh - Phil Seamark on DAX

It suggested one or two candidates for refactoring based on processing times. Unfortunately it doesn't reference memory usage at all.

There has not been a substantial increase in data in recent times, so it is hard to attribute that to the issue. It is possible that even a small increase could have pushed the model over the memory theshold.

At your suggestion, I turned off parallel processing, however the memory issue persists, suggesting it is a single query causing the problem. I will look to refactoring this.

Unfortunately, we are limited in our use of PPU so simply migrating to PPU is not an option for us. We have much larger data models (sourcing from SQLServer) and do not have the same issue. We are adding databricks to our platform architecture and are concerned that we should understand the impacts on our semantic models.

We also have a version of the same model drawing data from Azure SQLserver and it does not suffer from the same issue. It refreshes successfully several times per day.

Partitioning and incremental refreshing is interesting, but I don't think our source data would support it. Something to consider in the future.

Really appreciate your input. Thank you.

Same PBIX with databricks source runs out memory in a PRO workspace, but runs in 15 minutes in PPU

Helpful resources

Power BI Monthly Update - September 2025

Fabric Community Update - August 2025

FabCon is coming to Atlanta

Same PBIX with databricks source runs out memory in a PRO workspace, but runs in 15 minutes in PPU

Helpful resources

Power BI Monthly Update - September 2025

Fabric Community Update - August 2025