Out of Memory Error in Dataflow Gen2 When Reading ...

rubayatyasmin · ‎10-27-2025

I am encountering an Out of Memory error while refreshing a Dataflow Gen2 that simply reads a Parquet file. The error message is as follows:

MashupException.Error: DataFormat.Error: Parquet.Document: class parquet::ParquetStatusException (message: 'Out of memory: realloc of size 4194304 failed')

This issue started occurring recently after an internal outage two weeks ago. Prior to that, the dataflow worked fine for 7-8 months without any issues.

Details:

File Size: The Parquet file size is around 1MB per week, so it's relatively small.
Fabric Capacity: We are using F64 capacity, and I’ve verified that it’s not hitting the limits during the refresh.
Dataflow: The original dataflow is not CI/CD-based; it's just a Dataflow Gen2 reading a Parquet file without any transformations.
Staging: I have staging disabled, and still encountering the same error.
Issue Timeline: The issue began after an internal outage two weeks ago, during which the refreshes failed. Prior to that, it had been working fine for several months.

What I Have Tried:

Created a New Dataflow: I created a new Dataflow Gen2 to test if the issue persists, but I’m still encountering the same memory error.
Verified Capacity: I checked the capacity metrics, and F64 should be sufficient, but I still see the error.
Disabled Staging: Staging was already disabled, so no changes were made there.
Checked File Size: The Parquet file size is small (1MB per week), so this shouldn't be a large data issue.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

v-hashadapu · ‎10-28-2025

Hi @rubayatyasmin , Thank you for reaching out to the Microsoft Community Forum.

@tayloramy correctly suggested to open a Fabric notebook and try reading the same Parquet file directly using: df = spark.read.parquet("/lakehouse/Files/yourpath/yourfile.parquet") df.limit(1).show()

If that fails, it does confirm the Parquet file itself is likely corrupted, which can trigger the “Out of memory: realloc failed” error even for small files. This test helps isolate whether the problem lies in the file or in the Dataflow Gen2 service.

Might I add that if your dataflow reads a Parquet file stored in OneLake (as most Fabric Dataflows do), the on-premises gateway won’t be involved at all. The read operation happens entirely within Fabric’s managed compute, so gateway memory isn’t a factor. Given that the problem started after an internal outage, it’s possible that the Parquet file or its metadata footer became partially corrupted. If the notebook test fails, regenerate or re-export that file; if it succeeds but the Dataflow still errors, open a Microsoft Support ticket sharing them all the details.

For additional confirmation, you can delete and re-upload the file to OneLake, then re-run the refresh. These steps usually clear any residual corruption from interrupted service operations.

Below is the link to help create Microsoft Support ticket:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

Please check below doc for more information:

Differences between Dataflow Gen1 and Dataflow Gen2 - Microsoft Fabric | Microsoft Learn

Ingest Data with Dataflows in Microsoft Fabric - Training | Microsoft Learn

Dataflow Gen2 data destinations and managed settings - Microsoft Fabric | Microsoft Learn

How to configure Parquet format in the pipeline of Data Factory in Microsoft Fabric - Microsoft Fabr...

Load data into your lakehouse with a notebook - Microsoft Fabric | Microsoft Learn

Thank you @tayloramy for your valuable response.

rubayatyasmin · ‎10-30-2025

Hi, I created support ticket already, but it's not that helpful so far.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

v-hashadapu · ‎10-30-2025

Hi @rubayatyasmin , I’m sorry to hear that. Although I’m not a member of the PG team, I am confident that they are working diligently to resolve it as soon as possible. Your patience and understanding are greatly appreciated. Please share any updates or insights you receive here, as this information could be helpful to others experiencing similar challenges.

Thank you.

tayloramy · ‎10-27-2025

Hi @rubayatyasmin,

What's your data source? Are you using an on prem gateway at all?
I've seen gateway memory errors bubble up in unexpected ways before, if you're using a gateway can you check on the memory usage on the gateway server?

Are you able to read from the specific parquet file that's causing the issue? In a new notebook try

df = spark.read.parquet("/lakehouse/Files/yourpath/yourfile.parquet")
df.limit(1).show()

If this crashes, then the parquet file is likely bad and needs to be regenerated.

If you found this helpful, cosnider giving Kudos. If I solved your problem or answered your question, mark this post as a solution.

rubayatyasmin · ‎10-30-2025

Hi,

I’m not using an on-premises gateway. My files are stored in a Lakehouse. I considered using incremental load, but since my query isn’t folding, that option is ruled out for now.

All files are fine — when I load smaller chunks, the dataflow refreshes successfully. Currently, I have 39 files, and if I select any 36 of them, the dataflow works without issue. However, selecting more than that causes an out-of-memory error.

However, I am trying to use notebook and pipeline to load the tables. Failing so far. Need some more RnD

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!