Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreWe've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
I have a dataflow that last worked on 9/17. According to the refresh history it processed 8M rows. Yesterday I tried to run the same dataflow and received this error. Nothing has changed with the dataflow.
Append: Error Code: Mashup Exception Data Format Error, Error Details: Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: DataFormat.Error: Failed to insert a table., Underlying error: Parquet: class parquet::ParquetStatusException (message: 'Out of memory: malloc of size 1610612736 failed') Details: Reason = DataFormat.Error;Message = Parquet: class parquet::ParquetStatusException (message: 'Out of memory: malloc of size 1610612736 failed');Message.Format = Parquet: class parquet::ParquetStatusException (message: 'Out of memory: malloc of size 1610612736 failed');Microsoft.Data.Mashup.Error.Context = System (Request ID: 127d0336-cc7a-406c-9256-160c05fe40b6).
The dataflow is taking one column from two tables, appending them together, removing duplicates, and then adding an index. How come I'm getting this error when I changed nothing with my dataflow?
Hi @Anonymous ,
For the memory issue, Lakehouse requires Parquet if the configured destination is lakehouse, so the dataflow engine buffers all this data and converts it to Parquet, which is quite memory intensive.
Monitor the CPU, memory, and network usage of the dataflow job to identify any potential bottlenecks. This can help you understand if the dataflow is running out of memory due to resource constraints.
In addition, proper use of staging can optimize the performance of processing, refer to the following documentation.
Dataflow Gen2 data destinations and managed settings - Microsoft Fabric | Microsoft Learn
An overview of refresh history and monitoring for dataflows. - Microsoft Fabric | Microsoft Learn
Best Regards,
Adamk Kong
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi, the data destination is not lakehouse. I do not have a data destination configured.
How can I monitor the CPU, memory, and network usage of the dataflow job? The monitoring Hub provides none of these details
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.