The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi all, my Fabric Dataflow gen2 failed after 4 hrs, and I am getting this error message. Is there some time limit, and I'm not sure. Please help with this part.
Source- Databricks
Destination- Fabric Lakehouse
Medium- Dataflow Gen2 (No transformation, only full load)
1 Table - 120M+ records, 160+ columns, Size~ 18GB
Fabric SKU- F128
Thanks!
In addition to this, I have divided the data according to the year, and I am using 5 parallel Dataflows now and targeting them to one Final table in Lakehouse. While trying this, some of my Dataflows failed, and I am getting a Dataflow Gen2 error.
semantic_table: There was a problem refreshing the dataflow: 'Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: Error in comitting version., Underlying error: conflicting metadata change Details: Reason = DataSource.Error;ErrorCode = Lakehouse036;Message = conflicting metadata change;Message.Format = conflicting metadata change;Microsoft.Data.Mashup.Error.Context = User GatewayObjectId: e58a7629-e2e0-412b-8688-6f0ebc4f1e50'. Error code: 104100. (Request ID: 3cbf7621-0a97-481e-be25-97ad4ab717f6).
Can you please let me know about this part? And what will be the solution for this?
Thanks!
But I am also seeing on the internet that its timeout is 8hrs. @v-mdharahman Can you please help on this?
Thanks
Hi @avisri,
Yes right the execution time limit is up to 8 hours, depending on factors like workload, SKU, and internal resource management. However, in real-world usage, especially in production environments (including with high SKUs like F64 or F128), many users have observed a hard stop at exactly 4 hours like in your case.
Also the best use case to transfer this amount of data is by spliting the load. Likje you can break the source table into smaller partitions ( like by date range, primary key buckets, or any natural partitioning column). Then create multiple parameterized dataflows or loops using pipeline to load each partition individually and avoid hitting the execution time limit.
You can also use Data Pipelines with Copy Activity. Since you're only copying data without transformations, using Copy Activity in Fabric Data Pipelines from Databricks to Lakehouse can be more efficient and does not have execution time restriction. It will also handles larger datasets more gracefully.
If I misunderstand your needs or you still have problems on it, please feel free to let us know.
Best Regards,
Hammad.
Sp I wanted to you whats the best use case to transfer this amount of data through Dataflow?
Hi @avisri,
Thanks for reaching out to the Microsoft fabric community forum and yes, Fabric Dataflow Gen2 currently has a affective execution time limit of 4 hours per run, even with higher SKUs like F128. Since your job ran for exactly 4:00:43 and then failed, it’s very likely that it hit this hard timeout limit.
Given your scenario (18GB, 120M+ rows, 160+ columns, full load), even without transformations, this volume can push Dataflow Gen2 close to or beyond the timeout window, especially when moving from external sources like Databricks.
If I misunderstand your needs or you still have problems on it, please feel free to let us know.
Best Regards,
Hammad.
User | Count |
---|---|
4 | |
4 | |
2 | |
2 | |
2 |
User | Count |
---|---|
17 | |
15 | |
11 | |
6 | |
6 |