Solved: Using DataflowGen2 CI/CD parameterized, parallel e...

anon97242

Hello!

I have been attempting to use DataflowGen2 CI/CD with parameters with very little success, the goal was to reuse the dataflow within a pipeline For Each Activity using Parallel/Concurrent executions.

I have experimented with altering batch size (even going as low as 2), but have had to move to sequential execution, as it seems this functionality simply does not work reliably.

The Pattern of the errors I have encountered:
1) Pipeline fails when running dataflow: Dataflow refresh job failed with status:

Failed. Error Info: { errorCode: JobInstanceStatusFailed, message: Job instance failed without detail error, requestId: 609e5561-ad0d-40ca-b5df-20d6a0dcad86 }

Often times it appears this error is thrown when pipeline is checking the Dataflow refresh status, the dataflow can still succeed and this error stays with the pipeline run.

2) When the dataflow actually does fail drilling into these errors on the Dataflow run details simply results in Fabric interface itself throwing an error:

I am hoping that this feedback can be passed on to the product team as it is frustrating not being able to see the reason things are failing when executed concurrently, however these issues do not occur when executing sequentially.

Thanks!

Vinodh247

Internally, Dataflow Gen2 jobs may hit compute or metadata contention when too many parallel refreshes are triggered in a short interval. The Dataflow execution engine sometimes fails to forward exceptions back to the parent pipeline. Also the Fabric UI does not yet provide robust debugging/logging for concurrent Dataflow executions. Pls note that dataflow Gen2 is still evolving support for robust parameter handling in parallel executions, especially when resource reuse or contention is present.

As a temp workaround try:

Switch to sequential execution for critical paths until Microsoft enhances Dataflow Gen2's support for robust parallel execution.
Alternatively, use Notebook-based logic (Fabric Notebooks) for parallel operations where concurrency is more stable and transparent.

But always it is ideal to follow the best practices as listed below:

Throttle parallelism: Instead of completely disabling parallelism, set Batch count = 2 and introduce a Wait activity (2–5 seconds) between executions to reduce race conditions.
Isolate Dataflows per iteration: If possible, clone the dataflow for testing and assign different names for each execution path to test whether the issue is caused by shared state or metadata locks.
Enable verbose logging (if possible via REST API): Capture logs at the workspace or Dataflow job level to see if backend logs show failures.
Log Feedback to Microsoft (with Session ID)
Use the “Feedback” button inside the Fabric UI and submit:
- Session ID from your screenshot: b087d12b-cdc8-4a07-8a3a-33a6c9f34504
- Pipeline run ID and workspace details
Avoid Parameter Binding in Highly Parallel Jobs: Parameters in Dataflow Gen2 often get lost or mismatched when triggered concurrently. For now, externalize transformations to Notebooks or Spark job definitions if possible.

Please Kudos & 'Accept' as solution if the reply was helpful. This will be benefitting other community members who face the same issue.

View solution in original post

Vinodh247

Internally, Dataflow Gen2 jobs may hit compute or metadata contention when too many parallel refreshes are triggered in a short interval. The Dataflow execution engine sometimes fails to forward exceptions back to the parent pipeline. Also the Fabric UI does not yet provide robust debugging/logging for concurrent Dataflow executions. Pls note that dataflow Gen2 is still evolving support for robust parameter handling in parallel executions, especially when resource reuse or contention is present.

As a temp workaround try:

Switch to sequential execution for critical paths until Microsoft enhances Dataflow Gen2's support for robust parallel execution.
Alternatively, use Notebook-based logic (Fabric Notebooks) for parallel operations where concurrency is more stable and transparent.

But always it is ideal to follow the best practices as listed below:

Throttle parallelism: Instead of completely disabling parallelism, set Batch count = 2 and introduce a Wait activity (2–5 seconds) between executions to reduce race conditions.
Isolate Dataflows per iteration: If possible, clone the dataflow for testing and assign different names for each execution path to test whether the issue is caused by shared state or metadata locks.
Enable verbose logging (if possible via REST API): Capture logs at the workspace or Dataflow job level to see if backend logs show failures.
Log Feedback to Microsoft (with Session ID)
Use the “Feedback” button inside the Fabric UI and submit:
- Session ID from your screenshot: b087d12b-cdc8-4a07-8a3a-33a6c9f34504
- Pipeline run ID and workspace details
Avoid Parameter Binding in Highly Parallel Jobs: Parameters in Dataflow Gen2 often get lost or mismatched when triggered concurrently. For now, externalize transformations to Notebooks or Spark job definitions if possible.

Please Kudos & 'Accept' as solution if the reply was helpful. This will be benefitting other community members who face the same issue.

anon97242

Thank you @Vinodh247 for the detailed response!

For others potentially running into this issue:
Throttle parallelism: Instead of completely disabling parallelism, set Batch count = 2 and introduce a Wait activity (2–5 seconds) between executions to reduce race conditions. I experimented with differernt bath sizes and wait times (even using rand() to randomize dataflow kick offs) while this did improve the issues, it never resolved it.

Isolate Dataflows per iteration: If possible, clone the dataflow for testing and assign different names for each execution path to test whether the issue is caused by shared state or metadata locks. I am actually doing this in another project and it works well, the issue is there is limitation on # of workspace items and it becomes extremely onerous when there are more than an handful of iterations.

Avoid Parameter Binding in Highly Parallel Jobs: Parameters in Dataflow Gen2 often get lost or mismatched when triggered concurrently. For now, externalize transformations to Notebooks or Spark job definitions if possible. This is unfortunate, hopefully a more robust and performant option is availble via DataflowGen2 in the future.

Will be sure to provide feedback with fabric, was not aware of that option.

Thanks again!

Using DataflowGen2 CI/CD parameterized, parallel execution, with pipelines

Helpful resources

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025

Become a Certified Power BI Data Analyst!

Using DataflowGen2 CI/CD parameterized, parallel execution, with pipelines

Helpful resources

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025