Solved: Data Replication in Dev & Test Workareas for Data ...

MikeH_SDE · ‎01-09-2025

We are going to have Dev Test and Production Data Engineering workareas managed by deployment pipelines. We are dealing with a very large enterprise scale telecoms datalakehouse. What are the methods & best practices for selectively replicating the datalakehouse to the Test workarea and Dev with it's multiple feature branches? We do not want to write anything to Prod tables until the pipelines are in Prod.

Thanks

Mike

v-tsaipranay · ‎01-10-2025

Hi @MikeH_SDE,

Thanks for reaching out to the Microsoft fabric community forum.

Yes, we can Create a trigger configuration table that specifies which pipelines to run in each environment (Dev, Test, and Production). This table can contain permissions or flags indicating active pipelines per environment. The deployment process should include logic to read from this table and determine which pipelines are enabled based on the current deployment context.

By implementing a flexible and dynamic approach, using configurable pipeline controls and execution logic based on the deployment environment, you can significantly minimize the risk of unnecessary data backfilling in your Dev and Test environments. This not only saves resources but also creates a clearer and more manageable environment for testing and development.

Create a configuration process where each pipeline has a flag or setting that indicates its intended environment (Dev, Test, or Prod). Modify your deployment process to only execute pipelines based on the current environment context.
For example, when deploying to Dev, check only the pipelines marked for Dev execution in your configuration. This can be achieved through a similar logic in your deployment scripts.
Utilize staging tables or intermediate storage for preparing data before it’s replicated to Dev and Test. This allows for transformation and cleansing without touching production assets. Implement a pipeline that pulls data from the staging area into the Dev/Test environments based on the configuration settings.

For detail information please refer the documentation link for your better understanding:

https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/deploy-content?tabs=new#deploying...

https://learn.microsoft.com/en-us/fabric/cicd/deployment-pipelines/get-started-with-deployment-pipel...

I hope my suggestions give you good ideas, if you need any further assistance, feel free to reach out.

If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

View solution in original post

MikeH_SDE · ‎01-13-2025

@v-tsaipranay Thanks for the advice, two things we need to solve on this are:

How do we make changes and fix things, like add new columns and populate old data, correct bad data without rerunning the whole pipeline history?
Where do we create and keep our development assets like data discovery notebooks etc, we will want to keep these things and use them on production data for discovery tests though would need to be on developement data. If we promote those development, discovery & tests items with the dev pipelines it could get messy. I am considering having separate engineering & analytics workspaces. Engineering would have source control and used for pipelines whilst analytics would not and be used by analysts who dont want the hassle of source control and for the engineers tests, etc.

MikeH_SDE · ‎01-09-2025

I am wondering, as I believe data does not promote - definitions and processes promote, can we have a trigger configuration table that sets what pipeline in what environment are run?

The deployment process changes which environment column is checked so we only run in Dev and Test environments the pipelines we need. Otherwise Dev & Test would backfill everything when it is test run anyway

v-tsaipranay · ‎01-10-2025