Solved: Re: Medallion Architecture + Data products Life Cy...

SergioTorrinha · ‎01-19-2024

Hi everyone!

Last couple days I've been investigating what's the recomended aproach to implement the medalion architecture ( for which I came across this Microsoft Documentation ) and what should be setup related with life cycle management of data products that one can have/make available in MS Fabric ( for with I came across this documentation, as well from microsoft ).

By reading these documentations (and a couple more blog posts here and there), my mind is confused and that's why I require your help to desmistify this confusion. Worth noting that I also read a couple more question raised on similar topics in the forums here, but none of them had the extra complexity of the life cycle management environments on.

Warning: This post might be long, so please bear with me.

In simple terms, and if I understood correctly, the suggested/recommended aproach to implement a medallion architecture is something like:

- create domain specific to each buisness unit in the organization (in a limit situation create a domain for data transversal to the whole organization);

- inside those domains, build workspaces to display bronze/silver/gold layers

This will allow for more control over who can access the data layers and to organize the data, which sounds great.

However, going through the life cycle management documentation, shared previously, I understood that, at least: dev, test and prod environments should be created.

If we consider:

d - number of domains

w - number of workspaces or, if you'd like, number of data quality layers

e - number of lifecycle management environments

the complexity of things to manage would be something like: d * w * e = 'way to large number of stuff to maintain over time'

Example: If we consider an organization with 3 business domains where we would like to have 3 environments, then we should end up with 3 * 3 * 3 = 27 items to manage, in which each of them has it's particular artifacts. Hugh! too much to deal with, is it not?

Now,

1 - what is wrong with my interpretation ?

2 - what aproaches can one take to mitigate this complexity and stick to the medallion architecture implementation?

Thank you.

AndyDDC · ‎01-19-2024

Great question @SergioTorrinha. The simple answer is there's nothing wrong with your interpretation. The more environments (and therefore workspaces and items) you have, and the more seperation of workspaces into domains you have, yes that will explode out the number of artifacts.

But if we dive a little deeper... the reason you may want to have different domains is that different business areas may have their own data lakehouses/warehouses/semantic models etc. But they would be developed and managed by people allocated to that area - e.g. the Data Mesh pattern. Yes it could be done by a central team, but then that central team has to work across all the items in all the workspaces across the domains - not impossible, but there would be a lot of workspaces and items to manage.

Going on your example above (and visualising it), each of the domains would have their own engineering resource to provide the required functionality. This would make it easier to work on as each group of people only have to work/manage their specific domain environment. But...it would be a governance challenge to make sure their were frameworks and patterns shared across the teams.

A central team could develop and manage all this, but yes there would be more complexity.

View solution in original post

v-cboorla-msft · ‎01-22-2024

Hi @SergioTorrinha

Glad that your query got resolved.

Please continue using Fabric Community for any help regarding your queries.

SergioTorrinha · ‎01-22-2024

Hi @AndyDDC !

I surely found your post useful, of course, but after thinking awhile about your feedback I still have some requestions.

Because resources are limited in any organization, would it make sense, in order to scale down the complexity of having so many environments/areas manage, to have an approach like the one explained below?

For the following, please consider a tiny team (2 or 3 persons) that have to deal with all organization data.

- consider a single domain where all organization data will be in;

- consider development workspaces per medallion layer. Therefore, 3 distinct development workspaces;

- consider a single UAT workspace with only silver and/or gold layers in it;

- consider a single Production workspace with only the gold layer in it

In the end, one would end up with 5 workspaces to manage.

Would this architecture make sense, and is it possible to maintain things nice and tidy like this?

EDIT:

After a bit, I realized that one could further simplify the architecture related to the development workspaces – what about having a single dev workspace with 3 data lake houses, one for each medallion layer?

In the end, one would end up with 3 workspaces to manage.

Also, another question tat comes to mind, is how one should populate the UAT and Production workspaces, with data coming from the dev workspace in a safe and automated way. No, I’m not lazy I just like to be efficient 😉

I guess this is where Copy Activity or Shortcuts could come in handy, right?

Thank you.

AndyDDC · ‎01-22-2024

You could put everything into a single Workspace really, it's your choice 🙂

But in all seriousness I wouldn't get too hung up about number of workspaces, they allow you to group together items in whatever fashion works for your team, organisation, and security posture. If the team is small, then yes fewer workspaces may work better for your back-end services (lakehouses/warehouses/pipelines/notebooks) etc.

Workspaces do offer resource boundaries though, you may find a bottleneck if everything runs in a single workspace: Workload management - Microsoft Fabric | Microsoft Learn

SergioTorrinha · ‎01-22-2024

Thanks for the insights @AndyDDC !

Ha!Ha!Ha! I really don't think a single workspace would be ideal in the scenario I described, especially when you would like to share your data products with business users and don't want them to (unintentionally) mess up your notebooks, pipelines, staging tables or any other intermediary artifacts you might be using to generate the data products, which are the ones to be shared – I would like to have control over that, hence my suggestion to create specific workspaces for the UAT and production environments. 😊

AndyDDC · ‎01-22-2024

I really was joking about the single workspace...

Ultimately I think you've answered your own question here.

SergioTorrinha · ‎01-19-2024

@AndyDDCthank so much for the insisghts!

I guess your post will give me some food for thgouth for the next couple of days. 😁

AndyDDC · ‎01-21-2024

Thanks @SergioTorrinha. if you found this useful please consider marking my answer as the solution.

AndyDDC · ‎01-19-2024

Great question @SergioTorrinha. The simple answer is there's nothing wrong with your interpretation. The more environments (and therefore workspaces and items) you have, and the more seperation of workspaces into domains you have, yes that will explode out the number of artifacts.

But if we dive a little deeper... the reason you may want to have different domains is that different business areas may have their own data lakehouses/warehouses/semantic models etc. But they would be developed and managed by people allocated to that area - e.g. the Data Mesh pattern. Yes it could be done by a central team, but then that central team has to work across all the items in all the workspaces across the domains - not impossible, but there would be a lot of workspaces and items to manage.

Going on your example above (and visualising it), each of the domains would have their own engineering resource to provide the required functionality. This would make it easier to work on as each group of people only have to work/manage their specific domain environment. But...it would be a governance challenge to make sure their were frameworks and patterns shared across the teams.

A central team could develop and manage all this, but yes there would be more complexity.

Medallion Architecture + Data products Life Cycle Management in MS Fabric

Helpful resources

Microsoft Fabric Learn Together

Fabric Monthly Update - March 2024

Fabric Community Update - April 2024