Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi all,
We use multiple data sources in our company, ranging from Azure blob storage, SQL database, Snowflake (should be reduced in the future, but that is not what this post is about).
Our head of IT now pushes to centralize all data in a Fabric lakehouse. This is currently being worked on, but I see a lot of problems in doing this.
Since the Fabric "shortcut" functionality is only available for ADLS Gen2 and S3, we have to set up notebooks and data factory pipelines to move the data to Fabric lakehouse. Then a semantic model is build on top of this.
But the problems that I see:
- Now we have two versions of the data, for example in SQL database and fabric lakehouse.
- We have extra infrastructure in which things can go wrong: for example the pipeline moving data from SQL database and Snowflake to Fabric. This increases the change that data in Fabric is not always up to date.
- If data changes in the source system, we have to adjsut it in SQL Database, and in the data factory pipeline, and in the lakehouse, and in the semantic model
- The Fabric cloud costs are GIGANTIC since we are moving a lot of data.
What is your opinion on this? As all the Fabric webinars show that it is done this way: move all the data to the lakehouse. Moving only a part of the data to lakehouse would remove all the advantages Fabric offers (having everything in one place).
Because of the above, I am not a fan of using Fabric as the go to system.... What do you think?
Solved! Go to Solution.
Hi @daircom ,
You’ve raised some very valid concerns, and this is a common challenge when transitioning to a centralized data architecture like Fabric.
Here are a few thoughts and potential alternatives:
Avoid Full Duplication Where Possible
Instead of moving all data into the Fabric Lakehouse, consider a hybrid approach:
Data Virtualization Layer
Tools like Microsoft Data Virtualization (or third-party solutions) can help you query data in-place without physically moving it, reducing duplication and sync issues.
Incremental Loads & Change Tracking
If you must move data, use incremental refresh and change data capture (CDC) to minimize load and reduce latency between source and Fabric.
Cost Optimization
Fabric costs can indeed spike with large data volumes. Consider:
Semantic Model Layering
Try to decouple your semantic model from the physical storage layer as much as possible. This allows you to adapt the backend without constantly reworking the model.
In short, while centralizing data in Fabric Lakehouse has benefits (governance, performance, unified access), it’s not always practical or cost-effective for every scenario. A selective, hybrid strategy often provides a better balance between control, performance, and cost.
Would love to hear how others are handling this too!
Hi @daircom ,
You’ve raised some very valid concerns, and this is a common challenge when transitioning to a centralized data architecture like Fabric.
Here are a few thoughts and potential alternatives:
Avoid Full Duplication Where Possible
Instead of moving all data into the Fabric Lakehouse, consider a hybrid approach:
Data Virtualization Layer
Tools like Microsoft Data Virtualization (or third-party solutions) can help you query data in-place without physically moving it, reducing duplication and sync issues.
Incremental Loads & Change Tracking
If you must move data, use incremental refresh and change data capture (CDC) to minimize load and reduce latency between source and Fabric.
Cost Optimization
Fabric costs can indeed spike with large data volumes. Consider:
Semantic Model Layering
Try to decouple your semantic model from the physical storage layer as much as possible. This allows you to adapt the backend without constantly reworking the model.
In short, while centralizing data in Fabric Lakehouse has benefits (governance, performance, unified access), it’s not always practical or cost-effective for every scenario. A selective, hybrid strategy often provides a better balance between control, performance, and cost.
Would love to hear how others are handling this too!
Hi @daircom,
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please "Accept as Solution" and give a 'Kudos' so other members can easily find it.
Thank you,
Pavan.
Hi @daircom,
Thank you for reaching out in Microsoft Community Forum.
Thank you @ObungiNiels , @andrewsommer for the helpful response.
As suggested by ObungiNiels, andrewsommer., I hope this information was helpful. Please let me know if you have any further questions or you'd like to discuss this further. If this answers your question, please "Accept as Solution" and give it a 'Kudos' so others can find it easily.
Please continue using Microsoft community forum.
Regards,
Pavan.
Hi @daircom ,
I would agree that your concerns regarding data being out of sync between your SQL database and Fabric can be solved by looking into mirroring.
Apart from storing data, I would like to emphasize that Fabric shows great potential at enabling a great variety of business users to consume reports at a consolidated place. With separate workspaces and according workspace rights for users and OneLake security coming up, Fabric as an analytics platform can work well as the single source for analysts when retrieving their data.
is this AI generated? 😅
Yes, I would agree with your head of IT that the ultimate goal should be to get all of your data in one place so that it can be used wholistically.
However, Microsoft is clearly pushing us towards mirroring as the initial ingestion into the Fabric platform. Mirroring does not have the CU issues that pipelines do. While mirroring might not be production ready for some of your sources like Snowflake it will be in the very near future.
Please mark this post as solution if it helps you. Appreciate Kudos.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Fabric update to learn about new features.