Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
daircom
Resolver II
Resolver II

Open discussion: Should all data be available in Fabric

Hi all,

 

We use multiple data sources in our company, ranging from Azure blob storage, SQL database, Snowflake (should be reduced in the future, but that is not what this post is about). 

Our head of IT now pushes to centralize all data in a Fabric lakehouse. This is currently being worked on, but I see a lot of problems in doing this. 

Since the Fabric "shortcut" functionality is only available for ADLS Gen2 and S3, we have to set up notebooks and data factory pipelines to move the data to Fabric lakehouse. Then a semantic model is build on top of this.

But the problems that I see:

- Now we have two versions of the data, for example in SQL database and fabric lakehouse.

- We have extra infrastructure in which things can go wrong: for example the pipeline moving data from SQL database and Snowflake to Fabric. This increases the change that data in Fabric is not always up to date.

- If data changes in the source system, we have to adjsut it in SQL Database, and in the data factory pipeline, and in the lakehouse, and in the semantic model

- The Fabric cloud costs are GIGANTIC since we are moving a lot of data.

 

What is your opinion on this? As all the Fabric webinars show that it is done this way: move all the data to the lakehouse. Moving only a part of the data to lakehouse would remove all the advantages Fabric offers (having everything in one place).

Because of the above, I am not a fan of using Fabric as the go to system.... What do you think? 

1 ACCEPTED SOLUTION
burakkaragoz
Community Champion
Community Champion

Hi @daircom ,

You’ve raised some very valid concerns, and this is a common challenge when transitioning to a centralized data architecture like Fabric.

Here are a few thoughts and potential alternatives:

  1. Avoid Full Duplication Where Possible
    Instead of moving all data into the Fabric Lakehouse, consider a hybrid approach:

    • Use shortcuts for ADLS Gen2 and S3 where possible.
    • For other sources like SQL and Snowflake, evaluate DirectQuery or virtualization options (if available in your scenario) to avoid full duplication.
  2. Data Virtualization Layer
    Tools like Microsoft Data Virtualization (or third-party solutions) can help you query data in-place without physically moving it, reducing duplication and sync issues.

  3. Incremental Loads & Change Tracking
    If you must move data, use incremental refresh and change data capture (CDC) to minimize load and reduce latency between source and Fabric.

  4. Cost Optimization
    Fabric costs can indeed spike with large data volumes. Consider:

    • Partitioning and pruning data.
    • Using smaller capacities with autoscale.
    • Reviewing pipeline frequency and data retention policies.
  5. Semantic Model Layering
    Try to decouple your semantic model from the physical storage layer as much as possible. This allows you to adapt the backend without constantly reworking the model.

In short, while centralizing data in Fabric Lakehouse has benefits (governance, performance, unified access), it’s not always practical or cost-effective for every scenario. A selective, hybrid strategy often provides a better balance between control, performance, and cost.

Would love to hear how others are handling this too!

 

View solution in original post

7 REPLIES 7
burakkaragoz
Community Champion
Community Champion

Hi @daircom ,

You’ve raised some very valid concerns, and this is a common challenge when transitioning to a centralized data architecture like Fabric.

Here are a few thoughts and potential alternatives:

  1. Avoid Full Duplication Where Possible
    Instead of moving all data into the Fabric Lakehouse, consider a hybrid approach:

    • Use shortcuts for ADLS Gen2 and S3 where possible.
    • For other sources like SQL and Snowflake, evaluate DirectQuery or virtualization options (if available in your scenario) to avoid full duplication.
  2. Data Virtualization Layer
    Tools like Microsoft Data Virtualization (or third-party solutions) can help you query data in-place without physically moving it, reducing duplication and sync issues.

  3. Incremental Loads & Change Tracking
    If you must move data, use incremental refresh and change data capture (CDC) to minimize load and reduce latency between source and Fabric.

  4. Cost Optimization
    Fabric costs can indeed spike with large data volumes. Consider:

    • Partitioning and pruning data.
    • Using smaller capacities with autoscale.
    • Reviewing pipeline frequency and data retention policies.
  5. Semantic Model Layering
    Try to decouple your semantic model from the physical storage layer as much as possible. This allows you to adapt the backend without constantly reworking the model.

In short, while centralizing data in Fabric Lakehouse has benefits (governance, performance, unified access), it’s not always practical or cost-effective for every scenario. A selective, hybrid strategy often provides a better balance between control, performance, and cost.

Would love to hear how others are handling this too!

 

Anonymous
Not applicable

Hi @daircom,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please "Accept  as  Solution" and give a 'Kudos' so other members can easily find it.

Thank you,
Pavan.

Anonymous
Not applicable

Hi @daircom,

Thank you for reaching out in Microsoft Community Forum.

Thank you @ObungiNiels , @andrewsommer    for the helpful response.

As suggested by ObungiNiels, andrewsommer.,  I hope this information was helpful. Please let me know if you have any further questions or you'd like to discuss this further. If this answers your question, please "Accept as Solution" and give it a 'Kudos' so others can find it easily.

Please continue using Microsoft community forum.

Regards,
Pavan.

ObungiNiels
Resolver III
Resolver III

Hi @daircom , 

I would agree that your concerns regarding data being out of sync between your SQL database and Fabric can be solved by looking into mirroring. 

Apart from storing data, I would like to emphasize that Fabric shows great potential at enabling a great variety of business users to consume reports at a consolidated place. With separate workspaces and according workspace rights for users and OneLake security coming up, Fabric as an analytics platform can work well as the single source for analysts when retrieving their data. 

is this AI generated? 😅

andrewsommer
Super User
Super User

Yes, I would agree with your head of IT that the ultimate goal should be to get all of your data in one place so that it can be used wholistically. 

 

However, Microsoft is clearly pushing us towards mirroring as the initial ingestion into the Fabric platform.  Mirroring does not have the CU issues that pipelines do.  While mirroring might not be production ready for some of your sources like Snowflake it will be in the very near future.

 

Please mark this post as solution if it helps you. Appreciate Kudos.

 

Thanks for your answer @andrewsommer !

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Fabric Update Carousel

Fabric Monthly Update - October 2025

Check out the October 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.