Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Treichs
Frequent Visitor

Cost-Efficient Fabric Architecture Guidance for Multi-System Org

Hi all,

 

I’m looking for advice on cost-effective architectural patterns in Microsoft Fabric, specifically around Capacity Unit usage.

I’m standing up Fabric at a global post-production company (~2000 staff, multiple business units), but with limited resources, low data maturity, and a tight budget. We’re working to consolidate data from several systems, including:

  • AWS RDS for PostgreSQL (via gateway)

  • D365 F&O (Finance & HR)

  • UKG Pro (Payroll)

  • Dataverse (CRM Extensions via Model Driven App)

Question:

Given these sources, my goal is to build a solid medallion architecture, but I need help understanding which approach minimizes CU consumption while still enabling:

  • Frequent incremental refresh (hourly for some sources)

  • Archiving strategies (granular vs. summarized)

  • Support for ~60 tables, some with 10M+ rows

Options I’m weighing:

  • Lakehouse vs. Warehouse for transformed and semantic data

  • When to use Dataflows Gen2 vs. Pipelines vs. Notebooks

  • Where to store analysis-ready data for lowest ongoing CU impact

Any tips, design patterns, or lessons from similar orgs would be hugely appreciated—especially if you’ve prioritized cost control in your Fabric rollout.

 

Cheers!

1 ACCEPTED SOLUTION
v-achippa
Community Support
Community Support

Hi @Treichs,

 

Thank you for reaching out to Microsoft Fabric Community.

 

Thank you for sharing the detailed scenario. Based on your requirement, here is the most cost-efficient and scalable way to build the Fabric architecture:

  • Use Dataflows Gen2 for all ingestion, they use less capacity than Pipelines. It supports incremental refresh, so schedule hourly where needed and it works well with your sources like PostgreSQL, D365, Dataverse and UKG.
  • Use Lakehouse for storage and transformations, it is more cost-effective than Warehouse and handles large tables. And use Notebooks for transformations, and schedule them to run only when needed.
  • For semantic data, connect power bi to the Lakehouse using Direct Lake mode, this avoids compute costs at query time and lowest CU usage
  • Use Warehouse only if the users need full SQL features or frequently queries on the data.
  • For archiving, store raw, detailed data in the Lakehouse i.e Bronze layer. Summarize and clean data in Silver/Gold layers. Clean up the raw data older than few months to save storage and cost.

This setup will give you a solid medallion architecture with controlled CU usage, support for large datasets, frequent refresh and future scalability.

 

 

If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it! 

 

Thanks and regards,

Anjan Kumar Chippa

View solution in original post

3 REPLIES 3
v-achippa
Community Support
Community Support

Hi @Treichs,

 

Thank you for reaching out to Microsoft Fabric Community.

 

Thank you for sharing the detailed scenario. Based on your requirement, here is the most cost-efficient and scalable way to build the Fabric architecture:

  • Use Dataflows Gen2 for all ingestion, they use less capacity than Pipelines. It supports incremental refresh, so schedule hourly where needed and it works well with your sources like PostgreSQL, D365, Dataverse and UKG.
  • Use Lakehouse for storage and transformations, it is more cost-effective than Warehouse and handles large tables. And use Notebooks for transformations, and schedule them to run only when needed.
  • For semantic data, connect power bi to the Lakehouse using Direct Lake mode, this avoids compute costs at query time and lowest CU usage
  • Use Warehouse only if the users need full SQL features or frequently queries on the data.
  • For archiving, store raw, detailed data in the Lakehouse i.e Bronze layer. Summarize and clean data in Silver/Gold layers. Clean up the raw data older than few months to save storage and cost.

This setup will give you a solid medallion architecture with controlled CU usage, support for large datasets, frequent refresh and future scalability.

 

 

If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it! 

 

Thanks and regards,

Anjan Kumar Chippa

Hi @v-achippa 

 

Thanks so much for the response – much appreciated!

 

Quick follow-up question regarding two areas:

  1. Archiving Raw Data in Bronze:
    related to purging raw data older than a few months, what would be your recomendation of tool to use (i.e. pipeline/notebook/etc)

  2. Custom Views for Excel Users:
    Currently, I'm using Dataflows Gen1 within Power BI Pro workspaces (until FABRIC is capacity is further developed) to create curated views that users can connect to directly from Excel via Power Query. This works fine, but I’d love your thoughts on whether this is the best approach or if there’s a more optimal pattern for supporting ad hoc Excel access to curated datasets without incurring high compute costs.

 

Cheers,

burakkaragoz
Community Champion
Community Champion

Hi @Treichs ,

You’ve provided a great overview of your scenario and the challenges around cost-efficient architecture in Microsoft Fabric, especially with Capacity Unit (CU) usage. Here are some recommendations and considerations based on your requirements:

1. Medallion Architecture and CU Optimization

  • Building a solid medallion architecture (Bronze/Silver/Gold layers) is a best practice for managing data maturity and cost. Try to keep your Bronze layer as raw as possible, and push heavy transformations to the Silver or Gold layer where data volume is reduced.
  • Store as much raw and infrequently accessed data in Lakehouse objects with Parquet format to minimize ongoing storage and CU costs. Use Warehouse only for highly curated, query-optimized data that needs strong SQL semantics or high concurrency.
  • For incremental refresh, consider partitioning your fact tables and using incremental loading via Dataflows Gen2 or Pipelines to avoid full table scans and reduce CU consumption.

2. Incremental Refresh & Archiving

  • Use Dataflows Gen2 for ETL/ELT and take advantage of incremental refresh policies. Pipelines are powerful for orchestrating complex workflows, but Dataflows are often more cost-effective for straightforward ingestion and transformation jobs.
  • Archive old/granular data to colder storage in your Lakehouse and keep only summarized or recent data in performance-critical layers (Silver/Gold or Warehouse).
  • Evaluate Power BI’s incremental refresh features for semantic models if you later plan to build reports on top of your Gold layer.

3. Lakehouse vs Warehouse

  • Favor Lakehouse for large-scale, transformed, or historical data that does not require low-latency SQL access or many concurrent users. Lakehouse storage is cheaper and scales better for big data.
  • Use Warehouse for interactive analytics, semantic modeling, and scenarios needing high concurrency or where you leverage features like stored procedures.
  • A mixed approach often works best: Lakehouse for storage and staging, Warehouse for semantic/serving layer.

4. Dataflows Gen2 vs Pipelines vs Notebooks

  • Dataflows Gen2: Great for repeatable, declarative ETL processes, especially with incremental refresh.
  • Pipelines: Good for orchestrating and scheduling multi-step workflows, including Dataflows, Notebooks, and external activities.
  • Notebooks: Best for advanced data processing, ML, or ad-hoc data wrangling tasks, but may have higher CU usage for large data volumes.

5. Storing Analysis-Ready Data

  • Store analysis-ready data in Lakehouse Gold layer for cost efficiency, then load it into Warehouse only as needed for reporting.
  • Use Parquet format wherever possible for storage and query efficiency.
  • Monitor your CU usage and adjust retention/refresh strategies as your workload grows.

6. General Cost Control Tips

  • Monitor usage and performance in the Fabric Admin Portal to spot expensive queries or workloads.
  • Use incremental refresh and partitioning aggressively to reduce compute costs.
  • Prioritize automation and orchestration to reduce manual intervention and resource wastage.

Summary Table

Layer Purpose Storage Type CU Impact

BronzeRaw data storageLakehouseLow (storage only)
SilverCleaned/transformed dataLakehouseModerate
GoldCurated/analysis-readyLakehouse/WarehouseModerate/High (on demand)
WarehouseReporting/semantic layerWarehouseHigher (per query)

Let me know if you’d like more specific design patterns, sample architectures, or if you have more details about your data volumes and refresh needs. Happy to help further!

Hope this helps with your Fabric rollout!

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.