Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Level up your Power BI skills this month - build one visual each week and tell better stories with data! Get started

Reply
maverickf17
Helper I
Helper I

Expert Guidance Needed: Handling Billions of Rows in Microsoft Fabric + Power BI (Phase 2 Develop)

Hello Everyone,

I’m looking for advice from experts who have worked with large-scale datasets in Microsoft Fabric + Power BI. We are entering Phase 2 of our reporting solution, and I want to ensure the right architecture is in place from the beginning.


🔹 Current Setup (Phase 1)

  • Our data originally sat in Azure Synapse → Power BI refreshes used to take ~6 hours.

  • The data has now been migrated to Microsoft OneLake (Fabric) → refresh time has improved to ~1 hour.

  • In Phase 1, the approach taken was:

    • Import Mode in Power BI.

    • The entire raw table (billions of rows) was imported into Power BI.

    • Data modelling and transformations were handled inside Power BI instead of upstream in Fabric.

This worked for the proof of concept, but it’s clearly not scalable or efficient for long-term needs.


🔹 Phase 2 Requirements

For the next phase, here are the key requirements we want to achieve:

  1. Performance:

    • We are working with billions of rows.

    • Need refresh/query performance in minutes, not hours.

  2. Separation of concerns:

    • Power BI should not do heavy lifting.

    • All transformations, joins, and aggregations must be handled in Fabric.

    • Power BI should only consume curated, clean, star-schema-ready data.

  3. Technology stack:

    • We are fully on Microsoft Fabric (OneLake, Lakehouse, Warehouse, Dataflows Gen2).

  4. Refresh strategy:

    • We’re open to Incremental Refresh or Full Refresh depending on best practice.

    • The goal is to minimize refresh times and ensure scalability.


🔹 Options We Are Considering

Option A – Direct Lake Mode (Fabric-native)

  • Power BI connects in Direct Lake mode to Fabric tables.

  • Eliminates traditional refresh bottlenecks – near real-time with import-like performance.

  • Scales well with billions of rows.

  • Reduces maintenance overhead (no refresh partitions to manage).

Option B – Import Mode with Incremental Refresh

  • Import curated fact + dimension tables into Power BI.

  • Use Incremental Refresh to only load new/changed data.

  • Suitable for periodic loads but still requires refresh management.


🔹 What I’d Like Expert Advice On

  1. For billions of rows in Fabric, is Direct Lake Mode the clear best option, or should we still consider Incremental Refresh in Import Mode for stability?

  2. What are the best practices for modelling in Fabric (Lakehouse/Warehouse) before exposing data to Power BI?

    • Example: Should we push all transformations into Fabric pipelines and only expose star schema tables?

  3. Any recommended architecture patterns to balance:

    • Speed (minutes, not hours)

    • Scalability (billions of rows, additional columns)

    • Governance (no raw tables inside Power BI, maintain a clean semantic layer)


🔹 Key Constraints / Specs

  • Dataset size: Billions of rows (raw table + additional columns in Phase 2).

  • Current Phase 1 report: Import Mode, raw tables loaded → not sustainable.

  • Tools available: Microsoft Fabric (OneLake, Lakehouse, Warehouse, Dataflows Gen2).

  • Goal: Curated, clean data in Power BI with minimal refresh times.


Any insights, architectural suggestions, or real-world experiences would be hugely valuable.

Thanks,
Arjun 



1 ACCEPTED SOLUTION
v-agajavelly
Community Support
Community Support

Hi @maverickf17 ,
 

Direct Lake Mode Is the Game-Changer.

Your instincts about Direct Lake Mode are spot-on. It’s not just a theoretical improvement it’s a real-world performance enabler. In fact, your own accepted solution in the Microsoft Fabric Community outlines why Direct Lake is ideal for both dimension tables and massive fact tables:

  • Dimension Tables: Avoids duplication across datasets, centralizes storage, and improves maintainability.
  • Fact Tables: Handles billions of rows with no refresh delays, low memory pressure, and near real-time updates thanks to vectorized execution on Delta Parquet format.

This aligns with broader community consensus: Direct Lake offers import-like speed without the refresh overhead, making it the preferred choice for scale and simplicity.

Modelling Best Practices in Fabric

  1. Push All Transformations Upstream

Fabric is built for heavy lifting. Experts strongly advise:

  • Do all joins, filters, and aggregations in Fabric pipelines or Dataflows Gen2.
  • Expose only clean, curated star-schema tables to Power BI.

This separation of concerns ensures Power BI remains lean and focused on visualization not data prep.

  1. Use Lakehouse or Warehouse Strategically
  • Lakehouse: Ideal for flexible, large-scale storage and Direct Lake mode.
  • Warehouse: Better for structured SQL workloads and governance.

You can even mix both depending on your use case. For example, use Lakehouse for raw ingestion and Warehouse for curated semantic layers. [Performanc...soft Learn]

Architecture Patterns That Work

Star Schema + Direct Lake

  • Build dimensional models in Fabric.
  • Use Direct Lake to connect Power BI to curated tables.
  • Avoid loading raw tables into Power BI.

Partitioning & Pre-Aggregation

  • Partition large tables by date or region in Fabric notebooks or pipelines.
  • Pre-aggregate where possible to reduce query load. [Efficient...es and ...]

Use Fabric Pipelines + Dataflows Gen2

  • Automate transformations and refresh logic.
  • Keep Power BI free from ETL logic.

Performance & Scalability Tips

  • Avoid Bi-Directional Relationships unless absolutely necessary.
  • Minimize column cardinality and remove unused fields.
  • Use numeric keys for joins and indexing.
  • Turn off Auto Date/Time in Power BI to reduce model bloat.
  • Use Composite Models if you need both aggregated and detailed views. [Solved: Re...rosoft ...]

Governance & Maintainability

  • No raw tables in Power BI only expose semantic layers.
  • Use Purview for metadata and lineage tracking.
  • Centralize reusable dimensions to avoid duplication.
  • Monitor performance using Fabric’s built-in tools.  

Final Recommendation,

For your Phase 2 goals speed, scalability, and governance Direct Lake Mode is the clear winner. It simplifies architecture, reduces refresh complexity, and scales effortlessly. But it only shines when paired with strong upstream modelling in Fabric.

If you’re still evaluating, try a hybrid approach.

  • Use Direct Lake for fact tables.
  • Use Import Mode with Incremental Refresh for slowly changing dimensions or legacy compatibility.

Regards,
Akhil.

View solution in original post

4 REPLIES 4
v-agajavelly
Community Support
Community Support

Hi @maverickf17 ,

I hope the response provided helped in resolving the issue. If you still have any questions, please let us know we are happy to address.

Thanks,
Akhil.

v-agajavelly
Community Support
Community Support

Hi @maverickf17 ,

Just following up again were you able to test the Direct Lake setup in your environment? If you ran into any roadblocks while implementing it, feel free to share the details here and the community can help troubleshoot step by step.

Thanks,
Akhil.

v-agajavelly
Community Support
Community Support

Hi @maverickf17 ,

Thanks for sharing your detailed solution. Your points around Direct Lake Mode and upstream modeling in Fabric are spot on. This approach clearly addresses performance, scalability, and maintainability concerns, especially for large fact tables and centralized dimension tables.

Just checking in did you get a chance to try this setup in your environment yet? If not, happy to guide you through a step-by-step implementation or clarify any part of the recommendation to make it easier to adopt.

Thanks,
Akhil.

v-agajavelly
Community Support
Community Support

Hi @maverickf17 ,
 

Direct Lake Mode Is the Game-Changer.

Your instincts about Direct Lake Mode are spot-on. It’s not just a theoretical improvement it’s a real-world performance enabler. In fact, your own accepted solution in the Microsoft Fabric Community outlines why Direct Lake is ideal for both dimension tables and massive fact tables:

  • Dimension Tables: Avoids duplication across datasets, centralizes storage, and improves maintainability.
  • Fact Tables: Handles billions of rows with no refresh delays, low memory pressure, and near real-time updates thanks to vectorized execution on Delta Parquet format.

This aligns with broader community consensus: Direct Lake offers import-like speed without the refresh overhead, making it the preferred choice for scale and simplicity.

Modelling Best Practices in Fabric

  1. Push All Transformations Upstream

Fabric is built for heavy lifting. Experts strongly advise:

  • Do all joins, filters, and aggregations in Fabric pipelines or Dataflows Gen2.
  • Expose only clean, curated star-schema tables to Power BI.

This separation of concerns ensures Power BI remains lean and focused on visualization not data prep.

  1. Use Lakehouse or Warehouse Strategically
  • Lakehouse: Ideal for flexible, large-scale storage and Direct Lake mode.
  • Warehouse: Better for structured SQL workloads and governance.

You can even mix both depending on your use case. For example, use Lakehouse for raw ingestion and Warehouse for curated semantic layers. [Performanc...soft Learn]

Architecture Patterns That Work

Star Schema + Direct Lake

  • Build dimensional models in Fabric.
  • Use Direct Lake to connect Power BI to curated tables.
  • Avoid loading raw tables into Power BI.

Partitioning & Pre-Aggregation

  • Partition large tables by date or region in Fabric notebooks or pipelines.
  • Pre-aggregate where possible to reduce query load. [Efficient...es and ...]

Use Fabric Pipelines + Dataflows Gen2

  • Automate transformations and refresh logic.
  • Keep Power BI free from ETL logic.

Performance & Scalability Tips

  • Avoid Bi-Directional Relationships unless absolutely necessary.
  • Minimize column cardinality and remove unused fields.
  • Use numeric keys for joins and indexing.
  • Turn off Auto Date/Time in Power BI to reduce model bloat.
  • Use Composite Models if you need both aggregated and detailed views. [Solved: Re...rosoft ...]

Governance & Maintainability

  • No raw tables in Power BI only expose semantic layers.
  • Use Purview for metadata and lineage tracking.
  • Centralize reusable dimensions to avoid duplication.
  • Monitor performance using Fabric’s built-in tools.  

Final Recommendation,

For your Phase 2 goals speed, scalability, and governance Direct Lake Mode is the clear winner. It simplifies architecture, reduces refresh complexity, and scales effortlessly. But it only shines when paired with strong upstream modelling in Fabric.

If you’re still evaluating, try a hybrid approach.

  • Use Direct Lake for fact tables.
  • Use Import Mode with Incremental Refresh for slowly changing dimensions or legacy compatibility.

Regards,
Akhil.

Helpful resources

Announcements
April Power BI Update Carousel

Power BI Monthly Update - April 2026

Check out the April 2026 Power BI update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.