Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

Reply
Koraganti77
Frequent Visitor

Help Needed: Optimizing a 3.5-Hour Power BI Semantic Model Refresh - What Would You Do?

The Challenge We're Facing 

Hey Fabric Community! I'm working on a challenging Power BI performance optimization case and would love to get your thoughts and ideas. Here's what we're dealing with:

Current State - The Problem

The Situation:

  • Power BI Semantic Model: Sales Order Report (mission-critical for business)
  • Current Refresh Time: 3.5 hours  (completely unacceptable)
  • Data Source: Synapse Analytics table
  • Volume: ~20 million rows, 20GB dataset
  • Schema: 127 columns (many oversized nvarchar(1000) fields)
  • Business Need: Weekly refresh frequency
  • Target: Get this down to under 30 minutes if possible

Technical Details

Source Table Characteristics:

sql
-- Current Synapse table structure
Spoiler
[XXX].[XXX_sales_order]
- 127 columns total- Many nvarchar(1000) fields that could be right-sized- Key fields: sales_order_no, product_sku, created_on, ship_to_cot- Date range: 104-week rolling window (created_on >= DATEADD(WK, -104, GETDATE()))

Current Power BI Setup:

  • Import Mode with incremental refresh enabled
  • Complex M Query with multiple calculated columns and transformations
  • Heavy business logic in M Query (CASE statements, concatenations, date calculations)
  • Sample M Query complexity: Sales Type categorization with nested conditions

Key Constraints:

  • Cannot modify Synapse source table (not in scope)
  • Must preserve existing M Query business logic (critical requirement)
  • Can migrate to Microsoft Fabric (pilot approved)
  • Have F128 capacity available
  • Weekly refresh frequency is acceptable

What We're Considering

Migration to Fabric Options:

  1. Fabric Warehouse + Direct Lake approach
  2. Fabric Lakehouse with delta tables

Specific Questions for the Community

  1. Direct Lake vs DirectQuery: Given our complex M Query transformations, which mode would perform better?
  2. Schema Optimization: Should we tackle data type optimization first, or focus on migration approach?
  3. F128 Capacity: sufficient for 20M rows with complex transformations!!!
  4. M Query Preservation: Any tips for maintaining complex business logic while optimizing performance?
  5. Incremental Refresh: Best practices for implementing this in Fabric environment?

Success Criteria

  • Primary Goal: Reduce 3.5-hour refresh to under 30 minutes or direct lake mode
  • Secondary Goals: Maintain data accuracy, preserve business logic
  • Nice to Have: Improved query performance for end users

What Would You Do? 

Have you tackled similar large-scale semantic model optimizations?

Any specific Fabric features or approaches you'd recommend?

War stories or lessons learned from similar migrations?

Performance optimization techniques that worked well for your use cases?

Looking forward to hearing your thoughts, ideas, and experiences! Every suggestion helps as we plan our optimization strategy.


 

1 ACCEPTED SOLUTION
v-echaithra
Community Support
Community Support

Hi @Koraganti77 ,

Thank you for reaching out to Microsoft Community.

Inorder to reduce the 3.5-hour refresh time of your Power BI semantic model, try migrating to Microsoft Fabric Lakehouse with Delta tables and Direct Lake mode might be effective strategy. Direct Lake leverages the VertiPaq engine, offering import-mode-like performance while avoiding full data duplication. It loads columnar data directly from Parquet-based Delta tables in OneLake, enabling high-speed access with minimal overhead. This approach far outperforms DirectQuery, which federates queries back to Synapse and performs poorly with complex M logic.
Incremental refresh for semantic models in Power BI - Power BI | Microsoft Learn
Semantic Model Refresh in Fabric: Beyond the Defaults
Direct Lake overview - Microsoft Fabric | Microsoft Learn

As a first step, you should use Dataflows Gen2 or Fabric pipelines to load your Synapse data into a Lakehouse Delta table. During this process, implement schema optimization by trimming oversized nvarchar(1000) fields, pruning unused columns, and flattening any nested structures. These changes significantly reduce memory usage and improve compression, especially by lowering column cardinality. Fabric tools like Power BI usage analytics can help identify which columns are actually needed.

To preserve your critical M Query logic, replicate it upstream using Dataflows Gen2 or Spark notebooks. This allows complex business rules such as CASE statements and transformations to be precomputed before the data hits the semantic model. If you're using semantic link, notebooks can also be a powerful tool for embedding complex logic into Delta tables while maintaining flexibility.

Once the Delta tables are structured, build your Power BI semantic model in Direct Lake mode, mapping the tables via shortcuts. While Direct Lake does not currently support calculated columns or tables that reference Direct Lake sources, these transformations should be handled upstream or refactored into DAX measures within the model. For complex logic, DAX VAR statements can improve readability and performance.

For incremental refresh, define RangeStart and RangeEnd parameters on a reliable created_on column and enable query folding to push the filter logic back to the Lakehouse. To avoid refreshing full partitions unnecessarily, implement change detection based on metadata columns like ModifiedDateTime. This will help maintain efficiency during weekly refreshes. If you need near real-time freshness, consider using Hybrid Tables for recent data.

To ensure smooth operation, take advantage of Fabric's Semantic Model Refresh Pipelines to manage refresh parallelism and retries. Monitor overall system performance using the Fabric Metrics App, which provides insight into refresh duration, query execution, memory usage, and dataset size. Tools like Analyze My Refresh can help pinpoint bottlenecks and identify optimization opportunities.

By migrating to Delta tables in Fabric Lakehouse, optimizing your schema, pushing business logic upstream, and fully utilizing Direct Lake with incremental refresh and change detection, you can expect to bring your Power BI refresh time well below the 30-minute threshold possibly eliminating refresh time altogether. This architecture not only preserves data accuracy and business rules but also offers scalable, maintainable performance moving forward.

Best Regards,
Chaithra E.

View solution in original post

3 REPLIES 3
v-echaithra
Community Support
Community Support

Hi @Koraganti77 ,

Thank you for reaching out to Microsoft Community.

Inorder to reduce the 3.5-hour refresh time of your Power BI semantic model, try migrating to Microsoft Fabric Lakehouse with Delta tables and Direct Lake mode might be effective strategy. Direct Lake leverages the VertiPaq engine, offering import-mode-like performance while avoiding full data duplication. It loads columnar data directly from Parquet-based Delta tables in OneLake, enabling high-speed access with minimal overhead. This approach far outperforms DirectQuery, which federates queries back to Synapse and performs poorly with complex M logic.
Incremental refresh for semantic models in Power BI - Power BI | Microsoft Learn
Semantic Model Refresh in Fabric: Beyond the Defaults
Direct Lake overview - Microsoft Fabric | Microsoft Learn

As a first step, you should use Dataflows Gen2 or Fabric pipelines to load your Synapse data into a Lakehouse Delta table. During this process, implement schema optimization by trimming oversized nvarchar(1000) fields, pruning unused columns, and flattening any nested structures. These changes significantly reduce memory usage and improve compression, especially by lowering column cardinality. Fabric tools like Power BI usage analytics can help identify which columns are actually needed.

To preserve your critical M Query logic, replicate it upstream using Dataflows Gen2 or Spark notebooks. This allows complex business rules such as CASE statements and transformations to be precomputed before the data hits the semantic model. If you're using semantic link, notebooks can also be a powerful tool for embedding complex logic into Delta tables while maintaining flexibility.

Once the Delta tables are structured, build your Power BI semantic model in Direct Lake mode, mapping the tables via shortcuts. While Direct Lake does not currently support calculated columns or tables that reference Direct Lake sources, these transformations should be handled upstream or refactored into DAX measures within the model. For complex logic, DAX VAR statements can improve readability and performance.

For incremental refresh, define RangeStart and RangeEnd parameters on a reliable created_on column and enable query folding to push the filter logic back to the Lakehouse. To avoid refreshing full partitions unnecessarily, implement change detection based on metadata columns like ModifiedDateTime. This will help maintain efficiency during weekly refreshes. If you need near real-time freshness, consider using Hybrid Tables for recent data.

To ensure smooth operation, take advantage of Fabric's Semantic Model Refresh Pipelines to manage refresh parallelism and retries. Monitor overall system performance using the Fabric Metrics App, which provides insight into refresh duration, query execution, memory usage, and dataset size. Tools like Analyze My Refresh can help pinpoint bottlenecks and identify optimization opportunities.

By migrating to Delta tables in Fabric Lakehouse, optimizing your schema, pushing business logic upstream, and fully utilizing Direct Lake with incremental refresh and change detection, you can expect to bring your Power BI refresh time well below the 30-minute threshold possibly eliminating refresh time altogether. This architecture not only preserves data accuracy and business rules but also offers scalable, maintainable performance moving forward.

Best Regards,
Chaithra E.

@v-echaithra , Thank you for the excellent Fabric Lakehouse + Direct Lake recommendations! Your approach is absolutely the gold standard for maximum performance and future-ready analytics. We're implementing a dual-track strategy based on your insights: Phase 1 uses Fabric Warehouse + DirectQuery to achieve immediate wins (eliminating our 3.5-hour refresh completely) while preserving our team's existing T-SQL/M Query skills for a 6-week delivery. This allows us to demonstrate Fabric value quickly with minimal cultural disruption. Phase 2 will then migrate to your recommended Lakehouse + Spark notebooks + Direct Lake architecture as part of our enterprise migration from Synapse Analytics to Fabric engagement. This phased approach lets us achieve 70-80% performance improvement immediately while building toward the 95%+ optimization your Lakehouse solution provides. Your Delta table optimization and upstream logic processing recommendations will be invaluable for our long-term implementation. Thanks for sharing such comprehensive Fabric-native best practices. 

Once again appreciate your time and support to the community.

Best

Srikanth

Thomaslleblanc
Super User
Super User

Direct Lake will not speed up reporting performance, but will help with developing the semantic model and reduce refresh because data is not imported into the model. I feel over time, this performance for reporting will improve as Microsoft invests in the Delta Table and Direct Lake approach.

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2026

Check out the April 2026 Fabric update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.