Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now

Reply
sailkitty5
Frequent Visitor

Using Aggregate Tables and Partitions

Hi- 

I'm working on a large data project with multiple fact tables that need at least 2 years of data. Fact tables are 40 mill rows + per table per year with 20+ dimension tables 

 

Some of the aggregate tables I am working with are not shrinking the row count that much. Would partitioning the Aggregate table help to increase our end user speed? Is that possible? 

 

We are already (1) query folding using native query Value.NativeQuery(Database,

 

Does "groupby" have to happen in both the SQL query AND the MQuery  to make sure the aggregate gets folded back ?

 

5 REPLIES 5
pcoley
Responsive Resident
Responsive Resident

In Power BI, aggregate tables (often set up as part of aggregations in composite models) are designed to improve query performance by precomputing summaries, but if they're not reducing row counts significantly, that suggests high cardinality in your grouping columns—meaning the aggregation isn't coarsening the data enough to make a big impact on storage or scan times. This is common with detailed dimensions or when aggregations are at a finer grain than typical queries.


Partitioning can help improve end-user query performance, particularly in Import mode or composite models with large imported aggregates. Here's why and how:

  • Performance Benefits: Power BI's Analysis Services engine (VertiPaq) supports partition elimination during queries. If you partition your aggregate table by a commonly filtered column (e.g., date, year, or month), the engine can skip scanning irrelevant partitions. For your scenario with 2+ years of data and 40M+ rows per year, this reduces the amount of data scanned in memory, leading to faster query response times—especially for visuals or reports that filter by time ranges. Without partitioning, the engine might scan the entire table, which is inefficient for large datasets.
  • Is It Possible? Absolutely, but it requires Power BI Premium (or Premium Per User) capacity, as partitioning is managed through Tabular Model Explorer in SQL Server Management Studio (SSMS), XMLA endpoints, or tools like Tabular Editor. You can't set it up directly in Power BI Desktop's UI for imported tables, but it's straightforward:
    • Base partitions on a date column (e.g., one partition per month or quarter).
    • Use incremental refresh policies to only process new or changed partitions during data refresh, which also indirectly aids performance by keeping data fresh without full reloads.
    • In DirectQuery mode, partitioning happens at the source database (e.g., SQL Server partitioning), which can similarly prune data during queries pushed back via folding.

However, partitioning primarily shines for refresh management and query pruning; it won't "shrink" row counts like better-designed aggregations would. If your aggregates aren't reducing rows much, consider revising them:

  • Increase aggregation granularity (e.g., group by higher-level dimensions like quarter instead of day).
  • Use multiple aggregation tables at different levels.
  • Test with the Performance Analyzer in Power BI Desktop to measure before/after impacts.

For your scale (multiple fact tables at 80M+ rows for 2 years, plus dimensions), combine this with other optimizations like column pruning (remove unused columns), composite models (DirectQuery for details, Import for aggregates), and DAX best practices (e.g., avoid calculated columns in large tables).


The GROUP BY does not need to happen in both the native SQL query (via Value.NativeQuery) and the subsequent M Query steps to ensure the aggregation is folded back to the source. In fact, duplicating it could be inefficient or redundant. Let's break it down:

  • How Folding Works: Query folding pushes operations from Power Query (M) back to the data source as native SQL, reducing data transfer and leveraging the database's compute power. Value.NativeQuery(source, "YOUR SQL HERE") explicitly sends your custom SQL (including any GROUP BY) directly to the database, and the result is returned to Power Query. Any M steps after this are applied in-memory on the already-aggregated result—they won't fold back unless the entire query chain is foldable.
  • Best Practice for Aggregations:
    • To maximize folding and performance, perform the GROUP BY (and as much filtering/joining as possible) directly in the SQL inside Value.NativeQuery. This ensures the aggregation happens at the source, minimizing data pulled into Power BI.
    • If you only do GROUP BY in M Query (without native SQL), it might fold if the preceding steps are foldable (e.g., simple filters or selects), translating to a GROUP BY in the auto-generated SQL. But with large datasets like yours, relying on automatic folding can be risky—use the "View Native Query" option in Power Query Editor to confirm what's being pushed.
    • If you need additional transformations after the native query (e.g., further grouping in M), do them sparingly, as they process in-memory. Test folding by right-clicking steps in the editor.

Example of using Value.NativeQuery for aggregated folding:

let Source = Sql.Database("yourserver", "yourdatabase"), AggregatedQuery = Value.NativeQuery(Source, "SELECT Dimension1, SUM(Measure) AS Total FROM FactTable GROUP BY Dimension1 WHERE Date >= '2024-01-01'") in AggregatedQuery
This pushes the entire aggregation to SQL Server (or your DB), and no further M Group By is needed unless you're combining results post-fetch.

If you're seeing poor folding, check for non-foldable steps (e.g., custom functions, merges with non-SQL sources) or database permissions. For your multi-table setup, consider parameterized native queries to handle dynamic filters.

v-echaithra
Community Support
Community Support

Hi @sailkitty5 ,

May I ask if you have resolved this issue? Please let us know if you have any further issues, we are happy to help assist you.

Thank you.

v-echaithra
Community Support
Community Support

Hi @sailkitty5 ,

May I ask if you have resolved this issue? Please let us know if you have any further issues, we are happy to help and  share the sample data in a workable format such as text, an Excel file, or a PBIX file with sample data instead of screenshots. 
.

Thank you.

v-echaithra
Community Support
Community Support

Hi @sailkitty5 ,

Partitioning can help improve query performance, but mainly through partition elimination, not by reducing row count. If your reports commonly filter by Date/Year, partitioning the aggregate table on that column allows the VertiPaq engine to scan only relevant partitions, which can improve end-user response times especially at your scale. This is typically implemented via Incremental Refresh or XMLA-based partition management (Premium/PPU required).

However, if the aggregation itself isn’t significantly reducing rows, that indicates the grouping grain is still too detailed. In that case, revisiting the aggregation design (higher level grain or multiple agg tables) will usually deliver more benefit than partitioning alone.

Regarding folding: the GROUP BY only needs to exist in the native SQL query when using Value.NativeQuery. Adding another Group By step in M is not required and will execute in-memory rather than fold back to the source. Always validate using View Native Query to confirm folding.



Best Regards,
Chaithra E.

 

Natarajan_M
Resolver II
Resolver II

Hi @sailkitty5 : can you share the size of the semantic model and whats the database you are using . 
What is the response time that you are expecting ? it depends upon the visual and underlying dax too..

40 million rows per year is not a huge data, you can still implement incremental refresh on the fact table and import it . 
if you can share more details on this the community can help better . 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Power BI Update Carousel

Power BI Monthly Update - February 2026

Check out the February 2026 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors