This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
We’re thrilled to introduce Automated Table Statistics in Microsoft Fabric Data Engineering — a major upgrade that helps you get blazing-fast query performance with zero manual effort.
Whether you’re running complex joins, large aggregations, or heavy filtering workloads, Fabric’s new automated statistics will help Spark make smarter decisions, saving you time, compute, and money.
In simple terms, table statistics are summary metrics about your data that Spark uses to optimize queries which include:
Until now, generating these statistics required running manual commands (ANALYZE TABLE) or setting up custom pipelines. With this release, Fabric collects them automatically when you create a new Delta table — no setup or tuning required.
Spark’s Cost-Based Optimizer (CBO) relies on good statistics to:
Without accurate stats, Spark can only guess, often leading to slow or expensive query plans. With automated stats, Spark becomes much smarter — and in our internal benchmarks, we’ve seen up to ~45% performance gains on complex workloads.
You can also fine-tune behavior with configurations:
Enable/disable stats collection:
spark.conf.set("spark.microsoft.delta.stats.collect.extended", "true")
Enable/disable optimizer injection:
spark.conf.set("spark.microsoft.delta.stats.injection.enabled", "true")
Want to see what’s under the hood? Fabric lets you inspect collected stats and recompute them if needed:
Check current stats (Scala):
println(spark.read.table("tableName").queryExecution.optimizedPlan.stats)
Recompute after schema changes:
StatisticsStore.recomputeStatisticsWithCompaction(spark, "tableName")
For full control, you can also use the familiar ANALYZE TABLE command.
As with any advanced feature, it’s important to understand the current boundaries:
We’re actively working on expanding support — including improvements for existing tables and better recompute workflows.
To learn more about automated statistics for tables please visit the Configure and manage Automated Table Statistics in Fabric Spark documentation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.