Get certified for free when you join Fabric Data Days 2026 and dive into Fabric, Power BI, SQL, AI, and other essential data skills.
Join nowData Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more
We’re thrilled to introduce Automated Table Statistics in Microsoft Fabric Data Engineering — a major upgrade that helps you get blazing-fast query performance with zero manual effort.
Whether you’re running complex joins, large aggregations, or heavy filtering workloads, Fabric’s new automated statistics will help Spark make smarter decisions, saving you time, compute, and money.
In simple terms, table statistics are summary metrics about your data that Spark uses to optimize queries which include:
Until now, generating these statistics required running manual commands (ANALYZE TABLE) or setting up custom pipelines. With this release, Fabric collects them automatically when you create a new Delta table — no setup or tuning required.
Spark’s Cost-Based Optimizer (CBO) relies on good statistics to:
Without accurate stats, Spark can only guess, often leading to slow or expensive query plans. With automated stats, Spark becomes much smarter — and in our internal benchmarks, we’ve seen up to ~45% performance gains on complex workloads.
You can also fine-tune behavior with configurations:
Enable/disable stats collection:
spark.conf.set("spark.microsoft.delta.stats.collect.extended", "true")
Enable/disable optimizer injection:
spark.conf.set("spark.microsoft.delta.stats.injection.enabled", "true")
Want to see what’s under the hood? Fabric lets you inspect collected stats and recompute them if needed:
Check current stats (Scala):
println(spark.read.table("tableName").queryExecution.optimizedPlan.stats)
Recompute after schema changes:
StatisticsStore.recomputeStatisticsWithCompaction(spark, "tableName")
For full control, you can also use the familiar ANALYZE TABLE command.
As with any advanced feature, it’s important to understand the current boundaries:
We’re actively working on expanding support — including improvements for existing tables and better recompute workflows.
To learn more about automated statistics for tables please visit the Configure and manage Automated Table Statistics in Fabric Spark documentation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.