Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Inspired by this blog entry, I've been looking into using predefined Spark resource profiles: Supercharge your workloads: write-optimized default Spark configurations in Microsoft Fabric | Micro...
The use cases seem quite straightforward, and I don't see any reason not to use ReadHeavyForPBI for our Gold layer.
But, how do you decide between ReadHeavyForSpark or WriteHeavy for Bronze and Silver layers?
For Bronze and Silver tables that will end up as facts in our Gold layer, should you use WriteHeavy?
But for tables that will end up as slowly changing dimensions, would it be best to use ReadHeavyForSpark, as we will spend more time reading them than writing to them?
Has anyone measured any of these scenarios, and come up with recommendations?
A quick description of our architecture, for context:
Good qusestion!
If you want to validate this approach, a practical solution is to:
Use Fabric's Activity Runs or Spark History to measure duration with each profile.
Keep volume constant, switch profile, and compare metrics like CPU Time, Shuffle Read/Write, and Cached Memory.
There are no public benchmarks from Microsoft for these specific scenarios, but i believe they have based this on early adopter feedback and internal testing from Fabric preview days (i assume)
WriteHeavy consistently reduces latency during large ingestions and merges.
ReadHeavyForSpark shows noticeable improvements in transformation heavy pipelines, especially those with large joins.
ReadHeavyForPBI makes PBI DL reports slick and more stable under the load.
The rationale behind tagging each layer is commonly based on below:
Recommended: Use WriteHeavy
Data is typically appended.
You are not reading it often; transformations happen downstream.
Prioritise write throughput and ingestion latency.
Decision Point: Depends on your operations per table.
Use: WriteHeavy
Optimise ETL throughput, especially if you are reading directly from Bronze and writing enriched data.
Use: ReadHeavyForSpark
These tables are typically read heavy across many processes (joins, lookups).
The frequency and cost of reads outweigh the write overhead.
True for SCD 2, where point in time analysis needs frequent reads with filters.
Use: ReadHeavyForPBI
Designed for consumption.
Read latency impacts user experience.
Optimised for directquery and DirectLake queries in PBI.
Please 'Kudos' and 'Accept as Solution' if this answered your query.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
9 | |
4 | |
3 | |
3 | |
2 |