<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Predefined Spark resource profiles in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4738906#M10328</link>
    <description>&lt;P&gt;Inspired by this blog entry, I've been looking into using predefined Spark resource profiles: &lt;A href="https://blog.fabric.microsoft.com/en-us/blog/supercharge-your-workloads-write-optimized-default-spark-configurations-in-microsoft-fabric?ft=All" target="_blank" rel="noopener"&gt;Supercharge your workloads: write-optimized default Spark configurations in Microsoft Fabric | Microsoft Fabric Blog | Microsoft Fabric&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The use cases seem quite straightforward, and I don't see any reason not to use ReadHeavyForPBI for our Gold layer.&lt;/P&gt;&lt;P&gt;But, how do you decide between ReadHeavyForSpark or WriteHeavy for Bronze and Silver layers?&lt;/P&gt;&lt;P&gt;For Bronze and Silver tables that will end up as facts in our Gold layer, should you use WriteHeavy?&lt;/P&gt;&lt;P&gt;But for tables that will end up as slowly changing dimensions, would it be best to use ReadHeavyForSpark, as we will spend more time reading them than writing to them?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone measured any of these scenarios, and come up with recommendations?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A quick description of our architecture, for context:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;We are using a medallion architecture with Bronze, Silver, Gold Lakehouses, each in their own workspaces.&lt;/LI&gt;&lt;LI&gt;We store Notebooks and Data pipelines in their own workspace that we call "process".&lt;/LI&gt;&lt;LI&gt;We process fact and dimension data for multiple business areas.&lt;/LI&gt;&lt;LI&gt;Volumes vary between hundreds of records, and 100M records per month, depending on the data source.&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Fri, 20 Jun 2025 15:06:15 GMT</pubDate>
    <dc:creator>Liam_McCauley</dc:creator>
    <dc:date>2025-06-20T15:06:15Z</dc:date>
    <item>
      <title>Predefined Spark resource profiles</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4738906#M10328</link>
      <description>&lt;P&gt;Inspired by this blog entry, I've been looking into using predefined Spark resource profiles: &lt;A href="https://blog.fabric.microsoft.com/en-us/blog/supercharge-your-workloads-write-optimized-default-spark-configurations-in-microsoft-fabric?ft=All" target="_blank" rel="noopener"&gt;Supercharge your workloads: write-optimized default Spark configurations in Microsoft Fabric | Microsoft Fabric Blog | Microsoft Fabric&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The use cases seem quite straightforward, and I don't see any reason not to use ReadHeavyForPBI for our Gold layer.&lt;/P&gt;&lt;P&gt;But, how do you decide between ReadHeavyForSpark or WriteHeavy for Bronze and Silver layers?&lt;/P&gt;&lt;P&gt;For Bronze and Silver tables that will end up as facts in our Gold layer, should you use WriteHeavy?&lt;/P&gt;&lt;P&gt;But for tables that will end up as slowly changing dimensions, would it be best to use ReadHeavyForSpark, as we will spend more time reading them than writing to them?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone measured any of these scenarios, and come up with recommendations?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A quick description of our architecture, for context:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;We are using a medallion architecture with Bronze, Silver, Gold Lakehouses, each in their own workspaces.&lt;/LI&gt;&lt;LI&gt;We store Notebooks and Data pipelines in their own workspace that we call "process".&lt;/LI&gt;&lt;LI&gt;We process fact and dimension data for multiple business areas.&lt;/LI&gt;&lt;LI&gt;Volumes vary between hundreds of records, and 100M records per month, depending on the data source.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 20 Jun 2025 15:06:15 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4738906#M10328</guid>
      <dc:creator>Liam_McCauley</dc:creator>
      <dc:date>2025-06-20T15:06:15Z</dc:date>
    </item>
    <item>
      <title>Re: Predefined Spark resource profiles</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4739861#M10349</link>
      <description>&lt;P&gt;Good qusestion!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you want to validate this approach, a practical solution is to:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;Use Fabric's Activity Runs or Spark History to measure duration with each profile.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Keep volume constant, switch profile, and compare metrics like CPU Time, Shuffle Read/Write, and Cached Memory.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;There are no public benchmarks from Microsoft for these specific scenarios, but i believe they have based this on early adopter feedback and internal testing from Fabric preview days (i assume)&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;WriteHeavy consistently reduces latency during large ingestions and merges.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;ReadHeavyForSpark shows noticeable improvements in transformation heavy pipelines, especially those with large joins.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;ReadHeavyForPBI makes PBI DL reports slick and more stable under the load.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The rationale behind tagging each layer is commonly based on below:&lt;/P&gt;&lt;H3&gt;Bronze Layer (Raw Ingestion)&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Recommended: Use WriteHeavy&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Data is typically appended.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;You are not reading it often; transformations happen downstream.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Prioritise write throughput and ingestion latency.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Silver Layer (Cleansed, Business Logic Applied)&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Decision Point: Depends on your operations per table.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;If table is append only and used in fact pipelines (transactional facts?):&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use: WriteHeavy&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Optimise ETL throughput, especially if you are reading directly from Bronze and writing enriched data.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;If table is dim like (ex: SCD &amp;amp; lookups) and used across many pipelines:&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use: ReadHeavyForSpark&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;These tables are typically read heavy across many processes (joins, lookups).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The frequency and cost of reads outweigh the write overhead.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;True for SCD 2, where point in time analysis needs frequent reads with filters.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Gold Layer (Consumption/Visualization)&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use: ReadHeavyForPBI&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Designed for consumption.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Read latency impacts user experience.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Optimised for directquery and DirectLake queries in PBI.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Please 'Kudos' and 'Accept as Solution' if this answered your query.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 01:23:47 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4739861#M10349</guid>
      <dc:creator>Vinodh247</dc:creator>
      <dc:date>2025-06-23T01:23:47Z</dc:date>
    </item>
    <item>
      <title>Re: Predefined Spark resource profiles</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4740544#M10361</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/683290"&gt;@Liam_McCauley&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Thanks for raising this in the Fabric Community Forum&amp;nbsp;and thanks to &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/640329"&gt;@Vinodh247&lt;/a&gt;&amp;nbsp;for the detailed input.&lt;/P&gt;
&lt;P&gt;As mentioned by &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/640329"&gt;@Vinodh247&lt;/a&gt;&amp;nbsp;, a commonly observed pattern is to use WriteHeavy for ingestion-heavy Bronze, ReadHeavyForSpark for dimension tables in Silver that are frequently read or joined, and ReadHeavyForPBI in the Gold layer where reporting performance is a focus.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For the Silver layer, selecting between WriteHeavy and ReadHeavyForSpark often depends on how the tables are accessed. Tables that are primarily written in bulk and used in fact processing tend to follow the WriteHeavy approach. In contrast, dimension tables, particularly those with SCD logic or used across multiple pipelines, may benefit from ReadHeavyForSpark due to more read-intensive operations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you're evaluating these options, comparing Spark history or Activity Runs across different profiles on the same workload can help identify which setting works best in your context.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can find the official documentation on resource profiles here:&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/data-engineering/configure-resource-profile-configurations" target="_blank"&gt;Configure Resource Profile Configurations in Microsoft Fabric - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps. Please reach out for further assistance.&lt;/P&gt;
&lt;P&gt;Please consider marking the helpful reply as Accepted Solution to assist others with similar queries.&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 10:26:05 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Predefined-Spark-resource-profiles/m-p/4740544#M10361</guid>
      <dc:creator>v-veshwara-msft</dc:creator>
      <dc:date>2025-06-23T10:26:05Z</dc:date>
    </item>
  </channel>
</rss>

