<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Replicating TCP-DS Benchmark in Fabric in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4616504#M8056</link>
    <description>&lt;P&gt;Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse&lt;/P&gt;</description>
    <pubDate>Wed, 19 Mar 2025 13:44:37 GMT</pubDate>
    <dc:creator>VictorMed</dc:creator>
    <dc:date>2025-03-19T13:44:37Z</dc:date>
    <item>
      <title>Replicating TCP-DS Benchmark in Fabric</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4608974#M7907</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to replicate the TCP-DS performance test to compare the performance between Spark and DuckDB.&lt;/P&gt;&lt;P&gt;Using the following repo on how to set up the dataset:&amp;nbsp;&lt;A href="https://github.com/BlueGranite/tpc-ds-dataset-generator/tree/master" target="_blank"&gt;GitHub - BlueGranite/tpc-ds-dataset-generator: Generate big TPC-DS datasets with Databricks&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I've generated a new environment and a Configuration notebook replicating the steps from their &lt;A href="https://github.com/BlueGranite/tpc-ds-dataset-generator/blob/master/notebooks/TPC-DS-Configure.py" target="_self"&gt;configure.py&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;After creating the .sh script, I can't find in Fabric a way to configure the cluster/pool to run it as an init script as we've done in Databricks.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have 2 questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Have somebody used the TPC-DS Dataset in Fabric Notebooks successfully? If so, how?&lt;/LI&gt;&lt;LI&gt;&amp;nbsp;How can I import the spark.sql.perf into my environment to use it? Can't find it in public libraries or Built-in ones.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Victor&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 15:50:32 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4608974#M7907</guid>
      <dc:creator>VictorMed</dc:creator>
      <dc:date>2025-03-13T15:50:32Z</dc:date>
    </item>
    <item>
      <title>Re: Replicating TCP-DS Benchmark in Fabric</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4609996#M7922</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/688040"&gt;@VictorMed&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;SPAN data-teams="true"&gt;Thank you for reaching out to us on the Microsoft Fabric Community Forum.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Since &lt;SPAN&gt;spark.sql.perf&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;is not available in Fabric’s built-in libraries, you can try manually. Try to do it manually in the fabric notebooks. Once the spark.sql.perf is present you can run TPC-DS Dataset.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN data-teams="true"&gt;If this post was helpful, please give us &lt;STRONG&gt;Kudos&lt;/STRONG&gt; and consider marking &lt;STRONG&gt;Accept as solution&lt;/STRONG&gt; to assist other members in finding it more easily.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Mar 2025 10:39:52 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4609996#M7922</guid>
      <dc:creator>v-menakakota</dc:creator>
      <dc:date>2025-03-14T10:39:52Z</dc:date>
    </item>
    <item>
      <title>Re: Replicating TCP-DS Benchmark in Fabric</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4610441#M7929</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/882994"&gt;@v-menakakota&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've manually generated the spark.sql.perf.jar file on my local machine and uploaded as a custom library on the Fabric environment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The current issue is I'm not able to import it into my notebook with the following error:&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;----&amp;gt; 1&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;spark&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;perf&lt;/SPAN&gt; &lt;SPAN class=""&gt;ModuleNotFoundError&lt;/SPAN&gt;&lt;SPAN&gt;: No module named 'spark'&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Mar 2025 15:03:37 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4610441#M7929</guid>
      <dc:creator>VictorMed</dc:creator>
      <dc:date>2025-03-14T15:03:37Z</dc:date>
    </item>
    <item>
      <title>Re: Replicating TCP-DS Benchmark in Fabric</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4614454#M8005</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/688040"&gt;@VictorMed&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Thanks for the update! Since you've manually uploaded the spark.sql.perf JAR, here are a few steps to ensure it is correctly imported in your Fabric Notebook:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Try adding the JAR to your Spark session manually and please ensure that the path matches where you've uploaded the JAR in your OneLake Files or Lakehouse.&lt;/LI&gt;
&lt;LI&gt;Since you’re getting ModuleNotFoundError: No module named 'spark', try using this import pattern. If com.databricks.spark.sql.perf is not found, it likely means the JAR isn’t properly loaded into the Spark environment.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;spark.sql.perf is primarily a Scala-based benchmarking library. Fabric’s Notebooks use PySpark, so it may not be compatible unless you run it in a Scala environment (which Fabric currently does not natively support in Notebooks).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;If this post was helpful, please give us&lt;STRONG&gt; Kudos&lt;/STRONG&gt; and mark it as &lt;STRONG&gt;Accepted Solution&lt;/STRONG&gt; to assist other community members.&lt;BR /&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 10:26:21 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4614454#M8005</guid>
      <dc:creator>v-menakakota</dc:creator>
      <dc:date>2025-03-18T10:26:21Z</dc:date>
    </item>
    <item>
      <title>Re: Replicating TCP-DS Benchmark in Fabric</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4616504#M8056</link>
      <description>&lt;P&gt;Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse&lt;/P&gt;</description>
      <pubDate>Wed, 19 Mar 2025 13:44:37 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Replicating-TCP-DS-Benchmark-in-Fabric/m-p/4616504#M8056</guid>
      <dc:creator>VictorMed</dc:creator>
      <dc:date>2025-03-19T13:44:37Z</dc:date>
    </item>
  </channel>
</rss>

