Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
VictorMed
Frequent Visitor

Replicating TCP-DS Benchmark in Fabric

Hi All,

 

I'm trying to replicate the TCP-DS performance test to compare the performance between Spark and DuckDB.

Using the following repo on how to set up the dataset: GitHub - BlueGranite/tpc-ds-dataset-generator: Generate big TPC-DS datasets with Databricks

I've generated a new environment and a Configuration notebook replicating the steps from their configure.py.

After creating the .sh script, I can't find in Fabric a way to configure the cluster/pool to run it as an init script as we've done in Databricks. 

 

I have 2 questions:

  1. Have somebody used the TPC-DS Dataset in Fabric Notebooks successfully? If so, how?
  2.  How can I import the spark.sql.perf into my environment to use it? Can't find it in public libraries or Built-in ones.

 

Thank you.

 

Regards,

Victor 

 

1 ACCEPTED SOLUTION
VictorMed
Frequent Visitor

Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse

View solution in original post

4 REPLIES 4
VictorMed
Frequent Visitor

Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse

VictorMed
Frequent Visitor

Hi @v-menakakota ,

 

I've manually generated the spark.sql.perf.jar file on my local machine and uploaded as a custom library on the Fabric environment.

 

The current issue is I'm not able to import it into my notebook with the following error:

----> 1 import spark.sql.perf ModuleNotFoundError: No module named 'spark'

Hi @VictorMed ,

Thanks for the update! Since you've manually uploaded the spark.sql.perf JAR, here are a few steps to ensure it is correctly imported in your Fabric Notebook:

  • Try adding the JAR to your Spark session manually and please ensure that the path matches where you've uploaded the JAR in your OneLake Files or Lakehouse.
  • Since you’re getting ModuleNotFoundError: No module named 'spark', try using this import pattern. If com.databricks.spark.sql.perf is not found, it likely means the JAR isn’t properly loaded into the Spark environment.

spark.sql.perf is primarily a Scala-based benchmarking library. Fabric’s Notebooks use PySpark, so it may not be compatible unless you run it in a Scala environment (which Fabric currently does not natively support in Notebooks).

If this post was helpful, please give us Kudos and mark it as Accepted Solution to assist other community members.
Thank you.

v-menakakota
Community Support
Community Support

Hi @VictorMed ,
Thank you for reaching out to us on the Microsoft Fabric Community Forum.


Since spark.sql.perf  is not available in Fabric’s built-in libraries, you can try manually. Try to do it manually in the fabric notebooks. Once the spark.sql.perf is present you can run TPC-DS Dataset.

If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.

Helpful resources

Announcements
November Fabric Update Carousel

Fabric Monthly Update - November 2025

Check out the November 2025 Fabric update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.