Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
VictorMed
Frequent Visitor

Replicating TCP-DS Benchmark in Fabric

Hi All,

 

I'm trying to replicate the TCP-DS performance test to compare the performance between Spark and DuckDB.

Using the following repo on how to set up the dataset: GitHub - BlueGranite/tpc-ds-dataset-generator: Generate big TPC-DS datasets with Databricks

I've generated a new environment and a Configuration notebook replicating the steps from their configure.py.

After creating the .sh script, I can't find in Fabric a way to configure the cluster/pool to run it as an init script as we've done in Databricks. 

 

I have 2 questions:

  1. Have somebody used the TPC-DS Dataset in Fabric Notebooks successfully? If so, how?
  2.  How can I import the spark.sql.perf into my environment to use it? Can't find it in public libraries or Built-in ones.

 

Thank you.

 

Regards,

Victor 

 

1 ACCEPTED SOLUTION
VictorMed
Frequent Visitor

Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse

View solution in original post

4 REPLIES 4
VictorMed
Frequent Visitor

Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse

VictorMed
Frequent Visitor

Hi @v-menakakota ,

 

I've manually generated the spark.sql.perf.jar file on my local machine and uploaded as a custom library on the Fabric environment.

 

The current issue is I'm not able to import it into my notebook with the following error:

----> 1 import spark.sql.perf ModuleNotFoundError: No module named 'spark'

Hi @VictorMed ,

Thanks for the update! Since you've manually uploaded the spark.sql.perf JAR, here are a few steps to ensure it is correctly imported in your Fabric Notebook:

  • Try adding the JAR to your Spark session manually and please ensure that the path matches where you've uploaded the JAR in your OneLake Files or Lakehouse.
  • Since you’re getting ModuleNotFoundError: No module named 'spark', try using this import pattern. If com.databricks.spark.sql.perf is not found, it likely means the JAR isn’t properly loaded into the Spark environment.

spark.sql.perf is primarily a Scala-based benchmarking library. Fabric’s Notebooks use PySpark, so it may not be compatible unless you run it in a Scala environment (which Fabric currently does not natively support in Notebooks).

If this post was helpful, please give us Kudos and mark it as Accepted Solution to assist other community members.
Thank you.

v-menakakota
Community Support
Community Support

Hi @VictorMed ,
Thank you for reaching out to us on the Microsoft Fabric Community Forum.


Since spark.sql.perf  is not available in Fabric’s built-in libraries, you can try manually. Try to do it manually in the fabric notebooks. Once the spark.sql.perf is present you can run TPC-DS Dataset.

If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.