Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi All,
I'm trying to replicate the TCP-DS performance test to compare the performance between Spark and DuckDB.
Using the following repo on how to set up the dataset: GitHub - BlueGranite/tpc-ds-dataset-generator: Generate big TPC-DS datasets with Databricks
I've generated a new environment and a Configuration notebook replicating the steps from their configure.py.
After creating the .sh script, I can't find in Fabric a way to configure the cluster/pool to run it as an init script as we've done in Databricks.
I have 2 questions:
Thank you.
Regards,
Victor
Solved! Go to Solution.
Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse
Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse
Hi @v-menakakota ,
I've manually generated the spark.sql.perf.jar file on my local machine and uploaded as a custom library on the Fabric environment.
The current issue is I'm not able to import it into my notebook with the following error:
----> 1 import spark.sql.perf ModuleNotFoundError: No module named 'spark'
Hi @VictorMed ,
Thanks for the update! Since you've manually uploaded the spark.sql.perf JAR, here are a few steps to ensure it is correctly imported in your Fabric Notebook:
spark.sql.perf is primarily a Scala-based benchmarking library. Fabric’s Notebooks use PySpark, so it may not be compatible unless you run it in a Scala environment (which Fabric currently does not natively support in Notebooks).
If this post was helpful, please give us Kudos and mark it as Accepted Solution to assist other community members.
Thank you.
Hi @VictorMed ,
Thank you for reaching out to us on the Microsoft Fabric Community Forum.
Since spark.sql.perf is not available in Fabric’s built-in libraries, you can try manually. Try to do it manually in the fabric notebooks. Once the spark.sql.perf is present you can run TPC-DS Dataset.
If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.
Check out the November 2025 Fabric update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!