Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hi All,
I'm trying to replicate the TCP-DS performance test to compare the performance between Spark and DuckDB.
Using the following repo on how to set up the dataset: GitHub - BlueGranite/tpc-ds-dataset-generator: Generate big TPC-DS datasets with Databricks
I've generated a new environment and a Configuration notebook replicating the steps from their configure.py.
After creating the .sh script, I can't find in Fabric a way to configure the cluster/pool to run it as an init script as we've done in Databricks.
I have 2 questions:
Thank you.
Regards,
Victor
Solved! Go to Solution.
Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse
Finally generated the data using the TCP-DS Duck DB extension and saved them into parquets in my lakehouse
Hi @v-menakakota ,
I've manually generated the spark.sql.perf.jar file on my local machine and uploaded as a custom library on the Fabric environment.
The current issue is I'm not able to import it into my notebook with the following error:
----> 1 import spark.sql.perf ModuleNotFoundError: No module named 'spark'
Hi @VictorMed ,
Thanks for the update! Since you've manually uploaded the spark.sql.perf JAR, here are a few steps to ensure it is correctly imported in your Fabric Notebook:
spark.sql.perf is primarily a Scala-based benchmarking library. Fabric’s Notebooks use PySpark, so it may not be compatible unless you run it in a Scala environment (which Fabric currently does not natively support in Notebooks).
If this post was helpful, please give us Kudos and mark it as Accepted Solution to assist other community members.
Thank you.
Hi @VictorMed ,
Thank you for reaching out to us on the Microsoft Fabric Community Forum.
Since spark.sql.perf is not available in Fabric’s built-in libraries, you can try manually. Try to do it manually in the fabric notebooks. Once the spark.sql.perf is present you can run TPC-DS Dataset.
If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
9 | |
4 | |
3 | |
3 | |
2 |