Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
rblevi01
New Member

Fabric Spark Job Definition Create Lakehouse Table pyspark

Should df.write.format("delta").saveAsTable("test2") be executed from a Fabric Spark Job Definition?  Or, does it run on mutiple nodes and attempt to create the table many times?

 

I ask because if I execute the code below, the error is [TABLE_OR_VIEW_ALREADY_EXISTS].

 

I am sure the table does not exist in the Lakehouse associated with the Spark Job Definition.  

 

The eventual goal is to create a job that creates 1000s of tables from files.  If saveAsTable can only be run from a notebook, is there an alternative to create tables in a Lakehouse from a job?

 

He is the simple code that no matter the table name, will always return an error that it already exists.

 

from pyspark.sql import SparkSession

if __name__ == "__main__":

# Initialize a SparkSession
spark = SparkSession.builder.appName("TextToTable").getOrCreate()
df = spark.read.format("csv").option("header","true").load("Files/test.txt")

# df now is a Spark DataFrame containing CSV data from "Files/test.txt".
print(df.show())

# Create a new table   THIS IS WHERE ERROR HAPPENS
df.write.format("delta").saveAsTable("test2")

 

# Stop the SparkSession
spark.stop()

1 ACCEPTED SOLUTION
v-gchenna-msft
Community Support
Community Support

Hi @rblevi01 ,

Thanks for using Fabric Community.
Apologies for the issue you are facing. 

Please try to use the modified code - that you need to set the mode as overwrite.

 

 

df.write.mode('overwrite').format('delta').saveAsTable('new_table11')

 

 


I was able to execute Spark Job Definition without any issues.

Hope this is helpful. Please let me know incase of further queries.

View solution in original post

2 REPLIES 2
v-gchenna-msft
Community Support
Community Support

Hi @rblevi01 ,

Thanks for using Fabric Community.
Apologies for the issue you are facing. 

Please try to use the modified code - that you need to set the mode as overwrite.

 

 

df.write.mode('overwrite').format('delta').saveAsTable('new_table11')

 

 


I was able to execute Spark Job Definition without any issues.

Hope this is helpful. Please let me know incase of further queries.

Thank-you that worked.  Not sure why overwrite has to be used for a table that does not exist, however it works and I really appreciate the response.

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.