Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
roalexan
Microsoft Employee
Microsoft Employee

create Lakehouse table using pyspark

Hello all. At a high level, I'm trying to create an EventStream that reads from an EventHub as a streaming source, and writes to a Fabric Lakehouse as a destination (using this blog sample, BTW). Focusing on my problem, I'm having trouble creating the Lakehouse table that will be used as the destination. Really, I'm happy to create a table most any way (I do prefer it to be empty, so no-code 'flow' options aren't really applicable), but I've so far been unsuccessful via spark SQL and pyspark. For pyspark, I've run:

 

 

 

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.save("table1")

 

 

 

It's a bit hard to debug - perhaps these two snippets from the debug logs via the portal are useful:

 

 

 

Operation failed: "Bad Request", 400, HEAD, http://msit-onelake.dfs.fabric.microsoft.com/cc170da0-3f1c-46d6-8020-0828c17ee4c9/0a3379a2-d39c-4654-bfef-ad916bf4b9a6/table1/_delta_log/_last_checkpoint?upn=false&action=getStatus&timeout=90
An operation with ADLS Gen2 has failed. This is typically due to a permissions issue. 1. Please ensure that for all ADLS Gen2 resources referenced in the Spark job, that the user running the code has RBAC roles "Storage Blob Data Contributor" on storage accounts the job is expected to read and write from. 2. Check the logs for this Spark application by clicking the Monitor tab in left side of the Synapse Studio UI, select "Apache Spark Applications" from the "Activities" section, and find your Spark job from this list. Inspect the logs available in the "Logs" tab in the bottom part of this page for the ADLS Gen2 storage account name that is experiencing this issue.

 

 

 

Is this actually an ADLS 2 permission error? If so, running this notebook on Fabric, I don't know where the storage account even exists. The spark SQL execution resolves to a similar type of error (HiveException reading/write the Metastore).

 

Anyway, the steps I did to get to this point are pretty minimal: create a Fabric workspace, create a Fabric Lakehouse, then try to create a table using pyspark (and spark sql). Seems like a newbie getting off the ground type problem - if anyone has any suggestions, that would be great.

1 ACCEPTED SOLUTION
v-nikhilan-msft
Community Support
Community Support

Hi @roalexan 
Thanks for using Fabric Community.
1) Please attach the lakehouse to the notebook.

vnikhilanmsft_0-1703705843081.png


2) Then try running this code:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("table2")


This will create a new table named "table2" in your lakehouse.

vnikhilanmsft_1-1703705979826.png

 

Hope this helps. Please let me know if you have any further questions. Glad to help.

View solution in original post

3 REPLIES 3
roalexan
Microsoft Employee
Microsoft Employee

thanks! The solution was indeed to use code such as above

 

df.write.format("delta").saveAsTable("table2")

 

I will take a closer look at doc such as here. Wasn't sure delta format was strictly required.

Hi @roalexan 
Glad that your query got resolved. Lakehouse currently supports only delta format tables while creating a new table. Link1

vnikhilanmsft_0-1703709139182.png

Please continue using Fabric Community for any help regarding your queries.

v-nikhilan-msft
Community Support
Community Support

Hi @roalexan 
Thanks for using Fabric Community.
1) Please attach the lakehouse to the notebook.

vnikhilanmsft_0-1703705843081.png


2) Then try running this code:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("table2")


This will create a new table named "table2" in your lakehouse.

vnikhilanmsft_1-1703705979826.png

 

Hope this helps. Please let me know if you have any further questions. Glad to help.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.