Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
roalexan
Microsoft Employee
Microsoft Employee

create Lakehouse table using pyspark

Hello all. At a high level, I'm trying to create an EventStream that reads from an EventHub as a streaming source, and writes to a Fabric Lakehouse as a destination (using this blog sample, BTW). Focusing on my problem, I'm having trouble creating the Lakehouse table that will be used as the destination. Really, I'm happy to create a table most any way (I do prefer it to be empty, so no-code 'flow' options aren't really applicable), but I've so far been unsuccessful via spark SQL and pyspark. For pyspark, I've run:

 

 

 

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.save("table1")

 

 

 

It's a bit hard to debug - perhaps these two snippets from the debug logs via the portal are useful:

 

 

 

Operation failed: "Bad Request", 400, HEAD, http://msit-onelake.dfs.fabric.microsoft.com/cc170da0-3f1c-46d6-8020-0828c17ee4c9/0a3379a2-d39c-4654-bfef-ad916bf4b9a6/table1/_delta_log/_last_checkpoint?upn=false&action=getStatus&timeout=90
An operation with ADLS Gen2 has failed. This is typically due to a permissions issue. 1. Please ensure that for all ADLS Gen2 resources referenced in the Spark job, that the user running the code has RBAC roles "Storage Blob Data Contributor" on storage accounts the job is expected to read and write from. 2. Check the logs for this Spark application by clicking the Monitor tab in left side of the Synapse Studio UI, select "Apache Spark Applications" from the "Activities" section, and find your Spark job from this list. Inspect the logs available in the "Logs" tab in the bottom part of this page for the ADLS Gen2 storage account name that is experiencing this issue.

 

 

 

Is this actually an ADLS 2 permission error? If so, running this notebook on Fabric, I don't know where the storage account even exists. The spark SQL execution resolves to a similar type of error (HiveException reading/write the Metastore).

 

Anyway, the steps I did to get to this point are pretty minimal: create a Fabric workspace, create a Fabric Lakehouse, then try to create a table using pyspark (and spark sql). Seems like a newbie getting off the ground type problem - if anyone has any suggestions, that would be great.

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @roalexan 
Thanks for using Fabric Community.
1) Please attach the lakehouse to the notebook.

vnikhilanmsft_0-1703705843081.png


2) Then try running this code:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("table2")


This will create a new table named "table2" in your lakehouse.

vnikhilanmsft_1-1703705979826.png

 

Hope this helps. Please let me know if you have any further questions. Glad to help.

View solution in original post

3 REPLIES 3
roalexan
Microsoft Employee
Microsoft Employee

thanks! The solution was indeed to use code such as above

 

df.write.format("delta").saveAsTable("table2")

 

I will take a closer look at doc such as here. Wasn't sure delta format was strictly required.

Anonymous
Not applicable

Hi @roalexan 
Glad that your query got resolved. Lakehouse currently supports only delta format tables while creating a new table. Link1

vnikhilanmsft_0-1703709139182.png

Please continue using Fabric Community for any help regarding your queries.

Anonymous
Not applicable

Hi @roalexan 
Thanks for using Fabric Community.
1) Please attach the lakehouse to the notebook.

vnikhilanmsft_0-1703705843081.png


2) Then try running this code:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("table2")


This will create a new table named "table2" in your lakehouse.

vnikhilanmsft_1-1703705979826.png

 

Hope this helps. Please let me know if you have any further questions. Glad to help.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.