Solved: create Lakehouse table using pyspark

roalexan · ‎12-27-2023

Hello all. At a high level, I'm trying to create an EventStream that reads from an EventHub as a streaming source, and writes to a Fabric Lakehouse as a destination (using this blog sample, BTW). Focusing on my problem, I'm having trouble creating the Lakehouse table that will be used as the destination. Really, I'm happy to create a table most any way (I do prefer it to be empty, so no-code 'flow' options aren't really applicable), but I've so far been unsuccessful via spark SQL and pyspark. For pyspark, I've run:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.save("table1")

It's a bit hard to debug - perhaps these two snippets from the debug logs via the portal are useful:

Operation failed: "Bad Request", 400, HEAD, http://msit-onelake.dfs.fabric.microsoft.com/cc170da0-3f1c-46d6-8020-0828c17ee4c9/0a3379a2-d39c-4654-bfef-ad916bf4b9a6/table1/_delta_log/_last_checkpoint?upn=false&action=getStatus&timeout=90

An operation with ADLS Gen2 has failed. This is typically due to a permissions issue. 1. Please ensure that for all ADLS Gen2 resources referenced in the Spark job, that the user running the code has RBAC roles "Storage Blob Data Contributor" on storage accounts the job is expected to read and write from. 2. Check the logs for this Spark application by clicking the Monitor tab in left side of the Synapse Studio UI, select "Apache Spark Applications" from the "Activities" section, and find your Spark job from this list. Inspect the logs available in the "Logs" tab in the bottom part of this page for the ADLS Gen2 storage account name that is experiencing this issue.

Is this actually an ADLS 2 permission error? If so, running this notebook on Fabric, I don't know where the storage account even exists. The spark SQL execution resolves to a similar type of error (HiveException reading/write the Metastore).

Anyway, the steps I did to get to this point are pretty minimal: create a Fabric workspace, create a Fabric Lakehouse, then try to create a table using pyspark (and spark sql). Seems like a newbie getting off the ground type problem - if anyone has any suggestions, that would be great.

Anonymous · ‎12-27-2023

Hi @roalexan
Thanks for using Fabric Community.
1) Please attach the lakehouse to the notebook.

2) Then try running this code:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("table2")

This will create a new table named "table2" in your lakehouse.

Hope this helps. Please let me know if you have any further questions. Glad to help.

View solution in original post

roalexan · ‎12-27-2023

thanks! The solution was indeed to use code such as above

df.write.format("delta").saveAsTable("table2")

I will take a closer look at doc such as here. Wasn't sure delta format was strictly required.

Anonymous · ‎12-27-2023

Hi @roalexan
Glad that your query got resolved. Lakehouse currently supports only delta format tables while creating a new table. Link1

Please continue using Fabric Community for any help regarding your queries.

Anonymous · ‎12-27-2023

Hi @roalexan
Thanks for using Fabric Community.
1) Please attach the lakehouse to the notebook.

2) Then try running this code:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
schema = StructType([
    StructField("Id", IntegerType(), False),
    StructField("Name", StringType(), True)
])
df = spark.createDataFrame([], schema)
df.write.format("delta").saveAsTable("table2")

This will create a new table named "table2" in your lakehouse.

Hope this helps. Please let me know if you have any further questions. Glad to help.

create Lakehouse table using pyspark

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

create Lakehouse table using pyspark

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025