Solved: Lakehouse table error in Spark notebook in Fabric

SamKum · ‎01-14-2025

Even though I have set up a Lakehouse as default to my notebook, When I try to refer tables directly in Spark notebook, they are not recognized. Please find attached screen print and suggest what could be the reason.

Note: In attached screen print using select on a lakehouse table and creating a dataframe will work.

DataBard · ‎01-20-2025

For your code above @SamKum , the join function requires specifying a dataframe to join into, not a Delta Table.

With the Default Lakehouse attached to your notebook, you can reference the Delta Table to build a dataframe either via the absolute path:

absolute_path = 'abfss://00000000-0000-0000-0000-000000000000@onelake.dfs.fabric.microsoft.com/00000000-0000-0000-0000-000000000000/Tables/DimCustomer'

#Can I connect absolutely?
absolute_df = spark.read.format("delta").load(absolute_path)
display(absolute_df)

Or the Relative Path:

relative_path = 'Tables/DimCustomer'
relative_df = spark.read.format("delta").load(relative_path)
display(relative_df)

Or via a Spark SQL query against the default lakehouse:

#Default way from drag-and-drop
df = spark.sql("SELECT * FROM Databard_Demo.DimCustomer LIMIT 1000")
display(df)

You are correct, using the relative path and Spark SQL is simpler. However, it requires attaching your notebook to a default lakehouse, which must then be managed should you deploy your notebook to different environments (i.e. Dev/Test/Prod). Without a default lakehouse, the Spark SQL and relative paths won't work, and you would have to reference the absolute path.

As far as I'm aware, I've never seen a way where Delta Tables can be referenced directly like in your example, at least not without defining the Delta Table as a variable in PySpark. The closest I've seen is loading the Delta Tables into their own variables, and then using Spark SQL to join the tables. Here's an example:

#These show absolute paths, but you can do relative with a default lakehouse
delta_table_path_1 = "abfss://<your-workspace-id>@onelake.dfs.fabric.microsoft.com/<your-lakehouse-id>/<your-delta-table-path-1>"
delta_table_path_2 = "abfss://<your-workspace-id>@onelake.dfs.fabric.microsoft.com/<your-lakehouse-id>/<your-delta-table-path-2>"

# Load the Delta Tables
delta_table_df_1 = spark.read.format("delta").load(delta_table_path_1)
delta_table_df_2 = spark.read.format("delta").load(delta_table_path_2)

# Create temporary views
delta_table_df_1.createOrReplaceTempView("delta_table_1")
delta_table_df_2.createOrReplaceTempView("delta_table_2")

# Execute a SQL query to join the tables
# Assumes Customer data for demo purposes, but could be any query
joined_df = spark.sql("""
    SELECT 
        t1.CustomerKey, 
        t1.FirstName, 
        t1.LastName, 
        t2.OrderKey, 
        t2.OrderDate
    FROM 
        delta_table_1 t1
    JOIN 
        delta_table_2 t2
    ON 
        t1.CustomerKey = t2.CustomerKey
""")

# Show the result
joined_df.show()

Hope this helps! If it does, be sure to accept an answer so others can learn from this conversation.

View solution in original post

v-saisrao-msft · ‎01-27-2025

Hi @SamKum
I wanted to check if you had the opportunity to review the information. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.

v-saisrao-msft · ‎01-24-2025

Hi @SamKum ,
Thank you for reaching to the Microsoft forum community.

I wanted to check if you had the opportunity to review the information provided by @DataBard . Please feel free to contact us if you have any further questions. If the response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

DataBard · ‎01-20-2025

For your code above @SamKum , the join function requires specifying a dataframe to join into, not a Delta Table.

With the Default Lakehouse attached to your notebook, you can reference the Delta Table to build a dataframe either via the absolute path:

absolute_path = 'abfss://00000000-0000-0000-0000-000000000000@onelake.dfs.fabric.microsoft.com/00000000-0000-0000-0000-000000000000/Tables/DimCustomer'

#Can I connect absolutely?
absolute_df = spark.read.format("delta").load(absolute_path)
display(absolute_df)

Or the Relative Path:

relative_path = 'Tables/DimCustomer'
relative_df = spark.read.format("delta").load(relative_path)
display(relative_df)

Or via a Spark SQL query against the default lakehouse:

#Default way from drag-and-drop
df = spark.sql("SELECT * FROM Databard_Demo.DimCustomer LIMIT 1000")
display(df)

You are correct, using the relative path and Spark SQL is simpler. However, it requires attaching your notebook to a default lakehouse, which must then be managed should you deploy your notebook to different environments (i.e. Dev/Test/Prod). Without a default lakehouse, the Spark SQL and relative paths won't work, and you would have to reference the absolute path.

As far as I'm aware, I've never seen a way where Delta Tables can be referenced directly like in your example, at least not without defining the Delta Table as a variable in PySpark. The closest I've seen is loading the Delta Tables into their own variables, and then using Spark SQL to join the tables. Here's an example:

#These show absolute paths, but you can do relative with a default lakehouse
delta_table_path_1 = "abfss://<your-workspace-id>@onelake.dfs.fabric.microsoft.com/<your-lakehouse-id>/<your-delta-table-path-1>"
delta_table_path_2 = "abfss://<your-workspace-id>@onelake.dfs.fabric.microsoft.com/<your-lakehouse-id>/<your-delta-table-path-2>"

# Load the Delta Tables
delta_table_df_1 = spark.read.format("delta").load(delta_table_path_1)
delta_table_df_2 = spark.read.format("delta").load(delta_table_path_2)

# Create temporary views
delta_table_df_1.createOrReplaceTempView("delta_table_1")
delta_table_df_2.createOrReplaceTempView("delta_table_2")

# Execute a SQL query to join the tables
# Assumes Customer data for demo purposes, but could be any query
joined_df = spark.sql("""
    SELECT 
        t1.CustomerKey, 
        t1.FirstName, 
        t1.LastName, 
        t2.OrderKey, 
        t2.OrderDate
    FROM 
        delta_table_1 t1
    JOIN 
        delta_table_2 t2
    ON 
        t1.CustomerKey = t2.CustomerKey
""")

# Show the result
joined_df.show()

Hope this helps! If it does, be sure to accept an answer so others can learn from this conversation.

govindarajan_d · ‎01-15-2025

Hi @SamKum,

The code should be:

joined_df = sales_example.join(product_example, product_example.product_ID == sales_example.product_ID, "inner")

You should refer the dataframe name rather than the table name.

DataBard · ‎01-14-2025

Hi @SamKum ,

Adding a default lakehouse makes it easier to use the UI to create dataframes, as well as giving you an option to refer to the table path to find the table in your lakehouse relatively instead of having to reference the absolute path. Either way, you typically have to refer to the path along with your table itself (or lakehouse.schema.table, in the case of line 1 in your code).

In your case, instead of referencing 'products', either use the dataframe you are creating in the above example, or use a spark function to reference the Delta table like this:

# Read Delta tables
delta_table_1 = spark.read.format("delta").load("/path/to/delta_table_1")

Hopefully this helps! If it does, be sure to accept the answer so others in the community can learn from your question.

SamKum · ‎01-16-2025

@DataBard , If creating datframe is only way to go, in your solution path to delta table is absolue path? If yes, my attempt of creating dataframe with a select query is much simpler.

I saw an Udemy video where they demonstrated directly referring delta table name from lakehouse attached to the notebook. Do we you know if this was allowed in earlier version or this was never a feature?

Lakehouse table error in Spark notebook in Fabric

Helpful resources

Fabric Monthly Update - August 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Lakehouse table error in Spark notebook in Fabric

Helpful resources

Fabric Monthly Update - August 2025

Fabric Community Update - August 2025