Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Zoe_Guest
Regular Visitor

spark.sql is getting old data that was deleted from Lakehouse whereas spark.read.load doesn't

I have data in a Lakehouse and I have deleted some of it. I am trying to load it from a Fabric Notebook.

 

When I use spark.sql("SELECT * FROM parquet.`<abfs_path>/Tables/<table_name>`" then I get the old data I have deleted from the lakehouse.

 

When I use spark.read.load(<abfs_path>/Tables/<table_name>) I dont get this deleted data.

 

I have to use the abfs path as I am not setting a default lakehouse and can't set one to solve this.

 

Why is this old data coming up when I use spark.sql when the paths are exactly the same?

1 ACCEPTED SOLUTION

solved by changing it to delta

 

spark.sql("SELECT * FROM delta.`<abfs_path>/Tables/<table_name>`")

View solution in original post

6 REPLIES 6
v-prasare
Community Support
Community Support

@Zoe_Guest Thanks for being part of Fabric community and making it grow. 

@wardy912 Thanks for your prompt response.

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

wardy912
Responsive Resident
Responsive Resident

df = spark.sql("""
    SELECT *
    FROM <lakehouse>.<schema>.<table>
""")
 
You can also drag the table from the left hand side in the lakehouse to a cell and it will automatically add a SQL query for that table.

I can't set it as the default lakehouse which is why i want to use the abfs path, how do you do this with the abfs path?

solved by changing it to delta

 

spark.sql("SELECT * FROM delta.`<abfs_path>/Tables/<table_name>`")

wardy912
Responsive Resident
Responsive Resident

The paths are the same but you're using a different method to query them

spark.sql("SELECT * FROM parquet.`<abfs_path>/Tables/<table_name>`")

Spark SQL - may be using cached metadata

 

spark.read.load("<abfs_path>/Tables/<table_name>")

 

Dataframe API - reads current state of files

 

You could add a cell to your notebook that clears the cache if you want to use the Spark SQL code


spark.catalog.clearCache()

 

Please give a thumbs up if this helps, thanks

 

Unfortuantly clearing the cache doesn't work.

 

however this also gets the deleted data, so i think it's in specifying parquet.

spark.read.format("parquet").load(_table_abfs)
I want to be able to use a sql query and the abfs path to the data to load the data, any ideas on how i can do this? 

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.