Solved: spark.sql is getting old data that was deleted fro...

Zoe_Guest · ‎06-19-2025

I have data in a Lakehouse and I have deleted some of it. I am trying to load it from a Fabric Notebook.

When I use spark.sql("SELECT * FROM parquet.`<abfs_path>/Tables/<table_name>`" then I get the old data I have deleted from the lakehouse.

When I use spark.read.load(<abfs_path>/Tables/<table_name>) I dont get this deleted data.

I have to use the abfs path as I am not setting a default lakehouse and can't set one to solve this.

Why is this old data coming up when I use spark.sql when the paths are exactly the same?

Zoe_Guest · ‎06-19-2025

solved by changing it to delta

spark.sql("SELECT * FROM delta.`<abfs_path>/Tables/<table_name>`")

View solution in original post

v-prasare · ‎06-19-2025

@Zoe_Guest Thanks for being part of Fabric community and making it grow.

@wardy912 Thanks for your prompt response.

Thanks,

Prashanth Are

MS Fabric community support

wardy912 · ‎06-19-2025

df = spark.sql("""

SELECT *

FROM <lakehouse>.<schema>.<table>

""")

You can also drag the table from the left hand side in the lakehouse to a cell and it will automatically add a SQL query for that table.

Zoe_Guest · ‎06-19-2025

I can't set it as the default lakehouse which is why i want to use the abfs path, how do you do this with the abfs path?

Zoe_Guest · ‎06-19-2025

solved by changing it to delta

spark.sql("SELECT * FROM delta.`<abfs_path>/Tables/<table_name>`")

wardy912 · ‎06-19-2025

The paths are the same but you're using a different method to query them

spark.sql("SELECT * FROM parquet.`<abfs_path>/Tables/<table_name>`")

Spark SQL - may be using cached metadata

spark.read.load("<abfs_path>/Tables/<table_name>")

Dataframe API - reads current state of files

You could add a cell to your notebook that clears the cache if you want to use the Spark SQL code

spark.catalog.clearCache()

Please give a thumbs up if this helps, thanks

Zoe_Guest · ‎06-19-2025

Unfortuantly clearing the cache doesn't work.

however this also gets the deleted data, so i think it's in specifying parquet.

spark.read.format("parquet").load(_table_abfs)

I want to be able to use a sql query and the abfs path to the data to load the data, any ideas on how i can do this?

spark.sql is getting old data that was deleted from Lakehouse whereas spark.read.load doesn't

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

spark.sql is getting old data that was deleted from Lakehouse whereas spark.read.load doesn't

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025