March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
Register NowGet certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
I am using a pyspark notebook, reading data from one lakehouse (not schema enabled) to another lakehouse (schema enabled) in the same workspace... however:
it ONLY works if I use: df.write.format("delta").mode("overwrite").option("mergeSchema", "true").save("abfss://abc123@onelake.dfs.fabric.microsoft.com/def456/Tables/dim/testnewtable")
when I should be able to use: df.write.format("delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("testingSchemas.dim.testnewtable").
This seems like a bug to me, as non schema-enabled lakehouses work like my 2nd example just fine.
Solved! Go to Solution.
Hi @kely
My previous reply is based on my testing results, without any documentation supporting. So I'm not sure whether it's expected behavior, or it's a limitation, or it might be improved in the future.
If your notebook is currently using a 'bronze' lakehouse as the default lakehouse, using saveAsTable with 'goldlakehouse.schema.table' is probably not valid. Usually switching the default lakehouse of a running notebook will need to terminate the session first. If I understand it correctly, one session can have only one default lakehouse. So at least one lakehouse needs to use the ABFS path in the notebook.
As of now, I recommend using ABFS paths and save function to write the dataframe in your case. This is more reliable.
Best Regards,
Jing
Hi @kely
My testing result is somewhat different. When I attach a schema-enabled lakehouse to the notebook and use saveAsTable function to save a dataframe to the default lakehouse, it works fine with both 3 part naming and 2 part naming.
However when the destination schema-enabled lakehouse is not attached as the default lakehouse of the notebook, it failed with an error "AnalysisException: Couldn't find a catalog to handle the identifier LHschema.raw.newTableOne."
It seems when using saveAsTable, it always tries to look for the schema and table in the default lakehouse. With 3 part naming, saveAsTable treats the first part as a schema within the default lakehouse. However actually it should be the name of another lakehouse. This leads to the failure.
Best Regards,
Jing
Community Support Team
thx @v-jingzhan-msft !
So if I am in a notebook connected to a 'bronze' lakehouse, and I am writing to a 'gold' lakehouse that is schema enabled... is this not a valid use case?
i.e. When I write my gold table i would use goldlakehouse.schema.table?
Hi @kely
My previous reply is based on my testing results, without any documentation supporting. So I'm not sure whether it's expected behavior, or it's a limitation, or it might be improved in the future.
If your notebook is currently using a 'bronze' lakehouse as the default lakehouse, using saveAsTable with 'goldlakehouse.schema.table' is probably not valid. Usually switching the default lakehouse of a running notebook will need to terminate the session first. If I understand it correctly, one session can have only one default lakehouse. So at least one lakehouse needs to use the ABFS path in the notebook.
As of now, I recommend using ABFS paths and save function to write the dataframe in your case. This is more reliable.
Best Regards,
Jing
@kely thanks for sharing your workaround.
I guess in the future we will all be using schema enabled Lakehouses, once they turn GA.
But we will have many existing Lakehouses that are not schema enabled.
So we will need to be able to work with both of them at the same time, like you're wanting to do.
I hope this will be supported out-of-the box across schema enabled Lakehouses and non-schema enabled Lakehouses in the future.
This could be a public preview limitation. There's a few items listed in the limitations section talking about part naming conventions, it may be that this particular issue isn't documented.
It may be useful to create an Idea to make sure this functionality is being looked at
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
Check out the October 2024 Fabric update to learn about new features.
User | Count |
---|---|
13 | |
8 | |
5 | |
4 | |
2 |
User | Count |
---|---|
26 | |
23 | |
15 | |
12 | |
5 |