The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
We are starting to write to the "Lakehouse" from pyspark in Fabric and it isn't going well.
Sometimes the Lakehouse will properly recognize a table and sometimes it won't.
Notice below that many of the delta tables that are written from pyspark are presented in an "Unidentified" folder. I haven't found a pattern or explanation for this.
Are lakehouses still in "preview"? Is there any way to kick the lakehouse and get it to properly recognize tables? Is there any reason why our tables are being saved into this state of purgatory, instead of appearing as a normal table?
Here is an example of a notebook where we are writing to delta, in this part of the lakehouse ("Tables")
... is there a better API that allows the data to be validated as a Fabric table, during the write operation?
Solved! Go to Solution.
Hello @dbeavon3
Using `.save()` instead of `.saveAsTable()` can result in Delta tables not being properly registered in the metastore, leading them to appear under the “Unidentified” folder.
• The `.save()` method writes data to storage but does not register it as a managed table.
Hi @dbeavon3 ,
You can utilize this specific Fabric REST API to retrieve a list of tables within a workspace, along with their respective formats.
Lakehouse management API - Microsoft Fabric | Microsoft Learn
This might assist you in that process.
If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.
Hi @dbeavon3 ,
Thank you for reaching out to us on the Microsoft Fabric Community Forum.
I have also encountered the same issue. When the code is not structured properly, the table gets placed in the "Unidentified" folder instead of the "Tables" section.
Lakehouse doesn’t always register Delta tables written from PySpark as managed tables automatically. For complex data, tables may appear with a delay, but a simple refresh usually resolves the issue.
If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.
@v-menakakota
Thanks for the details. How do we see the metastore information, aside from visual inspection.
Is there an API to export that information? Or will it appear in git, if I integrate with source control?
Please let me know how to take next steps to investigate. Ultimately I'd like to find a pattern, and have some very specific observations to share with Mindtree support (not just screenshots). After I have gathered these things I'm hoping they will fix the bugs in their Spark and Lakehouse stuff. This stuff is very buggy and I hope that some day my efforts will not be spent on these bugs (about ~10 or 20% of my time in Fabric)
Hi @dbeavon3 ,
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you and Regards,
Menaka.
Hi @dbeavon3 ,
You can utilize this specific Fabric REST API to retrieve a list of tables within a workspace, along with their respective formats.
Lakehouse management API - Microsoft Fabric | Microsoft Learn
This might assist you in that process.
If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.
Yes, this API should allow me to better understand why some things are registered as normal tables and others are not. Thanks for the tip.
Hello @dbeavon3
Using `.save()` instead of `.saveAsTable()` can result in Delta tables not being properly registered in the metastore, leading them to appear under the “Unidentified” folder.
• The `.save()` method writes data to storage but does not register it as a managed table.
Hi @nilendraFabric
Thanks for the response.
It seems super redundant to write to "delta" in the "/Tables" folder, and still not get the desired result. Adding "saveAsTable" seems like I'm just repeating myself for the third time.
>> does not register it as a managed table.
I can see this by browsing the Lakehouse in the Power BI service, and looking at the visual representation. Aside from visual inspection, is there another place I can compare the differences between something that is NOT registered (ie "Unidentified") and something that is registered (ie "managed table")? Is there a metastore representation when can be retrieved, and will show me the list of all registered items? If I set up the git integration, will that push the "metastore" to a git repo, where it can be examined?
.. I feel like I'm poking around in the dark. I suppose I could also send all my data out to a "real" storage account in Azure and avoid some of the proprietary stuff that is going on in Fabric. I do not actually need all the bells and whistles (like automatic semantic models and SQL endpoints and stuff). All that stuff probably adds to the cost, without providing no real-world value above and beyond the table-blobs themselves.