March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi everyone,
I was looking for some guidance and some thoughts on best practises around using Fabric, specifically Lakehouses/Warehouses and ingesting data with Dataflow Gen2 or other options if you can recommend any.
I want to ingest .xml files stored in a SharePoint document library into the Lakehouse. The only way I find this is possible is using a Dataflow Gen2, getting the sharepoint library and opening the .xml files binary content with Xml.Tables. Then expanding all the needed columns and thus loading in all the tables that come forth of the .xml file into the Lakehouse.
Does anyone know of some other way to parse these .xml files (a lot) faster than using Dataflow Gen2? What are my options? Would using OneLake help me getting/loading the files faster, should I use PySpark to get the tables from the .xml or maybe even SQL (we are talking about hundreds of .xml files ranging from 1mb to 100mb to 5gb)?
Would love to get some extra eyes and experiences on this 🙂
Thanks!
Solved! Go to Solution.
Hi @SuperFiets_
You cannot use the above scenario for a sharepoint folder. Best option would be to use Dataflow Gen2.
Thanks
Hi @SuperFiets_
Thanks for using Fabric Community.
You can refer to this link for using pipelines :
Microsoft Fabric - Ingest XML into Lakehouse - Dan Ambler
Hope this helps. Please let me know if you have any further questions.
Hi,
Thanks for your response!
How would I be able to implement the use case you specified into a pipeline that gets the data from SharePoint instead of an API? And more importantly would this be able to work for multiple .xml files?
Hi @SuperFiets_
You cannot use the above scenario for a sharepoint folder. Best option would be to use Dataflow Gen2.
Thanks
Thank you, in that case I will be using dataflows for now.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.