March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Has anyone been able to read XML files in a notebook using pyspark yet? I loaded the spark-xml_2.12-0.16.0.jar library and am trying to run the below code, but it does not seem to recognize the package. I have the same configuration in an azure synapse notebook and it works perfectly. The interesting thing is that this does work in Fabric if I read the xml file using scala instead.
I just tried this on the new 2.2 runtime as well and no luck.
Code:
Solved! Go to Solution.
One way that i found is:
1 - Create an enviroment
2 - upload the the file spark-xml_2.12-0.17.0.jar
Open your notebook and language choose spark(scala) and then place the code below:
One way that i found is:
1 - Create an enviroment
2 - upload the the file spark-xml_2.12-0.17.0.jar
Open your notebook and language choose spark(scala) and then place the code below:
That is where I have been loading it.
Can you please share your workspace id, artifact id ? We'd like to check if it's our issue or hit the error by design. It will be great if you can also share the code snippet along with it. We would like to understand why there is an issue?
You can send us this information through email to AzCommunity[at]Microsoft[dot]com with the below details,
Email subject: <Attn - v-nikhilan-msft :Spark XML does not work with pyspark>
Thanks.
Hi @Joshrodgers123 ,
Thanks for providing the information. I have given the details to the internal team. I will update you once I hear back from them.
Appreciate your patience.
To use PySpark in order to play with XML files, we have to use spark-xml package Link1
Try using the Scala API
%%spark
val df = spark.read
.format("com.databricks.spark.xml")
.option("rowTag", "book")
.load("file:///synfs/nb_resource/builtin/demo.xml")
df.show(10)
You can find the tutorial here: Link2
So basically, the format must be .format("com.databricks.spark.xml").
I have already installed that package. The code you provided is scala, which does work. Pyspark does not work though.
Hi @Joshrodgers123
Apologies for the delay in response.
I would request you to please go ahead with Microsoft support for this. Please raise a support ticket on this link: https://support.fabric.microsoft.com/en-US/support/.
Also once you have opened the support ticket , please do share the supportcase# here so that we can keep an eye on it.
Thanks
Here is the support ticket: 2311150040007106
@Josh Did you get a reply on how to do use spark-xml with pyspark in Fabric? Thanks
It doesn't seem to be supported with pyspark. I got it working by loading the data with scala and then doing my transformations with pyspark.
My workaround is loading into a Pandas dataframe and then converting it to a pyspark dataframe before writing to delta tables.
Hi @Joshrodgers123
Thanks for the details. We expect you to keep using this forum and also motivate others to do that same.
Thanks
Hi @Joshrodgers123 ,
Thanks for using Fabric Community.
Apologies for the issue you have been facing.
We are reaching out to the internal team to get more information related to your query and will get back to you as soon as we have an update.
Appreciate your patience.
Could you please try to upload the .jar file in library management, and install it then use in notebook?
Please upload the .jar file here and try running the pyspark code.
Hope this helps. Please let us know if you have any further questions.
Can you provide a link to the .jar file or to the webpage where we can download it please?
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.