Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
Joshrodgers123
Advocate V
Advocate V

Spark XML does not work with pyspark

Has anyone been able to read XML files in a notebook using pyspark yet? I loaded the spark-xml_2.12-0.16.0.jar library and am trying to run the below code, but it does not seem to recognize the package. I have the same configuration in an azure synapse notebook and it works perfectly. The interesting thing is that this does work in Fabric if I read the xml file using scala instead.

 

I just tried this on the new 2.2 runtime as well and no luck.

 

Code:

df = spark.read.format("xml").option("rowTag", "BillOfLading").load("Files/Freight/kls/raw/KACC20230724.xml")
 
Error: 
Py4JJavaError: An error occurred while calling o5568.load. : org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: xml. Please find packages at `https://spark.apache.org/third-party-projects.html`.
1 ACCEPTED SOLUTION
Fgarcia1986
Regular Visitor

 

One way that i found is:

1 - Create an enviroment

Fgarcia1986_1-1709519835771.png

2 - upload the the file spark-xml_2.12-0.17.0.jar

Fgarcia1986_2-1709519883898.png

 

 

 

Open your notebook and language choose spark(scala) and then place the code below:

%%configure -f
{"conf": {"spark.jars.packages": "com.databricks:spark-xml_2.12:0.16.0"}}
Fgarcia1986_0-1709519689575.png

 

IMPORTANT: Must be the first code in the session and you can use the environment WorkSpace Default, you don´t have to use the environment that you´ve created, i don´t know but worked.
 
Then you can change your language to PySpark(Python) and read xml
 
It takes from 2 to 3 minutes to execute.
 
Let me know if you have any doubt.
 
I hope works for everyone
 
Cheers

View solution in original post

16 REPLIES 16
Fgarcia1986
Regular Visitor

 

One way that i found is:

1 - Create an enviroment

Fgarcia1986_1-1709519835771.png

2 - upload the the file spark-xml_2.12-0.17.0.jar

Fgarcia1986_2-1709519883898.png

 

 

 

Open your notebook and language choose spark(scala) and then place the code below:

%%configure -f
{"conf": {"spark.jars.packages": "com.databricks:spark-xml_2.12:0.16.0"}}
Fgarcia1986_0-1709519689575.png

 

IMPORTANT: Must be the first code in the session and you can use the environment WorkSpace Default, you don´t have to use the environment that you´ve created, i don´t know but worked.
 
Then you can change your language to PySpark(Python) and read xml
 
It takes from 2 to 3 minutes to execute.
 
Let me know if you have any doubt.
 
I hope works for everyone
 
Cheers
Joshrodgers123
Advocate V
Advocate V

That is where I have been loading it. 

Anonymous
Not applicable

Hi  @Joshrodgers123 

Can you please share your workspace id, artifact id ? We'd like to check if it's our issue or hit the error by design. It will be great if you can also share the code snippet along with it. We would like to understand why there is an issue?

You can send us this information through email to AzCommunity[at]Microsoft[dot]com with the below details,

Email subject: <Attn - v-nikhilan-msft  :Spark XML does not work with pyspark>

 

Thanks.

 

Hi @Anonymous, I have emailed all of the requested details. Thanks.

Anonymous
Not applicable

Hi @Joshrodgers123 ,
Thanks for providing the information. I have given the details to the internal team. I will update you once I hear back from them.
Appreciate your patience.

Anonymous
Not applicable

Hi  @Joshrodgers123 

To use PySpark in order to play with XML files, we have to use spark-xml package Link1

Try using the Scala API

%%spark

val df = spark.read
                .format("com.databricks.spark.xml")
                .option("rowTag", "book")
                .load("file:///synfs/nb_resource/builtin/demo.xml")

df.show(10)

 

 You can find the tutorial here: Link2

 

So basically, the format must be .format("com.databricks.spark.xml").

 

 

I have already installed that package. The code you provided is scala, which does work. Pyspark does not work though. 

Anonymous
Not applicable

Hi @Joshrodgers123 
Apologies for the delay in response.

I would request you to please go ahead with Microsoft support for this. Please raise a support ticket on this link: https://support.fabric.microsoft.com/en-US/support/.

Also once you have opened the support ticket , please do share the supportcase# here so that we can keep an eye on it.

Thanks

Here is the support ticket: 2311150040007106

@Josh Did you get a reply on how to do use spark-xml with pyspark in Fabric? Thanks

It doesn't seem to be supported with pyspark. I got it working by loading the data with scala and then doing my transformations with pyspark. 

My workaround is loading into a Pandas dataframe and then converting it to a pyspark dataframe before writing to delta tables.

Anonymous
Not applicable

Hi @Joshrodgers123 
Thanks for the details. We expect you to keep using this forum and also motivate others to do that same. 
Thanks

Anonymous
Not applicable

Hi @Joshrodgers123 ,

Thanks for using Fabric Community.

Apologies for the issue you have been facing. 

We are reaching out to the internal team to get more information related to your query and will get back to you as soon as we have an update.

Appreciate your patience.

Anonymous
Not applicable

Hi @Joshrodgers123 

Could you please try to upload the .jar file in library management, and install it then use in notebook?

vnikhilanmsft_0-1699260007446.png

Please upload the .jar file here and try running the pyspark code.
Hope this helps. Please let us know if you have any further questions.

Can you provide a link to the .jar file or to the webpage where we can download it please? 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.