Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more

Reply
friend1
Frequent Visitor

How to decode protobuf data with a Fabric PySpark notebook using the from_protobuf() method?

Hi there,

I am trying to stream event data from an Azure Event Hub using Spark Structured Streaming from within a Fabric Notebook to a lakehouse. The event data is protobuf and base64 encoded. I wanted to use the "from_protobuf()" method (Protobuf Data Source Guide - Spark 3.5.4 Documentation) to decode the payload. 

However, I am getting the following error message: 

"Spark Protobuf libraries not found in class path. Try one of the following. 1. Include the Protobuf library and its dependencies with in the spark-submit command as $ bin/spark-submit --packages org.apache.spark:spark-protobuf:3.4.3.5.3.20241016.1 ... 2. Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-protobuf, Version = 3.4.3.5.3.20241016.1. Then, include the jar in the spark-submit command as $ bin/spark-submit --jars <spark-protobuf.jar> ..."


Now my question is how do I do this in a Fabric notebook environment? Is there a way to include the mentioned library? 

Also: I have two Python modules that contain the classes generated from the .proto schemas which are required to decode the payload. Where do I have to put these so that I can hand them to the from_protobuf() method?

 

Looking forward to any ideas on this! Thanks a lot and best, flo.

7 REPLIES 7
v-pbandela-msft
Community Support
Community Support

Hi @friend1,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please "Accept  as  Solution" and give a 'Kudos' so other members can easily find it.

Thank you,
Pavan.

v-pbandela-msft
Community Support
Community Support

Hi @friend1,

I wanted to follow up since we haven't heard back from you regarding our last response. We hope your issue has been resolved.
If the community member's answer your query, please mark it as "Accept as Solution" and select "Yes" if it was helpful.
If you need any further assistance, feel free to reach out.

Please continue using Microsoft community forum.

Thank you,
Pavan.

v-pbandela-msft
Community Support
Community Support

Hi @friend1,

Thank you for reaching out in Microsoft Community Forum.


We would like to inquire if the solution offered by  @nilendraFabric  has resolved your issue. If you have discovered an alternative approach, we encourage you to share it with the community to assist others facing similar challenges.
Should you find the response helpful, please mark it as the "Accept as Solution" and add "kudos". This recognition benefits other members seeking solutions to related queries.

Regards,
Pavan.

Hi,

no, issue has not yet been solved. I haven't had time yet to try the suggested solution. Also, my question above contains a second part - where to put the python classes - that has not yet been addressed at all...

Hi @friend1,


Thank you for reaching out in Microsoft Community Forum.

As suggested by @nilendraFabric,  we hope your issue has been resolved. If you have discovered an alternative approach, we encourage you to share it with the community to assist others facing similar challenges.
Should you find the response helpful, please mark it as the "Accept as Solution" and add "kudos". This recognition benefits other members seeking solutions to related queries.

Please continue using Microsoft community forum.

Regards,
Pavan.

Hello @friend1 

You can upload them here 

nilendraFabric_0-1737989571354.png

Thaks 

nilendraFabric
Community Champion
Community Champion

Hello @friend1 


You can make `from_protobuf()` work in a Fabric notebook by adding Spark’s Protobuf library as a custom JAR library in your Fabric environment, then referencing your descriptor file (or binary descriptor) in your notebook. Here is a general workflow:

 

1. Obtain the spark-protobuf JAR
• Download the JAR from a public source such as Maven Central. For example, the group/artifact can be `org.apache.spark:spark-protobuf_2.12:x.x.x` (the version should match your Spark runtime).
2. Add the JAR as a custom library
• In Fabric, open your environment for editing (for example, via Workspace → Your Environment → Libraries).
• Upload the spark-protobuf JAR as a custom library.
• Select Publish to finalize the addition of that JAR into your environment so that spark-protobuf is on the classpath when your notebook session starts

https://learn.microsoft.com/en-us/fabric/data-engineering/library-management

 

 

 

 

if this helps please accept the solution 

 

thanks

Helpful resources

Announcements
MarchFBCvideo - carousel

Fabric Monthly Update - March 2025

Check out the March 2025 Fabric update to learn about new features.