Issues executing notebook using custom databricks library uploaded

Anonymous — Wed, 09 Oct 2024 14:41:19 GMT

I have been trying to process xml content using pyspark and dataframes as per the solution in the post https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-XML-does-not-work-with-pyspark/td-p/3515934

I am encoutering some execution errors in the notebook. As per the solution the first code element in the notebook is

%%configure -f {"conf": {"spark.jars.packages": "com.databricks:spark-xml_2-13-0.18.0"}}

Depending on how I exedcute this I get two different errors.

a) I connect to the spark instance first in the notebook. This takes 2 to 3 minutes to startup due to the loading of the custom environment with the databricks library. Then I execute the code fragment in the notebook:

SparkCoreError/UnexpectedSessionState: Livy session has failed. Error code: SparkCoreError/UnexpectedSessionState. SessionInfo.State from SparkCore is Error: Encountered an unexpected session state Dead while waiting for session to become Idle. Error description: Spark_User_Requirements_IllegalArgumentException. Source: System.

b) I execute the code fragment first which in turn connect to the spark instance using the custom environment. After 2 or 3 minutes I get this error

invalidHttpRequestToLivy: [TooManyRequestsForCapacity] This spark job can't be run because you have hit a spark compute or API rate limit. To run this spark job, cancel an active Spark job through the Monitoring hub, choose a larger capacity SKU, or try again later. HTTP status code: 430 {Learn more} HTTP status code: 430.

Is there a workaround? I can't imagine capacity is the real problem.

Any thoughts appreciated.

Re: Issues executing notebook using custom databricks library uploaded

Anonymous — Thu, 10 Oct 2024 08:16:35 GMT

Hi @Anonymous

A simple workaround is to use Pandas to read data from the xml file into a Pandas dataframe, then convert the Pandas dataframe into a Spark dataframe. For example,

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

Re: Issues executing notebook using custom databricks library uploaded

Anonymous — Thu, 10 Oct 2024 17:44:24 GMT

Perfect, works perfectly in my test case... now to try it in my real world scenarios

topic Re: Issues executing notebook using custom databricks library uploaded in Data Engineering

Issues executing notebook using custom databricks library uploaded

Re: Issues executing notebook using custom databricks library uploaded

Re: Issues executing notebook using custom databricks library uploaded