This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
As announced in the previous blog post on this topic, the Fabric Apache Spark Diagnostic Emitter for Logs and Metrics has recently entered preview, enabling you to emit Spark logs and metrics to various Azure destinations.
In this blog post, you will learn how to build a centralized Spark monitoring solution, leveraging Fabric Real-Time Intelligence capabilities. To do this we are going to integrate a Fabric Spark data emitter directly with Fabric Eventstream and Eventhouse to build a centralized Spark monitoring solution.
In order to set up the Spark monitoring solution, you will need a workspace and a Fabric capacity in order to deploy the following artifacts:
Here is an example of what your workspace should look like when all these artifacts are deployed:
Monitor_Fabric_Spark_applications_using_Fabric_Real-Time_Intelligence
Open the Eventstream (e.g., es_sparklogging) you just created and create a new source -> Custom App. Once the new source is created, take note of the Connection String - Primary Key, which can be found in the Details section at the bottom under Event Hub -> Keys, for your Step 3 configuration.
Monitor_Fabric_Spark_applications_using_Fabric_Real-Time_Intelligence
Open the Spark Environment and go to the Spark Properties tab:
Make sure you Save and Publish the changes to the Environment after configuring the properties.
Monitor_Fabric_Spark_applications_using_Fabric_Real-Time_Intelligence
Open the Spark Notebook and write some sample code. It can be as simple as writing a code cell with 1+1. Make sure you attach the environment you set up in Step 3 in the navigation bar of the Notebook:
Monitor_Fabric_Spark_applications_using_Fabric_Real-Time_Intelligence
Run the notebook. If you have set up the Environment correctly, Logs and Metrics will start to emit to the Eventstream that was set up in Step 2.
Reopen the Eventstream (e.g., es_sparklogging) and add a destination -> KQL Database. Select the Eventhouse (e.g., eh_sparklogs) that you set up in Step 1. If you haven’t created a table yet, you can create one in the UI using the schema of the logs, now that the logs should be present in the Eventstream. If there are no logs in the Eventstream, it’s likely due to an error in the Spark Environment Properties configured in Step 3.
Monitor_Fabric_Spark_applications_using_Fabric_Real-Time_Intelligence
Open the Eventhouse and go to the table where you store the Spark Logs and Metrics. In the UI, you can see a preview of the data when selecting the table as well as when last data was ingested:
Monitor_Fabric_Spark_applications_using_Fabric_Real-Time_Intelligence
Now that we have data in the Eventhouse, we can query the table using KQL. For example, in order to consume all the Spark Metrics, you can run the following query:
AllSparkLogs
|Where category == "Metrics"
|Evaluate bag_unpack(properties,"p_")
You can use the bag_unpack plugin to unpack the properties column which contains more detailed information.
In this blog post you have seen how simple it is to set up a Spark monitoring solution in Fabric using Real-Time Intelligence. You can apply the same logic to emit not just one, but all the Spark logs and metrics generated across all Notebooks in your Fabric estate to an Eventhouse. This will empower Fabric data engineers by leveraging a consistent and integrated monitoring experience with enhanced diagnostic and performance tuning capabilities.
For more information, please visit Collect your Apache Spark applications logs and metrics using Azure Event Hubs - Microsoft Fabric | ....
I would like to express sincere gratitude to Jenny Jiang for the collaboration on this blog post and her outstanding work on delivering this feature.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.