This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
Fabric Apache Spark Diagnostic Emitter for Logs and Metrics is now in public preview. This new feature allows Apache Spark users to collect Spark logs, job events, and metrics from their Spark applications and send them to various destinations, including Azure Event Hubs, Azure Storage, and Azure Log Analytics. It provides robust support for monitoring and troubleshooting Spark applications, enhancing your visibility into application performance.
What Does the Diagnostic Emitter Do?
The Fabric Apache Spark Diagnostic Emitter enables Apache Spark applications to emit critical logs and metrics that can be used for real-time monitoring, analysis, and troubleshooting. Whether you're sending logs to Azure Event Hubs, Azure Storage, or Azure Log Analytics, this emitter simplifies the process, allowing you to collect data seamlessly and store it in your preferred destinations.
Key Benefits of the Apache Spark Diagnostic Emitter
Below is a quick step-by-step guide for the one-time configuration of the destination for collecting logs and metrics.
To begin, you’ll need an Azure Event Hubs instance, an Azure Log Analytics workspace, or an Azure Blob Storage account, based on your preference. If you don’t already have one, you can quickly create one via the Azure portal.
Next, you'll need to create a Fabric Environment Artifact in Microsoft Fabric and configure it with the required Spark properties.
Here are some example key configuration properties available for the diagnostic emitter:
spark.synapse.diagnostic.emitters: Comma-separated names of diagnostic emitters.spark.synapse.diagnostic.emitter.<destination>.type: The destination type (e.g., AzureEventHub).spark.synapse.diagnostic.emitter.<destination>.categories: The log categories to be collected (e.g., DriverLog, ExecutorLog, EventLog, Metrics).spark.synapse.diagnostic.emitter.<destination>.secret: The Azure Event Hubs connection string.spark.synapse.diagnostic.emitter.<destination>.secret.keyVault: Azure Key Vault name for storing the connection string.For a full list of configuration options, refer to the official documentation below.
Once configured, attach your environment artifact to a Notebook or Spark Job Definition.
After this configuration, you can run your Notebooks or Spark jobs as you normally do. You can now efficiently collect and analyze logs and metrics from your Apache Spark applications using your preferred destination. This feature simplifies monitoring and debugging, allowing you to focus on your core business logic. Additionally, you can query, aggregate, and create custom alerts in Azure Monitor by querying logs and metrics at regular intervals, with alerts triggered based on your defined criteria.
Log Data Sample
Here is a sample log record in JSON format, showing how Spark logs and metrics are captured:
jsonCopy code{
"timestamp": "2024-09-06T03:09:37.235Z",
"category": "Log|EventLog|Metrics",
"fabricLivyId": "<fabric-livy-id>",
"applicationId": "<application-id>",
"applicationName": "<application-name>",
"executorId": "<driver-or-executor-id>",
"properties": {
"message": "Initialized BlockManager: BlockManagerId(1, vm-04b22223, 34319, None)",
"logger_name": "org.apache.spark.storage.BlockManager",
"level": "INFO"
}
}
Stay tuned for more updates, and happy coding!
Related documents:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.