This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
Efficient_log_management_with_Microsoft_Fabric
In the era of digital transformation, managing and analyzing log files in real-time is essential for maintaining application health, security, and performance. There are many 3rd party solutions in this area allowing collecting/processing storing, analyzing and acting upon this data source. Sometimes as your systems scale, those solution can become very costly, their cost model increases based on the amount of data they ingest and not according to the customer value.
This blog post explores a robust architecture leveraging Microsoft Fabric SaaS platform focused on its Real-Time Intelligence capabilities for efficient log files collection processing and analysis.
The use cases can vary from simple application errors troubleshooting, to more advanced use case such as application trends detection. For example Detecting slowly degrading performance issues like average user session in the app for specific activities last more than expected or more proactive monitoring using log based KPIs definition and monitoring those APIs for alerts generation.
Because Fabric provides a complete separation between compute and storage you can grow your data without necessarily growing your compute costs.
The proposed architecture integrates Microsoft Fabric’s Real-Time Intelligence (Real-Time Hub) with your source log files to create a seamless, near real-time log collection solution.
It is based on Microsoft Fabric: a Microsoft SAAS solution which is a unified suite of analytical experiences. Fabric is a modern data/AI platform based on unified and open data formats (parquet/delta) allowing both classical data management experiences like Lakehouse/warehouse at scale as well as real-time intelligence, all on a lake-centric SaaS platform for simplified analytics. Fabric's open foundation with built-in governance enables you to connect to various clouds and tools while maintaining data trust.
This is very High-level Overview of Real-Time Intelligence within Fabric.
Efficient_log_management_with_Microsoft_Fabric
Efficient_log_management_with_Microsoft_Fabric
Since Fabric is a SaaS solution, all the components can be used without deploying any infrastructure in advance, just by a click of a button and very simple configurations you can customize the relevant components for this solution.
The main components used in this solution are Data pipeline, OneLake, and Eventhouse.
Our data source for the example is taken from this git repo:
https://github.com/logpai/loghub/tree/master/Spark
The files were taken and stored inside an S3 bucket to simulate the easiness of the flow, regardless your data source's locations.
| 16/07/26 12:00:30 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 59219. |
| 16/07/26 12:00:30 INFO spark.SparkEnv: Registering MapOutputTracker |
| 16/07/26 12:00:30 INFO spark.SparkEnv: Registering BlockManagerMaster |
| 16/07/26 12:00:30 INFO storage.DiskBlockManager: Created local directory at /opt/hdfs/nodemanager/usercache/curi/appcache/application_1460011102909_0176/blockmgr-5ea750cb-dd00-4593-8b55-4fec98723714 |
| 16/07/26 12:00:30 INFO storage.MemoryStore: MemoryStore started with capacity 2.4 GB |
First challenge to solve is how to bring the log files from your system into Fabric this is the Log collection phase: many solutions exist for this phase each with its pros and cons.
In Fabric the standard approach to bring data in is by use of Copy Activity in ADF or in its Fabric SaaS version is now called Data pipeline. Data pipeline is a low code/no code tool allowing to manage and automate the process of moving and transforming data within Microsoft Fabric, a serverless ETL tool with more than 100 connectors enabling integration with a wide variety of data sources, including databases, cloud services, file systems, and more.
In addition, it supports an on prem agent called self-hosted integration runtime, this agent that you install on a VM, acts as a bridge allowing to run your pipeline on a local VM and securing your connection from on prem network to the cloud.
Let’s describe in more details our solution data pipeline.
Bear in mind ADF is very flexible and supports reading at scale from a wide range of data sources / files integrated as well to all major cloud vendors from blob storage retrieval: like S3, GCS, Oracle Cloud, File systems, FTP/SFTP etc. so that even if your files are generated externally to Azure this is not an issue at all.
Efficient_log_management_with_Microsoft_Fabric
Visualization of Fabric Data Pipeline
Efficient_log_management_with_Microsoft_Fabric
Visualization of Fabric first Activity: Copy from Source Bucket to Lakehouse
Upon log files landing in the Azure Blob storage, EventStream can be used to trigger the Data pipeline that will handle the data preparation and loading phase.
What is Data preparation phase’s main purpose?
After the log files land in the storage and before they are loaded to the real-time logs database the KQL Database, it might be necessary to transform the data with some basic manipulations, the reasons for that might be different.
Examples
In our case we will be running a PySpark Notebook that reads the files from OneLake folder, fixes the new lines inside a row issue, and create new files in another OneLake folder, we call this notebook with a base parameter called log_path that defines the log file's location on the OneLake to read from.
Efficient_log_management_with_Microsoft_Fabric
Visualization of Fabric second Activity - Running the Notebook
Efficient_log_management_with_Microsoft_Fabric
Visualization of Fabric last Activity - Loading to EventHouse.
For this step the log collection and preparation: we broke this into 3 data pipeline activities.
First, we needed to create a KQL database with a table to hold the log records.
A KQL database is a scalable and efficient storage solution for log files, optimized for high-volume data ingestion and retrieval. Eventhouses and KQL databases operate on a fully managed Kusto engine. With an Eventhouse or KQL database, you can expect available compute for your analytics within 5 to 10 seconds. The compute resources grow with your data analytic needs.
Log Ingestion to KQL Database with Update Policy
We can separate the ETL transformation logic of what happens to the data before, it reaches the Eventhouse KQL database and after that. Before it reached the database, the only transformation we did was calling during the data pipeline a notebook to handle the new lines merge logic. This cannot be easily done as part of the database ingestion logic simply because when we try to load the files with new lines as part of a field of a record, it breaks the convention and what happens is that the ingestion process creates separate table rows for each new line of the exceptions stacktrace exception.
On the other hand, we might need to define basic transformation rules, such as date formatting, type conversion (string to numbers), parse and extract some interesting value from a String based on regular exception. To create JSON (dynamic type) of a hierarchical string (XML / JSON string etc.) for all these transformations we can work with what is called an update policy we can define a simple ETL logic inside KQL database as explained in our documentation: Update policy overview.
During this step we create a new table called logparsed from the lograw table that will be our base table for the log queries.
Query log files
After data is ingested and transformed it lands in a basic logs table that is schematized logparsed. In general, we have some common fields that are mapped to their own columns, like log level (INFO/ ERROR/ DEBUG), log category, log timestamp (a datetime typed column) and log message which can be in general either a simple error string or a complex JSON formatted string. It is usually preferred to be converted to dynamic type that will bring additional benefits like simplified query logic, and reduced data processing can avoid expensive joins.
Following queries are simple queries on the logparsed table.
Category | Purpose | KQL Query |
|
Troubleshooting |
looking for an error at specific datetime range | logsparsed | where message contains "Exception" and formattedDatetime between ( datetime(2016-07-26T12:10:00) .. datetime(2016-07-26T12:20:00) ) |
|
Statistics |
Basic statistics Min/Max timestamp of log events | logsparsed | summarize minTimestamp=min(formattedDatetime), maxTimestamp=max(formattedDatetime) |
|
Exceptions Stats | logsparsed | extend exceptionType = case(message contains "java.io.IOException","IOException", message contains "java.lang.IllegalStateException","IllegalStateException", message contains "org.apache.spark.rpc.RpcTimeoutException","RpcTimeoutException", message contains "org.apache.spark.SparkException","SparkException", message contains "Exception","Other Exceptions", "No Exception") | where exceptionType != "No Exception" | summarize count() by exceptionType | |
|
Log Module Stats | Check Modules Distributions | logsparsed | summarize count() by log_module | order by count_ desc | take 10 |
Real-Time Dashboards
After querying the logs, it is possible to visualize the query results in Real-Time dashboards.
After adding the queries to tiles inside the dashboard this is a typical dashboard we can easily build.
Efficient_log_management_with_Microsoft_Fabric
Real-Time dashboards can be configured to be refreshed in Real-Time. In which case you can very easily configure how often to refresh the queries and visualization. In an extreme case it can be as low as continuos.
Efficient_log_management_with_Microsoft_Fabric
There are many more capabilities implemented in the Real-Time Dashboard, like data exploration Alerting using Data Activator, conditional formatting (change items colors based on KPIs threshold) and it keeps growing.
Machine Learning Models
Kusto supports out of the box time series analysis allowing for example anomaly detection. If that isn't enough you can Explore data in Real-Time Dashboard tiles and clustering. You can always mirror the data of your KQL tables into OneLake delta parquet format by selecting OneLake availability.
This configuration will create another copy of your data in open format delta parquet. It will be available for any SparkML models to do whatever machine learning exploration and ML modeling you wish. There is no additional storage cost to turn on OneLake availability.
Efficient_log_management_with_Microsoft_Fabric
A well-designed Real-Time Intelligence solution for log file management using Microsoft Fabric and EventHouse can significantly enhance an organization’s ability to monitor, analyze, and respond to log events. By leveraging modern technologies and best practices, organizations can gain valuable insights and maintain robust system performance and security.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.