Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

greglow

Microsoft Fabric Real Time Intelligence - Mind the Gap !

I work with a lot of clients that are creating analytic systems. They all collect large numbers of data points and analyze them. Most do a fairly good job with it.

 

One area that many also handle well is detecting anomalies i.e. data that is out of the ordinary.

 

But another area that I see very few handling well, is the data that is missing, rather than just the data that is present. There's a huge difference between data that arrived, and is odd, and data that just didn't arrive at all.

 

Unsplash image by AXP PhotographyUnsplash image by AXP Photography

 

One tool that's great at working with streams of data is the Real Time Intelligence workload for Microsoft Fabric. And it's also great at working with data that is missing from those streams.

Gaps can be valuable

It's just as important to spend time thinking about what you can infer from a lack of data, than it is to analyze data that is present.

Back in 2008, I built some StreamInsight samples for Microsoft. One was a toll booth on a multi-lane highway.

 

Unsplash image from Red JohnUnsplash image from Red John

It provided a good example of the value of missing data.

  • If no tags are being read on any lane, then there is an issue with the highway itself. Perhaps it's shut. Perhaps there is a major incident.
  • If no tag is being read on one lane, but other lanes are registering tags, there might be a traffic accident or breakdown in that lane.

I work with a number of sawmills across the country, and it's not hard to imagine the streams of data that are produced by the machinery at the sites. Once again though, making sense of missing data is critical. A missing lug from a processing system can be important.

Categories of Missing Data

 

I see two basic categories of missing data:

 

Category 1: Obviously missing data

 

At times, it's obvious that data is missing.

 

Unsplash image by Tanja TepavacUnsplash image by Tanja Tepavac

In that case, there is some repetitive source of data and you have an expectation that all the data should have arrived.

Signal to Noise Ratio

One challenge with this is the signal to noise ratio. Every day, I receive an email notifying me that backups of my various systems have completed. I'm sure the vendors of the backup code think it's really important for me to know that.

 

But if I receive a large number of these, and they always same the same thing "It succeeded", how long is it before I simply ignore them, or worse, create a rule to push them directly into a folder?

Would I even notice if one of them didn't come each day?

 

What I really want to know is if the action failed. But if that event is rare, it's basically indistinguishable from the notification system being broken. At least with the daily notifications, I do know that the notification system is still working.

 

Category 2: Inferred Missing Data

The second category of missing data is when you are not certain that data is missing, but you can infer that it's likely:

 

Unsplash image by Phil HearingUnsplash image by Phil Hearing

You can tell from a pattern in the data, that's it's likely that another value should have existed, or did previously exist and was somehow lost.

 

This is where the power of analytics starts to come in.

Embrace the Fuzziness

I've often talked about fuzziness in data. IT people love data that's all neat and precise, yet end-users live in a world that's fuzzy.

 

Unsplash image by Aswin KaruvallyUnsplash image by Aswin Karuvally

 

As IT people, we need to learn to love the fuzziness. That's where the real power and interesting outcomes lie.

Storing Event Streams in Microsoft Fabric

With Microsoft Fabric, we can source event streams, process them, and save them into a number of locations.

The typical destinations are:

  • Lakehouse
  • Table in an Eventhouse (via a KQL database)

Fabric has great transformations you can use on the data that has arrived. But instead of storing just what did occur, I encourage you to consider if there is value that can be gained from what did not happen. Once you derive that, you might consider storing that detected or inferred gap data in a destination that you then apply analytics on.