Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
I work with a lot of clients that are creating analytic systems. They all collect large numbers of data points and analyze them. Most do a fairly good job with it.
One area that many also handle well is detecting anomalies i.e. data that is out of the ordinary.
But another area that I see very few handling well, is the data that is missing, rather than just the data that is present. There's a huge difference between data that arrived, and is odd, and data that just didn't arrive at all.
One tool that's great at working with streams of data is the Real Time Intelligence workload for Microsoft Fabric. And it's also great at working with data that is missing from those streams.
It's just as important to spend time thinking about what you can infer from a lack of data, than it is to analyze data that is present.
Back in 2008, I built some StreamInsight samples for Microsoft. One was a toll booth on a multi-lane highway.
It provided a good example of the value of missing data.
I work with a number of sawmills across the country, and it's not hard to imagine the streams of data that are produced by the machinery at the sites. Once again though, making sense of missing data is critical. A missing lug from a processing system can be important.
I see two basic categories of missing data:
Category 1: Obviously missing data
At times, it's obvious that data is missing.
In that case, there is some repetitive source of data and you have an expectation that all the data should have arrived.
Signal to Noise Ratio
One challenge with this is the signal to noise ratio. Every day, I receive an email notifying me that backups of my various systems have completed. I'm sure the vendors of the backup code think it's really important for me to know that.
But if I receive a large number of these, and they always same the same thing "It succeeded", how long is it before I simply ignore them, or worse, create a rule to push them directly into a folder?
Would I even notice if one of them didn't come each day?
What I really want to know is if the action failed. But if that event is rare, it's basically indistinguishable from the notification system being broken. At least with the daily notifications, I do know that the notification system is still working.
Category 2: Inferred Missing Data
The second category of missing data is when you are not certain that data is missing, but you can infer that it's likely:
You can tell from a pattern in the data, that's it's likely that another value should have existed, or did previously exist and was somehow lost.
This is where the power of analytics starts to come in.
I've often talked about fuzziness in data. IT people love data that's all neat and precise, yet end-users live in a world that's fuzzy.
As IT people, we need to learn to love the fuzziness. That's where the real power and interesting outcomes lie.
With Microsoft Fabric, we can source event streams, process them, and save them into a number of locations.
The typical destinations are:
Fabric has great transformations you can use on the data that has arrived. But instead of storing just what did occur, I encourage you to consider if there is value that can be gained from what did not happen. Once you derive that, you might consider storing that detected or inferred gap data in a destination that you then apply analytics on.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.