Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
SavioFerraz
Kudo Kingpin
Kudo Kingpin

Complex RTI Issue: Eventstream Failing to Enforce Schema When Ingesting High-Volume IoT Messages

Hi everyone,

I’m running into a challenging Real-Time Intelligence issue related to schema enforcement and message drift inside an Eventstream, and I’d appreciate help understanding whether this is expected behavior, a configuration limitation, or a bug.

🟦 Scenario

We have an IoT setup publishing data to Azure Event Hubs.
The messages follow a defined schema, but occasionally:

New fields appear

Field ordering changes

Some devices send null or missing fields

A small subset of devices sends string values where numbers are expected

The Event Hub input feeds a Fabric Eventstream, which then routes data to:

A KQL database (hot store)

A Lakehouse (cold store)

A Real-Time Dashboard

Everything works fine until schema drift happens.

🟥 Problem

When messages with drift arrive, the Eventstream intermittently:

Drops messages silently (only visible in diagnostic logs)

Fails the downstream KQL ingestion with:

Type mismatch: expected real but received string

Column 'temperature' missing in source payload

Applies schema incorrectly, creating unexpected fields such as temperature_1

Stops updating the Real-Time dashboard until the schema is reconciled

Fails the Lakehouse ingestion with Delta errors like:

Inconsistent column types detected across input batches

This results in partial data, out-of-order ingestion, or full pipeline stalls.

🟩 What I’ve tried

Enabled "Schema inference" in Eventstream → still fails on mixed types

Disabled schema enforcement → downstream Warehouse/KQL breaks

Built a mapping transformation in Eventstream → fails when fields are missing

Used a KQL update policy to coerce types → stops ingestion when undefined fields appear

Added a Dataflow Gen2 as transformer → introduces latency, defeating the RTI purpose

Created a custom pre-cleaning Azure Function → works but is costly and adds complexity

Questions

Does Real-Time Intelligence currently support schema drift at scale, or does every upstream schema variation require manual adjustment?

Is there a recommended RTI pattern for handling IoT messages where fields may be missing, extra, or incorrectly typed?

Are Eventstream transformations supposed to guarantee schema consistency, or is type validation delegated to the downstream sinks?

Is this behavior expected with KQL ingestion, or is this a bug in RTI’s schema reconciliation?

Is the only robust workaround implementing an external cleansing function (Azure Function or Stream Analytics) before data reaches Eventstream?

This is impacting our ability to run true real-time analytics with thousands of devices, so any guidance or validation is extremely appreciated.

Thanks in advance!

1 ACCEPTED SOLUTION
kustortininja
Microsoft Employee
Microsoft Employee

As you mention, your schema and values change. It sounds like the way you have your eventstream defined today you are enforcing the schema. 

 

But question, if you are sending the data to an event hub first why are you sending it to an eventstream? If it is not behind a private endpoint, just send it directly to eventhouse, there is no need to have the eventstream. Get data from Azure Event Hubs - Microsoft Fabric | Microsoft Learn

 

If the event hub is behind a private endpoit, you will need the eventstream though. Instead of sending it to 3 destinations (eventhouse, lakehouse, and real-time dashboard), send the data just to Eventhouse using Direct Ingestion. Do not put any transformations in the Eventstream. When you configure Direct Ingestion, on the screen where it asks you for mapping, change the nested json levels down to 0 and bring the entire json object into the row and then do your casting/type checking and everything through update policies. Mirror the data into Lakehouse instead of dual-writing using Eventhouse Onelake availability. This saves you the overhead of having to transform the data in two places. 

https://learn.microsoft.com/en-us/fabric/real-time-intelligence/media/get-data-eventstream/inspect-d...

View solution in original post

2 REPLIES 2
svelde
Most Valuable Professional
Most Valuable Professional

Hello @SavioFerraz 

 

The answer given by @kustortininja is correct. 

 

Regarding the direct 'Data connection' without Eventstream, consider the Event Hubs option:

svelde_0-1765487410173.png

In the past, I showed this in a  blog post.

 

I prefer to use the bronze-silver-gold medallion architecture where we ingest the incoming (eventhub/iothub) messages as-is in a dynamic columns via a table mapping in Direct Ingest.

So it's ELT (extract, load, transform) instead of ETL. 

So each message is ingested as-is in the bronze layer. 

From the original message, you can also take separate values like device id, device type, timestamp, etc. if these help with table update policies towards the silver layer where you create typed columns.

Using an activator you could test for corrumt messages (eg. counting the number of incoming message and the number of transformed message per timespan).

 

Check out Eventhouse shortcuts in Lakehouse to make an Eventhouse table available as Lakehouse table.

 

If this answer helps, please upvote it.

kustortininja
Microsoft Employee
Microsoft Employee

As you mention, your schema and values change. It sounds like the way you have your eventstream defined today you are enforcing the schema. 

 

But question, if you are sending the data to an event hub first why are you sending it to an eventstream? If it is not behind a private endpoint, just send it directly to eventhouse, there is no need to have the eventstream. Get data from Azure Event Hubs - Microsoft Fabric | Microsoft Learn

 

If the event hub is behind a private endpoit, you will need the eventstream though. Instead of sending it to 3 destinations (eventhouse, lakehouse, and real-time dashboard), send the data just to Eventhouse using Direct Ingestion. Do not put any transformations in the Eventstream. When you configure Direct Ingestion, on the screen where it asks you for mapping, change the nested json levels down to 0 and bring the entire json object into the row and then do your casting/type checking and everything through update policies. Mirror the data into Lakehouse instead of dual-writing using Eventhouse Onelake availability. This saves you the overhead of having to transform the data in two places. 

https://learn.microsoft.com/en-us/fabric/real-time-intelligence/media/get-data-eventstream/inspect-d...

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

Real Time Intelligence in a Day

Real-Time Intelligence in a Day—Free Training

Turn streaming data into instant insights with Microsoft Fabric. Learn to connect live sources, visualize in seconds, and use Copilot + AI for smarter decisions.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors