Solved: Re: How to drop duplicates in event stream using e...

chetanhiwale · ‎04-23-2025

I am currently exploring what are the differnet ways to drop drop duplicates in eventstream via event processor.
I have understanding that by using groupby we can atleast identify the duplicates.

Note : Need to drop duplicates in eventstream only

datacoffee · ‎04-23-2025

Sure

in the Eventstream you can add a transformation task - here you will find the "group by" operator.

in this task you can add a windowing function and setup the needed aggregation

I hope you can find it 😊

If you find this reply to help with your problem, please consider hitting the accept button...
----------------
Blog: https://dcode.bi
KQL ref guide: https://aka.bi/kql
LinkedIn: https://aka.bi/follow

View solution in original post

v-csrikanth · ‎06-09-2025

Hi @chetanhiwale

I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!

Best Regards,
Community Support Team _ C Srikanth.

svelde · ‎06-05-2025

Hello @chetanhiwale ,

thank you for your question at this Fabric community forum.

If the ingested data is sent to an Eventhouse KQL databse table, you can also consider ignoring duplicates in the Eventhouse using a materialized view.

Consider this example based on the Weather data service:

.create materialized-view with(lookback=20m, lookback_column = "EventProcessedUtcTime", backfill=true, docString="Unique Weather data entries", folder="MaterializedViews") WeatherDataset on table WeatherData

{

  WeatherData

  | summarize arg_max(EventProcessedUtcTime, *) by dateTime

}

Here, the latest ingested rows with the same EventProcessedUtcTime are represented.

The table on which this view is based has a limited retention time to keep the database storage size at the minimum:

.alter-merge table WeatherData policy retention softdelete = 1h recoverability = disabled

Of course, this only works for this particular type of timeseries database.

If you find this reply to help with your problem, please consider hitting the upvote or accept button...

v-csrikanth · ‎06-04-2025

Hi @chetanhiwale

I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!

Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎06-01-2025

Hi @chetanhiwale

We haven't heard from you since last response and just wanted to check whether the solution provided has worked for you. If yes, please Accept as Solution to help others benefit in the community.
Thank you.

If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎05-21-2025

Hi @chetanhiwale

Thank you for being part of the Microsoft Fabric Community.

As highlighted by @datacoffee , the proposed approach appears to effectively address your requirements. Could you please confirm if your issue has been resolved?
If you are still facing any challenges, kindly provide further details, and we will be happy to assist you.

If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.

datacoffee · ‎04-23-2025

Hi
You can make a windowing function with the "group by" transformation.
This will give you options in different windowing functions and you can add a time-span in which the window should look and remove the dublicates (group the rows).

It is not the best option to remove them here though - I would remove them in the Eventhouse using an update policy or a materialized view. In this way you are out of the windowing function and can have a faster stream of data to the Eventhouse. The update policy or materialized view is also quicker than the Eventstream for data processing 🙂

If you find this reply to help with your problem, please consider hitting the accept button...
----------------
Blog: https://dcode.bi
KQL ref guide: https://aka.bi/kql
LinkedIn: https://aka.bi/follow

chetanhiwale · ‎04-23-2025

Hi @datacoffee , thanks for the answer.
Yes , we can drop duplicates in eventhouse, but our need is to drop them in eventstream only.
In groupby, which operation are you suggesting , can you please mention ?
Using count, we can identify the duplicates but removing them seems a different task to me.

datacoffee · ‎04-23-2025

Sure

in the Eventstream you can add a transformation task - here you will find the "group by" operator.

in this task you can add a windowing function and setup the needed aggregation

I hope you can find it 😊

If you find this reply to help with your problem, please consider hitting the accept button...
----------------
Blog: https://dcode.bi
KQL ref guide: https://aka.bi/kql
LinkedIn: https://aka.bi/follow

Alicia_Li_MSFT · ‎06-10-2025

Yeah, @datacoffee is correct. You can find the group by operation in the drop down list of transformation.

When you config it, you can select 'Average' as the aggregation method and which column to aggregate on. Then select all the remaining columns as Group operation by.

This is how you drop duplicates with current no code experience.

Once Eventstream can support users write SQL code (coming soon), you can use the following code to remove duplicates:

In the current no code experience, you can use 'group by' operator

WITH Temp AS (

SELECT Value, DeviceId

FROM Input TIMESTAMP BY Time

GROUP BY Value, DeviceId, System.Timestamp()

)

SELECT

AVG(Value) AS AverageValue, DeviceId

INTO Output

FROM Temp

GROUP BY DeviceId,TumblingWindow(minute, 5)

How to drop duplicates in event stream using event processor ?

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

How to drop duplicates in event stream using event processor ?

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025