Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
chetanhiwale
Helper II
Helper II

How to drop duplicates in event stream using event processor ?

I am currently exploring what are the differnet ways to drop drop duplicates in eventstream via event processor. 
I have understanding that by using groupby we can atleast identify the duplicates. 

Note : Need to drop duplicates in eventstream only 

1 ACCEPTED SOLUTION
datacoffee
Most Valuable Professional
Most Valuable Professional

Sure

 

in the Eventstream you can add a transformation task - here you will find the "group by" operator.

in this task you can add a windowing function and setup the needed aggregation 

 

I hope you can find it 😊


If you find this reply to help with your problem, please consider hitting the accept button...
----------------
Blog: https://dcode.bi
KQL ref guide: https://aka.bi/kql
LinkedIn: https://aka.bi/follow

View solution in original post

9 REPLIES 9
v-csrikanth
Community Support
Community Support

Hi @chetanhiwale 

I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!


Best Regards,
Community Support Team _ C Srikanth.

svelde
Advocate I
Advocate I

Hello @chetanhiwale ,

thank you for your question at this Fabric community forum.

If the ingested data is sent to an Eventhouse KQL databse table, you can also consider ignoring duplicates in the Eventhouse using a materialized view.

Consider this example based on the Weather data service:

.create materialized-view with(lookback=20m, lookback_column = "EventProcessedUtcTime", backfill=true, docString="Unique Weather data entries", folder="MaterializedViews") WeatherDataset on table WeatherData

{

  WeatherData

  | summarize arg_max(EventProcessedUtcTime, *) by dateTime

}

Here, the latest ingested rows with the same EventProcessedUtcTime are represented.

The table on which this view is based has a limited retention time to keep the database storage size at the minimum:

.alter-merge table WeatherData policy retention softdelete = 1h recoverability = disabled

Of course, this only works for this particular type of timeseries database.

 

If you find this reply to help with your problem, please consider hitting the upvote or accept button...

v-csrikanth
Community Support
Community Support

Hi @chetanhiwale 

I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!


Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth
Community Support
Community Support

Hi @chetanhiwale 

We haven't heard from you since last response and just wanted to check whether the solution provided has worked for you. If yes, please Accept as Solution to help others benefit in the community.
Thank you.

If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth
Community Support
Community Support

Hi @chetanhiwale 

Thank you for being part of the Microsoft Fabric Community.

As highlighted by @datacoffee , the proposed approach appears to effectively address your requirements. Could you please confirm if your issue has been resolved?
If you are still facing any challenges, kindly provide further details, and we will be happy to assist you.

If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.

datacoffee
Most Valuable Professional
Most Valuable Professional

Hi
You can make a windowing function with the "group by" transformation.
This will give you options in different windowing functions and you can add a time-span in which the window should look and remove the dublicates (group the rows).

It is not the best option to remove them here though - I would remove them in the Eventhouse using an update policy or a materialized view. In this way you are out of the windowing function and can have a faster stream of data to the Eventhouse. The update policy or materialized view is also quicker than the Eventstream for data processing 🙂



If you find this reply to help with your problem, please consider hitting the accept button...
----------------
Blog: https://dcode.bi
KQL ref guide: https://aka.bi/kql
LinkedIn: https://aka.bi/follow

Hi @datacoffee , thanks for the answer. 
Yes , we can drop duplicates in eventhouse, but our need is to drop them in eventstream only. 
In groupby, which operation are you suggesting , can you please mention ? 
Using count, we can identify the duplicates but removing them seems a different task to me.

 

datacoffee
Most Valuable Professional
Most Valuable Professional

Sure

 

in the Eventstream you can add a transformation task - here you will find the "group by" operator.

in this task you can add a windowing function and setup the needed aggregation 

 

I hope you can find it 😊


If you find this reply to help with your problem, please consider hitting the accept button...
----------------
Blog: https://dcode.bi
KQL ref guide: https://aka.bi/kql
LinkedIn: https://aka.bi/follow

Yeah, @datacoffee  is correct. You can find the group by operation in the drop down list of transformation.

 

When you config it, you can select 'Average' as the aggregation method and which column to aggregate on. Then select all the remaining columns as Group operation by. 

 

This is how you drop duplicates with current no code experience.

 

Once Eventstream can support users write SQL code (coming soon), you can use the following code to remove duplicates:

 

 

 

 

In the current no code experience, you can use 'group by' operator 

 

WITH Temp AS (
SELECT Value, DeviceId
FROM Input TIMESTAMP BY Time
GROUP BY Value, DeviceId, System.Timestamp()
)
 
 
SELECT
AVG(Value) AS AverageValue, DeviceId
INTO Output
FROM Temp
GROUP BY DeviceId,TumblingWindow(minute, 5)

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors
Top Kudoed Authors