Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.

Reply
gtzanakis
Advocate I
Advocate I

Deduplication of incrementally refreshed Dataflow tables

Hi all,

 

We just got premium and I'm toying around with incremental tables in Dataflows for the first time. 

 

I have observed something rather weird. I have created a table and set its refresh to incremental. I've configured it so that it stores rows for the last 18 Months and refreshes rows from the past 7 days, just to be on the super safe side with certain late-arriving changes.

 

However, I have not configured any primary keys (yet). I thought that due to that, I would observe duplicates after a couple of refreshes. Yet, there are no duplicates!  Is Dataflow smart enough to deduplicate on timestamps or something? Or does it hash each row? How does it handle duplicate rows when it re-scans data from the last 7 days that it has already ingested before?!

1 ACCEPTED SOLUTION
gtzanakis
Advocate I
Advocate I

Thanks for the reply Clara!

 

My Table's code had no transformations actually, so there was no deduplication going on. 

 

From the section "Merge partitions" in the documentation you provided I sort of came to the conclusion that the incremental partitions are being replaced, thus there is no need for deduplication. 

 

Thank you again!

View solution in original post

2 REPLIES 2
gtzanakis
Advocate I
Advocate I

Thanks for the reply Clara!

 

My Table's code had no transformations actually, so there was no deduplication going on. 

 

From the section "Merge partitions" in the documentation you provided I sort of came to the conclusion that the incremental partitions are being replaced, thus there is no need for deduplication. 

 

Thank you again!

v-kaiyue-msft
Community Support
Community Support

Hi @gtzanakis ,

 

A data flow is a collection of tables that are created and managed in the workspace of the Power BI service.Power BI data flows provide a self-service data preparation and management experience for business analysts in Power BI, enabling them to collect business data from a variety of sources, cleanse and transform business data.

 

So, please check the configuration and operations you have done in power query, some inadvertent operations may have made the data exhibit de-duplication. If you can provide more details, it will help us to help you better in resolving the issue.

Using incremental refresh with dataflows - Power Query | Microsoft Learn

 

If your Current Period does not refer to this, please clarify in a follow-up reply.

 

Best Regards,

Clara Gong

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code FABINSIDER for a $400 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.