Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
kallens
Frequent Visitor

DataflowGen2 in a Data Pipeline is intermittently inserting data twice despite uniqueness measures

Hello,

 

I am building a Data Pipeline to update my table in a data warehouse with a rolling 31 days of data. 

  • The data pipeline deletes the last 31 days of data 
  • Then it calls on a DataFlow Gen 2 from power query to bring in 31 days of new data and append it to the table where the data was just deleted
    • The flow has an anti-join in place so that it checks for dates already in the table and doesn't insert data if the date is already present
  • The problem I am having is that sometimes the Data Pipeline will insert double the amount of data in the DataFlow Gen 2 but this is not happening consistently

For example, the flow ran last night on a schedule and inserted 75,123 rows of data (green arrow)- which is double what it should be. When I reran the flow this morning, it correctly inserted 38,171 rows of new data (pink arrow).  

EDIT TO ADD: the previous night, the data inserted just fine, not duplicated!

 

This only happens when the Dataflow gen 2 is run within the Data Pipeline - when I run the flow manually in Power Query it works fine. 

 

I want to use the Data pipeline so that it only inserts data when data has successfully been deleted. Any idea as to why this happening? 

 

kallens_0-1734116878032.png

kallens_0-1734117054394.png

 

6 REPLIES 6
kallens
Frequent Visitor

Update: I was never able to resolve why this was happening and so I rebuilt the pipeline model differently to do a replacement on a separate table in the warehouse and then append from there. Instead of trying to append directly from the Power Query with the API call. I have not received duplicate data with this method. I think it had something to do with the Append statement.

Anonymous
Not applicable

Hi @kallens ,

 

What are the results of your API calls to import data into Power Query?

 

If the problem persists, please provide the relevant screenshot information with a description and I'll get back to you as soon as possible.

 

Best regards,

Adamk Kong

Thanks Adamk. Right now the results continue to be inconsistent in Fabric via the Pipeline and the Power Query, and when I check my API call it seems to produce the correct # of rows aka the unduplicated count. I am using Supermetrics to create and generate the API query and results.  Here are some screenshots with supoprting documentation. 

 

slide 2.png

slide 3.png

slide 4.png

slide 5.png

slide 7.png

  • When I run them in Supermetrics Query Manager the results are typically around ~40k rows and ~190 rows respectively (see slides 5&6)
  • When I put it into Power Query and run results in Power Query to get a row count it’s the same, around ~40k rows (slide 2)
  • In my Power Query I have a left anti-join in the Power Query that checks for dates already in my destination table so that it doesn’t add in any dates that already exist (this is more of a failsafe)
  • It’s set to append to a table I have set up in my Microsoft Fabric data warehouse
  • I also have a Data Pipeline in place that then deletes the last 31 days worth of data from the table each night and then upon that success, it calls on the Power Query data flow to insert the new latest 31 days of data into the table (slide 7)
  • I currently have the Data Pipeline set to run at 1AM Pacific Time each night
  • The problem I am experiencing: sometimes (intermittently, not every time) the data added to my table from the Power Query flow is exactly duplicated
    • For example, last night the row count added to Power Query was 86,262 – which I suspected was duplicated
    • When I re-ran the Data Pipeline manually this morning it successfully deleted those ~86k rows and then inserted the correct amount of rows which is 43,131 – exactly half of what was inserted last night
  • I know that my pipeline or flow isn’t running twice or concurrently because I have the left-anti join set up to not insert any dates that already exist into the table
  • I also tested having the Power Query run on its own every night independent of the Data Pipeline and still got duplicate results

 

So far I haven’t been able to recreate this duplication when I manually run the data query /pipeline myself during the day manually. It seems to only happen on my overnight schedules. Is there something about the time at which I am running it that could be impacting the data coming through twice?

 

Do you have any other hypotheses as to why this could be happening? I have set my Data Pipeline to run again today on a schedule to see if time impacts it. And to see if it’s always when it’s from a schedule or if I can get it to duplicate when I manually trigger it.

 

I appreciate your help and time and anything you can suggest for me to try and test!

 

kallens
Frequent Visitor

i stand corrected that the only time the data is duplicating from the Power Query dataflow gen 2 is in the pipeline. I have had it running independently on a schedule outside the data pipeline and got duplicated results the last 2 nights as well. 😞 

 

I have a measure in place to not insert data into the table if the date already exists, so it shouldn't be loading in twice. I might need to check the results coming from the API call I am using to get the data into the power query.

lbendlin
Super User
Super User

what do you consider "today"  and "night"?  Wonder if there are timezones other than UTC involved.

I have the data pipeline set to run at 1 AM Pacific Time Zone (that is the 'night') and then 'today' was when I ran it around 10:23 AM Pacific Time.

 

what is strange is that the previous night, on it's scheduled run, the data pipeline ran as expected and inserted the correct amount of data, no duplicates. and no material changes to the pipeline between those times.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.