Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hello,
I am building a Data Pipeline to update my table in a data warehouse with a rolling 31 days of data.
For example, the flow ran last night on a schedule and inserted 75,123 rows of data (green arrow)- which is double what it should be. When I reran the flow this morning, it correctly inserted 38,171 rows of new data (pink arrow).
EDIT TO ADD: the previous night, the data inserted just fine, not duplicated!
This only happens when the Dataflow gen 2 is run within the Data Pipeline - when I run the flow manually in Power Query it works fine.
I want to use the Data pipeline so that it only inserts data when data has successfully been deleted. Any idea as to why this happening?
Update: I was never able to resolve why this was happening and so I rebuilt the pipeline model differently to do a replacement on a separate table in the warehouse and then append from there. Instead of trying to append directly from the Power Query with the API call. I have not received duplicate data with this method. I think it had something to do with the Append statement.
Hi @kallens ,
What are the results of your API calls to import data into Power Query?
If the problem persists, please provide the relevant screenshot information with a description and I'll get back to you as soon as possible.
Best regards,
Adamk Kong
Thanks Adamk. Right now the results continue to be inconsistent in Fabric via the Pipeline and the Power Query, and when I check my API call it seems to produce the correct # of rows aka the unduplicated count. I am using Supermetrics to create and generate the API query and results. Here are some screenshots with supoprting documentation.
So far I haven’t been able to recreate this duplication when I manually run the data query /pipeline myself during the day manually. It seems to only happen on my overnight schedules. Is there something about the time at which I am running it that could be impacting the data coming through twice?
Do you have any other hypotheses as to why this could be happening? I have set my Data Pipeline to run again today on a schedule to see if time impacts it. And to see if it’s always when it’s from a schedule or if I can get it to duplicate when I manually trigger it.
I appreciate your help and time and anything you can suggest for me to try and test!
i stand corrected that the only time the data is duplicating from the Power Query dataflow gen 2 is in the pipeline. I have had it running independently on a schedule outside the data pipeline and got duplicated results the last 2 nights as well. 😞
I have a measure in place to not insert data into the table if the date already exists, so it shouldn't be loading in twice. I might need to check the results coming from the API call I am using to get the data into the power query.
what do you consider "today" and "night"? Wonder if there are timezones other than UTC involved.
I have the data pipeline set to run at 1 AM Pacific Time Zone (that is the 'night') and then 'today' was when I ran it around 10:23 AM Pacific Time.
what is strange is that the previous night, on it's scheduled run, the data pipeline ran as expected and inserted the correct amount of data, no duplicates. and no material changes to the pipeline between those times.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
2 | |
2 | |
2 | |
2 | |
2 |