Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
BeyzaKzlky
Frequent Visitor

Dataflow Gen2 Incremental Refresh and Data Refresh

Hi all,

I have a problem again with dataflow gen2. 

I'm getting my data from on-prem sql server with dataflow, and because there is no Incremental refresh on gen2, I set standard scheduled refresh daily on dataflow. 

But I realized that this refresh duplicates data on the lakehouse table. 

Is it a bug or is there any option to prevent this I couldn't see?

Thanks

1 ACCEPTED SOLUTION

Hi!

 

My name is Jeroen Luitwieler, and I’m a Senior Product Manager on the Dataflow team. While this guide shows how to use an OData source, it’s just an example. You can use this method with your on-premises SQL Server without any issues. If you want to perform a full refresh of your data every time you refresh, you can change the current data destination setting from “append” to “replace”.

 

Hope this resolves your issue! 😁

View solution in original post

11 REPLIES 11
ebjim
Helper IV
Helper IV

Incremental updates is similar to implementing SCD type 2. This might be useful:

https://www.youtube.com/watch?v=pRI4-aCUXHI

v-nikhilan-msft
Community Support
Community Support

Hi @BeyzaKzlky 
Thanks for using Fabric Community.
As I understand the above question, you are trying to achieve incremental load using Dataflow Gen2. 
Incrementally amassing data in a data destination requires a technique to load only new or updated data into your data destination. This technique can be done by using a query to filter the data based on the data destination.
Please refer to this link for more information: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...

Hope this helps. Please let us know if you have any further queries.

Hi @v-nikhilan-msft , 

Thanks for your response. I have seen the learning page which you have shared with me, but it's OData source. Unfortunately not proper for my case. 

I'm getting data from on-prem sql server. 

And also there is a data duplication problem with dataflow like I have mentioned above. 

 

Hi!

 

My name is Jeroen Luitwieler, and I’m a Senior Product Manager on the Dataflow team. While this guide shows how to use an OData source, it’s just an example. You can use this method with your on-premises SQL Server without any issues. If you want to perform a full refresh of your data every time you refresh, you can change the current data destination setting from “append” to “replace”.

 

Hope this resolves your issue! 😁

I can confirm that it works - with that example i made incremental refreshes from on prem sql server: query warehouse for max date (or entry no, if Your tables have them) and filter source sql server, check that it works correctly and query folding works. I have also implemented pipelines and stored procedures to delete some data from warehouse and refresh a week or a month old of data.

Hi, 

I double checked your metioned setting this morning. I'm already selecting replace for tables. 

Have a nice day. 

Hi @BeyzaKzlky 
Apologies for the issue you have been facing. Are you still facing this issue?
If yes we will respond back with more details and try to help.
Thanks.

Hi,

Thank you very much for your response. 

Yes I know the option which you have mentioned, but we cannot see that option since yesterday, also data destination selection is greyed out like below:

BeyzaKzlky_0-1705680934354.png

 

Also, we set scheduled refresh from the settings of dataflow:

BeyzaKzlky_1-1705681021245.png

and also I want to mention any other problem scheduled refresh is not triggered randomly. 

Is there any limit to show data on data transformation in dataflow? It cuts the operation while merging (inner join) multiple tables after 2 million rows.

I will try the mentioned example. 

Thanks for your help.

Hi @BeyzaKzlky ! 

 

It is correct what @fabricator1 mentioned and if a data destination is set you can edit it in the left bottom corner. However, the issue you are facing with not triggering a refresh if it was scheduled is quite interesting. Can you create a support ticket when you face this issue again so that our support team can look into it? There should also be no row limit for the operations that you are perfroming. You may want to consider to split the ingestion of the data and the merge in two separate queries so that the lakehouse/warehouse compute can engage and help you perform the inner join faster. 

Add data destination is greyed out if the query already has a destination.

It is possible to edit the Destination settings by clicking on the settings icon (cog wheel) in the bottom right corner of the screen:

fabricator1_0-1705684444700.png


I don't have information to answer about the other questions, someone else answer please 🙂

I think you can use that link from @v-nikhilan-msft also with on-prem sql server.


That procedure is only to append rows to the Lakehouse table.
Not update (change) existing data in the Lakehouse table. 
So it's called Incrementally amass data.


I think the main thinking behind that procedure is that you connect your Dataflow Gen2 to your Lakehouse table, to find out what is the newest row in your Lakehouse table. Then you use that information to filter your on-prem sql server data, so you ask the sql server to only return rows which are newer than your newest row in the Lakehouse. Then you append this new data from sql server into the Lakehouse.
When using that procedure, it shouldn't become duplicate rows, because your query to the SQL server is only asking for new rows.

There is an idea for Incremental refresh (which includes both append and updates to existing data), please vote: 
Microsoft Idea

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

Top Solution Authors