Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
Learning_Fabric
New Member

Append two tables based on latest date column

Hi, I am new to Data Factory and need help with the following use case:

Simplified version is that I have two tables in a lake house, [table A] where weekly data is staged and [table B] the main table where the weekly data needs to be appended. 

I need a way to only append Table A's data to B if the date in A is later than the latest date in B.

 

Can someone please advise what would be the easiest way to achieve this? do I need a pipeline/dataflow/notebook?

 

Data in Table A can arrive at any point during the week. A pipeline is scheduled to bring in the weekly data file and overwrite Table A whenever the file arrives. This bit is done. I now need to check the date and append it to the main table if needed.

 

Thank you!

1 ACCEPTED SOLUTION
frithjof_v
Super User
Super User

I guess this is a similar scenario: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...

 

If you choose to follow this tutorial, you will probably need to tweak it a bit to suit your case.

View solution in original post

4 REPLIES 4
v-yilong-msft
Community Support
Community Support

Hi @Learning_Fabric ,

Your solution is great, @frithjof_v . It worked like a charm! Here I have another idea in mind, and I would like to share it for reference.

I think when you use pipeline you can change the join type to suit your needs.

vyilongmsft_0-1722906903006.png

vyilongmsft_1-1722906983980.png

In other areas, you can refer to what frithjof_v said for specifics.

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

frithjof_v
Super User
Super User

I guess this is a similar scenario: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...

 

If you choose to follow this tutorial, you will probably need to tweak it a bit to suit your case.

Thank you for pointing me to this solution! It was exactly what I needed! Worked perfectly for my usecase 🙂

 

frithjof_v
Super User
Super User

I think you can achieve this by using any of the tools which you mentioned.

 

So I think you can use a notebook, data pipeline or dataflow gen 2.

You need to schedule it to run perhaps once each day.

 

You need to query both table A and table B. 

 

PS! You should not query the SQL Analytics Endpoint, because the SQL Analytics Endpoint can have some delay. You should query the Lakehouse tables / OneLake directly to get the current data.

 

Then you compare the max value in the Date column in each table.

 

If the max date in A is later than the max date in B, then you append A into B.

Otherwise do nothing.

 

 

What is the easiest way to implement this, depends on your current skillset. This can easily be done in Dataflow Gen2. It may not be the most performant and it may use more CUs on your capacity. But I think it should be easy to do this in Dataflow Gen2.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.