Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Learning_Fabric
New Member

Append two tables based on latest date column

Hi, I am new to Data Factory and need help with the following use case:

Simplified version is that I have two tables in a lake house, [table A] where weekly data is staged and [table B] the main table where the weekly data needs to be appended. 

I need a way to only append Table A's data to B if the date in A is later than the latest date in B.

 

Can someone please advise what would be the easiest way to achieve this? do I need a pipeline/dataflow/notebook?

 

Data in Table A can arrive at any point during the week. A pipeline is scheduled to bring in the weekly data file and overwrite Table A whenever the file arrives. This bit is done. I now need to check the date and append it to the main table if needed.

 

Thank you!

1 ACCEPTED SOLUTION
frithjof_v
Super User
Super User

I guess this is a similar scenario: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...

 

If you choose to follow this tutorial, you will probably need to tweak it a bit to suit your case.

View solution in original post

4 REPLIES 4
Anonymous
Not applicable

Hi @Learning_Fabric ,

Your solution is great, @frithjof_v . It worked like a charm! Here I have another idea in mind, and I would like to share it for reference.

I think when you use pipeline you can change the join type to suit your needs.

vyilongmsft_0-1722906903006.png

vyilongmsft_1-1722906983980.png

In other areas, you can refer to what frithjof_v said for specifics.

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

frithjof_v
Super User
Super User

I guess this is a similar scenario: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...

 

If you choose to follow this tutorial, you will probably need to tweak it a bit to suit your case.

Thank you for pointing me to this solution! It was exactly what I needed! Worked perfectly for my usecase 🙂

 

frithjof_v
Super User
Super User

I think you can achieve this by using any of the tools which you mentioned.

 

So I think you can use a notebook, data pipeline or dataflow gen 2.

You need to schedule it to run perhaps once each day.

 

You need to query both table A and table B. 

 

PS! You should not query the SQL Analytics Endpoint, because the SQL Analytics Endpoint can have some delay. You should query the Lakehouse tables / OneLake directly to get the current data.

 

Then you compare the max value in the Date column in each table.

 

If the max date in A is later than the max date in B, then you append A into B.

Otherwise do nothing.

 

 

What is the easiest way to implement this, depends on your current skillset. This can easily be done in Dataflow Gen2. It may not be the most performant and it may use more CUs on your capacity. But I think it should be easy to do this in Dataflow Gen2.

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.