Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hi, I am new to Data Factory and need help with the following use case:
Simplified version is that I have two tables in a lake house, [table A] where weekly data is staged and [table B] the main table where the weekly data needs to be appended.
I need a way to only append Table A's data to B if the date in A is later than the latest date in B.
Can someone please advise what would be the easiest way to achieve this? do I need a pipeline/dataflow/notebook?
Data in Table A can arrive at any point during the week. A pipeline is scheduled to bring in the weekly data file and overwrite Table A whenever the file arrives. This bit is done. I now need to check the date and append it to the main table if needed.
Thank you!
Solved! Go to Solution.
I guess this is a similar scenario: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...
If you choose to follow this tutorial, you will probably need to tweak it a bit to suit your case.
Hi @Learning_Fabric ,
Your solution is great, @frithjof_v . It worked like a charm! Here I have another idea in mind, and I would like to share it for reference.
I think when you use pipeline you can change the join type to suit your needs.
In other areas, you can refer to what frithjof_v said for specifics.
Best Regards
Yilong Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
I guess this is a similar scenario: https://learn.microsoft.com/en-us/fabric/data-factory/tutorial-setup-incremental-refresh-with-datafl...
If you choose to follow this tutorial, you will probably need to tweak it a bit to suit your case.
Thank you for pointing me to this solution! It was exactly what I needed! Worked perfectly for my usecase 🙂
I think you can achieve this by using any of the tools which you mentioned.
So I think you can use a notebook, data pipeline or dataflow gen 2.
You need to schedule it to run perhaps once each day.
You need to query both table A and table B.
PS! You should not query the SQL Analytics Endpoint, because the SQL Analytics Endpoint can have some delay. You should query the Lakehouse tables / OneLake directly to get the current data.
Then you compare the max value in the Date column in each table.
If the max date in A is later than the max date in B, then you append A into B.
Otherwise do nothing.
What is the easiest way to implement this, depends on your current skillset. This can easily be done in Dataflow Gen2. It may not be the most performant and it may use more CUs on your capacity. But I think it should be easy to do this in Dataflow Gen2.
User | Count |
---|---|
4 | |
3 | |
2 | |
2 | |
1 |
User | Count |
---|---|
7 | |
5 | |
3 | |
3 | |
3 |