Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
adigkarth
Frequent Visitor

Reprocess data present in ADLS

Hi,

We have a requirement fir a scenario to reprocess old data using data factory pipeline.Here are the details

Storage in ADLSGEN2
Landing zone(where the data will be stored in the same format as we get from source),Data will be loaded from sql server to ADLS gen2 using
data pieline copy activity)

Bronze layer(Data from landing zone will be copied to bronze layer by converting it to delta tables,this is done using Azure Databricks notebooks
which runs pyspark code)

Silver and gold layer(Runs databricks notebook python code)

Now our requirment is,we get data daily through files,Landing zone will have archive of that data for 7 days where as bronze layer is truncate and load everyday.


We need to build a reprocess logic where in if we pass the date as parameter it should trigger the flow and take the old files wrt date we passed and start processing from the landing zone .Could you please help me with this

1 ACCEPTED SOLUTION
jwinchell40
Resolver III
Resolver III

@adigkarth - How is your landing zone structured?  Are you using a hierarchy for storing when the file(s) were ingested in the landing zone (Ex:  Year -> Month -> Day..). 

 

When you read from whichever file path is in the Landing Zone or all dynamic, you can access some Metadata as part of the process; including the Modified Date of the file.  You can then use a filter to get rid of any data that is before the seed date you passed in.

 

df = spark.read.format('json').load(<path>).select("*","_metadata.file_modification_time)  Or

df = spark.read.format('json').load(<path).select("*").filter("_metadata.file_modification_time" > "<date>")

View solution in original post

2 REPLIES 2
jwinchell40
Resolver III
Resolver III

@adigkarth - How is your landing zone structured?  Are you using a hierarchy for storing when the file(s) were ingested in the landing zone (Ex:  Year -> Month -> Day..). 

 

When you read from whichever file path is in the Landing Zone or all dynamic, you can access some Metadata as part of the process; including the Modified Date of the file.  You can then use a filter to get rid of any data that is before the seed date you passed in.

 

df = spark.read.format('json').load(<path>).select("*","_metadata.file_modification_time)  Or

df = spark.read.format('json').load(<path).select("*").filter("_metadata.file_modification_time" > "<date>")

v-nikhilan-msft
Community Support
Community Support

Hi @adigkarth 
Thanks for using Fabric Community.
Can you please confirm if your ask is related to Microsoft Fabric or Azure Data Factory?

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.