Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
HamidBee
Power Participant
Power Participant

How can I get Synapse Notebook to only process newly added data?

I have data that I plan on uploading to an Azure storage account. My plan is to create a pipeline in Synapse Studio, which will include an Apache Notebook (Using PySpark). The primary objective is to have the Notebook process the data and then save it to a lake database.
 

The data will be uploaded to an Azure storage container following this format for example: 2022/Week1/Week1.xlsx and 2023/Week10/Week10.xlsx. Initially, I will store and process all historical data in the storage account. After that, the data will be processed and added to the lake database on a weekly basis. Now, the question is, what is the most efficient method to enable the Azure pipeline or the Notebook to identify and process only the newly added data?.

1 REPLY 1
HimanshuS-msft
Microsoft Employee
Microsoft Employee

Hello @HamidBee 
Thanks for using the Fabric community.
I believe you will have to use the below function in the notebook to get the weeknumber dynamically every week . 

weekofyear(df.colname)

 


Thanks
HImanshu

Helpful resources

Announcements
November Fabric Update Carousel

Fabric Monthly Update - November 2025

Check out the November 2025 Fabric update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.