Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
HamidBee
Power Participant
Power Participant

How can I get Synapse Notebook to only process newly added data?

I have data that I plan on uploading to an Azure storage account. My plan is to create a pipeline in Synapse Studio, which will include an Apache Notebook (Using PySpark). The primary objective is to have the Notebook process the data and then save it to a lake database.
 

The data will be uploaded to an Azure storage container following this format for example: 2022/Week1/Week1.xlsx and 2023/Week10/Week10.xlsx. Initially, I will store and process all historical data in the storage account. After that, the data will be processed and added to the lake database on a weekly basis. Now, the question is, what is the most efficient method to enable the Azure pipeline or the Notebook to identify and process only the newly added data?.

1 REPLY 1
HimanshuS-msft
Microsoft Employee
Microsoft Employee

Hello @HamidBee 
Thanks for using the Fabric community.
I believe you will have to use the below function in the notebook to get the weeknumber dynamically every week . 

weekofyear(df.colname)

 


Thanks
HImanshu

Helpful resources

Announcements
May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.