Re: What Activities are best to use to build this ...

HamidBee · ‎07-18-2023

I have weekly data that I would like to upload to a serverless SQL pool. The Data is in the form of an Excel file. My plan so far:

1. Create a db in the Serverless SQL pool with a table for my data.

2. Upload the Excel File to an Azure Storage Account.

3. Use an Azure Synapse Spark Notebook to query the data, transform it using PySpark and then write it to the db in the Serveless SQL pool.

4. Creae a Pipeline using the Synapse Notebook and schedule it to run once a week.

5. At the end of the week append the new data into the Azure Storage Account.

Since each week has like 70,000 rows of data I just want a way for new data to be read and trasformed and inserted into the db.

My question is:

A) What Activities are best to use to build this pipeline?

GeethaT-MSFT · ‎07-21-2023

Hi, @HamidBee Thanks for posting your question in Microsoft Fabric Community

You can use Synapse Notebook activity in the pipeline to achieve the above task. you can write the ETL code in PySpark in the Synapse Notebook and then use the Notebook activity to run the code in the pipeline.

The advantage of using the Notebook activity is that the user can write and test the ETL code in the same environment, and then schedule the pipeline to run the Notebook activity at a specific time.

Regards

Geetha

What Activities are best to use to build this pipeline?

Helpful resources

Fabric Monthly Update - April 2024

Microsoft Fabric Learn Together

Fabric Community Update - April 2024