Best practice for loading data into snowflake

Srisakthi · ‎12-19-2024

Hello Everyone,

Can you please suggest best practice for loading data from SQL/Oracle to snowflake using Microsoft Fabric data pipeline.

Case 1 : Multiple tables at a time with huge volume of data in each table

Case 2: Multiple tables at a time with less volume of data in each table

What is the recommended way for both performance and CU consumption

Regards,

Srisakthi

v-shamiliv · ‎12-19-2024

Hi @Srisakthi

Thank you for reaching out Microsoft Fabric Community Forum.

For high-volume data loads : We should focus on parallel loading using bulk loading techniques (COPY INTO), leveraging larger compute warehouses for performance, and batch processing to minimize compute and storage costs.
For low-volume data loads : we aim to reduce compute overhead with smaller virtual warehouses, sequential loads, and incremental loading (Streams and Tasks). Also, consider using Snowpipe for real-time or near-real-time data loading to optimize cost and compute resource usage.

If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community.
Thank you.

View solution in original post

Ray_Minds · ‎12-30-2024

Hi @Srisakthi

1. Firstly a). you will create a config table where we will mention the start and end range of date that choose either full and incremental load based on your datasets.

b). The data format should be parquet with snappy compression that helps to optimized the data performance. data should be save into small chunk files.

c). If you are quite better in pyspark, so it would be better otherwise you will do it with the help of the copy activities in the pipleines

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

v-shamiliv · ‎12-28-2024

Hi @Srisakthi
We haven’t heard back since our last response and wanted to check if your query has been resolved. If not, please feel free to reach out for further assistance. If it has been resolved, kindly mark the helpful reply as the solution to make it easier for others to find. A kudos would also be greatly appreciated!
Thank you.

v-shamiliv · ‎12-23-2024

Hi @Srisakthi
Thank you for your patience. please let us know if anything was helpful to you, so that we can convert it into a formal answer. If so, we would appreciate it if you could Accept it as a solution and drop a 'Kudos' so other members can find it more easily.
Thank you.

v-shamiliv · ‎12-19-2024

Hi @Srisakthi

Thank you for reaching out Microsoft Fabric Community Forum.

For high-volume data loads : We should focus on parallel loading using bulk loading techniques (COPY INTO), leveraging larger compute warehouses for performance, and batch processing to minimize compute and storage costs.
For low-volume data loads : we aim to reduce compute overhead with smaller virtual warehouses, sequential loads, and incremental loading (Streams and Tasks). Also, consider using Snowpipe for real-time or near-real-time data loading to optimize cost and compute resource usage.

If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community.
Thank you.

Best practice for loading data into snowflake

Best practice for loading data into snowflake

Helpful resources

Fabric Monthly Update - March 2025

Fabric Community Update - March 2025

Notebook: Write data from snowflake (oauth) to lak...

Fabric data ingestion best practice - data factory...

Best Practice for Lakehouse Strategy

Incremental Data Load Options in Microsoft Fabric

Best practices for creating connections in Fabric