This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreGet Fabric Certified for FREE during AI Skills Fest. This week only. Secure your voucher now.
Hello Fabric Community,
I am using Microsoft Fabric Data Pipelines – Copy Job activity to load data from SQL Server into a Lakehouse.
I would like to understand Fabric’s behaviour regarding backend batching / chunking during data extraction from SQL Server.
When copying data from SQL Server using a Copy Job activity, does Fabric automatically split large result sets into internal batches (for example, by primary key or row ranges) to improve performance?
If a SQL query returns a very large number of rows (millions or more), does Fabric:
From a performance and scalability perspective:
Are there any official best‑practice recommendations from the Fabric team regarding:
Any clarification or official documentation references would be greatly appreciated.
Thanks in advance!
Solved! Go to Solution.
Hi @pallaravikiran9,
Hope you're doing well!
This is my recommandationfor your use case :
Source: SQL Server (billions of rows)
Target: Fabric Lakehouse
- Use Dynamic Range Partitioning on a numeric ID column
- Set explicit partition count (start at 8, tune from there)
- Set DIUs to 16–32 for large tables
- Consider staging to Parquet in Lakehouse, then processing with Spark
- Don't rely on default single-stream for billion-row tables
- Add retry policies on the Copy Activity for resilience
Here's the docs :
1. https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance
2. https://learn.microsoft.com/en-us/azure/data-factory/connector-sql-server
3. https://learn.microsoft.com/en-us/fabric/data-factory/copy-job-overview
Hope this helps. Feel free to ask me questions if needed, and don’t forget to Accept as Solution if this guidance worked for you. That's motivate me to keep helping.
Best regards,
Oussama (Data Consultant & Fabric's Expert)
Did my response help you? Clicking Kudos is a small gesture that goes a long way, it encourages contributors and helps the community thrive!
✅ Did I answer your question? Please mark my post as a Solution, it helps others find the answer faster.
Senior Data & BI Consultant · Microsoft Fabric & Power BI Specialist
Hey @pallaravikiran9,
Appreciate the question structure; very helpful.
By default, Fabric doesn't auto-batch large queries, intead a single thread is used. For 1.5 billion records, default settings take around 2 hours to run. So, for larger datasets we should explicitly enable parallelism.
To enable parallel reads, in the Copy Activity, go to Source > Advanced > Partition option, choose:
Physical partitions give the most balanced result across duration, capacity units, and source compute overhead.
Give this Microsoft doc a read, it answers all of your queries in brief along with real test results by developers: https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases
There is one more option "Enable Staging", but it is relevant for Fabric Warehouses. For your SQL Server to Lakehouse setup, this option is not required.
Hope this helps!
Best,
Harshit
hi @pallaravikiran9 great question, let me try to answer it in "chunks":
From a performance and scalability perspective, Is it recommended / required to explicitly design batching at the pipeline level (for example using parameterised queries with ForEach loops)? Or does Fabric already handle this optimisation internally for SQL Server sources?
RA: If your goal is high throughput for one large table (e.g. table size > ~50GB or rows > ~100M), I would not consider batching at pipeline level (just enable partitioning + parallel copy).
Pipeline-level batching (ForEach, parameterized WHERE clauses) is an additional layer if you need more control or want to orchestrate multiple copy operations concurrently.
https://learn.microsoft.com/en-us/fabric/data-factory/copy-activity-performance-and-scalability-guid...
Are there any official best‑practice recommendations from the Fabric team regarding Batch size, ID‑based vs Date‑based batching and Concurrency settings when using Copy Job with SQL Server?
RA: For batch size, I don't think there is a “read batch size” concept for SQL Server source. Batch size is only relevant for writes (destination) and this is managed automatically by Fabric.
Regarding ID-based vs Date-based batching, prioritise dynamic partitioning with integer columns (int, bigint, etc.), then date / datetime columns (if integer columns don't exist). Please use a well-distributed column (avoid skew).
For concurrency settings, there are 3 levers to play with:
- Degree of copy parallelism = number of parallel threads / SQL queries
- Partitioning = multiple parallel reads from SQL Server
- Pipeline-level batching = multiple Copy activities in parallel (ForEach)
https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-database-copy-activity
Appreciate if you can 'Kudos' and/or 'Accept as Solution' if this answered your query.
Hi @pallaravikiran9,
Thank you for reaching out to the Microsoft Fabric Community Forum. Also, thanks to @rizalard0684, @stoic-harsh, @oussamahaimoud, for those inputs on this thread.
Has your issue been resolved? If the response provided by the community member @rizalard0684, @stoic-harsh, @oussamahaimoud, addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.
Hope this helps clarify things and let me know what you find after giving these steps a try happy to help you investigate this further.
Thank you for using the Microsoft Community Forum.
Hi @pallaravikiran9,
Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.
Thank you.
hi @pallaravikiran9 great question, let me try to answer it in "chunks":
From a performance and scalability perspective, Is it recommended / required to explicitly design batching at the pipeline level (for example using parameterised queries with ForEach loops)? Or does Fabric already handle this optimisation internally for SQL Server sources?
RA: If your goal is high throughput for one large table (e.g. table size > ~50GB or rows > ~100M), I would not consider batching at pipeline level (just enable partitioning + parallel copy).
Pipeline-level batching (ForEach, parameterized WHERE clauses) is an additional layer if you need more control or want to orchestrate multiple copy operations concurrently.
https://learn.microsoft.com/en-us/fabric/data-factory/copy-activity-performance-and-scalability-guid...
Are there any official best‑practice recommendations from the Fabric team regarding Batch size, ID‑based vs Date‑based batching and Concurrency settings when using Copy Job with SQL Server?
RA: For batch size, I don't think there is a “read batch size” concept for SQL Server source. Batch size is only relevant for writes (destination) and this is managed automatically by Fabric.
Regarding ID-based vs Date-based batching, prioritise dynamic partitioning with integer columns (int, bigint, etc.), then date / datetime columns (if integer columns don't exist). Please use a well-distributed column (avoid skew).
For concurrency settings, there are 3 levers to play with:
- Degree of copy parallelism = number of parallel threads / SQL queries
- Partitioning = multiple parallel reads from SQL Server
- Pipeline-level batching = multiple Copy activities in parallel (ForEach)
https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-database-copy-activity
Appreciate if you can 'Kudos' and/or 'Accept as Solution' if this answered your query.
Hey @pallaravikiran9,
Appreciate the question structure; very helpful.
By default, Fabric doesn't auto-batch large queries, intead a single thread is used. For 1.5 billion records, default settings take around 2 hours to run. So, for larger datasets we should explicitly enable parallelism.
To enable parallel reads, in the Copy Activity, go to Source > Advanced > Partition option, choose:
Physical partitions give the most balanced result across duration, capacity units, and source compute overhead.
Give this Microsoft doc a read, it answers all of your queries in brief along with real test results by developers: https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases
There is one more option "Enable Staging", but it is relevant for Fabric Warehouses. For your SQL Server to Lakehouse setup, this option is not required.
Hope this helps!
Best,
Harshit
Hi @pallaravikiran9,
Hope you're doing well!
This is my recommandationfor your use case :
Source: SQL Server (billions of rows)
Target: Fabric Lakehouse
- Use Dynamic Range Partitioning on a numeric ID column
- Set explicit partition count (start at 8, tune from there)
- Set DIUs to 16–32 for large tables
- Consider staging to Parquet in Lakehouse, then processing with Spark
- Don't rely on default single-stream for billion-row tables
- Add retry policies on the Copy Activity for resilience
Here's the docs :
1. https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance
2. https://learn.microsoft.com/en-us/azure/data-factory/connector-sql-server
3. https://learn.microsoft.com/en-us/fabric/data-factory/copy-job-overview
Hope this helps. Feel free to ask me questions if needed, and don’t forget to Accept as Solution if this guidance worked for you. That's motivate me to keep helping.
Best regards,
Oussama (Data Consultant & Fabric's Expert)
Did my response help you? Clicking Kudos is a small gesture that goes a long way, it encourages contributors and helps the community thrive!
✅ Did I answer your question? Please mark my post as a Solution, it helps others find the answer faster.
Senior Data & BI Consultant · Microsoft Fabric & Power BI Specialist
Check out the June 2026 Fabric update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
| User | Count |
|---|---|
| 28 | |
| 23 | |
| 17 | |
| 15 | |
| 13 |