Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during AI Skills Fest. This week only. Secure your voucher now.

Reply
pallaravikiran9
New Member

Does Microsoft Fabric Apply Automatic Backend Batching When Copying Data from SQL Server?

Hello Fabric Community,

I am using Microsoft Fabric Data Pipelines – Copy Job activity to load data from SQL Server into a Lakehouse.

I would like to understand Fabric’s behaviour regarding backend batching / chunking during data extraction from SQL Server.

My Questions:

  1. When copying data from SQL Server using a Copy Job activity, does Fabric automatically split large result sets into internal batches (for example, by primary key or row ranges) to improve performance?

  2. If a SQL query returns a very large number of rows (millions or more), does Fabric:

    • Execute the query once and stream all rows, or
    • Internally generate multiple SQL executions to optimise throughput?
  3. From a performance and scalability perspective:

    • Is it recommended / required to explicitly design batching at the pipeline level (for example using parameterised queries with ForEach loops)?
    • Or does Fabric already handle this optimisation internally for SQL Server sources?
  4. Are there any official best‑practice recommendations from the Fabric team regarding:

    • Batch size
    • ID‑based vs Date‑based batching
    • Concurrency settings when using Copy Job with SQL Server?

Context:

  • Source: SQL Server (row‑based tables)
  • Target: Fabric Lakehouse / OneLake
  • Activity: Copy Job
  • Volume: Large tables (millions to billions of rows)

Any clarification or official documentation references would be greatly appreciated.

Thanks in advance!

3 ACCEPTED SOLUTIONS
oussamahaimoud
Solution Sage
Solution Sage

Hi @pallaravikiran9,

Hope you're doing well!

 

This is my recommandationfor your use case :

 

Source: SQL Server (billions of rows)
Target: Fabric Lakehouse

Use Dynamic Range Partitioning on a numeric ID column
- Set explicit partition count (start at 8, tune from there)
- Set DIUs to 16–32 for large tables
- Consider staging to Parquet in Lakehouse, then processing with Spark
- Don't rely on default single-stream for billion-row tables
- Add retry policies on the Copy Activity for resilience

 

Here's the docs : 

1. https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance

2. https://learn.microsoft.com/en-us/azure/data-factory/connector-sql-server

3. https://learn.microsoft.com/en-us/fabric/data-factory/copy-job-overview

 

Hope this helps. Feel free to ask me questions if needed, and don’t forget to Accept as Solution if this guidance worked for you. That's motivate me to keep helping.

 

Best regards,

Oussama (Data Consultant & Fabric's Expert)


  Did my response help you? Clicking Kudos is a small gesture that goes a long way, it encourages contributors and helps the community thrive!


Did I answer your question? Please mark my post as a Solution, it helps others find the answer faster.


Senior Data & BI Consultant · Microsoft Fabric & Power BI Specialist


Connect with me on LinkedIn

View solution in original post

stoic-harsh
Solution Supplier
Solution Supplier

Hey @pallaravikiran9,

Appreciate the question structure; very helpful.

 

By default, Fabric doesn't auto-batch large queries, intead a single thread is used. For 1.5 billion records, default settings take around 2 hours to run. So, for larger datasets we should explicitly enable parallelism.

 

To enable parallel reads, in the Copy Activity, go to Source > Advanced > Partition option, choose:

  • Physical partitions of table. If your SQL Server table already has partitions defined. Zero config, Fabric auto-detects them. Best choice when available.
  • Dynamic range. If no physical partitions exist. Fabric splits by a numeric/date column. Fabric auto-detects them, if you don't specify.
  • Then set Degree of Copy Parallelism in the Settings tab. Start with 4–8 and tune from there.

Physical partitions give the most balanced result across duration, capacity units, and source compute overhead.

 

Give this Microsoft doc a read, it answers all of your queries in brief along with real test results by developers: https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases

 

There is one more option "Enable Staging", but it is relevant for Fabric Warehouses. For your SQL Server to Lakehouse setup, this option is not required.

Hope this helps!

Best,

Harshit

View solution in original post

rizalard0684
Resolver II
Resolver II

hi @pallaravikiran9 great question, let me try to answer it in "chunks":

 

  1. When copying data from SQL Server using a Copy Job activity, does Fabric automatically split large result sets into internal batches (for example, by primary key or row ranges) to improve performance?
    RA: Fabric has the capability (for SQL Server connector) to chunk a large SQL Server query into multiple SQL statements. This is supported by "Partition option" in the built-in configuration of SQL Server connector. This is not done automatically.
    https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-server-copy-activity

     

  2. If a SQL query returns a very large number of rows (millions or more), does Fabric execute the query once and stream all rows, or internally generate multiple SQL executions to optimise throughput?
    RA: Depends if 
    partitioned copy is enabled or not, assuming it is enabled, you can set "Degree of copy parallelism" as well which allows for multiple SQL executions.
    https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-server-copy-activity

  3. From a performance and scalability perspective, Is it recommended / required to explicitly design batching at the pipeline level (for example using parameterised queries with ForEach loops)? Or does Fabric already handle this optimisation internally for SQL Server sources?
    RA: If your goal is high throughput for one large table (e.g. table size > ~50GB or rows > ~100M), I would not consider batching at pipeline level (just enable partitioning + parallel copy). 
    Pipeline-level batching (ForEach, parameterized WHERE clauses) is an additional layer if you need more control or want to orchestrate multiple copy operations concurrently. 
    https://learn.microsoft.com/en-us/fabric/data-factory/copy-activity-performance-and-scalability-guid...

  4. Are there any official best‑practice recommendations from the Fabric team regarding Batch size, ID‑based vs Date‑based batching and Concurrency settings when using Copy Job with SQL Server?
    RA: For batch size, I don't think there is a “read batch size” concept for SQL Server source. Batch size is only relevant for writes (destination) and this is managed automatically by Fabric.
    Regarding ID-based vs Date-based batching, prioritise d
    ynamic partitioning with integer columns (int, bigint, etc.), then date / datetime columns (if integer columns don't exist). Please use a well-distributed column (avoid skew).
    For concurrency settings, there are 3 levers to play with:
    - Degree of copy parallelism = number of parallel threads / SQL queries
    - Partitioning = multiple parallel reads from SQL Server

    - Pipeline-level batching = multiple Copy activities in parallel (ForEach)
    https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-database-copy-activity

Appreciate if you can 'Kudos' and/or 'Accept as Solution' if this answered your query.

View solution in original post

5 REPLIES 5
v-kpoloju-msft
Community Support
Community Support

Hi @pallaravikiran9

Thank you for reaching out to the Microsoft Fabric Community Forum. Also, thanks to @rizalard0684@stoic-harsh@oussamahaimoud, for those inputs on this thread.

Has your issue been resolved? If the response provided by the community member @rizalard0684@stoic-harsh@oussamahaimoud, addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.

Hope this helps clarify things and let me know what you find after giving these steps a try happy to help you investigate this further.

Thank you for using the Microsoft Community Forum.

Hi @pallaravikiran9

Just checking in to see if the issue has been resolved on your end. If the earlier suggestions helped, that’s great to hear! And if you’re still facing challenges, feel free to share more details happy to assist further.

Thank you.

rizalard0684
Resolver II
Resolver II

hi @pallaravikiran9 great question, let me try to answer it in "chunks":

 

  1. When copying data from SQL Server using a Copy Job activity, does Fabric automatically split large result sets into internal batches (for example, by primary key or row ranges) to improve performance?
    RA: Fabric has the capability (for SQL Server connector) to chunk a large SQL Server query into multiple SQL statements. This is supported by "Partition option" in the built-in configuration of SQL Server connector. This is not done automatically.
    https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-server-copy-activity

     

  2. If a SQL query returns a very large number of rows (millions or more), does Fabric execute the query once and stream all rows, or internally generate multiple SQL executions to optimise throughput?
    RA: Depends if 
    partitioned copy is enabled or not, assuming it is enabled, you can set "Degree of copy parallelism" as well which allows for multiple SQL executions.
    https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-server-copy-activity

  3. From a performance and scalability perspective, Is it recommended / required to explicitly design batching at the pipeline level (for example using parameterised queries with ForEach loops)? Or does Fabric already handle this optimisation internally for SQL Server sources?
    RA: If your goal is high throughput for one large table (e.g. table size > ~50GB or rows > ~100M), I would not consider batching at pipeline level (just enable partitioning + parallel copy). 
    Pipeline-level batching (ForEach, parameterized WHERE clauses) is an additional layer if you need more control or want to orchestrate multiple copy operations concurrently. 
    https://learn.microsoft.com/en-us/fabric/data-factory/copy-activity-performance-and-scalability-guid...

  4. Are there any official best‑practice recommendations from the Fabric team regarding Batch size, ID‑based vs Date‑based batching and Concurrency settings when using Copy Job with SQL Server?
    RA: For batch size, I don't think there is a “read batch size” concept for SQL Server source. Batch size is only relevant for writes (destination) and this is managed automatically by Fabric.
    Regarding ID-based vs Date-based batching, prioritise d
    ynamic partitioning with integer columns (int, bigint, etc.), then date / datetime columns (if integer columns don't exist). Please use a well-distributed column (avoid skew).
    For concurrency settings, there are 3 levers to play with:
    - Degree of copy parallelism = number of parallel threads / SQL queries
    - Partitioning = multiple parallel reads from SQL Server

    - Pipeline-level batching = multiple Copy activities in parallel (ForEach)
    https://learn.microsoft.com/en-us/fabric/data-factory/connector-sql-database-copy-activity

Appreciate if you can 'Kudos' and/or 'Accept as Solution' if this answered your query.

stoic-harsh
Solution Supplier
Solution Supplier

Hey @pallaravikiran9,

Appreciate the question structure; very helpful.

 

By default, Fabric doesn't auto-batch large queries, intead a single thread is used. For 1.5 billion records, default settings take around 2 hours to run. So, for larger datasets we should explicitly enable parallelism.

 

To enable parallel reads, in the Copy Activity, go to Source > Advanced > Partition option, choose:

  • Physical partitions of table. If your SQL Server table already has partitions defined. Zero config, Fabric auto-detects them. Best choice when available.
  • Dynamic range. If no physical partitions exist. Fabric splits by a numeric/date column. Fabric auto-detects them, if you don't specify.
  • Then set Degree of Copy Parallelism in the Settings tab. Start with 4–8 and tune from there.

Physical partitions give the most balanced result across duration, capacity units, and source compute overhead.

 

Give this Microsoft doc a read, it answers all of your queries in brief along with real test results by developers: https://learn.microsoft.com/en-us/fabric/data-factory/copy-performance-sql-databases

 

There is one more option "Enable Staging", but it is relevant for Fabric Warehouses. For your SQL Server to Lakehouse setup, this option is not required.

Hope this helps!

Best,

Harshit

oussamahaimoud
Solution Sage
Solution Sage

Hi @pallaravikiran9,

Hope you're doing well!

 

This is my recommandationfor your use case :

 

Source: SQL Server (billions of rows)
Target: Fabric Lakehouse

Use Dynamic Range Partitioning on a numeric ID column
- Set explicit partition count (start at 8, tune from there)
- Set DIUs to 16–32 for large tables
- Consider staging to Parquet in Lakehouse, then processing with Spark
- Don't rely on default single-stream for billion-row tables
- Add retry policies on the Copy Activity for resilience

 

Here's the docs : 

1. https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance

2. https://learn.microsoft.com/en-us/azure/data-factory/connector-sql-server

3. https://learn.microsoft.com/en-us/fabric/data-factory/copy-job-overview

 

Hope this helps. Feel free to ask me questions if needed, and don’t forget to Accept as Solution if this guidance worked for you. That's motivate me to keep helping.

 

Best regards,

Oussama (Data Consultant & Fabric's Expert)


  Did my response help you? Clicking Kudos is a small gesture that goes a long way, it encourages contributors and helps the community thrive!


Did I answer your question? Please mark my post as a Solution, it helps others find the answer faster.


Senior Data & BI Consultant · Microsoft Fabric & Power BI Specialist


Connect with me on LinkedIn

Helpful resources

Announcements
June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.