Solved: Re: Load of large CSV file to Delta-Lake table

htrivedi · ‎09-17-2025

Environment:

F2 capacity, US East 2 region, one workspace, one lakehouse, default optimization

Issue:

Attempting to load large CSV file (Files folder, 275Mb, 600k rows) to Delta-Lake Table in Tables folder

Load using UI fails with 40,000% usage spike as per Fabric Capacity Metrics tool

Error Message:

This spark job can't be run because you have hit a spark compute or API rate limit.

Questions:

a) What is the recommended method to load similar large files of size (275Mb CSV, 600k rows)?

b) Are such large loads supported by Fabric UI using F2 capacity?

i) Which changes to default configuration are needed to enable success for such large loads?

c) What are the typical file sizes (Mb) and corresponding methods (UI, Notebooks etc) to be used for Fabric loads using F2 capacity?

Thank you for your reply.

v-veshwara-msft · ‎09-22-2025

Hi @htrivedi ,

Thanks for reaching out to Microsoft Fabric Community.

From the description, the error comes from Spark compute saturation. F2 is the smallest Fabric capacity, with very limited Spark concurrency and memory. While 275 MB and 600k rows is not considered large for bigger capacities, the UI based “Load to Table” process launches a Spark job with schema inference and implicit type conversions. On F2 this can exceed available resources and lead to the spike you observed.

To answer your questions,

a) What is the recommended method to load similar large files of size (275 MB CSV, 600k rows)?
The more reliable method is to copy the CSV into the Files section of your Lakehouse and then use a Notebook or Dataflow Gen2 to write it into a Delta table. This provides more control over schema inference, partitioning and write behavior. For example:

df = spark.read.option("header", True).csv("Files/file.csv")
df.write.format("delta").mode("overwrite").saveAsTable("TableName")

This avoids some of the overhead that comes with the UI process.

b) Are such large loads supported by Fabric UI using F2 capacity?
In practice, UI loads on F2 work reliably with smaller files, as you noticed with your 1 MB file. At 275 MB the Spark job created by the UI might often hit compute limit and fail. While technically supported, it is not reliable on F2 for files of this size. As suggested by @tayloramy you can try copying the file into the Lakehouse in binary form first and then parsing it into a Delta table afterward, which may help.

c) Which changes to default configuration are needed to enable success for such large loads?
There are no Spark configuration settings exposed on F2. To improve reliability you could split the file into smaller chunks before loading, or use the binary copy approach as mentioned earlier. If this workload will be repeated, moving to a higher capacity such as F4 will give more Spark vCores and memory to handle the load. It is also a good idea to check if other processes are consuming capacity at the same time. If the capacity is under sustained load, restarting the capacityl can sometimes help clear up resources.

d) What are the typical file sizes (MB) and corresponding methods (UI, Notebooks etc) to be used for Fabric loads using F2 capacity?
As a general guideline, UI based loads are best for small files. Files in the 100 MB to 1 GB range are more reliably handled through notebooks or Dataflow Gen2 where you have more control over the process. Larger files, such as multi-GB CSVs or multi-million row datasets, are best managed on higher capacities such as F4 or F8.

For further reference you may find these useful:

Solved: Should queries on lower F SKU's fail or just be sl... - Microsoft Fabric Community

Solved: Re: Guidance Needed for Importing Large Datasets i... - Microsoft Fabric Community

Concurrency limits and queueing in Apache Spark for Fabric - Microsoft Fabric | Microsoft Learn

Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn

Hope this helps. Please reach out for further assistance.

Thank you.

View solution in original post

v-veshwara-msft · ‎09-22-2025

Hi @htrivedi ,

Thanks for reaching out to Microsoft Fabric Community.

From the description, the error comes from Spark compute saturation. F2 is the smallest Fabric capacity, with very limited Spark concurrency and memory. While 275 MB and 600k rows is not considered large for bigger capacities, the UI based “Load to Table” process launches a Spark job with schema inference and implicit type conversions. On F2 this can exceed available resources and lead to the spike you observed.

To answer your questions,

a) What is the recommended method to load similar large files of size (275 MB CSV, 600k rows)?
The more reliable method is to copy the CSV into the Files section of your Lakehouse and then use a Notebook or Dataflow Gen2 to write it into a Delta table. This provides more control over schema inference, partitioning and write behavior. For example:

df = spark.read.option("header", True).csv("Files/file.csv")
df.write.format("delta").mode("overwrite").saveAsTable("TableName")

This avoids some of the overhead that comes with the UI process.

b) Are such large loads supported by Fabric UI using F2 capacity?
In practice, UI loads on F2 work reliably with smaller files, as you noticed with your 1 MB file. At 275 MB the Spark job created by the UI might often hit compute limit and fail. While technically supported, it is not reliable on F2 for files of this size. As suggested by @tayloramy you can try copying the file into the Lakehouse in binary form first and then parsing it into a Delta table afterward, which may help.

c) Which changes to default configuration are needed to enable success for such large loads?
There are no Spark configuration settings exposed on F2. To improve reliability you could split the file into smaller chunks before loading, or use the binary copy approach as mentioned earlier. If this workload will be repeated, moving to a higher capacity such as F4 will give more Spark vCores and memory to handle the load. It is also a good idea to check if other processes are consuming capacity at the same time. If the capacity is under sustained load, restarting the capacityl can sometimes help clear up resources.

d) What are the typical file sizes (MB) and corresponding methods (UI, Notebooks etc) to be used for Fabric loads using F2 capacity?
As a general guideline, UI based loads are best for small files. Files in the 100 MB to 1 GB range are more reliably handled through notebooks or Dataflow Gen2 where you have more control over the process. Larger files, such as multi-GB CSVs or multi-million row datasets, are best managed on higher capacities such as F4 or F8.

For further reference you may find these useful:

Solved: Should queries on lower F SKU's fail or just be sl... - Microsoft Fabric Community

Solved: Re: Guidance Needed for Importing Large Datasets i... - Microsoft Fabric Community

Concurrency limits and queueing in Apache Spark for Fabric - Microsoft Fabric | Microsoft Learn

Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn

Hope this helps. Please reach out for further assistance.

Thank you.

spaceman127 · ‎09-21-2025

Hi @htrivedi,

That's really interesting. I often use smaller capacities to load data into a Lakehouse Table.
I have CSV files that are just as large myself, and they don't fail.
600,000 rows should not be a problem.
In my opinion, there must be another reason for this.

Perhaps you have further information for us.
Example: How is the data loaded?

A second question as an example: do you already do something with F2 beforehand?

Best regards

BhaveshPatel · ‎09-20-2025

Hi @htrivedi

You can use two methods to load big data:

1. Fabric Embedded : 'Pay as you go'. Until cluster is running, increase the payload size to F64 capacity, and then decrease the cluster load ( Virtal machine)

2. Fabric Capacity : For large sizes of CSV file, You atleast need F64 capacity and you have to pay for 1 year subscription.

What method would you like to achieve?

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

tayloramy · ‎09-17-2025

Hi @htrivedi,

You can use the Fabric Capacity Estimator | Microsoft Fabric to estimate capacities.

Are you doing any transformations to the data while loading it? I've noticed that doing transformations in the copy job create a massive spike in compute use.

Another thing to try is copying the CSV files as binary into the lakehouse files, and then parsing them into tables once they're in the lakehouse.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

htrivedi · ‎09-17-2025

Hello @tayloramy

Thank you for your response.

We are not performing any explicit transformations. I am sure there would be implicit transformations on-going (e.g. converting an ascii number from the csv file into a number datatype).

Notably the smaller files (approx. 1Mb) were loaded successfully.

tayloramy · ‎09-17-2025

Hi @htrivedi,

Try putting your requirements into the capacity estimator and see what it spits out.

I'm also curious to see if copying the data as binary works any better.

I don't have much real work experience with capacities that small, all of mine are F64, so I would love to learn what the limits are and if the binary copy will help.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.