Get certified for free when you join Fabric Data Days 2026 and dive into Fabric, Power BI, SQL, AI, and other essential data skills.
Join nowData Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more
Would like to suggest adding file splitting capability for ingestion, perhaps for a particular set of formats. This could facilitate for our users the sheer necessity of having to load a csv/scsv/txt file that is larger than the current 4GB limit. Consider adding this option for Azure Storage source in the new Get Data interface of KQL Database.
Here's an example of using split command in WSL for a 13GB text file delimited by semi-colon and doesn't contain multi-line values.
split -l 76923077 measurements.txt measurements_split_
I basically divided 1 billion rows by 13 files using a calculator to get the line count & rounded up by 1 to get 13 files of ~1GB in size.
This is the run time of the command.
real 3m9.343s
user 0m14.828s
sys 2m15.161s
Here's another example to accomplish the same using split.
# Calculate the number of lines per file
total_lines=$(wc -l < measurements.txt)
num_files=13
((lines_per_file = (total_lines + num_files - 1) / num_files))
# Split the file, maintaining lines
split --lines=${lines_per_file} measurements.txt measurements_split_
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.