Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more

Splitting of files for ingestion

Would like to suggest adding file splitting capability for ingestion, perhaps for a particular set of formats. This could facilitate for our users the sheer necessity of having to load a csv/scsv/txt file that is larger than the current 4GB limit. Consider adding this option for Azure Storage source in the new Get Data interface of KQL Database. 


Here's an example of using split command in WSL for a 13GB text file delimited by semi-colon and doesn't contain multi-line values. 

split -l 76923077 measurements.txt measurements_split_

I basically divided 1 billion rows by 13 files using a calculator to get the line count & rounded up by 1 to get 13 files of ~1GB in size.

This is the run time of the command.

real 3m9.343s

user 0m14.828s

sys 2m15.161s


Here's another example to accomplish the same using split.

# Calculate the number of lines per file

total_lines=$(wc -l < measurements.txt)

num_files=13

((lines_per_file = (total_lines + num_files - 1) / num_files))


# Split the file, maintaining lines

split --lines=${lines_per_file} measurements.txt measurements_split_

Status: Under Review
Comments
fbcideas_migusr
New Member
Status changed to: Under Review