Today I Learned

A Hashrocket project

Split large file into multiple smaller files

Bash has a handy tool to split files into multiple pieces. This can be useful as a precursor to some parallelized processing of a large file. Lets say you have gigabytes of log files you need to search through, splitting the files into smaller chunks is one way to approach the problem.

> seq 10 > large_file.txt
> split -l2 large_file.txt smaller_file_
> ls -1
smaller_file_aa
smaller_file_ab
smaller_file_ac
smaller_file_ad
smaller_file_ae
large_file.txt

First, I created a “large” file with ten lines. Then, I split that file into files with the prefix smaller_file_. The -l2 option for split tells split to make every 2 lines a new file. For 10 lines, we’ll get 5 files. The suffix it adds (“aa”, “ab”, …) sorts lexigraphically so that we can reconstruct the file with cat and globs.

> cat smaller_file*
1
2
3
4
5
6
7
8
9
10