The commands below will work for percentages above 50% (if you want to split only into two files), quick and dirty approach.
1) split 70% based on lines
split -l $[ $(wc -l filename|cut -d" " -f1) * 70 / 100 ] filename
2) split 70% based on bytes
split -b $[ $(wc -c filename|cut -d" " -f1) * 70 / 100 ] filename
And here's a nice, simple, gawk one-liner :
$ gawk '/^\[/{match($0, /^\[ (.+?) \]/, k)} {print >k[1]".txt" }' entry.txt
This will work for any file size, irrespective of the number of lines in each entry, as long as each entry header looks like [ blahblah blah blah ]
. Notice the space just after the opening [
and just before the closing ]
.
EXPLANATION:
awk
and gawk
read an input file line by line. As each line is read, its contents are saved in the $0
variable. Here, we are telling gawk
to match anything within square brackets, and save its match into the array k
.
So, every time that regular expression is matched, that is, for every header in your file, k[1] will have the matched region of the line. Namely, "entry1", "entry2" or "entry3" or "entryN".
Finally, we print each line into a file called <whatever value k currently has>.txt
, ie entry1.txt, entry2.txt ... entryN.txt.
This method will be much faster than perl for larger files.
Best Answer
Simple
awk
command:RS
defines"
as record separator andNR
is the record number. If the record number was modulo of 2 (because we have another first"
for records), then print the current record$0
into aFile #
.