How to Split Text Between Separator into Multiple Files

bashsplittext processing

I have a text file contaning the following:

"random
textA"
"random
random
textB"

The separator is "

How can I split the containt into multiple file as follow using a bash command?

File 1 :

random
textA

File 2 :

random
random
textB

I came into examples using csplit or awk but they does not cover this text layout.

Best Answer

Simple awk command:

awk 'NR%2==0{ print > "File "++i }' RS='"' file

RS defines " as record separator and NR is the record number. If the record number was modulo of 2 (because we have another first " for records), then print the current record $0 into a File #.

Related Solutions

Split: how to split into different percentages

The commands below will work for percentages above 50% (if you want to split only into two files), quick and dirty approach.

1) split 70% based on lines

split -l $[ $(wc -l filename|cut -d" " -f1) * 70 / 100 ] filename

2) split 70% based on bytes

split -b $[ $(wc -c filename|cut -d" " -f1) * 70 / 100 ] filename

Text Processing – How to Split a Text File into Multiple Files

And here's a nice, simple, gawk one-liner :

$ gawk '/^\[/{match($0, /^\[ (.+?) \]/, k)} {print >k[1]".txt" }' entry.txt

This will work for any file size, irrespective of the number of lines in each entry, as long as each entry header looks like [ blahblah blah blah ]. Notice the space just after the opening [ and just before the closing ].

EXPLANATION:

awk and gawk read an input file line by line. As each line is read, its contents are saved in the $0 variable. Here, we are telling gawk to match anything within square brackets, and save its match into the array k.

So, every time that regular expression is matched, that is, for every header in your file, k[1] will have the matched region of the line. Namely, "entry1", "entry2" or "entry3" or "entryN".

Finally, we print each line into a file called <whatever value k currently has>.txt, ie entry1.txt, entry2.txt ... entryN.txt.

This method will be much faster than perl for larger files.

Best Answer

Related Solutions

Split: how to split into different percentages

Text Processing – How to Split a Text File into Multiple Files

Related Question