How to Split a File by Keyword Boundaries

I have a vcf file that contains numerous vcards.

When importing the vcf file to outlook it seems to only import the first vcard.

Hence I want to split them up.

Given that a vcard starts with

BEGIN:VCARD

and ends with

END:VCARD

What is the best way to split each vcard into it's own file.

Thanks

UPDATE

Thanks for all the responses. As with questions of this nature there's various ways to skin a cat. Here's the reasoning why I chose the one I did.

ROUND-UP

Here's a roundup of what I liked from each answer and what drove me to select one of them.

csplit: I really really liked the conciseness of this method. I just wished it was able to also set the file extension.
gawk: It did everything i asked of it.
paralell: Worked. But I had to install new things. (it also decided to make a new /bin dir in my home dir)
perl: I liked that it created vcf based on contact's name. But the -o option didn't really work

Conclusion

So the first one to go was perl because it was a bit broken
Next was paralell because I had to install new things
Next was csplit, because as far as I can see it can't create extensions on the output files
So the award goes to gawk, for being a utility that's readily available, and versatile enough that I can chop and change the file name a bit. Bonus marks for cmp too 🙂

$ curl -O https://raw.githubusercontent.com/qtproject/qt-mobility\ /d7f10927176b8c3603efaaceb721b00af5e8605b/demos/qmlcontacts/contents/\ example.vcf $ gawk ' /BEGIN:VCARD/ { close(fn); ++a; fn=sprintf("card_%02d.vcf", a); print "Writing: ", fn } { print $0 > fn; } ' example.vcf Writing: card_01.vcf Writing: card_02.vcf Writing: card_03.vcf Writing: card_04.vcf Writing: card_05.vcf Writing: card_06.vcf Writing: card_07.vcf Writing: card_08.vcf Writing: card_09.vcf $ cat card_0* > all.vcf $ cmp example.vcf all.vcf $ echo $? 0

Details

The awk line works like this: a is counter that is incremented on each BEGIN:VCARD line and at the same time the output filename is constructed using sprintf (stored in fn). For each line the current line ($0) is appended to the current file (named fn).

The last echo $? means that the cmp was successful, i.e. all single files concatenated are equal to the original example vcf example.

Note that the output redirection in awk works differently than in shell. That means that with > fn awk first checks if the file is already open. If it is already open then awk appends to it. If it is not then it opens and truncates it.

Because of this redirection logic we have to explicitly close the implicitly opened files, since otherwise the call would hit the open file limit in cases where the input file contains many records.

How to Split a File by Keyword Boundaries

Best Answer

Details

Related Question

Best Answer

Details

Related Solutions

Split: how to split into different percentages

How to split file and save parts to multiple locations

Related Question