Text Processing with Perl – How to Replace All Semicolons After the First One

perltext processing

This problem is related to my attempt to import questions and their answers in a Excel file into .txt -file which Anki flashcard program handles as described here.
I cannot have more than 2 fields so I need to make options one field.

Data stored as CSV from LibreOffice (semicolon as field separator – only distinction what the manual says) as instructed in Anki manual

Question ipsun; option 1 ; option 2 ; option 3 ; option 4 ; ... ; option n
Question ipsun; option 1 ; option 2 ; option 3 ; option 4 ; ... ; option n
...

where each entry with all options is in one line i.e. one "flashcard". In one card, front-part before semicolon, and back-part after semicolon. Second flashcard in newline and so on.

Wanted output which should be in UTF-8

Question ipsun; option 1 | option 2 | option 3 | option 4 | ... | option n
Question ipsun; option 1 | option 2 | option 3 | option 4 | ... | option n
...

My pseudocode in Perl based on this answer

perl -00 -pe s/;/\0/; s/;/ |/g; s/\0/;/' file

Commented

perl -00 -pe '   # each record is separated by blank lines (-00)
                 # read the file a record at a time and auto-print (-p)
    s/;/\0/;    # turn the first semicolon into a null byte
    s/;/ |/g;     # replace all other semicolons with " |"
    s/\0/;/     # restore the first semicolon
' file

How can you replace all semicolons after 1st semicolon?

Best Answer

sed 'y/|;/\n|/;s/|/;/;y/\n/|/' <<\IN
Question ipsun; option 1 ; option 2 ; option 3 ; option 4 ; ... ; option n
IN

Note that this does not use a regexp to handle the majority of the replacements, but rather uses a more basic (and far more performant) translation function to do so - and does so in a POSIX portable fashion. This should work on any machine with a POSIX sed installed.

It translates ; semicolons to | pipes and | pipes to \newlines simultaneously. The | pipes are set aside as \newlines in case any occur on an input line. It then s///ubstitutes the first occurring | pipe for a ; semicolon, and then translates all \newlines to | pipes - thus restoring any it might have set aside to robustly handle the single s///ubstitution.

While I use a <<\IN here-document for the sake of copy/pastable demonstration, you should probably use <infile >outfile.

OUTPUT:

Question ipsun; option 1 | option 2 | option 3 | option 4 | ... | option n
Related Question