Text Processing – Remove Newlines After Empty Line

perlsedtext processingtr

Data

4. Alendronic acid
A. Antiosteoporotic agent. 
B. Inhibit osteoclast formation and function by inhibiting FPPS enzyme, so increase bone mass. 
C. Osteoporosis in combination with vitamin D. 

5. Aminophylline
A. Methylxanthine. Less potent and shorter-acting bronchodilator than Theophylline. 
B. Phosphodiesterase (PDE) inhibitor, so increase cAMP so affecting calcium so relaxes respiratory SM and dilates bronchi/bronchioles. 
C. Last option of asthma attack, COPD, Reversible airways obstruction. 

which I want to be (and later without the empty line as in the pseudocode below explained)

4. Alendronic acid
A. Antiosteoporotic agent. B. Inhibit osteoclast formation and function by inhibiting FPPS enzyme, so increase bone mass. C. Osteoporosis in combination with vitamin D. 

5. Aminophylline
A. Methylxanthine. Less potent and shorter-acting bronchodilator than Theophylline. B. Phosphodiesterase (PDE) inhibitor, so increase cAMP so affecting calcium so relaxes respiratory SM and dilates bronchi/bronchioles. C. Last option of asthma attack, COPD, Reversible airways obstruction. 

My attempt was originally based on the idea of removing all empty lines by gsed -n "s/^$//;t;p;" but this is not possible now.

Pseudocode

  • remove all newlines (not empty lines) by tr '\n' ' ' (everything now one liners but problem since takes also empty lines!)
  • replace all A. by \nA. by sed 's#A.#\nA.#'
  • remove all empty lines by gsed -n "s/^$//;t;p;"

Pseudocode in summary

cat                                 \
     10.6.2015.tex                  \
                                    \
| tr '\n' ' '                       \
                                    \
| sed 's#A.#\nA.#'                  \
                                    \
| gsed -n "s/^$//;t;p;"             \
                                    \
> 10.6.2015_quizlet.tex

which is however wrong because of the logical mistake in the first line.

How can remove newlines after empty line in Perl/Sed/tr?

Best Answer

I would use perl or awk to read the data a paragraph at a time, and remove all but the first newline:

perl -00 -pe '$\="\n\n"; s/\n/\0/; s/\n//g; s/\0/\n/' file

Commented

perl -00 -pe '   # each record is separated by blank lines (-00)
                 # read the file a record at a time and auto-print (-p)
    $\="\n\n";   # auto-append 2 newlines to each record
    s/\n/\0/;    # turn the first newline into a null byte
    s/\n//g;     # remove all other newlines
    s/\0/\n/     # restore the first newline
' file

Similarly

awk -v RS= -F'\n' '{print $1; for (i=2; i<=NF; i++) printf "%s", $i; print ""; print ""}' file
Related Question