Bash – How to delete all unordered lines from a text file

awkbashsedtext processing

Overview

Consider an ordered list interspersed with unordered elements, e.g.:

Alligator
Ant
Falcon <--
Baboon
Badger    
Armadillo <--
Caiman
Cat

How can this list be processed so that all unordered elements are deleted? E.g.:

Alligator
Ant
Baboon
Badger    
Caiman
Cat

Some more information

The unordered elements are always singular, the ordered elements come in groups of at least 2 lines. The general pattern would be:

ordered
ordered
ordered
unordered <--
ordered
ordered
unordered <--
ordered
ordered

The unordered elements can be both lower…

A
B
F <---
D
E

…and higher than the following ordered element:

A
C
B <---
D
E

To make matters even more difficult: The elements can be both upper- and lowercase and contain diacritics (e.g.: ä,ö,à).


Is there any way to accomplish this with bash?

Best Answer

This works if the last line is OK:

awk 'BEGIN {IGNORECASE=1}; NR==1 {lastline=$0; next;}; {if($0>lastline) {print lastline; '\
'lastline2=lastline; lastline=$0;} else if ($0>lastline2) lastline=$0; }; '\
'END {print lastline;}' file1.txt

old version (with bugs, for comparison)

awk 'BEGIN {IGNORECASE=1}; NR==1 {lastline=$0; next;}; '\
'{if($0>lastline) print lastline; lastline=$0;}; END {print lastline;}' file