I have a long text file (a tab-file for stardict-editor) which consists of lines in the following format:
word1 some text
word1 some other text
word2 more text
word3 even more
and would like to convert it to
word1 some text<br>some other text
word2 more text
word3 even more
This means that subsequent lines (the file is sorted) which start with the same word should be merged to a single one (here the definitions are separated with <br>
). Lines with equal beginning can also appear more often than just twice. The character which separates word and definition is a tab-character and is unique on each line. word1
, word2
, word3
are of course placeholders for something arbitrary (except tab and newline characters) which I don't know in advance.
I can think of a longer piece of Perl code which does this, but wonder if there is a short solution in Perl or something for the command line. Any ideas?
Best Answer
This is standard procedure for
awk
If file is sorted by first word in line the script can be more simple
Or just
bash