Ubuntu – Create csv from inconsistent text file

awkcommand linecsvsedtext processing

I have loosely structured records in a file consisting of either 3 or 4 lines of text separated (mostly) by a blank line. Not all records have a blank line separator, but the last line of each starts with the word "Added". I would like to produce a csv file with each record on one line preceded by its line number. So far I have only been able to produce a concatenation of all records separated by an arbitrary number of spaces and a redundant comma.

Logically I am trying to achieve the following:

Read line, if line starts 'Added' keep newline at end
else replace 'newline' with ','
or if line is blank delete it
endif

Sample data:

Peter Green  
Space Monkey at Area 51  
Joined  
Added by SF 3 weeks ago  
Will Rossiter  
Joined  
Added by SF 3 weeks ago

Dean Matthews  
Guitarist at Blues  
Joined  
Added by SF 3 weeks ago  
Hobbit Mak  
Farnborough, United Kingdom  
Joined  
Added by SF 3 weeks ago  

Keneth W Moorfield  
THE STOREMAN  
Joined  
Added by SF 3 weeks ago  
Mick Georgious  
Software Engineer  
Joined  
Added by SF 3 weeks ago

Best Answer

Try:

awk '/./{ printf "%s%s", $0, (/Added/?"\n":",") }' data

Using your sample input data:

$ awk '/./{printf "%s%s",$0,(/Added/?"\n":",")}' data
Peter Green,Space Monkey at Area 51,Joined,Added by SF 3 weeks ago
Will Rossiter,Joined,Added by SF 3 weeks ago
Dean Matthews,Guitarist at Blues,Joined,Added by SF 3 weeks ago
Hobbit Mak,Farnborough, United Kingdom,Joined,Added by SF 3 weeks ago
Keneth W Moorfield,THE STOREMAN,Joined,Added by SF 3 weeks ago
Mick Georgious,Software Engineer,Joined,Added by SF 3 weeks ago

How it works:

  • /./{...}

    This performs the commands in curly braces only if the line contains a character. In other words, this ignores blank lines.

  • printf "%s%s",$0,(/Added/?"\n":",")

    This prints the line, denoted $0, followed by either a comma or a newline depending on whether the line matches the regex Added.

Related Question