Lum – `cut`: selecting columns containing a string

columnscutregular expressionterminaltext processing

I have a big file with several columns on each line. I'm familiar with using cut -f -d to select specific columns by their number.

I checked the manual for cut and it doesn't seem that there's a way to regex match columns.

What I want to do specifically is:

  • select the 2nd column of every line
  • and also select all columns that contain the string "hello" (there may be none, if not it could be any column(s) and not the same column(s) for each line)

What's the most convenient terminal tools for this operation?

EDIT:

Simplified example

x ID23 a b c hello1
x ID47 hello2 a b c
x ID49 hello3 a b hello4
x ID53 a b c d

The result I would want is:

ID23 hello1
ID47 hello2
ID49 hello3 hello4

or alternatively:

ID23 hello1
ID47 hello2
ID49 hello3 hello4
ID53

To elaborate the example given:

  • Columns are defined by one space
  • whether or not "only print if the string is present" is not really important, I can just grep for "hello" if necessary
  • we can assume the string "hello" will never be in column 1 or 2.

Best Answer

If one space at the end of the line doesn't hurt you much:

$ awk '{for(i=1;i<=NF;i++) if(i==2 || $i~"hello") printf $i" ";print ""}' file

ID23 hello1 
ID47 hello2 
ID49 hello3 hello4 
ID53 

This doesn't assume anything about the position of the "hello" string.

Related Question