Find text between tab (\t) as a delimiter

awksedtext processing

I thought this will be simple, but can't find out how to do it.

Scenario

I have a single .csv file with id_user,text,id_group columns where each column is delimited by tabs such like:

"123456789"        "Here's the field of the text, also contains comma"        "10"
"987456321"        "Here's the field of the text, also contains comma"        "10"
"123654789"        "Here's the field of the text, also contains comma"        "11"
"987456123"        "Here's the field of the text, also contains comma"        "11"

How to find the the text?

Attempt

awk

I was looking for a way to specify the print $n delimiter, if I could do it an option will be

$ awk -d '\t' '{print $2}' file.csv | sed -e 's/"//gp'

where -d is the delimiter for the print option and the sed to take out the "

Best Answer

TAB delimiter

cut

You do not need sed or awk, a simple cut will do:

cut -f2 infile

awk

If you want to use awk, the way to supply the delimiter is either through the -F argument or as a FS= postfix:

awk -F '\t' '{ print $2 }' infile

Or:

awk '{ print $2 }' FS='\t' infile

Output in all cases:

"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"

Quote delimiter

If the double-quotes in the file are consistent, i.e. no embedded double-quotes in fields, you could use them as the delimiter and avoid having them in the output, e.g.:

cut

cut -d\" -f4 infile

awk

awk -F\" '{ print $4 }' infile

Output in both cases:

Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Related Question