Find text between tab (\t) as a delimiter

awksedtext processing

I thought this will be simple, but can't find out how to do it.

Scenario

I have a single .csv file with id_user,text,id_group columns where each column is delimited by tabs such like:

"123456789"        "Here's the field of the text, also contains comma"        "10"
"987456321"        "Here's the field of the text, also contains comma"        "10"
"123654789"        "Here's the field of the text, also contains comma"        "11"
"987456123"        "Here's the field of the text, also contains comma"        "11"

How to find the the text?

Attempt

awk

I was looking for a way to specify the print $n delimiter, if I could do it an option will be

$ awk -d '\t' '{print $2}' file.csv | sed -e 's/"//gp'

where -d is the delimiter for the print option and the sed to take out the "

Best Answer

TAB delimiter

cut

You do not need sed or awk, a simple cut will do:

cut -f2 infile

awk

If you want to use awk, the way to supply the delimiter is either through the -F argument or as a FS= postfix:

awk -F '\t' '{ print $2 }' infile

Or:

awk '{ print $2 }' FS='\t' infile

Output in all cases:

"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"
"Here's the field of the text, also contains comma"

Quote delimiter

If the double-quotes in the file are consistent, i.e. no embedded double-quotes in fields, you could use them as the delimiter and avoid having them in the output, e.g.:

cut

cut -d\" -f4 infile

awk

awk -F\" '{ print $4 }' infile

Output in both cases:

Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma
Here's the field of the text, also contains comma

Related Solutions

Shell – How to use a shell command to only show the first column and last column in a text file

Almost there. Just put both column references next to each other.

cat logfile | sed 's/|/ /' | awk '{print $1, $8}'

Also note that you don't need cat here.

sed 's/|/ /' logfile | awk '{print $1, $8}'

Also note you can tell awk that the column separators is |, instead of blanks, so you don't need sed either.

awk -F '|' '{print $1, $8}' logfile

As per suggestions by Caleb, if you want a solution that still outputs the last field, even if there are not exactly eight, you can use $NF.

awk -F '|' '{print $1, $NF}' logfile

Also, if you want the output to retain the | separators, instead of using a space, you can specify the output field separators. Unfortunately, it's a bit more clumsy than just using the -F flag, but here are three approaches.

You can assign the input and output field separators in awk itself, in the BEGIN block.
```
awk 'BEGIN {FS = OFS = "|"} {print $1, $8}' logfile
```
You can assign these variables when calling awk from the command line, via the -v flag.
```
awk -v 'FS=|' -v 'OFS=|' '{print $1, $8}' logfile
```
or simply:
```
awk -F '|' '{print $1 "|" $8}' logfile
```

AWK – How to Replace Content of Specific Column in Tab Delimited File

You need to set Output Field Separator, to tab \t:

One way to do it is with -v option:

awk -vOFS='\t' '{$3 = "AD"; print}' file

another possibility inside awk, say in the BEGIN block:

awk 'BEGIN{OFS="\t"}{$3 = "AD"; print}' file

If you don't set OFS, then awk by default uses single space as a field separator.