Shell – Is it a bug for join with -t\t

quotingshell

I have a problem with the command join. "The default join field is the first, delimited by whitespace" (Cited from join --help). However, there is a field containing sentences in my tab-delimted files. Thus, I want to join the two files using -t\t (I also tried -t "\t" which reported errors under Cygwin, but not under CentOS). Unexpectedly, the command outputted the fields in two consecutive lines. I have processed the two files with dos2unix and sort.

The example of output is as follows. The 1st and 3rd lines are from file1, and the 2nd and 4th lines are from file2. The 1st and 2nd lines should appear in the same line. However, if -t\t is used, they appear in two consecutive lines (as below); if no -t, they appear in the same line.

LM00089 0.6281  0       Q27888  L-lactate dehydrogenase
LM00089 gi|2497622|sp|Q27888|LDH_CAEEL  0.6281  0.422
LM00136 0.3219  0.376741        O62619  Pyruvate kinase
LM00136 gi|27923979|sp|O62619|KPYK_DROME        0.3219  0.111

I want to know whether it is a bug or I made some mistakes.

Best Answer

-t \t passes t as the separator: an unquoted backslash always takes the next character literally (except when the next character is a newline). -t "\t" passes \t as the separator, different versions of join may behave differently when you pass multiple characters.

To pass a tab, from bash, use -t $'\t'. The $'…' syntax mimics the feature of C and many other languages where \ followed by letters designate control characters, and \ can be followed by octal digits.

Another way is to put a literal tab in your script (between single or double quotes). This isn't very readable.

If you need portability to all POSIX shells such as dash, use

tab=$(printf '\t')
join -t "$tab" …

or directly join -t "$(printf '\t')" ….

Related Question