If you have column(1)
, an old BSD tool, try column -t
, for pretty-printing tables.
To ensure empty cells are displayed, you could try the approach of inserting a single space in each empty cell (recognizable by two consecutive tabs). The idea is column(1)
should give the space character its own column but being a single character in width it should not affect the table dimensions or be visible in the output to humans.
generate_tsv |
awk '/\t\t/ { for (i = 0; i < 2; i++) gsub(/\t\t/, "\t \t") } 1' |
column -t -s $'\t'
The extra awk
inserted in the pipeline does the inserting of spaces into each empty cell, as described. 2 passes are necessary to handle 2 consecutive empty cells (\t\t\t
).
This will give you all lines whose 9th column matches MEF2
:
awk -F"\t" '$9~/MEF2/' file > output
Assuming your file is always tab-delimited, this will work and you can rest safely.
That's as close to 0 margin of error as you'll ever get.
If, however, you have tried importing into something like R (presumably using read.table("file",sep="\t")
) and that didn't work, you might have some lines with different numbers of fields (see the end for how to check that). If so, assuming you are always interested in the last field, you can use $(NF)
in awk
to print the last field, no matter how many fields there are:
awk -F"\t" '$(NF)~/MEF2/' file > output
If you still feel the need to check, you can simply extract all lines that match MEF2
, irrespective of where the match is, then compare the results:
grep MEF2 file > output2
Once you have that, you can use wc
to check if they have the same number of lines. If they don't, find where they differ by running
grep -vFf output output2
That command will print any lines in output2 that are not present in output1. If there are any, most probably they will have MEF2
somewhere in the line but not in the 9th field. If it's in the 9th field, then you know that your file is not tab separated and there's something wrong with your data.
The awk
above is probably the simplest solution but here are a few others that do the same thing:
Perl
perl -F"\t" -lane '$F[8]=~/MEF2/ && print' file
sed
(this one might match wrong lines if you have more than 9 fields)
sed -n '/\t.*\t.*\t.*\t.*\t.*\t.*\t.*\t.*MEF2.*/p' file
grep
grep -P '^.+?\t.*\t.*\t.*\t.*\t.*\t.*\t.*\t.*MEF2.*' file
If these don't all produce the same output, you know there's an issue with your file. One more thing you can check is to make sure that all your lines have 9 fields. If they don't, you know there's an issue:
awk -F"\t" 'NF!=9' file
The above will print all lines that don't have exactly 9 tab-separated fields. If there is output, the lines it prints are problematic.
Best Answer
If your fields can never contain whitespace, an empty field means either a tab as a first character (
^\t
), a tab as the last character (\t$
) or two consecutive tabs (\t\t
). You could therefore filter out lines containing any of those:If you can have whitespace, things get more complex. If your fields can begin with spaces, use this instead (it considers a field with only spaces to be empty):
The change filters out lines matching a tab followed by 0 or more spaces and then either another tab or the end of the line.
That will also fail if the last field contains nothing but spaces. To avoid that too, use
perl
with the-F
and-a
options to split input into the@F
array, telling it to print unless one of the fields is empty (/^$/
):