I have a text file as (for example):
apple V$NFY_Q6_rc V=0.741
apple V$HOXA7_01_rc V=0.454
apple V$ALPHACP1_01_rc V=0.695
orange V$SP4_Q5 V=0.747
grapes V$SP1_Q2_01 V=0.677
grapes V$SP1_Q6_01_rc V=0.685
grapes V$SP1_Q6_rc V=0.884
I want to extract the rows that have the highest V value (in column 3) for each fruit. My output should look like:
apple V$NFY_Q6_rc V=0.741
orange V$SP4_Q5 V=0.747
grapes V$SP1_Q6_rc V=0.884
I have several files like this in different subdirectories.
After subsetting, I would use the following code to get the number of occurences of a particular string in column 2.
perl -lanE '$str=$F[1]; $f="/home/$str/list/$str.txt"; $c=`grep -c "$str" "$f"`;chomp($c);$x=0;$x++ if $c;say "$str\t$x\t$c"' file2
this would result in the following output. Let's assume I am searching for the string "SP4" in column 2 of this file:
X X in file? number of times it occurs
NFA 0 0
SP4 1 2
NFATC1 0 0
I want the V value (from above to be included along with this table)
X X in file? number of times it occurs V value
NFA 0 0
SP4 1 2 0.747
NFATC1 0 0
P.S.: For further details on the perl program in between, please refer to this link:
http://stackoverflow.com/questions/23109490/search-for-occurrence-of-a-string-in-another-file-in-a-particular-column
Best Answer
If order is not important, two passes of
sort
should do the trick. In the first pass, sort by field 1 followed by reverse-sorting by the numeric portion of field 3 (starting position 4). The-b
modifier causes leading blanks to be ignored. Pipe this to a secondsort
that returns one record for every unique value in field 1, but this time specify the stable sort(-s
) modifier to ensure that the record with the highest value in field 3 (which had bubbled up to the top for each value in field 1 courtesy the previous sort) is returned