I have a table that looks like this:
name something
1 100036498|F|0--20:T>G something
2 100036501|F|0--44:C>T something
3 100036501|F|0-44:C>T-44:C>T something
4 100036508|F|0--66:T>G something
5 100036508|F|0-66:T>G-66:T>G something
6 100036511|F|0-19:G>A-19:G>A something
7 100036516|F|0--15:T>G something
8 100036516|F|0-15:T>G-15:T>G something
... ....
I added the line numbers to make more easy to follow my question. There are some pairs of rows that begin with the same number like rows 2 and 3, 4 and 5, 7 and 8. There are also rows that hare unique like rows 1 and 6. I would like to conserve only rows that have a pair or in other words eliminate lines that do not have a pair to have a table like this one:
name something
2 100036501|F|0--44:C>T something
3 100036501|F|0-44:C>T-44:C>T something
4 100036508|F|0--66:T>G something
5 100036508|F|0-66:T>G-66:T>G something
7 100036516|F|0--15:T>G something
8 100036516|F|0-15:T>G-15:T>G something
... ....
I want something like the opposite of the linux command uniq taking in to account only the numbers of the first column not the rest after simbole |.
Do you know how to do it?
Below is the same first table with the columns separated by one space and without header to make it more easy to copy.
100036498|F|0--20:T>G something
100036501|F|0--44:C>T something
100036501|F|0-44:C>T-44:C>T something
100036508|F|0--66:T>G something
100036508|F|0-66:T>G-66:T>G something
100036511|F|0-19:G>A-19:G>A something
100036516|F|0--15:T>G something
100036516|F|0-15:T>G-15:T>G something
Best Answer
this is an
awk
solution, which it's keeping the lines where those are repeated more than once, if you want those only repeated two times exactly, change>1
to==2