Shell – Bash – pair each line of file

shell-scripttext processing

This question is strongly related to this and this question. I have a file that contains several lines where each line is a path to a file. Now I want to pair each line with each different line (not itself). Also a pair A B is equal to a B A pair for my purposes, so only one of these combinations should be produced.

Example

files.dat reads like this in a shorthand notation, each letter is a file path (absolute or relative)

a
b
c
d
e

Then my result should look something like this:

a b
a c
a d
a e
b c
b d
b e
c d
c e
d e

Preferrably I would like to solve this in bash. Unlike the other questions, my file list is rather small (about 200 lines), so using loops and RAM capacity
pose no problems.

Best Answer

Use this command:

awk '{ name[$1]++ }
    END { PROCINFO["sorted_in"] = "@ind_str_asc"
        for (v1 in name) for (v2 in name) if (v1 < v2) print v1, v2 }
        ' files.dat

PROCINFO may be a gawk extension. If your awk doesn’t support it, just leave out the PROCINFO["sorted_in"] = "@ind_str_asc" line and pipe the output into sort (if you want the output sorted).

(This does not require the input to be sorted.)

Related Solutions

Merge two files line by line with the delimiter triple pipe symbol “|||”

With POSIX paste:

:|paste -d ' ||| ' fileA - - - - fileB

paste will concatenate corresponding lines of all input files. Here we have six files, fileA, four dummy files from standard in -, and fileB.

The list of delimiters include a space, three pipe and a space in that order will be used by paste circularly.

For the first line of six files, fileA will be concatenated with the first dummy file (which is nothing, thank to the no-op : operator), produce line1-fileA<space>.

The first dummy file will be concatenated with the second by a pipe, produce line1-fileA |, then the second dummy file with the third dummy file, produce line1-fileA ||, the third dummy file with the the forth dummy file, produce line1-fileA |||.

And the forth dummy file with fileB, produce line1-fileA ||| line1-fileB.

Those step will be repeated for all lines, give you the expected result.

The use of :| is for less-typing, and mainly use in interactive shell. In a script, you should use:

</dev/null paste -d ' ||| ' fileA - - - - fileB

to prevent a subshell from being spawned.

How to add 10 lines from a file (file2) to another one after 2 lines (file1)

You could use ed!

ed -s file1 <<< $'2r !head -10 file2\nw\nq'

This tells ed to edit file1 with three commands:

on line 2, read in the output of the command head -10 file2 and insert it
write the file out
quit ed

With GNU sed (using the e extension, which pipes input from a shell command):

sed -i '3e head -10 file2' file1

Extended solution, to iterate through file2

The script below is a for loop that repeats the ed idea as many times as there are transcr_ blocks in file1. Each time through the loop, we calculate three items:

the line number for ed to start reading from file1
the line number for sed to start reading from file2
the line number for sed to stop reading from file2

Item #1 is spelled out more clearly as: 10*(N-1) + 2*N, which I reduced to 12*N - 10.

Items #2 and #3 are spelled out more clearly as 10*(N-1) + 1 through 10*N, which I reduced to 10*N - 9 through 10*N.

I replaced the head command with the more flexible & powerful sed command for picking out blocks of lines from file2.

This will rewrite file1 times times as it goes through the loop.

# how many times we need to insert blocks
times=$(grep -c transcr_ file1)
for((index=1;index <= times; index++));
do
  printf "%dr !sed -n %d,%dp file2\nw\nq\n" $((12 * index - 10)) $((10 * index - 9)) $(( 10 * index ))  |
    ed -s file1
done

Best Answer

Related Solutions

Merge two files line by line with the delimiter triple pipe symbol “|||”

How to add 10 lines from a file (file2) to another one after 2 lines (file1)

Extended solution, to iterate through file2

Related Question