Ubuntu – Text processing Aptly output file

awksedtext processing

I have a text file made from the output of the repository management tool aptly, which lists my published repositories, from which I need to extract information.

The file format is as follows:

Published repositories:
 * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
 * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
 * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...

The last line of the output ends in a new line.

The "Published repositories:" line is not required.

For each of the lines starting ' *' I need to remove extraneous information, leaving only snapshot names. There is no way to do this in aptly. The desired output for the first of these lines is.

test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]

The square brackets are not essential either so a solution that retains or removes these is fine. I'd prefer a sed or awk solution but anything that works would be highly appreciated.

Best Answer

A Perl approach:

$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*\[(.+?)\]/g); print "$n @k"' file 
test_repo_one/xenial xenial-main_20190311 xenial-multiverse_20190311 xenial-restricted_20190311 xenial-universe_20190311
test_repo_one/xenial-security xenial-security-main_20190311 xenial-security-multiverse_20190311 xenial-security-restricted_20190311 xenial-security-universe_20190311
test_repo_two/trusty trusty-main_20190312 trusty-multiverse_20190312 trusty-restricted_20190312 trusty-universe_20190312

Explanation

perl -lne: read the input file line by line (-n), remove trailing newlines (-l) and run the script given by -e on each line. The -l also adds a newline to each print call.
next unless /^\s*\*\s*(\S+)/; : find the name of the repo, so the first stretch of non-whitespace characters (\S+) on a line that starts with 0 or more whitespace characters (^\s*), then a * (\*), and 0 or more whitespace characters again. The longest stretch of non-whitespace after that is what we want. If this line doesn't match this regex, the next will move us onto the next line.
$n=$1 : save what was captured by the match above (the (\S+) in parentheses, $1) as $n.
@k=(/\{.+?:\s*\[(.+?)\]/g): find all cases where we have a {, any other characters and then a :, followed by whitespace and a [ and capture anything between the [ and the ]. Save all matching strings in the array @k.
print "$n @k" : finally, print the name of the repo, the $n, and the array @k from above.

If you prefer to have the square brackets included, you can use:

$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*(\[.+?\])/g); print "$n @k"' file 
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]

Related Solutions

Ubuntu – Print text file every three line start at 2nd line

Something like this:

awk 'NR % 3 == 2'

Test

sh-3.2$ more test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Result

sh-3.2$ awk 'NR % 3 == 2' < test
2
5
8
11
14

Ubuntu – Parsing a file using text processing tools

With awk. The command below checks every entry in every line and writes in different files, in my example out1 and out2. If there is a newline in the input file, also a newline will be written in the output file.

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

Example

The input file

cat foo

1140.271257 0.002288454025 0.002763420728 0.004142512599 0 0 0 0 0 0 0 0 0 0 0 
1479.704769 0.00146621631 0.003190634646 0.003672029231 0 0 0 0 0 0 0 0 0 0 0 
1663.276205 0.003379552854 0.04643209167 0.0539399155 0 0 0 0 0 0 0 0 0 0 0 0

The command

awk '{for(i=1;i<=NF;i++) {if($i!=0) {printf "%s ",$i > "out1"} else {printf "%s ",$i > "out2"}; if (i==NF) {printf "\n" > "out1"; printf "\n" > "out2"} }}' foo

The output files

cat out1

1140.271257 0.002288454025 0.002763420728 0.004142512599 
1479.704769 0.00146621631 0.003190634646 0.003672029231 
1663.276205 0.003379552854 0.04643209167 0.0539399155

cat out2

0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

Best Answer

Explanation

Related Solutions

Ubuntu – Print text file every three line start at 2nd line

Ubuntu – Parsing a file using text processing tools

Related Question