Ubuntu – Text processing Aptly output file

awksedtext processing

I have a text file made from the output of the repository management tool aptly, which lists my published repositories, from which I need to extract information.

The file format is as follows:

Published repositories:
 * test_repo_one/xenial [i386,amd64] publishes {main: [xenial-main_20190311]: Snapshot from mirror [xenial-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {multiverse: [xenial-multiverse_20190311]: Snapshot from mirror [xenial-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {restricted: [xenial-restricted_20190311]: Snapshot from mirror [xenial-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}, {universe: [xenial-universe_20190311]: Snapshot from mirror [xenial-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial [src]}
 * test_repo_one/xenial-security [i386,amd64] publishes {main: [xenial-security-main_20190311]: Snapshot from mirror [xenial-security-main]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {multiverse: [xenial-security-multiverse_20190311]: Snapshot from mirror [xenial-security-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {restricted: [xenial-security-restricted_20190311]: Snapshot from mirror [xenial-security-restricted]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}, {universe: [xenial-security-universe_20190311]: Snapshot from mirror [xenial-security-universe]: http//gb.archive.ubuntu.com/ubuntu/ xenial-security[src]}
 * test_repo_two/trusty [i386,amd64] publishes {main: [trusty-main_20190312]: Snapshot from mirror [trusty-main]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {multiverse: [trusty-multiverse_20190312]: Snapshot from mirror [trusty-multiverse]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {restricted: [trusty-restricted_20190312]: Snapshot from mirror [trusty-restricted]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}, {universe: [trusty-universe_20190312]: Snapshot from mirror [trusty-universe]: http//gb.archive.ubuntu.com/ubuntu/ trusty[src]}
...

The last line of the output ends in a new line.

The "Published repositories:" line is not required.

For each of the lines starting ' *' I need to remove extraneous information, leaving only snapshot names. There is no way to do this in aptly. The desired output for the first of these lines is.

test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]

The square brackets are not essential either so a solution that retains or removes these is fine. I'd prefer a sed or awk solution but anything that works would be highly appreciated.

Best Answer

A Perl approach:

$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*\[(.+?)\]/g); print "$n @k"' file 
test_repo_one/xenial xenial-main_20190311 xenial-multiverse_20190311 xenial-restricted_20190311 xenial-universe_20190311
test_repo_one/xenial-security xenial-security-main_20190311 xenial-security-multiverse_20190311 xenial-security-restricted_20190311 xenial-security-universe_20190311
test_repo_two/trusty trusty-main_20190312 trusty-multiverse_20190312 trusty-restricted_20190312 trusty-universe_20190312

Explanation

  • perl -lne: read the input file line by line (-n), remove trailing newlines (-l) and run the script given by -e on each line. The -l also adds a newline to each print call.
  • next unless /^\s*\*\s*(\S+)/; : find the name of the repo, so the first stretch of non-whitespace characters (\S+) on a line that starts with 0 or more whitespace characters (^\s*), then a * (\*), and 0 or more whitespace characters again. The longest stretch of non-whitespace after that is what we want. If this line doesn't match this regex, the next will move us onto the next line.
  • $n=$1 : save what was captured by the match above (the (\S+) in parentheses, $1) as $n.
  • @k=(/\{.+?:\s*\[(.+?)\]/g): find all cases where we have a {, any other characters and then a :, followed by whitespace and a [ and capture anything between the [ and the ]. Save all matching strings in the array @k.
  • print "$n @k" : finally, print the name of the repo, the $n, and the array @k from above.

If you prefer to have the square brackets included, you can use:

$ perl -lne 'next unless /^\s*\*\s*(\S+)/; $n=$1; @k=(/\{.+?:\s*(\[.+?\])/g); print "$n @k"' file 
test_repo_one/xenial [xenial-main_20190311] [xenial-multiverse_20190311] [xenial-restricted_20190311] [xenial-universe_20190311]
test_repo_one/xenial-security [xenial-security-main_20190311] [xenial-security-multiverse_20190311] [xenial-security-restricted_20190311] [xenial-security-universe_20190311]
test_repo_two/trusty [trusty-main_20190312] [trusty-multiverse_20190312] [trusty-restricted_20190312] [trusty-universe_20190312]