Create a JSON file of all installed dpkg software

dpkgjsonpackage-managementtext processing

I'm trying to collect all packages installed through dpkg in a JSON file.

I tried this script:

echo [ > installed_packages.json
dpkg-query -W -f '{"name":"${binary:Package}","version":"${Version}","short_description":"${binary:Summary}","description":"${Description}","author":"${Maintainer}","location":"${Filename}","status":{"want":"${db:Status-Want}","status":"${db:Status-Status}","eflag":"${db:Status-Eflag}"},"dependencies":"${Depends}","tags":"${Depends}"},\n' >> installed_packages.json
echo ] >> installed_packages.json

Though I quickly noticed that the placeholders are unescaped and some fields (like the dependencies) would need some processing to be really useful.

So I was thinking about just getting a simple list with the command dpkg-query -W -f '${binary:Package}' and iterating over it and process every field individually. Though I'm worried that is going to take a severe hit on the performance having 10ish dpkg-query calls per package.

So how could I achieve that in a as portable as possible way? (The script will end up as a part of a monitoring tool on many different machines. Support for other package managers will follow).

EDIT:

Since the placeholders seem to be designed to be RFC 822 compliant (and other software, like apt-cache show <package>, produces RFC 822 compliant output anyways), I think a sh solution to turn RFC 822 into JSON would be an amazing solution.

EDIT 2:

I just noticed that that would be nice, but sadly doesn't make processing individual values easier.

So like RFC 822 to properly escaped variables or something like that would what could make everything work.

EDIT 3:

Repediately calling dpkg-query absolutely kills the performance. Having a single call, makes the script run in far less than a second. Running it once for every packages makes the script take well above 30s with 100% CPU. That is not acceptable…

Best Answer

A few months ago I wrote a simple ruby script to do a very similar job for our monitoring tool. I only needed the name and the version. I added the short_description and author. Any other fields may require more processing. It's a starting point for something you can build out if you wish.

#!/usr/bin/env ruby

require 'open3'
# json is only necessary for the pretty_generate at end, remove if not needed
require 'json'

allpkgs = {}
# Edit this command to serve your own purposes
cmd = ("dpkg-query -W -f='${binary:Package};${Version};${binary:Summary};${Maintainer}\n'")

dpkgout, stderr, status = Open3.capture3(cmd)
dpkgout.split("\n").each do |line|
  pkginfo = line.split(';')
  allpkgs[pkginfo[0]] = { 'version': pkginfo[1], 'short_description': pkginfo[2], 'author': pkginfo[3] }
end

# pretty JSON print, otherwise use 'puts allpkgs'
puts JSON.pretty_generate(allpkgs)
Related Question