I'm building a resource that references man pages, and I'm wondering if anyone knows of a way to access man pages as structured data? My current approach is to do a lot if REGEXing, but this is tedious and prone to errors.
I'm not an expert on *nix, but what I understand about man pages is that they are basically text files with a particular syntax that is parsable by the man
command. This makes me a little skeptical that there might be an easy way to, say, access a list of the options or flags. But maybe there's a way to do it that I don't know.
Best Answer
You might peek at how the fish shell builds its completions from the man pages in particular how
__fish_complete_man
works. An easier option assuminggroff
might be to emit HTML and then use one of the multitude of HTML parsers out there to get what you want:That's a man page rendered as HTML and then selected on using XPath to obtain the list of flags in the SYNOPSIS section; using CSS selectors might be more hip these days. However, the HTML generated is not very structured.