grepping, awking, sedding, and piping are day-to-day routine of a user of any Unix-like operating system, may it be on the command line or inside a shell script (collectively called filters from now on).
At their essence, when working with "standard" Unix CLI programs and shell builtins (collectively called commands from now on), filters need a precise expected format for stdin, stdout, and stderr in each filter step in order to work correctly. I call this precise expected format of some command an API of this command in the following.
As someone with web development background, I compare this kind of data collecting and data processing technically with web scraping – a technique which is very instable whenever there is the slightest change in data presentation.
My question now relates to the stability of Unix command APIs.
- Do commands in a Unix-like operating systems adhere to a formal standardization with respect to their input and output?
- Have there been instances in history where updates to some important command caused to break the functionality of some filter that was built using an older version of said command?
- Have Unix commands matured over time that it is absolutely impossible to change in such a way that some filter could break?
- In case filters may break from time to time due to changing command APIs, how can I as a developer protect my filters against this problem?
Best Answer
The POSIX 2008 standard has a section describing "Shell and Utilities". Generally, if you stick to that your scripts should be fairly future-proof, except possibly for deprecations, but those hardly happen overnight so you should have plenty of time to update your scripts.
In some cases where output format for a single utility varies widely across platforms and versions, the POSIX standard may include an option typically called
-p
or-P
which specifies a guaranteed and predictable output format. An example of this is thetime
utility, which has widely varying implementations. If you need a stable API/output format, you would usetime -p
.If you need to use a filter utility that is not covered by the POSIX standard, then you are pretty much at the mercy of the distribution packagers / upstream developers, just as you are at the mercy of the remote web developers when doing web scraping.