Shell – How stable are Unix shell “stdin/stdout APIs”

apicommand linefiltershellstandard

grepping, awking, sedding, and piping are day-to-day routine of a user of any Unix-like operating system, may it be on the command line or inside a shell script (collectively called filters from now on).

At their essence, when working with "standard" Unix CLI programs and shell builtins (collectively called commands from now on), filters need a precise expected format for stdin, stdout, and stderr in each filter step in order to work correctly. I call this precise expected format of some command an API of this command in the following.

As someone with web development background, I compare this kind of data collecting and data processing technically with web scraping – a technique which is very instable whenever there is the slightest change in data presentation.

My question now relates to the stability of Unix command APIs.

  1. Do commands in a Unix-like operating systems adhere to a formal standardization with respect to their input and output?
  2. Have there been instances in history where updates to some important command caused to break the functionality of some filter that was built using an older version of said command?
  3. Have Unix commands matured over time that it is absolutely impossible to change in such a way that some filter could break?
  4. In case filters may break from time to time due to changing command APIs, how can I as a developer protect my filters against this problem?

Best Answer

The POSIX 2008 standard has a section describing "Shell and Utilities". Generally, if you stick to that your scripts should be fairly future-proof, except possibly for deprecations, but those hardly happen overnight so you should have plenty of time to update your scripts.

In some cases where output format for a single utility varies widely across platforms and versions, the POSIX standard may include an option typically called -p or -P which specifies a guaranteed and predictable output format. An example of this is the time utility, which has widely varying implementations. If you need a stable API/output format, you would use time -p.

If you need to use a filter utility that is not covered by the POSIX standard, then you are pretty much at the mercy of the distribution packagers / upstream developers, just as you are at the mercy of the remote web developers when doing web scraping.

Related Question