Shell – Why are POSIX mandatory utilities not built into the shell

posixshellshell-builtinshell-scriptUtilities

The purpose of this question is to answer a curiosity, not to solve a particular computing problem. The question is: Why are POSIX mandatory utilities not commonly built into shell implementations?

For example, I have a script that basically reads a few small text files and checks that they are properly formatted, but it takes 27 seconds to run, on my machine, due to a significant amount of string manipulation. This string manipulation makes thousands of new processes by calling various utilities, hence the slowness. I am pretty confident that if some of the utilities were built in, namely grep, sed, cut, tr, and expr, then the script would run in a second or less (based on my experience in C).

It seems there would be a lot of situations where building these utilities in would make the difference between whether or not a solution in shell script has acceptable performance.

Obviously, there is a reason it was chosen not to make these utilities built in. Maybe having one version of a utility at a system level avoids having multiple unequal versions of that utility being used by various shells. I really can't think of many other reasons to keep the overhead of creating so many new processes, and POSIX defines enough about the utilities that it does not seem like much of a problem to have different implementations, so long as they are each POSIX compliant. At least not as big a problem as the inefficiency of having so many processes.

Best Answer

Shell scripts are not expected to run with that type of speed. If you want to improve the speed of your script, try it in perl. If that is still too slow, then you'll have to move to a statically typed language such as java or c, or write a C module for perl that runs the parts which are too slow.

Shell is the first level of prototyping, if you can prove the concept with shell, then move to a better scripting language which can do more bounds checking which would take acres of shell.

A Unix OS is expected to include many small programs which do well defined tasks which make up a larger picture. This is a good thing as it compartmentalises bigger programs. Take a look at qmail, for example and compare that with sendmail. qmail is made of many programs:

http://www.nrg4u.com/qmail/the-big-qmail-picture-103-p1.gif

Exploiting the network daemon would not help you exploit the queue manager.