I looked up a Unix book, the man and the wikipedia page for tr
but could not find a reason why it was designed/implemented in such way that it does not read from file but strictly only from the standard input. For instance, tools such as wc
, grep
, sed
, and awk
all will happily read input from file if provided or from standard input. Was/Is there a compelling reason to design tr
this way?
Why does the tr command not read from file
historytr
Related Solutions
Based on the error message that you get, I don't think /dev/urandom is the problem. If it were, I'd expect an error like "no such file or directory".
I searched for the error message you got and found this, which seems like it might be relevant to your issue: http://nerdbynature.de/s9y/2010/04/11/tr-Illegal-byte-sequence
Basically, specify the locale by prepending the tr
command with LC_CTYPE=C
:
LC_CTYPE=C tr -dc A-Za-z0-9_\!\@\#\$\%\^\&\*\(\)-+= < /dev/urandom | head -c 32 | xargs
The difference is only in the wording of the padding behavior in the V4-V5 manual - but the behavior is the same throughout. As it stands the results of the V5 implementation is
identical to that of the System V one, which is itself identical to the GNU tr
behavior with the --truncate-set1
option. Furthermore, "truncating set1 to the lenght of set2" gives the same result as "padding string2 with corresponding characters from string1". It means the same thing in practice. Let's demonstrate this.
First, you need not be a developer to try to compile this. Compare the source code with the almost identical PWB/Unix version. You will see the only difference being reliance on the "modern" stdio.h assets basically, so I've stripped the source of its references to inbuf
, fout
, dup
and flush
and replaced it with what PWB/Unix does - but this in no way should alter the behavior as the algorithms remain untouched. I've annotated the trivial changes I've made from the original:
#include <stdio.h> <------ added
int dflag = 0; <------ added "=" sign to those
int sflag = 0;
int cflag = 0;
int save = 0;
char code[256];
char squeez[256];
char vect[256];
struct string { int last, max, rep; char *p; } string1, string2;
FILE *input; <------ part of the stdio framework I guess;
main(argc,argv)
char **argv;
{
int i, j;
int c, d;
char *compl;
string1.last = string2.last = 0;
string1.max = string2.max = 0;
string1.rep = string2.rep = 0;
string1.p = string2.p = "";
if(--argc>0) {
argv++;
if(*argv[0]=='-'&&argv[0][1]!=0) {
while(*++argv[0])
switch(*argv[0]) {
case 'c':
cflag++;
continue;
case 'd':
dflag++;
continue;
case 's':
sflag++;
continue;
}
argc--;
argv++;
}
}
if(argc>0) string1.p = argv[0];
if(argc>1) string2.p = argv[1];
for(i=0; i<256; i++)
code[i] = vect[i] = 0;
if(cflag) {
while(c = next(&string1))
vect[c&0377] = 1;
j = 0;
for(i=1; i<256; i++)
if(vect[i]==0) vect[j++] = i;
vect[j] = 0;
compl = vect;
}
for(i=0; i<256; i++)
squeez[i] = 0;
for(;;){
if(cflag) c = *compl++;
else c = next(&string1);
if(c==0) break;
d = next(&string2);
if(d==0) d = c;
code[c&0377] = d;
squeez[d&0377] = 1;
}
while(d = next(&string2))
squeez[d&0377] = 1;
squeez[0] = 1;
for(i=0;i<256;i++) {
if(code[i]==0) code[i] = i;
else if(dflag) code[i] = 0;
}
input = stdin; <------ again stdio
while((c=getc(input)) != EOF ) { <------
if(c == 0) continue;
if(c = code[c&0377]&0377)
if(!sflag || c!=save || !squeez[c&0377])
putchar(save = c);
}
}
next(s)
struct string *s;
{
int a, b, c, n;
int base;
if(--s->rep > 0) return(s->last);
if(s->last < s->max) return(++s->last);
if(*s->p=='[') {
nextc(s);
s->last = a = nextc(s);
s->max = 0;
switch(nextc(s)) {
case '-':
b = nextc(s);
if(b<a || *s->p++!=']')
goto error;
s->max = b;
return(a);
case '*':
base = (*s->p=='0')?8:10;
n = 0;
while((c = *s->p)>='0' && c<'0'+base) {
n = base*n + c - '0';
s->p++;
}
if(*s->p++!=']') goto error;
if(n==0) n = 1000;
s->rep = n;
return(a);
default:
error:
write(1,"Bad string\n",11);
exit(0); <------original was exit();
}
}
return(nextc(s));
}
nextc(s)
struct string *s;
{
int c, i, n;
c = *s->p++;
if(c=='\\') {
i = n = 0;
while(i<3 && (c = *s->p)>='0' && c<='7') {
n = n*8 + c - '0';
i++;
s->p++;
}
if(i>0) c = n;
else c = *s->p++;
}
if(c==0) *--s->p = 0;
return(c&0377);
}
So cc tr.c
compiles:
tr.c: In function ‘next’:
tr.c:118:4: warning: incompatible implicit declaration of built-in function ‘exit’
[enabled by default]
exit(0);
^
But a.out is there and works, so let's now compare the padding behavior of the two programs we have:
GNU tr
#tr 0123456789 d
0123456789 input
dddddddddd output <----- BSD classic behavior
#tr 0123456789 d123456789 <----- padding set2 with set1 explicitly
0123456789 i
d123456789 o
01234567890123456789 i
d123456789d123456789 o
#tr -t 0123456789 d <----- --truncate-set1 i.e. System V behavior
0123456789 i
d123456789 o <----- concretely, this is what is meant by a result
0012 i where set2 was padded with set1
dd12 o
#tr -t 0123456789 d123456789 <----- padding set2 with set1 explicitly
0123456789 i
d123456789 o <----- note this is identical to the last results
Unix V5 tr + stdio mod
#./a.out 0123456789 d <----- our compiled version with the classic example
0123456789 i
d123456789 o
./a.out 0123456789 d123456789 <----- padding set2 with set1 explicitly
0123456789 i
d123456789 o
So our V5 version behaves exactly like the System V version in that respect. Furthermore explicitly padding set2 with set1 yields the same result for all implementations because it insures that set1 and set2 have the same number of elements (and it's when you don't have this that results vary historically).
Finally, explicitly padding or having tr
pad set2 with set1
as described in the original V4-V5 manuals means the same thing as truncating set1 to the length of set2
insofar as results are concerned - it IS the classic System V implementation for padding and yields the same results. V5 tr
is not a different implementation, despite the difference in the man pages.
Best Answer
The UNIX philosophy advocates for "small, sharp tools", so the answer is that reading from a file would be bloat contrary to the UNIX philosophy. As to why
wc
,grep
,sed
,awk
, etc do read from files, the answer is that they all have features that require more than one input or input selection or otherwise require direct access to the files. Astr
is not commonly used for those reasons you are left with one of the following forms to meet your needs;