Hash format and source
The format of the password hash is $<type>$<salt>$<hash>
, where <type>
5
is an SHA-256 based hash. The salt is usually at least 8 characters, (and is in the examples in the question) so the sixth character is part of the salt.
Those hashes are likely generated by a version of the shadow tool suite (src package shadow
in Debian, shadow-utils
in CentOS)
I tried to find out why, exactly, the code biases the slash. (thanks to @thrig for originally digging up the code.)
TLDR: It's a bit interesting, but doesn't matter.
The code generating the salt
In libmisc/salt.c
, we find the gensalt
function that calls l64a
in a loop:
strcat (salt, l64a (random()));
do {
strcat (salt, l64a (random()));
} while (strlen (salt) < salt_size);
The loop takes a random number from random()
, turns it into a piece of a string, and concatenates that to the string forming the salt. Repeat until enough characters are collected.
What happens in l64a
is more interesting though. The inner loop generates one character at a time from the input value (which came from random()
):
for (i = 0; value != 0 && i < 6; i++) {
digit = value & 0x3f;
if (digit < 2) {
*s = digit + '.';
} else if (digit < 12) {
*s = digit + '0' - 2;
} else if (digit < 38) {
*s = digit + 'A' - 12;
} else {
*s = digit + 'a' - 38;
}
value >>= 6;
s++;
}
The first line of the loop (digit = value & 0x3f
) picks six bits from the input value, and the if
clauses turn the value formed by those into a character. (.
for zero, /
for a one, 0
for a two, etc.)
l64a
takes a long
but the values output by random()
are limited to RAND_MAX
, which appears to be 2147483647 or 2^31 - 1 on glibc. So, the value that goes to l64a
is a random number of 31 bits. By taking 6 bits at a time or a 31 bit value, we get five reasonably evenly distributed characters, plus a sixth that only comes from one bit!
The last character generated by l64a
cannot be a .
, however, since the loop also has the condition value != 0
, and instead of a .
as sixth character, l64a
returns only five characters. Hence, half the time, the sixth character is a /
, and half the time l64a
returns five or fewer characters. In the latter case, a following l64a
can also generate a slash in the first positions, so in a full salt, the sixth character should be a slash a bit more than half the time.
The code also has a function to randomize the length of the salt, it's 8 to 16 bytes. The same bias for the slash character happens also with further calls to l64a
which would cause the 11th and 12th character to also have a slash more often than anything else. The 100 salts presented in the question have 46 slashes in the sixth position, and 13 and 15 in the 11th and 12th position, respectively. (a bit less than half of the salts are shorter than 11 characters).
On Debian
On Debian, I couldn't reproduce this with a straight chpasswd
as shown in the question. But chpasswd -c SHA256
shows the same behaviour. According to the manual, the default action, without -c
, is to let PAM handle the hashing, so apparently PAM on Debian at least uses a different code to generate the salt. I didn't look at the PAM code on any distribution, however.
(The previous version of this answer stated the effect didn't appear on Debian. That wasn't correct.)
Significance, and requirements for salts
Does it matter, though? As @RemcoGerlich commented, it's pretty much only a question of encoding. It will effectively fix some bits of the salt to zero, but it's likely that this will have no significant effect in this case, since the origin of those bits is this call to srandom
in seedRNG
:
srandom (tv.tv_sec ^ tv.tv_usec ^ getpid ());
This is a variant of ye olde custom of seeding an RNG with the current time. (tv_sec
and tv_usec
are the seconds and microseconds of the current time, getpid()
gives the process id if the running process.)
As the time and PIDs are not very unpredictable, the amount of randomness here is likely not larger than what the encoding can hold.
The time and PID is not something you'd like to create keys with, but might be unpredictable enough for salts. Salts must be distinct to prevent brute-force testing multiple password hashes with a single calculation, but should also be unpredictable, to prevent or slow down targeted precomputation, which could be used to shorten the time from getting the password hashes to getting the actual passwords.
Even with the slight issues, as long as the algorithm doesn't generate the same salt for different passwords, it should be fine. And it doesn't seem to, even when generating a couple dozen in a loop, as the list in the question shows.
Also, the code in question isn't used for anything but generating salts for passwords, so there are no implications about problems elsewhere.
For salts, see also, e.g. this on Stack Overflow and this on security.SE.
Conclusion
In conclusion, there's nothing wrong with your system. Making sure your passwords are any good, and not used on unrelated systems is more useful to think about.
Best Answer
Yes. You're correct. Each steps can be split in minor tasks as well, but you describe the overall algorithm.
Follow a couple of articles describing in details the login process. [1] [2]
Note that this is only about the plain password, not mentioning PAM system. [3]