Why Bash Shell Doesn’t Warn About Arithmetic Overflow

arithmeticbashshell

There are limits set for the arithmetic evaluation capabilities of the bash shell. The manual is succinct about this aspect of shell arithmetic but states:

Evaluation is done in fixed-width integers with no check for overflow,
though division by 0 is trapped and flagged as an error. The operators
and their precedence, associativity, and values are the same as in the
C language.

Which fixed-width integer this refers to is really about which data type is used (and the specifics of why this is are beyond this) but the limit value is expressed in /usr/include/limits.h in this fashion:

#  if __WORDSIZE == 64
#   define ULONG_MAX     18446744073709551615UL
#  ifdef __USE_ISOC99
#  define LLONG_MAX       9223372036854775807LL
#  define ULLONG_MAX    18446744073709551615ULL

And once you know that, you can confirm this state of fact like so:

# getconf -a | grep 'long'
LONG_BIT                           64
ULONG_MAX                          18446744073709551615

This is a 64 bits integer and this translates directly in the shell in the context of arithmetic evaluation:

# echo $(((2**63)-1)); echo $((2**63)); echo $(((2**63)+1)); echo $((2**64))
9223372036854775807        //the practical usable limit for your everyday use
-9223372036854775808       //you're that much "away" from 2^64
-9223372036854775807     
0
# echo $((9223372036854775808+9223372036854775807))
-1

So between 263 and 264-1, you get negative integers showing you how far off from ULONG_MAX you are1. When the evaluation reaches that limit and overflows, by whatever order that is, you get no warning and that part of the evaluation is reset to 0 which may yield some unusual behavior with something like right-associative exponentiation for instance:

echo $((6**6**6))                      0   // 6^46656 overflows to 0
echo $((6**6**6**6))                   1   // 6^(6^46656) = 6^0 = 1
echo $((6**6**6**6**6))                6   // 6^(6(6^46656)) = 6^(6^0) = 6^1
echo $((6**6**6**6**6**6))         46656   // 6^(6^(6^(6^46656))) = 6^6
echo $((6**6**6**6**6**6**6))          0   // = 6^6^6^1 = 0
...

Using sh -c 'command' doesn't change anything so I have to assume this is normal and compliant output. Now that I think I have a basic but concrete understanding of the arithmetic range and limit and what it means in the shell for expression evaluation, I thought I could quickly peek at what data types the other software in Linux use. I used some bash sources I had to complement the input of this command:

{ shopt -s globstar; for i in /path/to/source_bash-4.2/include/**/*.h /usr/include/**/*.h; do grep -HE '\b(([UL])|(UL)|())LONG|\bFLOAT|\bDOUBLE|\bINT' $i; done; } | grep -iE 'bash.*max'

bash-4.2/include/typemax.h:#    define LLONG_MAX   TYPE_MAXIMUM(long long int)
bash-4.2/include/typemax.h:#    define ULLONG_MAX  TYPE_MAXIMUM(unsigned long long int)
bash-4.2/include/typemax.h:#    define INT_MAX     TYPE_MAXIMUM(int)

There's more output with the if statements and I can search for a command like awk too etc. I notice the regular expression I used doesn't catch anything about arbitrary precision tools I have such as bc and dc.


Questions

  1. What is the rationale for not warning you (like awk does when evaluating 2^1024) when your arithmetic evaluation overflows? Why are the negative integers between 263 and 264-1 exposed to the end user when he's evaluating something?
  2. I have read somewhere that some flavor of UNIX can interactively change ULONG_MAX? Has anyone heard of this?
  3. If someone arbitrarily changes the value of the unsigned integer maximum in limits.h, then recompiles bash, what can we expect will happen?

Note

1. I wanted to illustrate more clearly what I saw, as it is very simple empirical stuff. What I noticed is that:

  • (a)Any evaluation that gives < 2^63-1 is correct
  • (b)Any evaluation that gives => 2^63 up to 2^64 gives a negative
    integer:

    • The range of that integer is x to y. x = -9223372036854775808 and y = 0.

Considering this, an evaluation which is like (b) can be expressed as
2^63-1 plus something within x..y. For instance if we're literally asked to evaluate (2^63-1)+100 002 (but could be any number smaller than in (a) ) we get -9223372036854675807. I'm just stating the obvious I guess but this also means that the two following expressions:

  • (2^63-1) + 100 002 AND;
  • (2^63-1) + (LLONG_MAX – {what the shell gives us for ((2^63-1) +
    100 002), which is -9223372036854675807}) well, using positive values we have;

    • (2^63-1) + (9223372036854775807 – 9223372036854675807 = 100 000)
    • = 9223372036854775807 + 100 000

are very close indeed. The second expression is "2" apart from (2^63-1) + 100 002 i.e. what we're evaluating. This is what I mean by you get negative integers showing you how far off from 2^64 you are. I mean with those negative integers and knowledge of the limits, well you cannot finish the evaluation within the x..y range in the bash shell but you can elsewhere – the data is usable up to 2^64 in that sense (I could add it up on paper or use it in bc). Beyond that however the behavior is similar to that of 6^6^6 as the limit is reached as described below in the Q…

Best Answer

So between 2^63 and 2^64-1, you get negative integers showing you how far off from ULONG_MAX you are.

No. How do you figure that? By your own example, the max is:

> max=$((2**63 - 1)); echo $max
9223372036854775807

If "overflow" meant "you get negative integers showing you how far off from ULONG_MAX you are", then if we add one to that, shouldn't we get -1? But instead:

> echo $(($max + 1))
-9223372036854775808

Perhaps you mean this is a number you can add to $max to get a negative difference, since:

> echo $(($max + 1 + $max))
-1

But this does not in fact continue to hold true:

> echo $(($max + 2 + $max))
0

This is because the system uses two's complement to implement signed integers.1 The value resulting from an overflow is NOT an attempt to provide you with a difference, a negative difference, etc. It is literally the result of truncating a value to a limited number of bits, then having it interpreted as a two's complement signed integer. For example, the reason $(($max + 1 + $max)) comes out as -1 is because the highest value in two's complement is all bits set except the highest bit (which indicates negative); adding these together basically means carrying all the bits to the left so you end up with (if the size were 16-bits, and not 64):

11111111 11111110

The high (sign) bit is now set because it carried over in the addition. If you add one more (00000000 00000001) to that, you then have all bits set, which in two's complement is -1.

I think that partially answers the second half of your first question -- "Why are the negative integers...exposed to the end user?". First, because that is the correct value according to the rules of 64-bit two's complement numbers. This is the conventional practice of most (other) general purpose high level programming languages (I cannot think of one that does not do this), so bash is adhering to convention. Which is also the answer to the first part of the first question -- "What's the rationale?": this is the norm in the specification of programming languages.

WRT the 2nd question, I have not heard of systems which interactively change ULONG_MAX.

If someone arbitrarily changes the value of the unsigned integer maximum in limits.h, then recompiles bash, what can we expect will happen?

It would not make any difference to how the arithmetic comes out, because this is not an arbitrary value that is used to configure the system -- it's a convenience value that stores an immutable constant reflecting the hardware. By analogy, you could redefine c to be 55 mph, but the speed of light will still be 186,000 miles per second. c is not a number used to configure the universe -- it's a deduction about the nature of the universe.

ULONG_MAX is exactly the same. It is deduced/calculated based on the nature of N-bit numbers. Changing it in limits.h would be a very bad idea if that constant is used somewhere assuming it is supposed to represent the reality of the system.

And you cannot change the reality imposed by your hardware.


1. I don't think that this (the means of integer representation) is actually guaranteed by bash, since it depends on the underlying C library and standard C does not guarantee that. However, this is what is used on most normal modern computers.