Undefined Behavior Is Really Undefined
Undefined Behavior is the expression used in C and C++ to describe a situation in which, basically, “anything goes”. Historically, UB covered cases where existing C compilers (and architectures) would act in irreconcilable ways, and the standard committee, in its infinite wisdom, decided not to decide (that is, not to rule out any of the competing implementations). UB was also applicable to possible scenarios in which the standard, while usually quite exhaustive, would have failed to define a prescribed behavior. A third and increasingly more important meaning of UB is the following: UB represents opportunities for optimizations. And C and C++ developers really love optimizations; they demand with great insistence that compilers do their best to make the code fast.
Here is a classic example. Consider the following code:
void
foo(double *src, int *dst)
{
int i;
for (i = 0; i < 4; i ++) {
dst[i] = (int)src[i];
}
}
Compile this code on a 64-bit x86 platform, running Linux, with GCC
(it’s an up-to-date Ubuntu 18.04, GCC version is 7.3.0). We want full
optimization, and then have a look at the assembly output, hence the
options are “-W -Wall -O9 -S
” (the “-O9
” argument selects the
maximum optimization level of GCC, which is in practice equivalent to
“-O3
”, although there have been some GCC forks that defined higher
levels). Here is the result:
.file "zap.c"
.text
.p2align 4,,15
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
movupd (%rdi), %xmm0
movupd 16(%rdi), %xmm1
cvttpd2dq %xmm0, %xmm0
cvttpd2dq %xmm1, %xmm1
punpcklqdq %xmm1, %xmm0
movups %xmm0, (%rsi)
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0"
.section .note.GNU-stack,"",@progbits
Each of the first two movupd
opcodes reads two double
values into a
128-bit SSE2 register (a double
is 64 bits, an SSE2 register can
contain two of them). In other words, the four source values are read,
and only then the conversions to int
occur (the cvttpd2dq
opcodes).
The punpcklqdq
moves around the four resulting 32-bit integers into a
single SSE2 register (%xmm0
), which is then written into RAM
(movups
). Notice the important bit: this C program nominally requests
the following sequence of accesses to RAM:
- Read the first
double
value atsrc[0]
. - Write the first
int
value intodst[0]
. - Read the second
double
value atsrc[1]
. - Write the second
int
value intodst[1]
. - Read the third
double
value atsrc[2]
. - Write the third
int
value intodst[2]
. - Read the fourth
double
value atsrc[3]
. - Write the fourth
int
value intodst[3]
.
However, what the C program asks for is only within the context of the abstract machine defined by the C standard; how things happen on the physical machine may differ. The compiler is free to reorder or modify the operations as long as the result still matches the semantics of the abstract machine (it is known as the “as-if rule”). Notably, in this case, the compiled code does things in a different order:
- Read the first
double
value atsrc[0]
. - Read the second
double
value atsrc[1]
. - Read the third
double
value atsrc[2]
. - Read the fourth
double
value atsrc[3]
. - Write the first
int
value intodst[0]
. - Write the second
int
value intodst[1]
. - Write the third
int
value intodst[2]
. - Write the fourth
int
value intodst[3]
.
This is C: everything in memory is ultimately bytes (i.e. unsigned
char
slots, in practice octets), and arbitrary pointer arithmetic is
supported. In particular, the src
and dst
pointers might be made by
the caller to point to memory areas that overlap in some way (this is
known as “aliasing”). Thus, the order of reads and writes may matter, in
case some bytes are written to then read back again. To fully conform to
the abstract behavior defined by the C program, the C compiler would
have to interleave the reads and writes, and make the code do a full
round-trip to RAM at each loop iteration. The resulting code would be
larger, and much slower. C developers would wail and whine.
Fortunately, undefined behavior comes to the rescue. The C standard
specifies that values “cannot” be accessed through pointers that do not
match the effective type of the value; in plain words, that if a value
is written into dst[0]
, with dst
being a pointer to int
, the
corresponding bytes cannot be read back with src[1]
, where src
is a
pointer to double
, because that would be accessing what has become an
int
value through a pointer to an incompatible type. Such an access
would imply undefined behavior. In the ISO 9899:1999 standard (aka
“C99”), this is specified in section 6.5, subclause 7 (in the more
recent revision 9899:2018, aka “C17”, the wording is unchanged); this is
also known as strict aliasing. The consequence here is that the C
compiler is allowed to assume that accesses that imply UB by breaking
aliasing rules do not happen; thus, the compiler can reorder the reads
and writes in any way it wants, since they should not overlap. This is
how the code is optimized.
This is in a nutshell what UB means: the compiler can assume that UB
does not happen, and produce code under that assumption. In the case of
strict aliasing rules, UB allows for important optimizations to take
place, that would be hard to obtain otherwise, as long as aliasing can
occur. More generally, in the code generation routines used by the
compiler, each instruction has dependencies that restrict the opcode
scheduling algorithm: an instruction cannot be issued before the
instructions that it depends upon, or after the instructions that depend
on it. In the example above, UB removes the dependencies between the
writes to dst[]
, and the “subsequent” reads from src[]
: such a
dependency can exist only if an UB-implying access exists. In a similar
way, the notion of UB can allow the compiler to simply remove code
that cannot happen without going through an UB condition.
This is all well and dandy, but the same behavior is also occasionally felt as the compiler backstabbing the developer. A commonly encountered comment is that “the compiler uses UB as a pretext to crash my code”. Suppose that somebody is writing code that adds integer values together, and fears overflows; such things have happened in Bitcoin. The developer might think along the following lines: the CPU uses two’s complement representation of integers, and therefore, if an overflow occurs, this will be because the result is truncated to the type width, e.g. 32 bits. This means that the result of an overflow is predictable, and can be tested. Our developer will write this:
#include <stdio.h>
#include <stdlib.h>
int
add(int x, int y, int *z)
{
int r = x + y;
if (x > 0 && y > 0 && r < x) {
return 0;
}
if (x < 0 && y < 0 && r > x) {
return 0;
}
*z = r;
return 1;
}
int
main(int argc, char *argv[])
{
int x, y, z;
if (argc != 3) {
return EXIT_FAILURE;
}
x = atoi(argv[1]);
y = atoi(argv[2]);
if (add(x, y, &z)) {
printf("%d\n", z);
} else {
printf("overflow!\n");
}
return 0;
}
Now let’s try to compile that with GCC:
$ gcc -W -Wall -O9 testadd.c
$ ./a.out 17 42
59
$ ./a.out 2000000000 1500000000
overflow!
OK, this seems to work. Let’s try again with another compiler, Clang (version 6.0.0 on my system):
$ clang -W -Wall -O3 testadd.c
$ ./a.out 17 42
59
$ ./a.out 2000000000 1500000000
-794967296
Wut?
It turns out that when an operation on signed integer types has a result
which is not representable in the target type, we enter UB territory.
But the compiler can assume that UB does not happen. In particular, when
optimizing the expression “x > 0 && y > 0 && r < x
”, the compiler
infers that since x
and y
are strictly positive, the third test
cannot be true (their sum cannot be lower than either), and the whole
thing can be skipped. In other words, the overflow being UB, it “cannot
happen” from the compiler perspective, and all instructions that depend
upon such a condition can be removed. The overflow detection mechanism
has simply disappeared.
The assumption that computations use “wraparound semantics” (as the CPU
opcodes actually do) has never been standard for signed types; it was
traditional, at a time when compilers were not smart enough to
optimize code based on range analysis. It is possible to force Clang and
GCC to enforce wraparound semantics on signed types with the special
flag -fwrapv
(on Microsoft Visual C, you might use
-d2UndefIntOverflow-
, as described in this blog
entry).
This is however relatively fragile, and that flag may disappear when the
code is reused in another project or on another architecture.
That overflows on signed types imply UB is not well-known. In the C99 and C17 standards, it is described in section 6.5, subclause 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
However, for unsigned types, modular semantics are guaranteed. Section 6.2.5, subclause 9, specifies that:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Another example of UB with signed types happens with divisions. It is
well-known that division by zero is not mathematically defined, and thus
dividing by zero leads to undefined behavior, as per the standard. On
x86 CPU, if the idiv
opcode is invoked with a denominator equal to
zero, a CPU exception happens. CPU exceptions are like interrupt
requests, they are handled by the operating system; on Unix-like systems
such as Linux, the CPU exception triggered by idiv
is translated into
a SIGFPE
signal delivered to the process, and the default handler
kills the process (nevermind that “FPE” means “floating-point
exception”, while idiv
works with integers). However, there is another
situation that also implies UB. Consider this code:
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
int x, y;
if (argc != 3) {
return EXIT_FAILURE;
}
x = atoi(argv[1]);
y = atoi(argv[2]);
printf("%d\n", x / y);
return 0;
}
And then try it:
$ gcc -W -Wall -O testdiv.c
$ ./a.out 42 17
2
$ ./a.out -2147483648 -1
zsh: floating point exception (core dumped) ./a.out -2147483648 -1
Indeed, on that machine (still my common x86 running Linux), the int
type can represent values from -2147483648 to +2147483647. When dividing
-2147483648 by -1, I should get +2147483648. But this does not fit in an
int
. Therefore, behavior is undefined. Anything can happen. In this
case, the process dies. On other systems, you could get a different
outcome, especially the small CPU that do not have a division opcode; on
such architectures, division is done in software, with a routine that is
normally provided by the compiler, and that one may do whatever it wants
on UB, because that’s exactly what UB means.
As a side-note, a SIGFPE
can also be obtained under the same
conditions with the modulo operator (%
). Indeed, internally, the same
idiv
opcode is used to get both the quotient and remainder, hence the
same CPU exception is triggered. Interestingly, as per the C99 standard,
the “INT_MIN % -1
” expression would not have invoked UB, because the
mathematical result is well defined (it is zero) and it certainly fits
into the representable range of the target type. In the C17 standard,
the wording of 6.5.5, subclause 6, was augmented to cover that case and
align the standard with the reality of common hardware platforms.
There are many subtle conditions that can lead to UB. Look at this code:
#include <stdio.h>
#include <stdlib.h>
unsigned short
mul(unsigned short x, unsigned short y)
{
return x * y;
}
int
main(int argc, char *argv[])
{
int x, y;
if (argc != 3) {
return EXIT_FAILURE;
}
x = atoi(argv[1]);
y = atoi(argv[2]);
printf("%d\n", mul(x, y));
return 0;
}
As per the C standard, what is this C program allowed to print, if invoked with parameters “45000” and “50000”?
- A) 18048
- B) 2250000000
- C) God save the Queen
And the answer is… all of the above! You might have told yourself that
since unsigned short
is an unsigned type, it should work with
wraparound semantics, modulo 65536 because on an x86 CPU, that type
typically has size exactly 16 bits (as per the standard, it could be
larger than that, but in practice it will be a 16-bit type). And while
the product is mathematically 2250000000, it will be reduced modulo
65536, yielding 18048. However, this would be forgetting about the
integer promotions. As per the C standard (section 6.3.1.1, subclause
2), when the operands are of a type strictly smaller than int
, and all
values of that type can be represented without loss in an int
(which
is our case here: on my x86 running Linux, an int
is 32 bits and can
certainly hold all values from 0 to 65535), then both operands are
converted to int
, and the operation occurs on int
values. In
particular, the product is computed as an int
, and only coerced back
to the unsigned short
type when returning from the function (this is
when the modulo 65536 occurs). The problem is that the mathematical
result before that coercion is 2250000000, and that value does not fit
in an int
, which is a signed type. Hence UB. Therefore, anything can
happen afterwards, including a spontaneous display of British
patriotism.
In practice, though, you will get 18048 with usual compilers, because there is no currently known and implemented optimization that could leverage the UB to optimize things in this specific program (one can imagine other more contrived scenarios where that UB really wreaks havoc to execution).
A final example, this one in C++:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <array>
int
main(int argc, char *argv[])
{
std::array<char, 16> tmp;
int i;
if (argc < 2) {
return EXIT_FAILURE;
}
memset(tmp.data(), 0, 16);
if (strlen(argv[1]) < 16) {
strcpy(tmp.data(), argv[1]);
}
for (i = 0; i < 17; i ++) {
printf(" %02x", tmp[i]);
}
printf("\n");
}
This is not a classic “strcpy()
is bad bad bad” example. Indeed, in
the code above, the strcpy()
is performed only if the source string is
small enough, including the terminating zero. Moreover, the array
contents are explicitly initialized to zero, so all array bytes have a
set value, whether the argument string is large or not. However, the
final loop is incorrect, in that it reads one more byte than it
should.
Let’s try it:
$ g++ -W -Wall -O9 testvec.c
$ ./a.out foo
66 6f 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 10 58 ffffffca ff
ffffac ffffffc0 55 00 00 00 ffffff80 71 34 ffffff99 07 ffffffba ff
ffffea ffffffd0 ffffffe5 44 ffffff83 fffffffd 7f 00 00 00 00 00 00
00 00 00 00 10 58 ffffffca ffffffac ffffffc0 55 00 00 ffffff97 7b
12 1b ffffffa1 7f 00 00 02 00 00 00 00 00 00 00 ffffffd8 ffffffe5
44 ffffff83 fffffffd 7f 00 00 00 ffffff80 00 00 02 00 00 00 60 56
(...)
62 64 3d 30 30
zsh: segmentation fault (core dumped) ./a.out foo
Wut++?
Naively, you could say: OK, it is reading one byte out of bounds; but that should be no big deal, that byte is still on the stack, memory is mapped, there should be no issue except an extra seventeenth number with a somewhat unpredictable value. The final loop will still print exactly 17 integers (in hexadecimal) and exit cleanly.
But the compiler sees things differently. It knows perfectly well that
the seventeenth access invokes UB. Within the logic of the compiler,
this means that any instruction beyond that one is then in limbo: things
after UB are not required to exist (technically, even previous
instructions can be affected, because UB can be retroactive). In
particular, it will simply skip the condition test on the loop, and the
loop will run forever; or, more accurately, until the reading goes
beyond the pages allocated for the stack, at which point a SIGSEGV
occurs.
Amusingly, if GCC is invoked with less aggressive optimizations, it will warn in a quite explicit way:
$ g++ -W -Wall -O1 testvec.c
testvec.c: In function ‘int main(int, char**)’:
testvec.c:20:15: warning: iteration 16 invokes undefined behavior [-Waggressive-loop-optimizations]
printf(" %02x", tmp[i]);
~~~~~~^~~~~~~~~~~~~~~~~
testvec.c:19:19: note: within this loop
for (i = 0; i < 17; i ++) {
~~^~~~
But at -O9
level, somehow, the warning disappears. This may be related
to the more forcefull loop unrolling that happens at high optimization
levels. Arguably, this is a GCC bug (the lack of warning, that is; as
per the standard, what GCC does is fully conformant either way, and no
“diagnostic” is required by the standard in that situation).
Conclusion: if you write code in C or C++, take great care that you always avoid undefined behavior conditions, even in situations that seem “obviously harmless”. Using unsigned integer types helps with arithmetic computations, because they have guaranteed modular semantics (but you can still run into trouble with integer promotions). Another possible stance, but strangely unpopular, might be: don’t write code in C or C++. This is not always an option, for a variety of reasons. But in situations where you can choose between languages, e.g. when starting a new project on a platform where Go or Rust or Java or other languages are supported, then it can pay off to not necessarily go for C as the “default language”. All choices of tools, including programming languages, imply trade-offs; the pitfalls of C programming, notably UB conditions on arithmetics on signed types, tend to imply maintenance efforts in the long run that are often underestimated.