Undefined behavior

From cppreference.com
< clrm; | language

The C language standard precisely specifies the observable behavior of C language programs, except for the ones in the following categories:

  • undefined behavior - there are no restrictions on the behavior of the program. Examples of undefined behavior are memory accesses outside of array bounds, signed integer overflow, null pointer dereference, modification of the same scalar more than once in an expression without sequence points, access to an object through a pointer of a different type, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.
  • unspecified behavior - two or more behaviors are permitted and the implementation is not required to document the effects of each behavior. For example, order of evaluation, whether identical string literals are distinct, etc. Each unspecified behavior results in one of a set of valid results and may produce a different result when repeated in the same program.
  • implementation-defined behavior - unspecified behavior where each implementation documents how the choice is made. For example, number of bits in a byte, or whether signed integer right shift is arithmetic or logical.
  • locale-specific behavior - implementation-defined behavior that depends on the currently chosen locale. For example, whether islower returns true for any character other than the 26 lowercase Latin letters.

(Note: Strictly conforming programs do not depend on any unspecified, undefined, or implementation-defined behavior)

The compilers are required to issue diagnostic messages (either errors or warnings) for any programs that violates any C syntax rule or semantic constraint, even if its behavior is specified as undefined or implementation-defined or if the compiler provides a language extension that allows it to accept such program. Diagnostics for undefined behavior are not otherwise required.

UB and optimization

Because correct C programs are free of undefined behavior, compilers may produce unexpected results when a program that actually has UB is compiled with optimization enabled:

For example,

Signed overflow

int foo(int x) {
    return x+1 > x; // either true or UB due to signed overflow
}

may be compiled as (demo)

foo:
        movl    $1, %eax
        ret

Access out of bounds

int table[4] = {0};
int exists_in_table(int v)
{
    // return true in one of the first 4 iterations or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return 1;
    }
    return 0;
}

May be compiled as (demo)

exists_in_table:
        movl    $1, %eax
        ret

Uninitialized scalar

bool p; // uninitialized local variable
if(p) // UB access to uninitialized scalar
    puts("p is true");
if(!p) // UB access to uninitialized scalar
    puts("p is false");

May produce the following output (observed with an older version of gcc):

p is true
p is false
size_t f(int x)
{
    size_t a;
    if(x) // either x nonzero or UB
        a = 42;
    return a; 
}


May be compiled as (demo)

f:
        mov     eax, 42
        ret

Invalid scalar

int f(void) {
  _Bool b = 0;
  unsigned char* p =(unsigned char*)&b;
  *p = 10;
  // reading from b is now UB
  return b == 0;
}

May be compiled as (demo)

f():
        movl    $11, %eax
        ret

Null pointer dereference

int foo(int* p) {
    int x = *p;
    if(!p) return x; // Either UB above or this branch is never taken
    else return 0;
}
int bar() {
    int* p = NULL;
    return *p;       // Unconditional UB
}

may be compiled as (foo with gcc, bar with clang)

foo:
        xorl    %eax, %eax
        ret
bar:
        retq

Access to pointer passed to realloc

Choose clang to observe the output shown

#include <stdio.h>
#include <stdlib.h>
int main(void) {
    int *p = (int*)malloc(sizeof(int));
    int *q = (int*)realloc(p, sizeof(int));
    *p = 1; // UB access to a pointer that was passed to realloc
    *q = 2;
    if (p == q) // UB access to a pointer that was passed to realloc
        printf("%d%d\n", *p, *q);
}

Possible output:

12

Infinite loop without side-effects

Choose clang to observe the output shown

#include <stdio.h>

int fermat() {
  const int MAX = 1000;
  int a=1,b=1,c=1;
  // Endless loop with no side effects is UB
  while (1) {
    if (((a*a*a) == ((b*b*b)+(c*c*c)))) return 1;
    a++;
    if (a>MAX) { a=1; b++; }
    if (b>MAX) { b=1; c++; }
    if (c>MAX) { c=1;}
  }
  return 0;
}

int main(void) {
  if (fermat())
    puts("Fermat's Last Theorem has been disproved.");
  else
    puts("Fermat's Last Theorem has not been disproved.");
}

Possible output:

Fermat's Last Theorem has been disproved.

References

  • C11 standard (ISO/IEC 9899:2011):
  • 3.4 Behavior (p: 3-4)
  • 4/2 Undefined behavior (p: 8)