Nov 24, 2021 • Research, Rust • Edits • Permalink

Do we really need Undefined Behavior?

I recently published a blog post on why Undefined Behavior is actually not a bad idea. Coincidentally, this is just a few weeks after the publication of this paper by Victor Yodaiken which basically argues that Undefined Behavior (UB for short) made C unusable for one of its core audiences, OS developers. Here I refer to the typical modern interpretation of UB: assumptions the compiler may trust, without bounds on what happens if they are violated. The paper makes many good points, but I think the author is throwing out the baby with the bathwater by concluding that we should entirely get rid of this kind of Undefined Behavior. The point of this blog post is to argue that we do need UB by showing that even some of the most basic optimizations that all compilers perform require this far-reaching notion of Undefined Behavior.

To avoid ambiguity, I will refer to the above notion of UB as “unrestricted UB”. The alternative interpretation of UB promoted by Yodaiken is what one might call “platform-specific UB”. This requires that even programs with Undefined Behavior should behave in a consistent way: for example, the result of an out-of-bounds write may be ‘unpredictable’, it may either not actually happen or mutate some data somewhere. However, if a write occurs, the program must still behave in a way that is consistent with performing a write to the given address in the target platform. (At least, that is my understanding. I hope I am not misrepresenting their position here. The paper does not go into a lot of detail on how the situation could be improved, but it mentions proposals “where compilers map source operations to well-defined instruction sequences, in either a virtual or real machine, from which compiler optimisations may not observably stray”.)¹

Examples of unrestricted UB

So what is the problem with platform-specific UB? First of all, it does not reflect what the major compilers actually do in practice. I have seen claims in the past that GCC and LLVM are the only compilers making use of unrestricted UB; this is simply not true. Here is an example of ICC performing such an optimization (based on example code by Yodaiken):

#include <stdlib.h>
#include <stdio.h>

int main () {
  int *i = malloc(sizeof(int));
  *i = 1;
  int *j = malloc(sizeof(int));
  *j = 1;
  int *k = malloc(sizeof(int));
  *k = 1;

  int *x = j+(32/4);
  *x = 40;
  printf("*i=%d (%p) *j=%d (%p) *k=%d (%p)  *x=%d (%p)", *i, i, *j, j, *k, k, *x, x);
}

This program prints the values and addresses of a few pointers. The concrete addresses are different on each execution, but the pattern is always the same:

*i=1 (0x1aef2a0) *j=1 (0x1aef2c0) *k=1 (0x1aef2e0)  *x=40 (0x1aef2e0)

Notice how k and x point to the same address (0x1aef2e0 in this particular execution), but seem to contain different values. This is impossible under “platform-specific UB”: no sequence of target platform operations can lead to a situation where the same address contains two different values.² This example demonstrates that even ICC with -O1 already requires unrestricted UB. (For completeness’ sake, here is a similar example for GCC; at the time of writing, i and x have the same address but different values. And here is an example for clang/LLVM, this time it’s again k and x that behave inconsistently. godbolt supports MSVC but does not seem to be willing to execute the generated programs, but I have no doubt that similar examples can be found for this compiler.)

What about niche compilers specifically built for reliable software? In their paper, Yodaiken claims that the verified C compiler CompCert “does not do any undefined behavior based optimization” (with a footnote saying “Except for assuming objects do not overlap in memory”; I am not quite sure what exactly is meant by this). This is incorrect. First of all, since CompCert has a proof of correctness, we can have a look at its specification to see what exactly it promises to its users—and that specification quite clearly follows the “unrestricted UB” approach, allowing the compiled program to produce arbitrary results if the source program has Undefined Behavior. Secondly, while CompCert’s optimizer is very limited, it is still powerful enough that we can actually demonstrate inconsistent behavior for UB programs in practice:

#include <stdio.h>

int y, x;

int f(void)
{
  y = 0;
  *(&x + 1) = 1;
  return y;
}

int main()
{
  int eq = (&x+1 == &y);
  if (eq) {
    printf("%d ", f());
    printf("%d\n", y);
  }
  return 0;
}

(Putting the result of the comparison into a local variable eq prevents CompCert from optimizing away the entire conditional.) This program, after being compiled with CompCert, prints “0 1”. Again, this is printing “the same thing” twice, in this case the value stored at y, and produces two different results. CompCert exploited UB in a way that leads to a situation which should be “impossible” on the underlying machine.

Platform-specific UB is not an option

Both of these examples highlight a fundamental problem with “platform-specific UB”: any out-of-bounds write could potentially modify any other variable (at least any variable that has an address in memory). This can make even the most basic parts of high-quality code generation, such as register allocation, tricky or impossible: a variable that has its address taken has to be re-loaded from that same address any time an out-of-bounds write might have happened, since that write might just have hit the right address to change this variable’s value. This applies even if the address has not yet been leaked to the outside world, as the first example shows. This is probably why there is hardly any compiler that follows the platform-specific interpretation of UB. (I say “hardly any” without knowing a counterexample, but I would not be surprised if some compilers for high-assurance embedded code are so simple that platform-specific UB is sufficient for them. But that is hardly representative for how C is used—and as we have seen with CompCert, even some high-assurance compilers do rely on unrestricted UB.)

I honestly think trying to write a highly optimizing compiler based on a different interpretation of UB would be a worthwhile experiment. We sorely lack data on how big the performance gain of exploiting UB actually is. However, I strongly doubt that the result would even come close to the most widely used compilers today—and programmers that can accept such a big performance hit would probably not use C to begin with. Certainly, any proposal for requiring compilers to curtail their exploitation of UB must come with evidence that this would even be possible while keeping C a viable language for performance-sensitive code.

To conclude, I fully agree with Yodaiken that C has a problem, and that reliably writing C has become incredibly hard since undefined behavior is so difficult to avoid. It is certainly worth reducing the amount of things that can cause UB in C, and developing practical tools to detect more advanced kinds of UB such as strict aliasing violations. I also wonder whether strict aliasing can be made more compatible with low-level programming patterns—or whether C should provide alternative means of alias control to programmers, such as restrict (not that its specification doesn’t have its own set of problems, but an opt-in mechanism like restrict seems fundamentally more suited when the goal is to ensure compatibility with existing code).

However, I do not think this problem can be solved with a platform-specific interpretation of UB. That would declare all but the most basic C compilers as non-compliant. We need to find some middle ground that actually permits compilers to meaningfully optimize the code, while also enabling programmers to actually write standards-compliant programs. I am not involved in the work that happens here on the C side, but for Rust, I think we can achieve this through a combination of being diligent about how much UB we really need, using language and API design to make it easier for the programmer to be aware of UB requirements imposed by the code they write, and providing tools that help programmers determine if their code exhibits UB or not.

The paper also cites C committee proposal N2769. However, N2769 explicitly says that a + 1 < a can still be optimized to false, while Yodaiken mentions this as an undesirable optimization. In fact, N2769 says it is okay and of “great value” to “assume the absence of UB”. I admit I do not understand the distinction N2769 makes between “assuming the absence of UB” and “making assumptions about the result of UB”, but it seems clear that Yodaiken goes even further than N2769 in restricting UB-based optimizations. ↩
I assume N2769 would also not be happy with this outcome of our example program. ↩

Posted on Ralf's Ramblings on Nov 24, 2021.
Comments? Drop me a mail!