Nov 18, 2021 • Research, RustEditsPermalink

Undefined Behavior deserves a better reputation

This is a cross-post of an article that I wrote for the SIGPLAN blog.

“Undefined Behavior” often has a bad reputation. People see it as an excuse compiler writers use to break code, or an excuse by a lazy language designer to not complete the specification and properly define all this behavior. But what, really, is Undefined Behavior, and is it as bad as its reputation? In this blog post, I will look at this topic from a PL perspective, and argue that Undefined Behavior (or UB for short) is a valuable tool in a language designer’s toolbox, and that it can be used responsibly to convey more of the programmer’s insight about their code to the compiler with the goal of enabling more optimizations. I will also explain why I spent a significant amount of time adding more UB to Rust.

A simple example

In the best PL tradition, let us consider an artificial example to demonstrate the benefit of UB. Imagine we want to implement a function that returns the element in the middle of an array. If we are using Rust, we would probably write something like this:

fn mid(data: &[i32]) -> Option<i32> {
  if data.is_empty() { return None; }
  return Some(data[data.len()/2]);
}

The argument is of type &[i32], which is called a “slice” and consists of a pointer to some array and information about how long the array is. mid itself returns an integer wrapped in Option (corresponding to Maybe in Haskell) to properly signal the case where the array is empty. In the non-empty case, it computes the index in the middle of data, and returns that element.

Now imagine this function is called in a tight loop in the benchmark for our next paper, so performance really matters. Is there any performance improvement we can hope to achieve in this function? It might seem like mid already does the absolute minimum amount of work required for the task, but there is some hidden cost in the array access data[_]: the compiler has to insert a bounds-check here to ensure that we do not access data beyond the size of the array that data points to. But as the programmer we know that bounds-check to be entirely unnecessary, since data.len()/2 will always be smaller than data.len()! Wouldn’t it be great if there was a way to tell the compiler about this, such that we can be sure no bounds check happens?

Here is one way to accomplish that in Rust:

fn mid(data: &[i32]) -> Option<i32> {
  if data.is_empty() { return None; }
  match data.get(data.len()/2) {
    Some(&x) => return Some(x),
    None => unsafe { unreachable_unchecked() }
  }
}

We are now using the get operation to access the array, which returns an Option that is None for out-of-bounds accesses. And in case we get None, we call a special function unreachable_unchecked which makes a binding promise to the compiler that this piece of code is unreachable. The keyword unsafe here indicates that what we are doing is not covered by the type safety guarantees of the language: the compiler will not actually check that the promise we made holds true, it will just trust us on that. (The phrase “unchecked” is a Rust idiom; this is the “unchecked” version of unreachable, which inserts a run-time check that safely aborts the program should this code ever be reached – or, to be more precise, it triggers a Rust panic.)

After some inlining, the relevant part of this code looks as follows:

  let idx = data.len()/2;
  if idx < data.len() { // Automatically inserted bounds-check.
    ... // Access the array at `idx`.
  } else {
    unreachable_unchecked()
  }

Since we told the compiler that the else branch is unreachable, it is easy to optimize away the conditional, so we end up with just a direct access to element idx in the array. Problem solved! (In fact, Rust provides get_unchecked as an alternative to get where the caller has to promise that the index is in-bounds, so a Rust programmer would just write data.get_unchecked(data.len()/2) to implement mid efficiently.)

I expect some readers will not be happy with the way I achieved the desired optimization in the initial example, and argue that the compiler should be smart enough to do this automatically. I will get back to this point later; for now, just note that the latest stable version of Rust at the time of writing does not perform this optimization (as indicated by the call to panic_bounds_check).

Where is the Undefined Behavior?

Hang on, you might say at this point, wasn’t this blog post supposed to be about Undefined Behavior? That term did not even appear in the discussion of the example! Indeed, I was a bit sneaky and used different terminology that I think better captures a constructive way to think about Undefined Behavior. In the typical terminology, I would have said that calling the special function unreachable_unchecked causes immediate Undefined Behavior. Following the definition in the latest C standard (also shared by C++), the standard “imposes no requirements” on programs that exhibit Undefined Behavior. The compiler can hence basically replace the else branch by whatever code it wants, famously including “to make demons fly out of your nose”, but also including just executing the then branch instead.

This line of reasoning leads to the same result, but it paints an unnecessarily antagonistic picture of compiler writers. It makes it sound like compilers use complicated analyses to detect Undefined Behavior. Once they find UB, they have an excuse to emit broken code and hide behind the standard should anyone complain. This is not what actually happens. As we have seen in our example, the compiler really has no idea if this code has Undefined Behavior or not – all it does is perform optimizations that are correct under the extra assumption that there is no Undefined Behavior.

UB is a double-edged sword

Another reaction you might have is that unreachable_unchecked is not a “typical” example of UB. Most people probably associate that term with C or C++, which do not even have unreachable_unchecked (though many compilers provide an intrinsic with the same effect, e.g., __builtin_unreachable in GCC). So it may seem like I picked a strange example. Shouldn’t I be talking about how, say, signed integer overflow is UB?

This is the right time to admit that I am not going to defend all UB in C/C++. I think UB as a concept is a great idea, and unreachable_unchecked is the “most pure” form of UB that shows how it can be used by the programmer to convey extra information to the compiler – but I also think that C and C++ are massively overusing UB. Of course, it is easy to say this with the benefit of hindsight; the first C compilers were extremely simple and today’s use of UB for optimizations only emerged over time. It took a while for the implications of the modern interpretation of UB in the standard to become clear – and C and C++, being very successful languages, have massive existing codebases which makes it super hard to revise any prior decision. This post is about defending and promoting UB as a concept, not UB in C/C++.

Speaking of signed integer overflow, I think this is actually a good example for how to not use UB. An innocent-looking + turns into a promise of the programmer that this addition will never overflow, but the programmer probably will not carefully do a mental no-overflow proof for every single addition in their program. Instead, + could perform overflow checks or well-defined wrap-around, and the language could provide an unchecked_add function where overflows are UB. This lets the programmer opt in to providing extra no-overflow promises, to be used in situations where it is really beneficial for performance that the compiler can make this assumption (such as this example). Basically, I am considering this a language (and library) design problem: UB is a sharp knife; when used well it gets the job done better, but it can also hurt a lot when used without enough care.

Language and library design is not everything that can be used to tame UB, however. Good tooling can also make a big difference: if programmers can easily run their programs in “UB-checking mode”, they can write tests to at least ensure the absence of UB for certain inputs. (Shameless plug: I am working on Miri, a tool that provides exactly this for Rust.) Library authors can run their test suites with such a tool, and the tool can also be used exploratively to learn about what exactly is and is not UB in the first place. I think this is absolutely crucial, and language designers should design UB in a way that makes UB-checking tools more feasible. For the examples of UB we have seen so far (unreachable_unchecked, get_unchecked, and unchecked_add), this is obviously trivial.

How far can we push UB?

That said, not all UB is that simple to teach and test. Even Rust, with its benefit of learning from several decades of experience with UB in C and C++, has UB that is a lot more subtle than this. The most glaring example of this is probably UB related to incorrect aliasing of mutable references. (Other, less extreme examples would be UB due to using uninitialized memory, or UB due to data races.)

The Rust type system ensures that mutable references never alias any other reference that is currently being used in the program, i.e., they never point to the same memory as any other reference. This is a juicy guarantee for compiler writers, because while reordering memory accesses is often beneficial, it can be very hard to figure out if the transformation is even allowed – if two accesses alias, then their original order must be preserved.

However, unsafe code in Rust could easily create aliasing mutable references. So what can we do? We make the programmer promise that they do not do this! This is a lot like saying “the programmer promises that unreachable_unchecked is never called”, so we can put on our UB lens and say that it is Undefined Behavior to have aliasing mutable references.

The devil is of course in the details of defining what exactly this means. Stacked Borrows (part of my PhD thesis and also described in a series of blog posts: v1.0, v2.0, v2.1) goes into all that detail by giving an operational semantics that exactly defines the promises programmers have to make. And that semantics is non-trivial! According to Stacked Borrows, the following code has UB:

let x = &mut 42; // Safely create a reference.
let xptr = x as *mut i32; // Turn that reference into a raw (unchecked) pointer.
let x1 = unsafe { &mut *xptr }; // Turn the pointer back into a reference...
let x2 = unsafe { &mut *xptr }; // ...twice, so uniqueness is violated.
*x1 = 0; // Undefined Behavior!

The reason this code has UB is that creating x2 makes a promise that this is the unique reference created from xptr, so the previously created x1 is invalidated when x2 gets created. This means future uses of x1 are Undefined Behavior.

So this raises the question: can we really expect every author of unsafe Rust code to internalize Stacked Borrows to the extent that they can faithfully promise to the Rust compiler that their code will comply by this bespoke set of rules? Is it a good idea to interpret &mut expr as a promise that all aliasing was carefully checked and this reference is definitely unique? As with other UB, we can help programmers by providing tools; Miri contains an implementation of Stacked Borrows which both helps us to evaluate whether actual Rust code is compatible (or can reasonably be made compatible) with Stacked Borrows, and it helps Rust programmers by giving them a way to at least test for aliasing violations, and to interactively play with the semantics to gain a better understanding. I think that puts us in a pretty good spot overall, but some people still argue that Stacked Borrows goes too far and Rust will end up in a situation similar to the one C and C++ find themselves in – where too few programmers actually know how to write UB-free code, and a significant amount of the code people rely on exhibits UB.

Stacked Borrows is not part of the Rust spec, and is not the final word for aliasing-related UB in Rust. So there is still the chance that future revisions of this model can be made to better align with programmer intuition. The above code might get accepted because x2 is not actually being used to access memory. Or maybe &mut expr should only make such promises when used outside an unsafe block – but then, should adding unsafe really change the semantics of the program? As usual, language design is a game of trade-offs.

Conclusion

I have presented Undefined Behavior as a tool that enables the programmer to write code that the compiler cannot check for correctness, and argued that – used responsibly – it is a useful component in a language designer’s toolbox.

As I alluded to earlier, the “obvious” alternative would be to make the compiler smarter. However, real programs are typically a lot more complicated than my simple example (which already outsmarts Rust’s LLVM backend), and the reasoning required to justify an optimization can become arbitrarily complicated. Language designers should acknowledge that optimizers have their limitations and give programmers the tools they need to help the optimizer. Indeed, I think the fact that Rust combines a clever type checker with the idea of using unsafe code for the cases where the type checker is not clever enough is crucial for its success: unsafe is not a bug; it is a feature without which Rust would not be able to make systems programming safer in practice. It is also worth mentioning that many languages that we all know and love provide comparable “trusted” operations or annotations, e.g., Obj.magic in OCaml or the rewrite rules in GHC. Rust only differs in how prevalent unsafe code is in the ecosystem (and in emphasizing the importance of encapsulating such code within safe APIs).

In closing, I would like to propose that “Undefined Behavior” might need a rebranding. The term focuses on the negative case, when really all we ever care about as programmers or compiler authors is that programs do not have Undefined Behavior. Can we get rid of this double negation? Maybe we should talk about “ensuring Well-Defined Behavior” instead of “avoiding Undefined Behavior”.

To sum up: most of the time, ensuring Well-Defined Behavior is the responsibility of the type system, but as language designers we should not rule out the idea of sharing that responsibility with the programmer.

Thanks to Anish Athalye and Adrian Sampson for feedback on earlier drafts of this post.

Posted on Ralf's Ramblings on Nov 18, 2021.
Comments? Drop me a mail or leave a note on reddit!