Aug 22, 2018 • Internship, Rust • Edits • Permalink

Two Kinds of Invariants: Safety and Validity

When talking about the Rust type system in the context of unsafe code, the discussion often revolves around invariants: Properties that must always hold, because the language generally assumes that they do. In fact, an important part of the mission of the Unsafe Code Guidelines strike force is to deepen our understanding of what these invariants are.

However, in my view, there is also more than one invariant, matching the fact that there are (at least) two distinct parties relying on these invariants: The compiler, and (authors of) safely usable code. This came up often enough in recent discussions that I think it is worth writing it down properly once, so I can just link here in the future. :)

Safety Invariants

The most prominent invariant in Rust is probably what I will call the safety invariant: This is the invariant that every safe function can assume holds. This invariant depends on the type, so I will also speak of data being safe at type T, meaning the data satisfies the safety invariant. (In fact, types come with more than one safety invariant to appropriately handle sharing, but that is not our concern this time.)

For example, when you write a function fn foo(x: bool), you can assume that x is safe for bool: It is either true or false. This means that every safe operation on bool (in particular, case distinction) can be performed without running into undefined behavior.

In fact, this is the main point of the safety invariant:

The safety invariant ensures that safe code, no matter what it does, executes safely – it cannot cause undefined behavior when working on this data.

Based on that, it should not be hard to fill in the invariant for some more types. For example, &bool is an aligned, non-NULL reference to allocated read-only memory (because safe code can dereference) such that the data stored in that memory is safe at bool (because safe code can work on that bool after dereferencing). This kind of “structural recursion” is happening everywhere for safety invariants: x is safe for (T, U) if x.0 is safe for T and x.1 is safe for U. As far as enums is concerned, for example x is safe for Option<T> if either the discriminant is None, or it is Some and the data is safe for T. Similarly, x is never safe for ! – there just is no valid discriminant that x could have.

When we talk about integer types like i32, it is tempting to say that all data is safe at i32. However, that does not actually work: As I discussed in a previous post, the “bytes” that make up data have more possible states than just 0..256. In particular, they can be uninitialized (undef or poison in LLVM). Some operations like if x != 0 { ... } are undefined behavior if x is uninitialized, so sticking with our theme that the safety invariant must be strong enough to support safe execution of safe code, we have to declare that x is only safe at i32 if it is initialized.

For reference types, some of the details are harder to define, but we can still roughly say that a pointer is safe for &'a mut T if it is aligned, non-NULL and points to allocated memory that, for lifetime 'a, no other pointer accesses, and such that the data stored in memory at that location is safe for T. Similarly (and ignoring interior mutability), a pointer is safe for &'a T if it is aligned, non-NULL and points to allocated memory that is not mutated for lifetime 'a and such that the data stored in memory is safe for T. The safety of &bool that we discussed above is just an instance of this general definition. In particular, it is impossible to have a safe &! because it would have to point to a safe !, which does not exist.

The safety invariant gets particularly interesting for higher-order data like function pointers or trait objects. For example, when is f safe at type fn(bool) -> i32? We can again figure this out by asking: What can safe code do with f? The only thing safe code can do is call the function! So, f is safe at fn(bool) -> i32 if, whenever you call that function with data that is safe at bool, the function runs without causing undefined behavior, and if it returns (as opposed to diverging or unwinding), then it returns data that is safe at i32.¹ This generalizes to arbitrary function types in the expected manner.

Custom Safety Invariants

So far, we have talked about built-in types only. However, safety invariants become really interesting once we talk about user-defined types. A key property of the Rust type system is that users can define their own safety invariants. (This is by far not exclusive to the Rust type system, of course, but it is a property not covered by the usual discussions about “type soundness”. See this talk by Derek Dreyer if you want dive a bit deeper.)

To pick one example (that I already used in one of my first blog posts), let us look at Vec. Vec is roughly defined as follows (the real definition just groups the first two fields together as a RawVec):

pub struct Vec<T> {
    ptr: Unique<T>, // pointing to the heap-allocated backing store holding all the data
    cap: usize, // number of elements that fit the backing store
    len: usize, // number of elements that are actually initialized in the backing store
}

From what we said so far, the safety invariant for Vec<T> (structurally demanding safety of its fields) would be not very useful – Unique<T> is just a non-NULL raw pointer, and usize can be anything initialized.

For Vec to work, however, ptr will be pointing to valid memory of size cap * mem::size_of::<T>(), and the first len items in that memory must be safe at T. This is an invariant that all the unsafe code implementing Vec maintains, and it is the invariant that the safe API surface of Vec expects the outside world to uphold. The reason this works is that the fields mentioned in this invariant are all private, so safe code cannot break the invariant and use that broken invariant to cause havoc. Again, the safety invariant is about ensuring safe execution of safe code. Unsafe code can of course break this invariant, but then it is just Doing It Wrong (TM).

Thanks to privacy and an abstraction barrier, types in Rust can define their own safety invariant, and they can then expect the outside world to respect that invariant. As we have seen with Vec, when generic types are involved, these custom safety invariants will often have a “structural” element in that being safe at Vec<T> is defined in terms of being safe at T.

An Invariant for the Compiler

We want the compiler to make some type-based assumptions about data – in fact, we already do that: Option<bool> is stored in a way that exploits that bool only has two possible values. When passing around an enum with repr(u32) (or any other integer type), we tell LLVM that it may assume the value to be one of the enum variants. For reference types, we tell LLVM that they are always aligned and dereferencable. And there are probably more of these attributes we are using that I am not aware of.

So, we need an invariant that the compiler may assume always holds, justifying the optimizations we are already doing (and more we want to do in the future). There is no a priori reason that the invariant the compiler relies on is the same as the safety invariant that safe code relies on. To make it easier to talk about this distinction, let us call the compiler-facing invariant validity, so that we can ask if some data is valid at type T. Because the compiler assumes that all data is always valid at its given type, we make it instantaneous undefined behavior to ever violate validity.

The validity invariant is exploited by the compiler to perform optimizations (e.g. enum layout optimizations).

Now, can’t we just say that the validity invariant is the same as the safety invariant, and be done? I think there are several problems with that approach.

First of all, notice that for user-defined invariants, this plainly does not work. For example, if you look at the code that grows a Vec, you can see that it first updates the ptr and then updates the cap. Between these two updates, the safety invariant of Vec is violated. But that’s okay, because this is carefully controlled unsafe code – and by the time the Vec reaches safe code again, the invariant is reestablished. (And because we can assume there are no data races, there is no problem with concurrency either.)

Unsafe code only has to uphold safety invariants at the boundaries to safe code.

Notice that the “boundary” is not necessarily where the unsafe block ends. The boundary occurs where the unsafe code provides a public API that safe code is intended to use – that might be at the module boundary, or it might even be at the crate level. That is where safe code should be able to rely on safety, so that it can interact with the unsafe code without triggering undefined behavior, and hence that is where the safety invariants come into play.

This is in strong contrast to validity, which must always hold. Layout optimizations and LLVM’s attributes are in effect throughout unsafe code, so it is never okay to ever have invalid data. (With the sole restriction of data which the compiler statically knows is not initialized: If you write let b: bool;, that data in b is kept inaccessible even to unsafe code, and it does not have to satisfy any invariant. This works because the compiler knows about b not being initialized.)

Unsafe code must always uphold validity invariants.

So we clearly cannot just pick the same invariant for both, or else it would be impossible to write Vec. We might want to just ignore user-defined invariants when it comes to validity, but I think that would be ill-advised.

First of all, validity is part of the definition of undefined behavior. I think we want to eventually check for as much undefined behavior as is possible, and when we define undefined behavior, we should strive to make it as “checkable” as we can. If you go back up and look at safety for types like fn(bool) -> i32, you can see that this is inherently not checkable – checking if f is safe at fn(bool) -> i32 requires solving the halting problem!

Moreover, when I did my evaluation of Types as Contracts last year (which made an attempt to enforce a fairly strong invariant, albeit still weaker than how we defined safety above, throughout the entire program execution), I found plenty of cases of unsafe code temporarily violating the safety invariant. For example, Arc’s destructor calls Layout::for_value with a reference to a struct that contains a deallocated Box. Plenty of unsafe code passes around &mut T where the T is not yet fully initialized. We could decide that all of that code is incorrect and should use raw pointers, but I think it is too late for that.

Validity Invariants

Instead, one thing I would like to do in the near future is get a better understanding of what our validity invariants are. First of all, this is a topic that comes up again and again in unsafe code discussions. Secondly, because validity invariants are supposed to be checkable, we can talk about them precisely without having to give everyone PhD-level courses in separation logic (if you read my previous posts on invariants, some of which I linked to above and which were all about safety invariants², you know what I mean). We can just talk about which checks one would have to do e.g. in Miri to enforce the invariant. And finally, validity invariants have “wiggle room”.

With that last point, I mean that we can make a conservative choice now (making as few things valid as we can), and later expand the amount of data we consider valid without breaking existing code. Weakening the invariant can only lead to code that used to have undefined behavior, now being well-defined. This is in contrast to safety invariants, which we cannot weaken or strengthen – some unsafe code might consume data assuming the invariant and hence break if we make it weaker, another piece of code might produce data that must satisfy the invariant and hence break if we make it stronger.

So, let us look at some examples, and how the validity invariant is constrained by optimizations that we are already performing. If Rust’s layout algorithm exploits an invariant for enum optimizations (e.g. Option<&i32>), then our hands are bound to make that part of validity or that layout is just incorrect. As already mentioned, we also pass on a bunch of attributes to LLVM telling it about various things we know, based on the variable’s type. And finally, all safe data must also be valid, so the safety invariant also imposes a constraint.

For bool, we stick to the same invariant as for safety: It must be true or false. This invariant is already exploited by enum layout optimizations, so violating it at any time can break your code. More generally speaking, validity requires correct enum discriminants. One consequence of this is that no data is valid at !.

Similar to safety, validity is structural – so validity for (T, U) requires that the first field is valid at T, and the second field is valid at U.

What about references? This is where it gets interesting. For a type like &i32, we tell LLVM that it is aligned and non-NULL, and we exploit some of that with enum layout optimizations, so that is something validity must require. But do we want to require that an &bool always points to a valid bool? As the examples I mentioned in the previous section showed, there is unsafe code out there that this would break (and people bring up examples for this every time uninitialized data is discussed). It is often useful to be able to call a method that takes &mut on a partially initialized data structure if you know what that method does because it is part of the same crate. Doing all that with raw pointers is much less ergonomic, and making people write less ergonomic code will inadvertently lead to more bugs. Moreover, the benefits of making this validity for reference structural (i.e., &T is valid only if it points to a valid T) seem slim (though @comex came up with an optimization that would require this).

A consequence of this is would be that while &! does not have a safe inhabitant, it is possible to have data that is valid for &!. After all, it does not have to point to a valid ! (which would be impossible).

One point that we already assume, however, is that references are dereferencable. We even tell LLVM that this is the case. It seems natural to make that part of validity. However, it turns out that if we adapt Stacked Borrows, we do not need to: The aliasing model already implicitly enforces all references to be dereferencable (concretely, references get “activated” when they come in scope, which affects “shadow data” that is stored together with each byte in memory and can only succeed if the reference points to allocated memory). For theoretical reasons, I would like not to make dereferencability part of validity. I think it is desirable if a value that is valid at some point, will always be considered valid for the rest of program execution. That is true for invariants like “a bool must be true or false”, but it is not true when we also require a reference to be dereferencable and then that memory gets deallocated. On some level, it doesn’t even matter where this fact that references are dereferencable comes from, so I should probably not waste as much space on this discussion. ;)

Another interesting case is validity of a union. This has been discussed a lot, and my personal opinion is that validity makes no restrictions whatsoever for a union. They do not have to be valid according to any one of their variants, and can contain arbitrary (even uninitialized) data – but no firm decision has been made yet.

As one last example, there are open questions even around the validity for types as simple as i32. When we talked about safety, I explained that uninitialized data is not safe at i32. But is it valid? My gut feeling is that it should not be (i.e., validity should require that i32 be initialized to some value), but I cannot demonstrate any tangible benefit and there are good examples for allowing uninitialized data in a u8.

Conclusion

I have talked about two kinds of invariants that come with every type, the safety invariant and the validity invariant. For unsafe code authors, the slogan summarizing this post is:

You must always be valid, but you only must be safe in safe code.

I think we have enough experience writing unsafe code at this point that we can reasonably discuss which validity invariants make sense and which do not – and I think that it is high time that we do so, because many unsafe code authors are wondering about these exact things all the time.

The plan is to open issues in the UCG RFCs repo soon-ish, one issue for each type family that we need to make decisions on wrt. validity. Meanwhile, if you have any comments or questions, feel free to join us in the forum!

Footnotes

This is not fully precise yet; beyond the return value we also have to take into account side-effects that might be observable by other functions. But those details are not the point of this post. ↩
I am aware that I inadvertently used the term “valid at type T” in those prior posts. It is hard to come up with consistent terminology… ↩

Posted on Ralf's Ramblings on Aug 22, 2018.
Comments? Drop me a mail or leave a note in the forum!