Jan 31, 2018 • Research, RustPermalink

Sharing for a Lifetime

This post is about an aspect of the RustBelt paper. Concretely, I’d like to share some of our thoughts on the nature of types and shared references. Let’s see how well this goes. :)

Shared references are an extremely powerful mechanism in the Rust type system, and we’ve had quite some trouble finding a good way of handling them in our formal model. In this post, I will present the model that we came up with. Let’s get started by discussing what a “model” of a type looks like, and what’s so hard about handling shared references.

Immutable Sharing

The reason shared references are so subtle is interior mutability. This fundamentally changes the nature of shared references. To see how, I will briefly assume that Rust does not permit interior mutability, and describe how a model of Rust types could roughly look like.

What’s in a Type?

I’ve already written about the idea of “semantic types”, which essentially boils down to the concept that types can impose additional invariants on their fields, like self.cap >= self.len (for Vec) or !is_zero(self.0) (for NonZero). So, whenever I say “(semantic) type”, I really mean “a type’s (syntactic) definition (aka the data layout) and the additional invariants that the module imposes on this type”. Two types may be syntactically equal but have different invariants, and then clearly casting from one to the other would be horribly unsafe – so, these would be completely different types despite their syntactic similarity. What really matters for a type is the answer to the question: Given some sequence of bytes, is this sequence a valid inhabitant of the type?1 If we can answer this question for every Rust type, we have found a “(semantic) model” of the Rust type system. The (syntactic) data layout imposes some basic structure on the permitted sequences (like their length), but the invariants can arbitrarily refine the set of sequences that are allowed at the given type by imposing additional requirements. (Indeed, refinement types are a way to reflect these additional restrictions into the type system, but that’s not what I want to talk about today.)

For example, the sequence of four 0 bytes is valid at i32 but not at NonZero<i32>. As another example, a pointer is valid at Box<i32> if it points to four bytes of owned memory2, and these four bytes are a valid i32. Finally, a pointer is valid at &'a mut T if it points to size_of::<T>() many bytes of memory3 that are borrowed for lifetime 'a, and those bytes are valid at T.

So, the path to handling shared references in our model seems clear: We have to find out when a pointer is valid at &'a T. Since we ignore interior mutability, we can answer this question fairly easily (at least if don’t want to go fully formal on this, which right now, we do not ;): While the lifetime 'a lasts, the pointer has to point to size_of::<T>() many immutable bytes of memory (meaning the bytes do not change during this lifetime; we impose no restrictions on earlier or later modifications); and moreover, those bytes are valid at T.

Given this model, we can now explain why *x works for any x: &'a T if T: Copy: The access can only happen during the lifetime 'a, so we can dereference the pointer. Moreover, since T: Copy, we know that we can copy the bytes and have both sequences (the original and the copy) be valid at T (remember that “being a valid sequence” could impose ownership requirements like Box<i32> does, so this kind of duplication is not possible in general for all T). Finally, since we know the memory is immutable, we can do this access even though we don’t fully own the memory: We can rely on everyone else (i.e., other threads) also just performing read operations on this memory. Overall, we conclude that we did not violate any of the invariants imposed on x, and the data we copied is a valid T.

Interior Mutability

Interior mutability however breaks this model of &'a T: We just cannot declare that the pointed-to memory is read-only in general. If we did, Cell::set and many more operations would be blatantly unsafe. Indeed this is the reason shared references are not called “immutable references” even though they are generally considered the dual of mutable references. (My personal thinking is that mutable references should really be called “unique references” to instill the point that the distinction between the two is about uniqueness, not about mutability. Oh well.)

So, with our old model broken, what could we do instead? Let us consider some examples.

&Cell<T>

Why is Cell::set a safe operation? The reason is twofold. First of all, Cell<T> is not Sync, so it is not possible to send an &Cell<T> from one thread to another. As a consequence, all shared references to a given Cell always reside in the same thread. This already excludes the possibility of data races. Secondly, Cell<T> does not permit obtaining a deep pointer into the data. So, changing a Cell<Result<bool, i32>> from Ok(true) to Err(42) cannot invalidate any pointers that still expect a bool in the Result.

Coming back to our model, we could say that &'a Cell<T> is a pointer that points to size_of::<T>() many bytes of memory that, for the duration of 'a, may be accessed by any code running in the current thread, but not by any other code; and moreover, those bytes are valid at T. This definition forces &Cell<T> to not be Send (because the model refers to the “current thread”, so changing that thread could violate the invariant) and hence Cell<T> must not be Sync. It also rules out pointers into a Cell<Result<bool, i32>>: Say we would want to have a pointer to the Ok data, which would have type &Cell<bool>. We could not show that this pointer is actually valid at the given type, because anyone in the current thread could modify the (outer) Cell<Result<bool, i32>> at any time, and if they write an Err(42) in it they will violate the requirements that an &Cell<bool> only ever points to a valid bool! So, both of the restrictions that we discussed above follow directly from the invariant for &Cell<T>.

&Mutex<T>

Coming to Mutex, the justification for why Mutex::lock and MutexGuard::deref_mut are safe operations is again different: A Mutex implements mutual exclusion (hence the name), so only one thread can ever hold the lock. As a result, giving that thread mutable access to the inner data cannot result in any data races or other conflicts with other threads.

In the model (and I’m going to gloss over more and more details here), we will say that &'a Mutex<T> is valid if everyone accessing the memory it points to during the lifetime 'a follows the locking discipline. A MutexGuard is valid if we actually own the lock of the corresponding Mutex. Together, this is sufficient to justify safety of all the Mutex and MutexGuard operations.

Sharing is Different

We now come back to the question: When is a pointer valid at type &'a T? As we have seen above, the answer to this question actually depends significantly on the type T! For i32 or bool, our old read-only model still works, but for Cell or Mutex, we need entirely different invariants that are very specific to the respective type.

To be able to handle this diversity, we have to change our idea of what a type is. I’ve said above that a type is all about the question: Given some sequence of bytes, is this sequence a valid inhabitant of the type? We now extend this by saying a type also has to be able to answer the question: Given a pointer and a lifetime, is this pointer a valid shared reference to this type for the given lifetime? In other words, every type gets to pick for itself what it means to share this type.

While I let that sink in, let me remark that this also explains why Rust has Send and Sync, and why neither implies the other even though they seem so closely related. This has certainly puzzled me when I first learned about these marker traits. The reason is that T and &T are really very different types; the author of T picks the invariants for both of these types (mostly) independently – so one of them being safe to send to another thread has no bearing on the other one.

The main consequence of this choice is that for every type with extra invariants, we have to pick two invariants:

  1. An invariant specifying the sequences of bytes that are valid for the owned case (this includes mutable references), and
  2. An invariant specifying the pointers that are valid for the shared case at a given lifetime.

In principle, this applies to all types, even types like Vec that don’t really care about interior mutability themselves. However, because we have to handle types like &Vec<Mutex<T>>, we cannot just skip this problem; interior mutability as a feature “infects” the entire type system. This should however usually not be a big problem: &'a Vec<T> can just have the invariant that len and cap are read-only, and all the initialized elements of the vector are themselves shared for lifetime 'a. This strategy of just using the shared invariant of the “inner type” and making the rest read-only should be applicable to the vast majority of types.

Unsharing

Of course, the two invariants making up a type cannot be entirely unrelated. First of all, given some y: T, it must be possible to somehow “start sharing” it and obtain &y: &T. Moreover, if T: Copy, then it must be possible to read *x where x: &T and obtain a copy of a T. This was automatically the case in our first model where sharing was uniformly defined as being read-only memory, but now that we made sharing “configurable”, we have to explicitly require this property.

However, that’s not enough: At some point, sharing will end, and some part of the program expects the owned invariant to hold again. Somehow, everything that happened when sharing started must be un-done – I think of this as “un-sharing”. This is tricky because shared references can be, well, shared (d’oh). Even for the simple case of read-only sharing, we need to make sure that nobody still uses the old read-only access to the memory covered by the shared reference. In Rust, this works because the lifetime of these old shared references will have expired when the “un-sharing” happens.

This is why, in our model, sharing is inherently connected to a lifetime. If you go back to what I wrote above about a type being two invariants, you will notice that the second invariant is about sharing an inhabitant of T for some lifetime. There is no sharing without a lifetime. Whenever you start sharing something (i.e., whenever a &x is created from previously owned data), our model demands that you announce the lifetime for which this sharing will last. Once the lifetime ends, un-sharing happens automatically, so we don’t need to explicitly talk about it in our model. (Of course, the proof effort related to this does not just go away. Indeed we now have to follow the rules for what one can do with things that are only owned for some lifetime, and those rules are restrictive enough to guarantee that un-sharing is always possible.)

Conclusion

I have explained the idea of types having a separate “sharing invariant” that we came up with in the RustBelt paper to be able to handle interior mutability. To my knowledge, this is the only formal treatment of interior mutability to date, so we were all kind of stabbing in the dark until we found an approach that worked. This is not so say that the last word on interior mutability has been spoken; there may well be other ways of justifying why Mutex<T> or Cell<T> are safe.

Either way, this model has certainly helped me not only in the formal work on RustBelt, but also in thinking about the Rust type system itself. For example, I briefly wondered why Cell is not covariant. Indeed, if T <: U, then it is perfectly safe to turn a Cell<T> into a Cell<U> – that is to say, whenever some sequence of bytes is valid at Cell<T>, it is also valid at Cell<U>. (In fact, the owned invariant for Cell<T> is exactly the same as the one for T!) That’s precisely what we need for covariance, right? Well, not exactly – because our types have two invariants, we also have to require that whenever a pointer is a shared Cell<T>, it is a shared Cell<U>, and that is decisively not true! So, owned Cells are covariant, but shared Cells are not. One could think of an extension to the Rust type system that separates owned subtyping from shared subtyping, and then Cell could indeed be “owned-covariant”.

In closing this post, I’d like to leave you with some more general thoughts. One way to explain the role of shared references in the Rust type system more abstractly is that &T switches T into “shared mode”. Every type comes with two modes, “owned” and “shared”, and we typically think only of the “owned” mode because that’s the one that’s relevant for x: T. However, Rust provides a way to switch into the other mode, and types can have a completely different “face” there – for example, Cell<T> and T are equal in “owned” mode, but very different in “shared” mode. I don’t know of any other type system having a comparable concept. However, I’ve also not seen Rust’s shared references being discussed as a super-special magic feature (though, after doing this research, I think they are!) – it’s rather the mutable references that seem special due to their unique nature (and this was a revolution, called mutpocalypse). So, maybe this is some extremely ingenious idea that the Rust devs had without even realizing, or maybe it’s just some artifact of our model that’s not worth losing many words about. (I do have my opinion, of course. ;)

And then there is the question: Why exactly two modes? Would it be worth having more? I honestly don’t know, but I had to think of this a while back when I read this post by glaebhoerl, the first part of which I would translate as “Cell could be considered a mode”.

If you have any comments or thoughts on this, please join the discussion in the Rust forums! I’d also be interested in feedback on how understandable this post is; this is my first attempt at translating research results into a blog post.

  1. I should mention at this point that “sequences of bytes” is a subtle concept because of complications like uninitialized memory and pointers. I hope to come back to this in a future post; it should not matter much here. 

  2. I will also mostly ignore how the concept of ownership can be made more precise. If you want to dig deeper, the keyword is separation logic. We had to extend separation logic to be able to handle borrowing as well, but the details of all of this don’t matter for this post. That’s the plan, anyway. 

  3. Yes, we do ignore unsized types here. 

Posted on Ralf's Ramblings on Jan 31, 2018.
Comments? Drop me a mail or leave a note in the forum!