Over the last couple of weeks of my internship, I have been working on a proposal for the “Unsafe Code Guidelines”. I got some very encouraging feedback at the Mozilla All-Hands, and now I’d like to put this proposal out there and start having a discussion. Warning: Long post is long. You have been warned.
If you need more context on this topic, I suggest you read my post on unsafe code guidelines and undefined behavior first.
1 It’s in the Types
Consider the following simple function:
As discussed in my previous post, we would like an unsafe Rust program that calls this function with aliasing pointers to be UB.
I’d like to propose that the fundamental reason this is UB is the type of the function and, in particular, its arguments:
&mut expresses uniqueness, so the memory pointed to by an
&mut must not alias with anything else.
Borrowing some terminology from related fields, we can see the function type as stating a contract: The
&mut say that
simple expects to be called with non-aliasing pointers.
If we violate the contract, all bets are off.
To realize this idea, we need to define, for every type, what is the contract associated with that type. When is some piece of storage valid at any given type?
For some types, that’s simple:
i32 is any fully initialized four-byte value.
bool is one byte, either
Compound types like
(T, U) are defined in terms of their constituent types: The first component must be a valid
T, the second component a valid
Where it gets interesting is the reference types:
&mut T says that we have a (non-null, properly aligned) pointer.
Whatever the pointer points to must be a valid
Moreover, the memory range covered by this pointer (whose extent is given by the size of
T) is exclusively accessed by this function – no concurrent thread performs any access, and when we call unknown code and do not pass on the reference, we can rely on that unknown code not accessing this memory either.
&T, we can rely on no-one performing any write access to this memory.
At the same time, the contract for
&T also says that we ourselves will not write to this memory either.
This is a nice example showing that contracts go both ways: The function can rely on the other party (the caller) having done their part by providing valid arguments, but the function also has to make guarantees itself.
Another example of a guarantee provided by functions is their promise to provide a valid return value when they are done.
My proposal for the unsafe code guidelines then is to validate the contract described by a function’s types at the function boundaries, and to make it UB for the contract to be violated.
simple would validate
y immediately after being called, and it would be UB for them to alias.
2 Specification Framework
I also think that tooling to detect UB is of paramount importance […]. In fact, specifying a dynamic UB checker is a very good way to specify UB! Such a specification would describe the additional state that is needed at run-time to then check at every operation whether we are running into UB.
Following this, I will in the following describe how we could check the contracts expressed by Rust’s types for violations. I am considering a MIR program; if a Rust program is supposed to be checked, it has to be translated to MIR first. (You can read up on my explanation of why I think this is a good way to proceed.) The validation of reference types (in particular, their exclusive / read-only nature) will need some extra “instrumented” state – also see this previous post on the topic.1 The extra state in my proposal consists of something akin to a reader-writer lock for every memory location. I am first going to describe these locks and how they affect program behavior, before explaining contract validation.
This is not (yet) a proper specification in that some important details are still left open. The goal is to lay down a framework as the basis for discussion. Many things already work as expected, and hopefully the remaining issues can be sorted out by tuning some of the details.
2.1 Memory Locks
Every location comes with an additional reader-writer “lock”.
These locks have nothing to do with concurrency, in fact they are more like
Locks are used to verify the guarantees provided by references:
Shared references point to memory that is not mutated by anyone, and mutable references point to memory that is not accessed (read from or written to) by anyone else.
Notably, however, unlocked memory is free to be accessed by anyone – nobody is making any particular assumptions about this memory.
To track these locks, we store a
locks: Vec<LockInfo> for every location, with
LockInfo defined as:
(Notice that an implementation of this specification is free to track locks in a different way, as long as the effective behavior is the same. The reason I am giving concrete types here is just to make the specification more precise.)
When a lock is acquired, we record which stack frame is the owner of this lock (starting at 0 for
main, and counting up).
Now, on every read access, we check whether there is a write lock of this location held by another function.
If yes, the access is UB.
Furthermore, write accesses are UB if there is a read lock held or if there is a write lock held by another function.
Finally, acquiring a lock makes sure that the new lock is not in conflict with existing locks:
Acquiring a read lock is UB if there is a write lock held by anyone, including the function requesting a read lock; and acquiring a write lock is UB is there is any lock held by anyone.
Locks are typically released automatically when they expire.
To this end, the function acquiring a lock can optionally specify the lifetime of the lock.
Not specifying a lifetime implies that the lock is held until the function returns.
For the purpose of this specification, it does not matter what exactly a lifetime is; it is only relevant at which point it ends.
Lucky enough, MIR recently gained
EndRegion statements that we can use for this purpose:
EndRegion(lft) is executed, all locks with lifetime
lft are released.
Beyond this, a write lock can also be explicitly released by the function that acquired it, even before the lock’s lifetime ends.
(It does not turn out to ever be necessary to do this for read locks in my proposal.)
There is one more mechanism we need to introduce: Acquired write locks can be suspended for a lifetime.
This means the lock is temporarily released, but meant to be reacquired later.
To this end, the
LockInfo is removed from the location’s
locks and put into a
HashMap<Lifetime, (Location, LockInfo)> which is stored somewhere separately from the program’s memory.
EndRegion corresponding to the given lifetime is reached, the lock is reacquired and put back into the
locks of the given location.
With the lock mechanism established, we can now perform type validation in MIR.
To this end, we add a new
Validate statement to the language.
This statement takes a list of
(Lvalue, Ty) pairs to be validated, and a validation operation.
The operation can be acquire, suspend (for a lifetime) or release, indicating whether memory locks are acquired, suspended, or released.
Before explaining type validation in general, let us look at some examples.
Validating function arguments
Here’s the sample code from above with validation annotated explicitly:
For simplicity’s sake, I just pretend that validation commands were present not just in MIR, but also in the Rust surface language. Also, while validation takes pairs of lvalues and types, the types can be inferred, so I am only listing lvalues here.
Let us now see what happens when
I mentioned above that we want this case to be UB, because we want to be able to reorder the two stores.
Indeed, it will turn out that the validation describes above triggers UB here.
We perform acquire validation of the arguments immediately after a function is called.
This validates the function’s contract with the caller.
x is of reference type, so validation will follow the reference and recursively validate whatever
*x is valid at type
The recursive call then acquires the write lock for
*x and checks that memory is initialized.
The lock is acquired for the entire duration of the function call, based on the fact that the lifetime of the reference outlives the function.
This completes validating
y proceeds exactly the same way.
In particular, validation will acquire a write lock for
y alias, this will overlap the write lock that was acquired for
This is how we get the UB that we were looking for.
In other words, acquiring locks for everything we can reach through references implicitly checks for aliasing, making sure that mutable references do not overlap with anything else.
Validating function calls
The second example demonstrates how validation, and memory locks in particular, let us reason about the actions of unknown external functions:
We would like to optimize this code and eliminate the write to
y at the end of the function, arguing that we are just writing back the value that’s already stored there.
The reason we think we can do that is that we hold an exclusive reference to
We do not pass
fun, and there most not be any other reference to this memory, so
fun cannot touch
And indeed, the validation rules make it so that it is UB for
fun to access
Furthermore, we want to eliminate the load from
x at the end, and instead re-use
x_val, arguing that since
x is a shared reference, it is UB for
nop to change its value.
In the beginning of
example, during acquire validation, we acquire the read lock for
*x and the write lock for
When we pass a mutable reference to
fun, we explicitly acknowledge that
fun has the right to modify this memory.
This means that we have to release any write locks that we may hold on
z, which is done by performing release validation on all function arguments.
x, this doesn’t actually do anything as we only hold a read lock, but the write lock on
z is released here.
(We assume there is no implicit reborrowing of
z going on here. The curly braces make it so that
z is moved, not borrowed, to
fun is called, we still hold the read lock for
x and the write lock for
fun were to access
y or write to
x, we know that’s UB – so we can rely on this not to happen and perform our desired optimizations.
Notice that this optimization is extremely hard to perform e.g. in C, because to know that
fun cannot access
y, the compiler needs to do whole-program alias analysis – a notoriously hard problem.
In contrast, Rust’s type system allows us to perform such optimizations intra-procedurally, without looking at other functions’ code at all.
Validating taking a reference
In the last example, we deliberately moved
fun to side-step any discussions about borrowing.
However, typically, you would actually reborrow
As we will see in this example, to properly model transferring ownership of (re)borrowed memory to other functions, we will perform suspend and acquired validation around the operation that takes a reference.
To better demonstrate this, let us also say that
fun actually returns a reference with the same lifetime as
Overall, our code now looks like this (with all reborrowing being explicit):
Again we would like to eliminate the write to
y and the read from
To eliminate the write to
y as being redundant, we have to prove, in particular, that the write to
z_inner cannot affect
As usual, we start by acquiring the read lock for
*x and the write lock for
z gets reborrowed to be passed to
fun with lifetime
'inner, we suspend our lock of
*z for this lifetime.
This models the fact that, no matter what we do with the newly created reference, we know that when
'inner ends, we get our exclusive access back.
After the suspension, we don’t hold a lock any more for what is now
*z_for_fun, so we immediately acquire a write lock again.
However, following the type of
z_for_fun, we will acquire this lock only for lifetime
For this to work, validation needs to know the non-erased type of the lvalues – all the lifetime information must be present, so we can acquire the lock for the right lifetime.
(Suspending and then acquiring the write lock may sound somewhat redundant.
However, notice that, if we were to take a shared reference, we would now acquire a read lock rather than a write lock.
Furthermore, even here, the overall effect is far from a no-op: The suspended lock will stay around even if the reacquired lock is released. In fact, this is exactly what happens next.)
Next, we call
At this point, we release our write lock on
*z_for_fun again, as usual when passing a mutable reference to a function.
(The acquire-release of
z_for_fun is indeed redundant here, but notice that the two validation statements arise from two separate MIR operations: taking a reference and calling a function.
These operations just happen to occur immediately after one another here.)
However, the suspended lock of
*z stays around – so while we gave up control of
z_for_fun) for lifetime
'inner, we still know that we will get control back when
fun is now free to do whatever it wants, as long as it does not violate our still-existing write lock of
*y and the read lock of
fun returns, we obtain a new mutable reference.
At this point, we again run acquire validation.
In this case, we want to ensure that
z_inner does not alias
The locks ensure that
fun cannot possibly change
y, but we (
example) can – and if writing to
z_inner were to change
*y, that access would come from the stack frame holding the lock on
y, so it would be permitted!
However, we do want to still optimize away the write to
y at the end of
example as being redundant.
To this end, we acquire-validate all data returned to us from other functions, making sure that references we obtain this way to not conflict with other references we already hold.
Following the type of
z_inner, we only acquire this lock for
Another way to look at acquiring the return value is that we are validating the contract we have with
fun: The contract says that
fun should return a reference to us that comes with the uniqueness guarantees of
&mut, so we better make sure that
fun actually delivered what it said it would.
Multiple things happen now:
First, our lock of
z_inner expires and is hence released.
(We already released the lock of
z_for_fun ourselves, so nothing is to be done there.)
Next, remember that we suspended our lock of
'inner; so now is the time that this lock gets reacquired.
In other words, as the lifetime of the reborrow from
z expires, we release all locks of memory that is tied to that lifetime, and then re-gain full exclusive access to
When we come to the write to
*y, we know that this write is redundant because (a) we held the write lock for
*y all the time, so nobody else could have written to it without triggering UB, and (b) we know
z_inner does not alias with
y (or else we would already have UB), so we ourselves have not written to
Hence, we may optimize away the write.
The examples should already give you a good idea of what
It is time to look at this statement in some more depth.
Remember that it takes a validation operation and a list of lvalue-type pairs as arguments.
The most important validation operation is acquire validation.
As the name indicates, this is about acquiring the appropriate locks.
It also otherwise makes sure that values are valid at the given type, e.g., references have to be non-NULL and aligned.
The core of acquire validation is an operation with a signature roughly like this:
Why are mutability and lifetime showing up here?
The reason is that reference types have a “transitive” effect down the type hierarchy:
&T makes everything reachable through this reference read-only, even if
Similarly, in case of nested references, it is the outermost reference that defines the lifetime for which access through this pointer is granted.
(The compiler makes sure that inner references always have longer lifetimes than outer references.)
As we have seen in the examples,
acquire_validate recursively traverses the type hierarchy.
When it encounters a primitive type like
i32, it acquires a lock of
size_of::<ty> many locations starting at
lval for lifetime
lft – this will be a write or read lock depending on whether
mutbl says this data is mutable or not.2
In other words, memory is locked “at the leaves” of the type structure, rather than locking immediately when we see a reference type.
We will come back later to why this is necessary.
Validation also makes sure that the value stored in memory is valid at the given type.
u32, this just amounts to checking that the memory is properly initialized.
For compound types like tuples,
enum, validation proceeds recursively on the fields.
In particular, the enum discriminant is checked to be in-range (in particular, nothing can pass validation at an empty enum type).
However, because the recursive validation will only lock memory that’s actually covered by a field, we also have to acquire the appropriate lock for padding bytes and the enum discriminant here.
Finally, when encountering a box or reference type, two things happen.
First, the pointer itself is stored in memory somewhere; this memory has to be locked just like the validation of
i32 locks the memory used to store the integer.
The pointer is also checked to be non-NULL and properly aligned for the type it points to.
Furthermore, validation proceeds recursively after dereferencing the pointer.
Crucially, for the case of validating a reference, the
lft for this recursive call are taking the type into account:
None and this reference’s lifetime ends within the function (i.e., there is a corresponding
EndRegion somewhere), it is now set to the reference’s lifetime.
mutbl was mutable, it becomes immutable when following a shared reference.
Acquire validation is performed at the beginning of every function (acquiring the function arguments) and upon returning from another function (acquiring the return value). These are the places when we rely on other functions providing valid data to us.
There are two other validation operations besides acquire validation: release validation, and suspend validation.
Both operations recursively follow the type just like acquire validation, and for all memory they visit, they release/suspend write locks.
Read locks are not touched, which corresponds to the fact that shared references are
Copy and hence never “given up”:
Shared references always remain usable until their lifetime expires, at which point the read lock is released automatically.
Release validation occurs before calling a function, when the function arguments are released. This grants the functions write access to memory that we are explicitly passing along.
Suspend validation occurs before taking a reference, using the lifetime of the new reference. This encodes the fact that when the lifetime ends (and hence the lock is being reacquired), we are the exclusive owner of this memory again.
Update: It has been pointed out to me that another way for values to flow back into a function is through mutable references. Good catch! I will need to extend the validation machinery to also perform acquire validation when a write lock is reacquired. /Update
To summarize, validation statements are emitted during MIR construction in the following places:
- At the beginning of the function, we perform acquire validation of the arguments.
- When taking a reference, we suspend the referee for the reference lifetime and acquire the referent.
- When calling another function, we release the arguments and acquire the return value.
I hope you now have an idea of how memory locks and validation enforce the assumptions that are encoded in Rust’s types.
One topic that needs special treatment in this context is interior mutability.
If a function takes an
x: &Cell<i32>, following the rules above, it will acquire a read lock of
*x for the duration of the function call, making
Clearly, we do not want to do that – calling
x.set will actually mutate
*x, and mutating through a shared reference is exactly the point of using
Lucky enough, the compiler already says that interior mutability is only allowed via
We can use this for our purposes: To adjust validation for interior mutability, we stop our recursive descent and do not do anything when reaching an
mutbl indicates we are in immutable mode –
&mut UnsafeCell is not affected.
In particular, no locks are acquired.
This justifies calling
set on a shared reference and having the value changed.
Of course, it also means we cannot do some of the optimizations we discussed above – but that’s actually exactly what we want!
These optimizations are unsound in the presence of interior mutability.
Notice that for this to work, it is crucial that we acquire locks “at the leaves” of the type structure, and not just when we encounter a reference.
For example, if we validate something at type
&(i32, Cell<i32>), we do want to acquire a read lock for the first component, but not for the second.
2.5 Unsafe Code
There’s one last issue that I want to cover before (finally) concluding this post: Unsafe code. So far, we have only considered entirely safe code, and we have seen how validation and memory locks enforce the guarantees associated with reference type. The type checker already ensures that safe code will never trigger a lock failure – actually, you can see the validation statements as some form of “dynamic borrow checking”, enforcing the same rules as the borrow checker does statically. The only way that these checks could ever fail is for unsafe code to feed “bad” values into safe code, and that is exactly the point: These proposed “unsafe code guidelines” restrict what unsafe code can do when interfacing with safe code.
However, within unsafe code, we have to give programmers a little more leeway.
We cannot always make as strong assumptions as we did in safe code:
For example, let us look at
mem::swap (somewhat desugared):
In the beginning, we acquire the write lock for
y, establishing that the two pointers do not alias – this is exactly as expected.
However, when we release the arguments to
ptr::swap_nonoverlaping, nothing actually happens:
We are only passing raw pointers that do not come with any guarantees.
In particular, we keep the write locks on
Now, that’s a problem –
ptr::swap_nonoverlaping will of course write to both of them when it swaps their content, triggering UB.
First of all, let me note that this example actually demonstrates that we are validating the borrow checking gurantees correctly.
As far as the borrow checker is concerned,
y have never been borrowed or moved to anyone else, they were fully owned by
swap all the time.
The trouble is that the ownership transfer to
ptr::swap_nonoverlaping is not described by its type, and hence cannot be known to the compiler.
This is possible because
ptr::swap_nonoverlapping is an unsafe function that can make more assumptions than its type says, and rely on the caller to uphold these assumptions.
The compiler has no way to know what these assumptions are.
To accommodate such code, I suggest that we emit fewer validation instruction when building MIR for functions containing unsafe blocks. For example, we could imagine that if a function contains an unsafe block, instead of the usual validation, we only perform an acquire in the beginning of the function, immediately followed by a release. This would verify that our arguments are well-formed, but would otherwise acknowledge that we just do not know how ownership is flowing inside such a function. Of course, this also disables some optimizations – but maybe these are optimizations that we do not want to perform anyway in unsafe code, because they could end up breaking the program.
Anyway, this part of my proposal is certainly not yet final. This topic needs input from many people so as to safely cover the unsafe code people write while still (hopefully) permitting the optimizations people expect to happen. One potential middle-ground here is having a function attribute that controls which validation statements are emitted, or even providing some form of intrinsics for programmers to explicitly perform validation.
I have described a mechanism to validate the contract expressed by Rust’s types, in particular, the uniqueness and read-only guarantees provided by reference types. This validation describes what (safe) code can assume about its variables at run-time, and it enables some code motion optimizations that would otherwise require whole-program alias analysis.
I said before that I think providing a way to detect UB is crucial, and the framework laid out above is designed with this in mind.
In fact, I have been working on a miri-based implementation of this mechanism:
This rustc branch adds the new
Validate statement to MIR and adds a MIR pass to insert validation statements as described above, while this miri branch adds memory locks to miri and implements type-based validation.
(Both of these branches are subject to frequent rebasing, so expect force pushes if you track them.)
The implementation is not yet complete, and some of the terminology changed (it turns out writing things down helps you find good names :), so don’t be confused if it looks a little different.
Other than that though, feel free to play around and provide feedback.
I hope the implementation can help us check whether unsafe code follows the rules I laid down here, and whether these rules match programmer intuition.
(In fact, this is how I figured out that
mem::swap violates the most strict interpretation of the rules – I did expect a violation somewhere in libstd’s unsafe code, but had no idea where it would show up.)
We could even have a test suite to aid in fine-tuning the rules.
I also said that “we should strive for programmers’ intuition agreeing with the […] compiler”.
This is of course hard to measure, but one property that I think goes a long way here is that in this proposal, pointers can remain “thin”.
I probably should write a whole separate blog post on this topic, but the short version is that in my proposal, if a function has permission to access certain region of memory, then it does not matter how it performs the access – if there are multiple pointers to this region around, they are all equally valid to use.
This is in contrast to e.g. C’s
restrict or type-based alias analysis, which make a distinction based on which pointer is used to access certain regions of memory.
Such models are fairly hard to write checkers for, as witness by there being no sanitizer.
Moreover, they typically (from what I have seen so far) have some trouble supporting integer-pointer casts.
If pointers have an “identity” beyond the address they point to, then which “identity” does a pointer that’s casted from an integer obtain?
I am not saying it is impossible to solve this issue, but it is a source of complication.
It is also quickly a source of programmer confusion.
In my experience, programmers that use pointers casted from integers typically like to think of pointers as being “just integers” and not having any “identity” beyond the address they point to (and indeed, the fact that such casts work seems to indicate that they are right).
I would be much less worried about unsafe Rust code being broken by optimizations if we can make it so that this mental model is actually right.
Wouldn’t it be great if we could add “Has the simple model of pointers that everybody incorrectly thinks C has” to the list of Rust selling points?
I think this aligns well with Rust’s slogan “Hacking without fear” – and without compromise; remember we still get all the optimizations!
You have reached the end of this blog post.
Thanks a lot for reading.
If you have any comments, please raise them – the entire purpose of this post is to get the proposal out there so that we can see if this approach can suit the needs of Rust programmers and compiler writers alike.
If there are explanations you think are missing or confusing, please bring that up as well; I may then update this post accordingly or add a note in the forum for any ammendments that turned out to be necessary.
The concrete proposal will sure need some fixing here and there.
It is also not complete yet.
I already mentioned that there are open questions around unsafe code.
Other areas of future work include
static variables and the
Nevertheless, my hope is that the general approach of a contract-like, type-driven validation mechanism ends up being useful.
So, keep the comments flowing – and safe hacking.
Update: I did some refactoring of the post, reordering sections 2.2 and 2.3 to hopefully make it all flow better. Thanks for all the helpful feedback!
If at this point you are under the impression that all these previous posts have been collectively building up towards this proposal, you are not entirely wrong. ↩
For simplicity, we pretend lvalues are just locations in memory. Actually, lvalues can also refer to variables not in memory (like non-addressable stack locals), but these do not need to be locked. Furthermore, lvalues for unsized types carry the additional information stored in a fat pointer, i.e., a length (for slices) or a vtable (for trait objects); this is handled appropriately when validating the respective types. ↩