https://www.ralfj.de/blog/feed.xml2024-03-21T19:53:07+01:00Ralf JungRalf's RamblingsGoogle Open Source Peer Bonus2023-12-27T00:00:00+01:00https://www.ralfj.de/blog/2023/12/27/open-source-peer-bonus.html
<p>We are all used to spam emails, supposedly from Google, that say “You won” and I just need to send all my data to somewhere to receive my lottery payout.
When I recently received an email about Google’s “Open Source Peer Bonus” program, I almost discarded it as yet another version of that kind of spam.
But it turns out sometimes these emails are real!
Meanwhile the <a href="https://opensource.googleblog.com/2023/12/google-open-source-peer-bonus-program-announces-second-group-of-2023-winners.html">official announcement</a> has been released which lists me as a recipient of this bonus as a thank you for my work on Rust.
So this one time, it wasn’t spam after all!</p>
<!-- MORE -->
<p>Thanks a lot to Google for this program at the $250 reward; it is great to see open source work honored this way.
I have donated the amount in full to <a href="https://noyb.eu/en">noyb</a>, who I’m sure will be using it <a href="https://noyb.eu/en/noyb-win-first-major-fine-eu-1-million-using-google-analytics">for good</a>.</p>
<p><strong>Update (2024-01-07):</strong>
In fact, this is already my second Google Open Source Peer Bonus.
The first was in the <a href="https://opensource.googleblog.com/2023/05/google-open-source-peer-bonus-program-announces-first-group-of-winners-2023.html">first half of 2023</a>.
Due to issues with the payment process, it took a while for that bonus to be transferred, but I can confirm that it has now arrived in my bank account.
I will have to find a suitable non-for-profit to donate this to… or it might be noyb again, we will see.
<strong>/Update</strong></p>
Talk about Undefined Behavior, unsafe Rust, and Miri2023-06-13T00:00:00+02:00https://www.ralfj.de/blog/2023/06/13/undefined-behavior-talk.html
<p>I recently gave a talk at a local Rust meetup in Zürich about Undefined Behavior, unsafe Rust, and Miri.
The recording is available <a href="https://www.youtube.com/watch?v=svR0p6fSUYY">here</a>.
It targets an audience that is familiar with Rust but not with the nasty details of unsafe code, so I hope many of you will enjoy it!
Have fun. :)</p>
From Stacks to Trees: A new aliasing model for Rust2023-06-02T00:00:00+02:00https://www.ralfj.de/blog/2023/06/02/tree-borrows.html
<p>Since last fall, <a href="https://perso.crans.org/vanille/">Neven</a> has been doing an internship to develop a new aliasing model for Rust: Tree Borrows.
Hang on a second, I hear you say – doesn’t Rust already have an aliasing model?
Isn’t there this “Stacked Borrows” that Ralf keeps talking about?
Indeed there is, but Stacked Borrows is just one proposal for a possible aliasing model – and it <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues?q=is%3Aopen+is%3Aissue+label%3AA-stacked-borrows">has its problems</a>.
The purpose of Tree Borrows is to take the lessons learned from Stacked Borrows to build a new model with fewer issues, and to take some different design decisions such that we get an idea of some of the trade-offs and fine-tuning we might do with these models before deciding on the official model for Rust.</p>
<p>Neven has written a detailed introduction to Tree Borrows <a href="https://perso.crans.org/vanille/treebor/">on his blog</a>, which you should go read first.
He presented this talk at a recent RFMIG meeting, so you can also <a href="https://www.youtube.com/watch?v=zQ76zLXesxA">watch his talk here</a>.
In this post, I will focus on the differences to Stacked Borrows.
I assume you already know Stacked Borrows and want to understand what changes with Tree Borrows and why.</p>
<!-- MORE -->
<p>As a short-hand, I will sometimes write SB for Stacked Borrows and TB for Tree Borrows.</p>
<h2 id="two-phase-borrows">Two-phase borrows</h2>
<p>The main novelty in Tree Borrows is that it comes with proper support for two-phase borrows.
Two-phase borrows are a mechanism introduced with NLL which allows code like the following to be accepted:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">two_phase</span><span class="p">(</span><span class="k">mut</span> <span class="n">x</span><span class="p">:</span> <span class="nb">Vec</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span><span class="p">)</span> <span class="p">{</span>
<span class="n">x</span><span class="nf">.push</span><span class="p">(</span><span class="n">x</span><span class="nf">.len</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The reason this code is tricky is that it desugars to something like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">two_phase</span><span class="p">(</span><span class="k">mut</span> <span class="n">x</span><span class="p">:</span> <span class="nb">Vec</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">arg0</span> <span class="o">=</span> <span class="o">&</span><span class="k">mut</span> <span class="n">x</span><span class="p">;</span>
<span class="k">let</span> <span class="n">arg1</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">len</span><span class="p">(</span><span class="o">&</span><span class="n">x</span><span class="p">);</span>
<span class="nn">Vec</span><span class="p">::</span><span class="nf">push</span><span class="p">(</span><span class="n">arg0</span><span class="p">,</span> <span class="n">arg1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This code clearly violates the regular borrow checking rules since <code class="language-plaintext highlighter-rouge">x</code> is mutably borrowed to <code class="language-plaintext highlighter-rouge">arg0</code> when we call <code class="language-plaintext highlighter-rouge">x.len()</code>!
And yet, the compiler will accept this code.
The way this works is that the <code class="language-plaintext highlighter-rouge">&mut x</code> stored in <code class="language-plaintext highlighter-rouge">arg0</code> is split into two phases:
in the <em>reservation</em> phase, <code class="language-plaintext highlighter-rouge">x</code> can still be read via other references.
Only when we actually need to write to <code class="language-plaintext highlighter-rouge">arg0</code> (or call a function that might write to it) will the reference be “activated”, and it is from that point onwards (until the end of the lifetime of the borrow) that no access via other references is allowed.
For more details, see <a href="https://github.com/rust-lang/rfcs/blob/master/text/2025-nested-method-calls.md">the RFC</a> and <a href="https://rustc-dev-guide.rust-lang.org/borrow_check/two_phase_borrows.html">the rustc-dev-guide chapter on two-phase borrows</a>.
The only point relevant for this blog post is that when borrowing happens implicitly for a method call (such as <code class="language-plaintext highlighter-rouge">x.push(...)</code>), Rust will treat this as a two-phase borrow.
When you write <code class="language-plaintext highlighter-rouge">&mut</code> in your code, it is treated as a regular mutable reference without a “reservation” phase.</p>
<p>For the aliasing model, two-phase borrows are a big problem: by the time <code class="language-plaintext highlighter-rouge">x.len()</code> gets executed, <code class="language-plaintext highlighter-rouge">arg0</code> already exists, and as a mutable reference it really isn’t supposed to allow reads through other pointers.
Therefore Stacked Borrows just <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/85">gives up</a> here and basically treats two-phase borrows like raw pointers.
That is of course unsatisfying, so for Tree Borrows we are adding proper support for two-phase borrows.
What’s more, we are treating <em>all</em> mutable references as two-phase borrows: this is more permissive than what the borrow checker accepts, but lets us treat mutable references entirely uniformly.
(This is a point we might want to tweak, but as we will see soon this decision actually has some major unexpected benefits.)</p>
<p>This is why we need a tree in the first place: <code class="language-plaintext highlighter-rouge">arg0</code> and the reference passed to <code class="language-plaintext highlighter-rouge">Vec::len</code> are both children of <code class="language-plaintext highlighter-rouge">x</code>.
A stack is no longer sufficient to represent the parent-child relationships here.
Once the use of a tree is established, modeling of two-phase borrows is fairly intuitive: they start out in a <code class="language-plaintext highlighter-rouge">Reserved</code> state which tolerates reads from other, unrelated pointers.
Only when the reference (or one of its children) is written to for the first time, its state transitions to <code class="language-plaintext highlighter-rouge">Active</code> and now reads from other, unrelated pointers are not accepted any more.
(See Neven’s post for more details. In particular note that there is one unpleasant surprise lurking here: if there are <code class="language-plaintext highlighter-rouge">UnsafeCell</code> involved, then a reserved mutable reference actually has to tolerate <em>mutation</em> via unrelated pointers!
In other words, the aliasing rules of <code class="language-plaintext highlighter-rouge">&mut T</code> are now affected by the presence of <code class="language-plaintext highlighter-rouge">UnsafeCell</code>. I don’t think people realized this when two-phase borrows were introduced, but it also seems hard to avoid so even with hindsight, it is not clear what the alternative would have been.)</p>
<h2 id="delayed-uniqueness-of-mutable-references">Delayed uniqueness of mutable references</h2>
<p>One of the most common source of Stacked Borrows issues is its <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/133">very eager enforcement of uniqueness of mutable references</a>.
For example, the following code is illegal under Stacked Borrows:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">];</span>
<span class="k">let</span> <span class="n">from</span> <span class="o">=</span> <span class="n">a</span><span class="nf">.as_ptr</span><span class="p">();</span>
<span class="k">let</span> <span class="n">to</span> <span class="o">=</span> <span class="n">a</span><span class="nf">.as_mut_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="c1">// `from` gets invalidated here</span>
<span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">copy_nonoverlapping</span><span class="p">(</span><span class="n">from</span><span class="p">,</span> <span class="n">to</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
</code></pre></div></div>
<p>The reason it is illegal is that <code class="language-plaintext highlighter-rouge">as_mut_ptr</code> takes <code class="language-plaintext highlighter-rouge">&mut self</code>, which asserts unique access to the entire array, therefore invalidating the previously created <code class="language-plaintext highlighter-rouge">from</code> pointer.
In Tree Borrows, however, that <code class="language-plaintext highlighter-rouge">&mut self</code> is a two-phase borrow! <code class="language-plaintext highlighter-rouge">as_mut_ptr</code> does not actually perform any writes, so the reference remains reserved and never gets activated.
That means the <code class="language-plaintext highlighter-rouge">from</code> pointer remains valid and the entire program is well-defined.
The call to <code class="language-plaintext highlighter-rouge">as_mut_ptr</code> is treated like a read of <code class="language-plaintext highlighter-rouge">*self</code>, but <code class="language-plaintext highlighter-rouge">from</code> (and the shared reference it is derived from) are perfectly fine with reads via unrelated pointers.</p>
<p>It happens to be the case that swapping the <code class="language-plaintext highlighter-rouge">from</code> and <code class="language-plaintext highlighter-rouge">to</code> lines actually makes this code work in Stacked Borrows.
However, this is not for a good reason: this is a consequence of the rather not-stack-like rule in SB which says that on a read, we merely <em>disable all <code class="language-plaintext highlighter-rouge">Unique</code></em> above the tag used for the access, but we keep raw pointers derived from those <code class="language-plaintext highlighter-rouge">Unique</code> pointers enabled.
Basically, raw pointers can live longer than the mutable references they are derived from, which is highly non-intuitive and potentially problematic for program analyses.
With TB, the swapped program is still fine, but for a different reason:
when <code class="language-plaintext highlighter-rouge">to</code> gets created first, it remains a reserved two-phase borrow.
This means that creating a shared reference and deriving <code class="language-plaintext highlighter-rouge">from</code> from it (which acts like a read on <code class="language-plaintext highlighter-rouge">self</code>) is fine; reserved two-phase borrows tolerate reads via unrelated pointers.
Only when <code class="language-plaintext highlighter-rouge">to</code> is written to does it (or rather the <code class="language-plaintext highlighter-rouge">&mut self</code> it was created from) become an active mutable reference that requires uniqueness, but that is after <code class="language-plaintext highlighter-rouge">as_ptr</code> returns so there is no conflicting <code class="language-plaintext highlighter-rouge">&self</code> reference.</p>
<p>It turns out that consistently using two-phase borrows lets us entirely eliminate this hacky SB rule and also fix one of the most common sources of UB under SB.
I didn’t expect this at all, so this is a happy little accident. :)</p>
<p>However, note that the following program is fine under SB but invalid under TB:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">];</span>
<span class="k">let</span> <span class="n">to</span> <span class="o">=</span> <span class="n">a</span><span class="nf">.as_mut_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">to</span><span class="nf">.write</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">let</span> <span class="n">from</span> <span class="o">=</span> <span class="n">a</span><span class="nf">.as_ptr</span><span class="p">();</span>
<span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nf">copy_nonoverlapping</span><span class="p">(</span><span class="n">from</span><span class="p">,</span> <span class="n">to</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
</code></pre></div></div>
<p>Here, the write to <code class="language-plaintext highlighter-rouge">to</code> activates the two-phase borrow, so uniqueness is enforced.
That means the <code class="language-plaintext highlighter-rouge">&self</code> created for <code class="language-plaintext highlighter-rouge">as_ptr</code> (which is considered reading all of <code class="language-plaintext highlighter-rouge">self</code>) is incompatible with <code class="language-plaintext highlighter-rouge">to</code>, and so <code class="language-plaintext highlighter-rouge">to</code> is invalidated (well, it is made read-only) when <code class="language-plaintext highlighter-rouge">from</code> gets created.
So far, we do not have evidence that this pattern is common in the wild.
The way to avoid issues like the code above is to <em>set up all your raw pointers before you start doing anything</em>.
Under TB, calling reference-receiving methods like <code class="language-plaintext highlighter-rouge">as_ptr</code> and <code class="language-plaintext highlighter-rouge">as_mut_ptr</code> and using the raw pointers they return on disjoint locations is fine even if these references overlap, but you must call all those methods before the first write to a raw pointer.
Once the first write happens, creating more references can cause aliasing violations.</p>
<h2 id="no-strict-confinement-of-the-accessible-memory-range">No strict confinement of the accessible memory range</h2>
<p>The other major source of trouble with Stacked Borrows is <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/134">restricting raw pointers to the type and mutability they are initially created with</a>.
Under SB, when a reference is cast to <code class="language-plaintext highlighter-rouge">*mut T</code>, the resulting raw pointer is confined to access only the memory covered by <code class="language-plaintext highlighter-rouge">T</code>.
This regularly trips people up when they take a raw pointer to one element of an array (or one field of a struct) and then use pointer arithmetic to access neighboring elements.
Moreover, when a reference is cast to <code class="language-plaintext highlighter-rouge">*const T</code>, it is actually read-only, even if the reference was mutable!
Many people expect <code class="language-plaintext highlighter-rouge">*const</code> vs <code class="language-plaintext highlighter-rouge">*mut</code> not to matter for aliasing, so this is a regular source of confusion.</p>
<p>Under TB, we resolve this by no longer doing any retagging for reference-to-raw-pointer casts.
A raw pointer simply uses the same tag as the parent reference it is derived from, thereby inheriting its mutability and the range of addresses it can access.
Moreover, references are not strictly confined to the memory range described by their type:
when an <code class="language-plaintext highlighter-rouge">&mut T</code> (or <code class="language-plaintext highlighter-rouge">&T</code>) gets created from a parent pointer, we initially record the new reference to be allowed to access the memory range describe by <code class="language-plaintext highlighter-rouge">T</code> (and we consider this a read access for that memory range).
However, we also perform <em>lazy initialization</em>: when a memory location outside this initial range is accessed, we check if the parent pointer would have had access to that location, and if so then we also give the child the same access.
This is repeated recursively until we find a parent that has sufficient access, or we reach the root of the tree.</p>
<p>This means TB is compatible with <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/243"><code class="language-plaintext highlighter-rouge">container_of</code>-style pointer arithmetic</a> and <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/276"><code class="language-plaintext highlighter-rouge">extern</code> types</a>, overcoming two more SB limitations.</p>
<p>This also means that the following code becomes legal under TB:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="k">mut</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">let</span> <span class="n">ptr</span> <span class="o">=</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ptr</span><span class="p">::</span><span class="nd">addr_of_mut!</span><span class="p">(</span><span class="n">x</span><span class="p">);</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">ptr</span><span class="nf">.read</span><span class="p">();</span>
</code></pre></div></div>
<p>Under SB, <code class="language-plaintext highlighter-rouge">ptr</code> and direct access to the local <code class="language-plaintext highlighter-rouge">x</code> used two different tags, so writing to the local invalidated all pointers to it.
Under TB, this is no longer the case; a raw pointer directly created to the local is allowed to alias arbitrarily with direct accesses to the local.</p>
<p>Arguably the TB behavior is more intuitive, but it means we can no longer use writes to local variables as a signal that all possible aliases have been invalidated.
However, note that TB only allows this if there is an <code class="language-plaintext highlighter-rouge">addr_of_mut</code> (or <code class="language-plaintext highlighter-rouge">addr_of</code>) immediately in the body of a function!
If a reference <code class="language-plaintext highlighter-rouge">&mut x</code> is created, and then some other function derives a raw pointer from that, those raw pointers <em>do</em> get invalidated on the next write to <code class="language-plaintext highlighter-rouge">x</code>.
So to me this is a perfect compromise: code that uses raw pointers has a lower risk of UB, but code that does not use raw pointers (which is easy to see syntactically) can be optimized as much as with SB.</p>
<p>Note that this entire approach in TB relies on TB <em>not</em> needing the stack-violating hack mentioned in the previous section.
If raw pointers in SB just inherited their parent tag, then they would get invalidated together with the unique pointer they are derived from, disallowing all the code that this hack was specifically added to support.
This means that backporting these improvements to SB is unlikely to be possible.</p>
<h2 id="unsafecell"><code class="language-plaintext highlighter-rouge">UnsafeCell</code></h2>
<p>The handling of <code class="language-plaintext highlighter-rouge">UnsafeCell</code> also changed quite a bit with TB.
First of all, another <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/303">major issue</a> with SB was fixed: turning an <code class="language-plaintext highlighter-rouge">&i32</code> into an <code class="language-plaintext highlighter-rouge">&Cell<i32></code> <em>and then never writing to it</em> is finally allowed.
This falls out of how TB handles the aliasing allowed with <code class="language-plaintext highlighter-rouge">UnsafeCell</code>: they are treated like casts to raw pointers, so reborrowing an <code class="language-plaintext highlighter-rouge">&Cell<i32></code> just inherits the tag (and therefore the permissions) of the parent pointer.</p>
<p>More controversially, TB also changes how precisely things become read-only when an <code class="language-plaintext highlighter-rouge">&T</code> involves <code class="language-plaintext highlighter-rouge">UnsafeCell</code> somewhere inside <code class="language-plaintext highlighter-rouge">T</code>.
In particular, for <code class="language-plaintext highlighter-rouge">&(i32, Cell<i32>)</code>, TB allows mutating <em>both</em> fields, including the first field which is a regular <code class="language-plaintext highlighter-rouge">i32</code>, since it just treats the entire reference as “this allows aliasing”.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>
In contrast, SB actually figured out that the first 4 bytes are read-only and only the last 4 bytes allow mutation via aliased pointers.</p>
<p>The reason for this design decision is that the general philosophy with TB was to err on the side of allowing more code, having less UB (which is the opposite direction than what I used with SB).
This is a deliberate choice to uncover as much of the design space as we can with these two models.
Of course we wanted to make sure that TB still allows all the desired optimizations, and still has enough UB to justify the LLVM IR that rustc generates – those were our “lower bounds” for the minimum amount of UB we need.
And it turns out that under these constraints, we can support <code class="language-plaintext highlighter-rouge">UnsafeCell</code> with a fairly simple approach: for the aliasing rules of <code class="language-plaintext highlighter-rouge">&T</code>, there are only 2 cases.
Either there is no <code class="language-plaintext highlighter-rouge">UnsafeCell</code> anywhere, then this reference is read-only, or else the reference allows aliasing.
As someone who thinks a lot about proving theorems about the full Rust semantics including its aliasing model, this approach seemed pleasingly simple. :)</p>
<p>I expected this decision to be somewhat controversial, but the amount of pushback we received has still been surprising.
The good news is that this is far from set in stone: we can <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/403">change TB to treat <code class="language-plaintext highlighter-rouge">UnsafeCell</code> more like SB did</a>.
Unlike the previously described differences, this one is entirely independent of our other design choices.
While I prefer the TB approach, the way things currently stand, I do expect that we will end up with SB-like <code class="language-plaintext highlighter-rouge">UnsafeCell</code> treatment eventually.</p>
<h2 id="what-about-optimizations">What about optimizations?</h2>
<p>I have written a lot about how TB differs from SB in terms of which coding patterns are UB.
But what about the other side of the coin, the optimizations?
Clearly, since SB has more UB, we have to expect TB to allow fewer optimizations.
And indeed there is a major class of optimizations that TB loses: speculative writes, i.e. inserting writes in code paths that would not previously have written to this location.
This is a powerful optimization and I was quite happy that SB could pull it off, but it also comes at a major cost: mutable references have to be “immediately unique”.
Given how common of a problem “overeager uniqueness” is, my current inclination is that we most likely would rather make all that code legal than allow speculative writes.
We still have extremely powerful optimization principles around reads, and when the code <em>does</em> perform a write that gives rise to even more optimizations, so my feeling is that insisting on speculative writes is just pushing things too far.</p>
<p>On another front, TB actually allows a set of crucial optimizations that SB ruled out by accident: reordering of reads!
The issue with SB is that if we start with “read mutable reference, then read shared reference”, and then reorder to “read shared reference, then read mutable reference”, then in the new program, reading the shared reference might invalidate the mutable reference – so the reordering might have introduced UB!
This optimization is possible without having any special aliasing model, so SB not allowing it is a rather embarrassing problem.
If it weren’t for the stack-violating hack that already came up several times above, I think there would be a fairly easy way of fixing this problem in SB, but alas, that hack is load-bearing and too much existing code is UB if we remove it.
Meanwhile, TB does not need any such hack, so we can do the Right Thing (TM): when doing a read, unrelated mutable references are not entirely disabled, they are just made read-only.
This means that “read shared reference, then read mutable reference” is equivalent to “read mutable reference, then read shared reference” and the optimization is saved.
(A consequence of this is that retags can also be reordered with each other, since they also act as reads. Hence the order in which you set up various pointers cannot matter, until you do the first write access with one of them.)</p>
<h2 id="future-possibility-unique">Future possibility: <code class="language-plaintext highlighter-rouge">Unique</code></h2>
<p>Tree Borrows paves the way for an extension that we have not yet implemented, but that I am quite excited to explore: giving meaning to <code class="language-plaintext highlighter-rouge">Unique</code>.
<code class="language-plaintext highlighter-rouge">Unique</code> is a private type in the Rust standard library that was originally meant to express <code class="language-plaintext highlighter-rouge">noalias</code> requirements.
However, it was never actually wired up to emit that attribute on the LLVM level.
<code class="language-plaintext highlighter-rouge">Unique</code> is mainly used in two places in the standard library: <code class="language-plaintext highlighter-rouge">Box</code> and <code class="language-plaintext highlighter-rouge">Vec</code>.
SB (and TB) treat <code class="language-plaintext highlighter-rouge">Box</code> special (matching rustc itself), but not <code class="language-plaintext highlighter-rouge">Unique</code>, so <code class="language-plaintext highlighter-rouge">Vec</code> does not come with any aliasing requirements.
And indeed the SB approach totally does not work for <code class="language-plaintext highlighter-rouge">Vec</code>, since we don’t actually know how much memory to make unique here.
However, with TB we have lazy initialization, so we don’t need to commit to a memory range upfront – we can make it unique “when accessed”.
This means we can explore giving meaning to the <code class="language-plaintext highlighter-rouge">Unique</code> in <code class="language-plaintext highlighter-rouge">Vec</code>.</p>
<p>Now, this might not actually work.
People actually do blatantly-aliasing things with <code class="language-plaintext highlighter-rouge">Vec</code>, e.g. to implement arenas.
On the other hand, <code class="language-plaintext highlighter-rouge">Vec</code>’s uniqueness would only come in when it is moved or passed <em>by value</em>, and only for the memory ranges that are actually being accessed.
So it is quite possible that this is compatible with arenas.
I think the best way to find out is to implement <code class="language-plaintext highlighter-rouge">Unique</code> semantics behind a flag and experiment.
If that works out, we might even be able to remove all special handling of <code class="language-plaintext highlighter-rouge">Box</code> and rely on the fact that <code class="language-plaintext highlighter-rouge">Box</code> is defined as a newtype over <code class="language-plaintext highlighter-rouge">Unique</code>.
This would slightly reduce the optimization potential (<code class="language-plaintext highlighter-rouge">Box<T></code> is known to point to a memory range at least the size of <code class="language-plaintext highlighter-rouge">T</code>, whereas <code class="language-plaintext highlighter-rouge">Unique</code> has no such information), but making <code class="language-plaintext highlighter-rouge">Box</code> less magic is a long-standing quest so this might be an acceptable trade-off.</p>
<p>I should note that there are many people who think neither <code class="language-plaintext highlighter-rouge">Box</code> nor <code class="language-plaintext highlighter-rouge">Vec</code> should have any aliasing requirements. I think it’s worth first exploring whether we can have aliasing requirements which are sufficiently light-weight that they are compatible with common coding patterns, but even if we end up saying <code class="language-plaintext highlighter-rouge">Box</code> and <code class="language-plaintext highlighter-rouge">Vec</code> behave like raw pointers, it can still be useful to have <code class="language-plaintext highlighter-rouge">Unique</code> in our toolbox and expose it for unsafe code authors to eke out the last bits of performance.</p>
<h2 id="conclusion">Conclusion</h2>
<p>These are the major differences between Stacked Borrows and Tree Borrows.
As you can see, almost all of them are cases where TB allows more code than SB, and indeed TB fixes what I consider to be SB’s two biggest problems: overeager uniqueness for mutable references, and confining references and raw pointers to the size of the type they are created with.
These are great news for unsafe code authors!</p>
<p>What TB <em>doesn’t</em> change is the presence of “protectors” to enforce that certain references remain valid for the duration of an entire function call (whether they are used again or not); protectors are absolutely required to justify the LLVM <code class="language-plaintext highlighter-rouge">noalias</code> annotations we would like to emit and they also do enable some stronger optimizations than what would otherwise be possible.
I do expect protectors to be the main remaining source of unexpected UB from Tree Borrows, and I don’t think there is a lot of wiggle-room that we have here, so this might just be a case where we have to tell programmers to adjust their code, and invest in documentation material to make this subtle issue as widely known as possible.</p>
<p>Neven has implemented Tree Borrows in Miri, so you can play around with it and check your own code by setting <code class="language-plaintext highlighter-rouge">MIRIFLAGS=-Zmiri-tree-borrows</code>.
If you run into any surprises or concerns, please let us know!
The <a href="https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem">t-opsem Zulip</a> and the <a href="https://github.com/rust-lang/unsafe-code-guidelines/">UCG issue tracker</a> are good places for such questions.</p>
<p>That’s all I got, thanks for reading – and a shout out to Neven for doing all the actual work here (and for giving feedback on this blog post), supervising this project has been a lot of fun!
Remember to read <a href="https://perso.crans.org/vanille/treebor/">his write up</a> and <a href="https://www.youtube.com/watch?v=zQ76zLXesxA">watch his talk</a>.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This does not mean that we bless such mutation! It just means that the compiler cannot use immutability of the first field for its optimizations. Basically, immutability of that field becomes a <a href="/blog/2018/08/22/two-kinds-of-invariants.html">safety invariant instead of a validity invariant</a>: when you call foreign code, you can still rely on it not mutating that field, but within the privacy of your own code you are allowed to mutate it. See <a href="https://www.reddit.com/r/rust/comments/13y8a9b/comment/jmlvgun/">my response here</a> for some more background. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
cargo careful: run your Rust code with extra careful debug checking2022-09-26T00:00:00+02:00https://www.ralfj.de/blog/2022/09/26/cargo-careful.html
<p>Did you know that the standard library is full of useful checks that users never get to see?
There are plenty of debug assertions in the standard library that will do things like check that <code class="language-plaintext highlighter-rouge">char::from_u32_unchecked</code> is called on a valid <code class="language-plaintext highlighter-rouge">char</code>, that <code class="language-plaintext highlighter-rouge">CStr::from_bytes_with_nul_unchecked</code> does not have internal nul bytes, or that pointer functions such as <code class="language-plaintext highlighter-rouge">copy</code> or <code class="language-plaintext highlighter-rouge">copy_nonoverlapping</code> are called on suitably aligned non-null (and non-overlapping) pointers.
However, the regular standard library that is distributed by rustup is compiled without debug assertions, so there is no easy way for users to benefit from all this extra checking.</p>
<!-- MORE -->
<p><a href="https://github.com/RalfJung/cargo-careful"><code class="language-plaintext highlighter-rouge">cargo careful</code></a> is here to close this gap:
when invoked the first time, it builds a standard library with debug assertions from source, and then runs your program or test suite with that standard library.
Installing <code class="language-plaintext highlighter-rouge">cargo careful</code> is as easy as <code class="language-plaintext highlighter-rouge">cargo install cargo-careful</code>, and then you can do <code class="language-plaintext highlighter-rouge">cargo +nightly careful run</code>/<code class="language-plaintext highlighter-rouge">cargo +nightly careful test</code> to execute your binary crates and test suites with an extra amount of debug checking.</p>
<p>This will naturally be slower than a regular debug or release build, but it is <em>much</em> faster than executing your program in <a href="https://github.com/rust-lang/miri">Miri</a> and still helps find some Undefined Behavior.
Unlike Miri, it is fully FFI-compatible (though the code behind the FFI barrier is completely unchecked).
Of course Miri is much more thorough and <code class="language-plaintext highlighter-rouge">cargo careful</code> will miss many problems (for instance, it cannot detect out-of-bounds pointer arithmetic – but it <em>does</em> perform bounds checking on <code class="language-plaintext highlighter-rouge">get_unchecked</code> slice accesses).</p>
<p>Note that for now, some of these checks (in particular for raw pointer methods) cause an abrupt abort of the program via SIGILL without a nice error message or backtrace.
There are probably ways to improve this in the future.
Meanwhile, if you have some <code class="language-plaintext highlighter-rouge">unsafe</code> code that for one reason or another you cannot test with Miri, give <a href="https://github.com/RalfJung/cargo-careful"><code class="language-plaintext highlighter-rouge">cargo careful</code></a> a try and let me know how it is doing. :)</p>
<p><em>By the way, I am soon <a href="/blog/2022/08/16/eth.html">starting as a professor at ETH Zürich</a>, so if you are interested in working with me on programming language theory as a master student, PhD student, or post-doc, then please <a href="https://research.ralfj.de/contact.html">reach out</a>!</em></p>
A New Beginning2022-08-16T00:00:00+02:00https://www.ralfj.de/blog/2022/08/16/eth.html
<p>I have some very exciting news to share: starting November 1st, I will work at ETH Zürich as an assistant professor!
Becoming a professor in the first place is a dream come true, and becoming a professor at a place like ETH Zürich is not something I even dared to dream of.
I still cannot quite believe that this is actually happening (I will be <em>professor</em>?!??), but <a href="https://twitter.com/CSatETH/status/1548944615285350400">the news is out</a> so I guess this is real. :D</p>
<!-- MORE -->
<p>I feel excited and terrified in about equal parts.
Excited by all the new possibilities, by the prospect of working with students and inspiring the next generation of researchers;
terrified by all the responsibility and the prospect of having to stand in a classroom and give a lecture in only a few months’ time.
But somehow everyone else seems confident that I can do this, so I guess I’ll just play along and hope that I will not prove them wrong…</p>
<p>I am also humbled and eternally thankful for being given this opportunity.
Being able to work in an environment like ETH is a privilege beyond imagination, and I don’t know how I got so lucky.
I probably used up all my Karma points for the rest of my life, and will do my best to honor this privilege.
I feel hugely indebted to everyone I worked with, first and foremost of course my PhD advisor <a href="https://people.mpi-sws.org/~dreyer/">Derek Dreyer</a>.
But I would also like to specifically call out the Rust community, because I don’t think this would have happened without Rust – thanks to <em>everyone</em> who contributed to this language that I am essentially building my career on<sup id="fnref:rust" role="doc-noteref"><a href="#fn:rust" class="footnote" rel="footnote">1</a></sup>, and thanks in particular to everyone who indulged in my ideas for how Rust should approach unsafe code and helped me shape that corner of the language.</p>
<p>So what’s next?
I will soon finish my post-doc at MIT and move back to Europe, and then move to Zürich in October.
And then I will have to figure out how this being-a-professor thing works. ;)
My first main priority is building a research group: the “Programming Language Foundations Lab”<sup id="fnref:lab" role="doc-noteref"><a href="#fn:lab" class="footnote" rel="footnote">2</a></sup>.
So if you are interested in doing a PhD or post-doc working on, well, programming language foundations, and in particular formal foundations for Rust, or if you are an ETH student interested in a Master Thesis in that area – please <a href="https://research.ralfj.de/contact.html">reach out</a>!
I am still figuring out how to do things like hiring people and finding suitable projects, but there is no shortage of open problems that need solving and theorems that need proving. :)</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:rust" role="doc-endnote">
<p>Before anyone gets worried, I also have some <a href="https://iris-project.org/">ideas</a> I want to pursue that are unrelated to Rust. But Rust is currently by far the biggest inspiration for new research problems for me, and without Rust I don’t think my research would be anywhere near as applied and impactful as it is today, which I am sure played a key role in the decision of ETH to hire me. <a href="#fnref:rust" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:lab" role="doc-endnote">
<p>Yes, I have a lab coat. I don’t usually wear it though… and if you want to see me wear it, that will cost you some beer. <a href="#fnref:lab" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Announcing: MiniRust2022-08-08T00:00:00+02:00https://www.ralfj.de/blog/2022/08/08/minirust.html
<p>I have been thinking about the semantics of Rust – as in, the intended behavior of Rust programs when executed, in particular those containing unsafe code – a lot.
Probably too much.
But all of these thoughts are just in my head, which is not very useful when someone else wants to try and figure out how some tricky bit of unsafe Rust code behaves.
As part of the <a href="https://github.com/rust-lang/unsafe-code-guidelines/">Unsafe Code Guidelines</a> project, we often get questions asking whether a <em>concrete</em> piece of code is fine or whether it has Undefined Behavior.
But clearly, that doesn’t scale: there are just too many questions to be asked, and figuring out the semantics by interacting with an oracle with many-day latency is rather frustrating.
We have <a href="https://github.com/rust-lang/miri/">Miri</a>, which is a much quicker oracle, but it’s also not always right and even then, it can just answer questions of the form “is this particular program fine”; users have to do all the work of figuring out the model that <em>generates</em> those answers themselves.</p>
<!-- MORE -->
<p>So I have promised for a long time to find some more holistic way to write down my thoughts on unsafe Rust semantics.
I thought I could do it in 2021, but I, uh, “slightly” missed that deadline… but better late than never!
At long last, I can finally present to you: <a href="https://github.com/RalfJung/minirust"><strong>MiniRust</strong></a>.<sup id="fnref:name" role="doc-noteref"><a href="#fn:name" class="footnote" rel="footnote">1</a></sup></p>
<p>The purpose of MiniRust is to describe the semantics of an interesting fragment of Rust in a way that is both precise and understandable to as many people as possible.
These goals are somewhat at odds with each other – the most precise definitions, e.g. carried out in the Coq Proof Assistant, tend to not be very accessible.
English language, on the other hand, is not very precise.
So my compromise solution is to write down the semantics in a format that is hopefully known to everyone who could be interested: in Rust code.
Specifically, MiniRust is specified by a <em>reference interpreter</em> that describes the step-by-step process of executing a MiniRust program, <em>including</em> checking at each step whether the program has Undefined Behavior.</p>
<p>“Hold on”, I hear a <a href="https://fasterthanli.me/articles/">Cool Bear</a> say, “you are defining Rust in Rust code? Isn’t that cyclic?”<sup id="fnref:bear" role="doc-noteref"><a href="#fn:bear" class="footnote" rel="footnote">2</a></sup>
Well, yes and no. It’s not <em>really</em> Rust code.
It’s what I call “pseudo Rust”, uses only a tiny fragment of the language (in particular, no <code class="language-plaintext highlighter-rouge">unsafe</code>), and then extends the language with some conveniences to make things less verbose.
The idea is that anyone who knows Rust should immediately be able to understand what this code means, but also hopefully eventually if this idea pans out we can have tooling to translate pseudo Rust into “real” languages – in particular, real Rust and Coq.
Translating it to real Rust means we can actually execute the reference interpreter and test it, and translating it to Coq means we can start proving theorems about it.
But I am getting waaaay ahead of myself, these are rather long-term plans.</p>
<p><strong>Update (2023-02-13):</strong> “Pseudo Rust” has now been renamed to “specr lang”, the language of the work-in-progress “specr” tool that can translate specr lang into Rust code to make specifications executable. <strong>/Update</strong></p>
<p>So, if you want to look into my brain to see how I see Rust programs, then please go check out <a href="https://github.com/RalfJung/minirust">MiniRust</a>.
The README explains the scope and goals, the general structure, and the details of <del>pseudo Rust</del> specr lang, as well as a comparison with some related efforts.</p>
<p>In particular I find that the concept of “places” and “values”, which can be rather mysterious, becomes a lot clearer when spelled out like that, but that might just be me.
I hasten to add that this is <em>very early work-in-progress</em>, and it is <em>my own personal experiment</em>, not necessarily reflecting the views of anyone else.
It is also <em>far from feature-complete</em>, in fact it has just barely enough to be interesting.
There are lots of small things missing (like integers that aren’t exactly 2 bytes in size, or tuples that don’t have exactly 2 elements), but the biggest omission by far is the total lack of an aliasing model.
And unsized types. And concurrency. And probably other things.</p>
<p>On the other hand, there are many things that it <em>can</em> explain in full precision:</p>
<ul>
<li>validity invariants, and how they arise from the mapping between a high-level concept of “values” and a low-level concept of “sequences of bytes”</li>
<li>the basic idea of provenance tracking the “allocation” a pointer points to, and how that interacts with pointer arithmetic (including <code class="language-plaintext highlighter-rouge">offset</code> and <code class="language-plaintext highlighter-rouge">wrapping_offset</code>)</li>
<li>how pointer provenance behaves when doing transmutation between pointers and integers</li>
<li>what happens when <em>casting</em> between pointers and integers</li>
<li>padding (that’s why tuples can have 2 elements, so there can be padding between them)</li>
</ul>
<p>If you are not used to reading interpreter source code, then I guess this can be rather jarring, and there is certainly a <em>lot</em> of work that could and should be done to make this more accessible.
(Like, examples. I hear people like examples.)
But just being able to talk about these questions with precision <em>at all</em> has already lead to some interesting discussions in the UCG WG, some of which made me change my mind – thanks in particular to @digama0, @JakobDegen, and @alercah for engaging deeply with my ideas.
So for now it is serving its purpose, and maybe some of you can find it useful, too.
Hopefully we can even use this as a starting place for seriously tackling the issue of an <em>official</em> specification of Rust.
More on that soon. :)</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:name" role="doc-endnote">
<p>I am beginning to wonder if this name was a bad choice. Naming is not my strong suit. Maybe “CoreRust” would have been better? Alas… <a href="#fnref:name" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:bear" role="doc-endnote">
<p>Thanks to fasterthanlime for facilitating the bear’s appearance on this blog. <a href="#fnref:bear" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
The last two years in Miri2022-07-02T00:00:00+02:00https://www.ralfj.de/blog/2022/07/02/miri.html
<p>It has been <a href="/blog/2020/09/28/miri.html">almost two years</a> since my last Miri status update.
A lot has happened in the mean time that I would like to tell you all about!
If you are using Miri, you might also be seeing new errors in code that previously worked fine; read on for more details on that.</p>
<p>For the uninitiated, <a href="https://github.com/rust-lang/miri/">Miri</a> is an interpreter that runs your Rust code and checks if it triggers any <a href="https://doc.rust-lang.org/reference/behavior-considered-undefined.html">Undefined Behavior</a> (UB for short).
You can think of it a as very thorough (and very slow) version of valgrind/ASan/TSan/UBSan:
Miri will detect when your program uses uninitialized memory incorrectly, performs out-of-bounds memory accesses or pointer arithmetic, causes a data race, violates key language invariants, does not ensure proper pointer alignment, or causes incorrect aliasing.
As such, it is most helpful when writing unsafe code, as it aids in ensuring that you follow all the rules required for unsafe code to be correct and safe.
Miri also detects memory leaks, i.e., it informs you at the end of program execution if there is any memory that was not deallocated properly.</p>
<!-- MORE -->
<p>Moreover, Miri is able to run code for other targets: for example, you might be developing code on x86_64, a 64-bit little-endian architecture.
When you do low-level bit manipulation, it is easy to introduce bugs that only show up on 32-bit systems or big-endian architectures.
You can run Miri with <code class="language-plaintext highlighter-rouge">--target i686-unknown-linux-gnu</code> and <code class="language-plaintext highlighter-rouge">--target mips64-unknown-linux-gnuabi64</code> to test your code in those situations – and this will work even if your host OS is macOS or Windows!</p>
<p>That said, it’s not all roses and rainbows.
Since Miri just knows how to interpret Rust code, it will get stuck when you call into C code.
Miri knows how to execute a certain small set of well-known C functions (e.g. to access environment variables or open files), but it is still easy to run into an “unsupported operation” error due to missing C library implementations.
In many cases you should be able to still write tests that cover the remaining code that does not need to, for example, directly access the network;
but I also hope that Miri will keep growing its support for key platform APIs.</p>
<h2 id="miri-progress">Miri progress</h2>
<p>So, what progress has Miri made in the last two years?</p>
<h3 id="concurrency">Concurrency</h3>
<p>The story of concurrency in Miri continues to surprise me: I had not even planned for Miri to support concurrency, but people just keep showing up and implement one part of it after the other, so now we have pretty good support for finding concurrency bugs!</p>
<p>In that spirit, @JCTyblaidd implemented a data race detector.
So if your code does not use appropriate atomic operations to make sure all accesses are suitably synchronized, Miri will now detect that problem and report Undefined Behavior.
<a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=2dc29d339658dd2e1e74b84fcffc3926">Here’s a demo</a>.
(Click that link and then select “Tools - Miri” to see this in action.)
Our data race error reports could be improved a lot (in particular they only show one of the two conflicting accesses involved in a data race), but they are still useful and have already found several data races in the wild.</p>
<p>@thomcc changed our <code class="language-plaintext highlighter-rouge">compare_exchange_weak</code> implementation so that it randomly just fails with 80% probability.
(The exact rate is adjustable via <code class="language-plaintext highlighter-rouge">-Zmiri-compared-exchange-weak-failure-rate=<x></code>.)
<a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=0af8443b760985ce01640135ffb83749">Here’s a demo</a>.
This is super useful to find issues where code uses <code class="language-plaintext highlighter-rouge">compare_exchange_weak</code> but cannot handle spurious failures, since those are very unlikely to occur in the wild.</p>
<p>@henryboisdequin added support for the atomic <code class="language-plaintext highlighter-rouge">fetch_min</code> and <code class="language-plaintext highlighter-rouge">fetch_max</code> operations, completing our support of the <code class="language-plaintext highlighter-rouge">Atomic*</code> types.</p>
<p>And finally, @cbeuw showed up and added “weak memory emulation”.
This means that when you do an atomic load, you might not observe the latest value written to that location; instead, a previous value can be returned.
<a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7809a750bda54f0cc458b81823c79db7">Here’s a demo</a>.
This happens on real hardware, so having this supported in Miri helps to find more potential bugs.
The caveat is that Miri still cannot produce <em>all</em> the behaviors that the actual program might exhibit.
Also, the C++20 revision of the C++ memory model disallowed some possible behaviors that were previously allowed, but Miri might produce those behaviors – there is currently no known algorithm that would prevent that.
This should be very rare though.</p>
<p>I then just put the icing on the cake by fixing some long-standing issues in our scheduler, so that it no longer gets stuck in spin loops.
Miri now has a chance to preempt the running thread at the end of each basic block; the preemption probability is 1% but you can adjust it (using <code class="language-plaintext highlighter-rouge">-Zmiri-preemption-rate=<x></code>).</p>
<p>All of this made our concurrency support sufficiently solid that it no longer shows any warning about being “experimental”.
For example, it has already found a <a href="https://github.com/rust-lang/rust/issues/98498">data race in the standard library</a>.
I can barely express how happy and proud I am that I had to do basically none of this work. :)</p>
<p>One warning though: several of the improvements mentioned above rely on doing random choices.
So, it is now more likely than before that Miri will work fine one day, and then show an error after some seemingly inconsequential change to the program the next day.
I will get back to these problems later.</p>
<h3 id="pointer-provenance-and-stacked-borrows">Pointer provenance and Stacked Borrows</h3>
<p>One of the most subtle aspects of Miri is <a href="https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md">Stacked Borrows</a>.
The aliasing model is already quite complicated, and actually debugging what happens when Miri finds an aliasing violation in your code can be pretty tricky.
However, @saethlin made this a lot easier!
The error messages now show a lot more detail and point to several relevant locations in the code: not only where the bad access happened, but also where the pointer tag used for that access was created, and where that tag was invalidated.
I am very impressed by how good some of these errors are, just <a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e831ed0f262e039dda4d24c159a9f5b0">check this out</a>.</p>
<p>Another big thing that happened recently is the entire <a href="https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance">“Strict Provenance” story</a>.
I am super excited by these developments, because they offer the chance to fix some long-standing open problems in Miri:
the issues with “untagged” raw pointers in Stacked Borrows, and Miri not properly supporting integer-to-pointer casts.</p>
<p>After a lot of work by @carbotaniuman and myself, the situation now is as follows:</p>
<ul>
<li>Miri always properly tags raw pointers.
So there are no longer any counter-intuitive behaviors caused by Miri “mixing up” two raw pointers that point to the same address, but were computed in a different way.
(We had a <code class="language-plaintext highlighter-rouge">-Zmiri-tag-raw-pointers</code> flag for a while that also achieves this; that flag is now on-by-default.)</li>
<li>If you do not use any integer-to-pointer casts, then you can stop reading here!
You can pass <code class="language-plaintext highlighter-rouge">-Zmiri-strict-provenance</code> to Miri to ensure that this is indeed the case.</li>
<li>If you <em>are</em> using integer-to-pointer casts, then Miri will warn about that. You now have two options.
<ul>
<li>The ideal solution is to avoid using integer-to-pointer casts, and to follow Strict Provenance instead.
The <a href="https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance">pointer library docs</a> explain in more detail what exactly that means.
Note that the APIs described there are still unstable, but a <a href="https://crates.io/crates/sptr">polyfill</a> is available for stable Rust.
Also see <a href="https://gankra.github.io/blah/tower-of-weakenings/">Gankra’s blog post</a> and <a href="/blog/2022/04/11/provenance-exposed.html">my own blog post</a> for some more background on this subject.</li>
<li>If the casts are in code you do not control, or if you cannot currently avoid integer-to-pointer casts, you can pass <code class="language-plaintext highlighter-rouge">-Zmiri-permissive-provenance</code> to Miri to silence the warning.
Know that this means that Miri might miss some bugs in your code:
integer-to-pointer casts make it impossible to precisely track which pointer came from where, so Miri will conservatively accept some code that actually should be rejected.</li>
</ul>
</li>
</ul>
<p>This is overall much better than previously – there is nothing funky going on with raw pointers any more, and we should never incorrectly report UB any more even when integer-to-pointer casts are used.
:-)</p>
<h3 id="other-areas">Other areas</h3>
<p>Concurrency and pointer aliasing are the two big changes, but there is also a long tail of smaller changes that together make Miri a hack of a lot more useful than it used to be:</p>
<ul>
<li>@teryror made Miri support doctests, so now <code class="language-plaintext highlighter-rouge">cargo miri test</code> will also check your doctests for UB!</li>
<li>@Smittyvb fixed our fast-math intrinsics to properly report UB when they are used on non-finite values.</li>
<li>@hyd-dev added “symbol resolution” support to Miri, so if one part of your Rust code defines a function with a given <code class="language-plaintext highlighter-rouge">link_name</code>, and another piece of Rust code imports that function via an <code class="language-plaintext highlighter-rouge">extern</code> block, Miri now knows how to find the right function implementation.</li>
<li>@atsmtat added a <code class="language-plaintext highlighter-rouge">-Zmiri-isolation-error=<action></code> flag so when a function call is rejected due to isolation, evaluation can continue by reporting an error code to the interpreted program.</li>
<li>@landaire added <code class="language-plaintext highlighter-rouge">-Zmiri-panic-on-unsupported</code>, which makes Miri raise a panic rather than stopping evaluation when an unsupported system function is encountered.
This can be useful to keep going with the next test in a test suite.
However, it also raises panics where usually that would be impossible, which can lead to surprising behavior.</li>
<li>@DrMeepster added support for running programs that use the <code class="language-plaintext highlighter-rouge">#[start]</code> attribute, and @oli-obk made that work even for targets without <code class="language-plaintext highlighter-rouge">libstd</code>.
(You need to set <code class="language-plaintext highlighter-rouge">MIRI_NO_STD=1</code> to make the latter work.)</li>
<li>@DrMeepster also implemented support for the <code class="language-plaintext highlighter-rouge">#[global_allocator]</code> attribute.</li>
<li>@camelid made Miri optionally detect UB due to uninitialized integers, which has since become the default.</li>
<li>@saethlin made our errors more readable by pruning irrelevant details from the backtraces.</li>
<li>I have implemented support for calling methods on types like <code class="language-plaintext highlighter-rouge">Pin<Box<dyn Trait>></code>.</li>
<li>@oli-obk fixed our handling of types like <code class="language-plaintext highlighter-rouge">MaybeUninit<u64></code>, where previously we did not properly support only <em>some</em> of the bytes being initialized.</li>
</ul>
<p>We also improved out platform API and intrinsic support:</p>
<ul>
<li>Thanks to @m-ou-se Miri now supports the Linux futex APIs used by the Rust standard library.
This was crucial for std’s <code class="language-plaintext highlighter-rouge">park()</code> and <code class="language-plaintext highlighter-rouge">unpark()</code>, but meanwhile is also used for many other synchronization primitives.</li>
<li>On the file system side, @Aaron1011 implemented <code class="language-plaintext highlighter-rouge">readlink</code>, which makes <code class="language-plaintext highlighter-rouge">std::fs::read_link</code> work on Linux and macOS.</li>
<li>@asquared31415 made the three-argument form of <code class="language-plaintext highlighter-rouge">open</code> work.</li>
<li>@tavianator implemented <code class="language-plaintext highlighter-rouge">readdir64</code> so we can still list directories on Linux (the Rust standard library was changed to use that function rather than <code class="language-plaintext highlighter-rouge">readdir64_r</code>).</li>
<li>@Aaron1011 has also improved the rendering of panic backtraces inside the interpreter.</li>
<li>@frewsxcv implemented the missing bits to make the aarch64-apple-darwin target work in Miri.</li>
<li>I implemented the intrinsics required by <code class="language-plaintext highlighter-rouge">std::simd</code>, so portable-simd code should work with Miri.
It will not be very fast, though…</li>
<li>@V0ldek made our Windows <code class="language-plaintext highlighter-rouge">GetSystemInfo</code> shim work in more situations.</li>
<li>@saethlin added support for <code class="language-plaintext highlighter-rouge">*_COARSE</code> clocks on Linux.</li>
<li>@InfRandomness has started on getting Miri to work on FreeBSD targets (but this support is still incomplete).</li>
</ul>
<h3 id="bugfixes-and-cleanup">Bugfixes and cleanup</h3>
<p>And of course there were tons of bugfixes.
I want to particularly call out @hyd-dev who fixed a <em>lot</em> of issues in our <code class="language-plaintext highlighter-rouge">cargo miri</code> frontend.
@dtolnay did a lot of code cleanup, making Miri pass by clippy’s critical eyes and ensuring all our tests are properly formatted.
And last not least, @oli-obk completely re-wrote our test suite so that we can finally actually test the full output of Miri.</p>
<p>I have probably forgotten to mention something interesting as well.
<a href="https://github.com/rust-lang/miri/graphs/contributors?from=2020-09-29&to=2022-07-02&type=c">See here</a> for the full list of amazing people who contributed to Miri since my last update.
I cannot thank all of you enough! <3</p>
<h2 id="help-miri-suddenly-says-my-code-is-broken">Help, Miri suddenly says my code is broken</h2>
<p>Several of the changes mentioned above, in particular with regards to concurrency and Stacked Borrows, mean that Miri is now able to detect more problems than before.
On the one hand, that’s of course great, but on the other hand, it can mean that when you re-test Miri on some code that seemed fine, it might suddenly complain!
And because of all the non-determinism, it might also be the case that Miri <em>sometimes</em> complains, and sometimes doesn’t (or that it works fine locally but complains on CI).
What can you do when that happens?</p>
<p>If Miri shows a new Stacked Borrows error, then that is probably caused by raw pointers now being properly tagged.
The new Stacked Borrows messages should make it easier than before to diagnose these problems, but in the end this still remains a case-by-case issue.
For example, <a href="https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=14ddec169895d111578ec96757df95d1">this program</a> will print:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: Undefined Behavior: attempting a read access using <3255> at alloc1770[0x4], but that tag does not exist in the borrow stack for this location
--> src/main.rs:4:25
|
4 | let _val = unsafe { *ptr.add(1) }; // ...and use it to access the *second* element.
| ^^^^^^^^^^^
| |
| attempting a read access using <3255> at alloc1770[0x4], but that tag does not exist in the borrow stack for this location
| this error occurs as part of an access at alloc1770[0x4..0x8]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3255> was created by a retag at offsets [0x0..0x4]
--> src/main.rs:3:15
|
3 | let ptr = &x[0] as *const i32; // We create a pointer to the *first* element...
| ^^^^^
= note: backtrace:
= note: inside `main` at src/main.rs:4:25
</code></pre></div></div>
<p>In this case, the clue is in the offsets: note that the tag was created for offsets <code class="language-plaintext highlighter-rouge">[0x0..0x4]</code> (as usual in Rust, this <em>excludes</em> <code class="language-plaintext highlighter-rouge">0x4</code>), and the access was at <code class="language-plaintext highlighter-rouge">alloc1770[0x4]</code>.
The pointer was thus used outside the offset range for which its tag (<code class="language-plaintext highlighter-rouge"><3255></code>) is valid.
The fix is to use <code class="language-plaintext highlighter-rouge">x.as_ptr()</code> rather than <code class="language-plaintext highlighter-rouge">&x[0] as *const i32</code> to get a pointer that is valid for the entire array.</p>
<p>If the error only shows up sometimes, then it probably has something to do with concurrency.
Miri is not <em>truly</em> random, but uses a pseudo-random number generator to make all concurrency-related choices (such as when to schedule another thread).
This means you can explore various different possible choices by passing different <em>seeds</em> for Miri to use for its pseudo-random number generator.
The following little shell snippet will run Miri with many different seeds, which is great to be able to locally reproduce a failure that you saw on CI, but that you are having trouble reproducing:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for SEED in $({ echo obase=16; seq 0 255; } | bc); do
echo "Trying seed: $SEED"
MIRIFLAGS=-Zmiri-seed=$SEED cargo miri test || { echo "Failing seed: $SEED"; break; };
done
</code></pre></div></div>
<p>It is important that you use exactly the same <code class="language-plaintext highlighter-rouge">MIRIFLAGS</code> as CI to ensure the failure can even happen!
It is also a good idea to use a filter with <code class="language-plaintext highlighter-rouge">cargo miri test FILTER</code> to ensure only the test you care about is being run.</p>
<p>Once you confirmed that this is indeed a non-deterministic test failure, you can narrow it down further by reducing Miri’s non-determinism:</p>
<ul>
<li>You can pass <code class="language-plaintext highlighter-rouge">-Zmiri-preemption-rate=0</code> to make the scheduler non-preemptive (only schedule to other threads when a thread explicitly yields).
This <em>can</em> lead to infinite loops if there are spin-loops that do not yield, but if it makes the problem go away, then the problem needs some very particular scheduling decisions to surface, which might help you track down its source.</li>
<li>You can also pass <code class="language-plaintext highlighter-rouge">-Zmiri-disable-weak-memory-emulation</code> which has the effect of making atomic loads always return the latest value stored in that location.
If that makes the problem go away, then the issue is likely caused by insufficient synchronization somewhere. It might be a missing fence, or a <code class="language-plaintext highlighter-rouge">Relaxed</code> access that should be <code class="language-plaintext highlighter-rouge">Release</code>/<code class="language-plaintext highlighter-rouge">Acquire</code>.</li>
<li>Finally, <code class="language-plaintext highlighter-rouge">-Zmiri-compare-exchange-weak-failure-rate=0</code> makes <code class="language-plaintext highlighter-rouge">compared_exchange_weak</code> behave exactly like <code class="language-plaintext highlighter-rouge">compare_exchange</code>.
If that makes the problem go away, then some code using <code class="language-plaintext highlighter-rouge">compared_exchange_weak</code> is not properly handling spurious failures.</li>
</ul>
<p>Passing all of these flags will make Miri’s concurrency entirely deterministic.
That can be useful to avoid non-deterministic test failures, but note that this will also mask many real-world bugs.
Those test failures are often real, even if they can be hard to track down!</p>
<p>If you are still having trouble, feel free to come visit us in our <a href="https://rust-lang.zulipchat.com/#narrow/stream/269128-miri">Zulip stream</a>, which is the official communication channel for Miri.</p>
<p>By the way, if you are still disabling some tests on Miri because Miri used to not support panics/concurrency, it’s time to give those tests another try. :)
So this is a good opportunity to go over your <code class="language-plaintext highlighter-rouge">cfg(miri)</code> and similar attributes and re-evaluate if they are still needed.</p>
<h2 id="using-miri">Using Miri</h2>
<p>If this post made you curious and you want to give Miri a try, here’s how to do that.
Assuming you have a crate with some unsafe code, and you already have a test suite (you are testing your unsafe code, right?), you can just install Miri (<code class="language-plaintext highlighter-rouge">rustup +nightly component add miri</code>) and then run <code class="language-plaintext highlighter-rouge">cargo +nightly miri test</code> to execute all tests in Miri.
Note that this requires the nightly toolchain as Miri is still an experimental tool.</p>
<p>Miri is very slow, so it is likely that some tests will take way too long to be feasible.
You can adjust iteration counts in Miri without affecting non-Miri testing as follows:</p>
<figure class="highlight"><pre><code class="language-rust" data-lang="rust"><span class="k">let</span> <span class="n">limit</span> <span class="o">=</span> <span class="k">if</span> <span class="nd">cfg!</span><span class="p">(</span><span class="n">miri</span><span class="p">)</span> <span class="p">{</span> <span class="mi">10</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="mi">10_000</span> <span class="p">};</span></code></pre></figure>
<p>If your test suite needs to access OS facilities such as timers or the file system, set <code class="language-plaintext highlighter-rouge">MIRIFLAGS=-Zmiri-disable-isolation</code> to enable those.
(Miri will tell you when that is necessary.)
If your test suite runs into an unsupported operation, please <a href="https://github.com/rust-lang/miri/issues">report an issue</a>.
However, note that we can only really support sufficiently “generic” operations – like accessing file systems and network sockets.
To implement things like <code class="language-plaintext highlighter-rouge">Py_IsInitialized</code> would mean putting a Python interpreter into Miri; that is not going to happen. ;)</p>
<p>If you want to add Miri to your CI to ensure your test suite keeps working in Miri, please consult our <a href="https://github.com/rust-lang/miri/#running-miri-on-ci">README</a>.
That document is also a great starting point for any other questions you might have.</p>
<p>Miri is also integrated into the <a href="https://play.rust-lang.org/">Rust Playground</a>: you can select Miri in the “Tools” menu to check the code for Undefined Behavior.</p>
<p>If Miri complains about your code and you do not understand why, we are happy to help!
The best place to ask for support is our <a href="https://rust-lang.zulipchat.com/#narrow/stream/269128-miri">Zulip stream</a>.
Questions are much easier to answer if you manage to reproduce the problem in a small self-contained bit of example code (ideally on the playground), but feel free to ask even if you do not know how to reduce the problem.</p>
<h2 id="helping-miri">Helping Miri</h2>
<p>If you want to help improve Miri, that’s awesome!
The <a href="https://github.com/rust-lang/miri/issues">issue tracker</a> is a good place to start; the list of issues is short enough that you can just browse through it rather quickly to see if anything pikes your interest.
The ones that are particularly suited for getting started are marked with a green label, but notice that even “E-easy” issues can require some amount of Rust experience – Miri is not a good codebase for your first steps in Rust.
Another good starting point is to try to implement the missing bit of functionality that keeps your test suite from working.
If you need any mentoring, just <a href="https://rust-lang.zulipchat.com/#narrow/stream/269128-miri">get in touch</a>. :)</p>
<p>That’s it for now.
I am totally blown away by how many people (and companies!) are already using and even contributing to Miri.
This endeavor of re-shaping the way we approach correctness of unsafe code has been way more successful than my wildest dreams.
I hope Miri can also help you to ensure correctness of your unsafe code, and I am excited for what the next year(s) of Miri development will bring. :D</p>
Pointers Are Complicated III, or: Pointer-integer casts exposed2022-04-11T00:00:00+02:00https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html
<p>In my <a href="/blog/2020/12/14/provenance.html">previous blog post on pointer provenance</a>, I have shown that not thinking carefully about pointers can lead to a compiler that is internally inconsistent:
programs that are intended to be well-behaved get miscompiled by a sequence of optimizations, each of which seems intuitively correct in isolation.
We thus have to remove or at least restrict at least one of these optimizations.
In this post I will continue that trend with another example, and then I will lay down my general thoughts on how this relates to the recent <a href="https://github.com/rust-lang/rust/issues/95228">Strict Provenance</a> proposal, what it could mean for Rust more generally, and compare with C’s PNVI-ae-udi.
We will end on a very hopeful note about what this could all mean for Rust’s memory model.
There’s a lot of information packed into this post, so better find a comfortable reading position. :)</p>
<!-- MORE -->
<p>In case you don’t know what I mean by “pointer provenance”, you can either read that previous blog post or the <a href="https://doc.rust-lang.org/nightly/core/ptr/index.html#provenance">Strict Provenance documentation</a>.
The gist of it is that a pointer consists not only of the address that it points to in memory, but also of its <em>provenance</em>: an extra piece of “shadow state” that is carried along with each pointer and that tracks which memory the pointer has permission to access and when.
This is required to make sense of restrictions like “use-after-free is Undefined Behavior, even if you checked that there is a new allocation at the same address as the old one”.
Architectures like CHERI make this “shadow state” explicit (pointers are bigger than usual so that they can explicitly track which part of memory they are allowed to access),
but even when compiling for AMD64 CPUs, compilers act “as if” pointers had such extra state – it is part of the specification, part of the Abstract Machine, even if it is not part of the target CPU.</p>
<h2 id="dead-cast-elimination-considered-harmful">Dead cast elimination considered harmful</h2>
<p>The key ingredient that will help us understand the nuances of provenance is <code class="language-plaintext highlighter-rouge">restrict</code>, a C keyword to promise that a given pointer <code class="language-plaintext highlighter-rouge">x</code> does not alias any other pointer not derived from <code class="language-plaintext highlighter-rouge">x</code>.<sup id="fnref:restrict" role="doc-noteref"><a href="#fn:restrict" class="footnote" rel="footnote">1</a></sup>
This is comparable to the promise that a <code class="language-plaintext highlighter-rouge">&mut T</code> in Rust is unique.
However, just like last time, we want to consider the limits that <code class="language-plaintext highlighter-rouge">restrict</code> combined with integer-pointer casts put on an optimizing compiler – so the actual programming language that we have to be concerned with is the IR of that compiler.
Nevertheless I will use the more familiar C syntax to write down this example; you should think of this just being notation for the “obvious” equivalent function in LLVM IR, where <code class="language-plaintext highlighter-rouge">restrict</code> is expressed via <code class="language-plaintext highlighter-rouge">noalias</code>.
Of course, if we learn that the IR has to put some limitations on what code may do, this also applies to the surface language – so we will be talking about all three (Rust, C, LLVM) quite a bit.</p>
<p>With all that out of the way, consider the following program:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp">
#include</span> <span class="cpf"><stdint.h></span><span class="cp">
</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">uwu</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uintptr_t</span> <span class="n">xaddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">x</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">y2</span> <span class="o">=</span> <span class="n">y</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">uintptr_t</span> <span class="n">y2addr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">y2</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">xaddr</span> <span class="o">==</span> <span class="n">y2addr</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">xaddr</span><span class="p">;</span>
<span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="o">*</span><span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">};</span>
<span class="kt">int</span> <span class="n">res</span> <span class="o">=</span> <span class="n">uwu</span><span class="p">(</span><span class="o">&</span><span class="n">i</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">&</span><span class="n">i</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="c1">// Always prints 1.</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">res</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This function takes as argument two <code class="language-plaintext highlighter-rouge">restrict</code> pointers <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code>. We first write <code class="language-plaintext highlighter-rouge">0</code> into <code class="language-plaintext highlighter-rouge">*x</code>.
Then we compute <code class="language-plaintext highlighter-rouge">y2</code> as pointing to the <code class="language-plaintext highlighter-rouge">int</code> right before <code class="language-plaintext highlighter-rouge">*y</code>, and cast that and <code class="language-plaintext highlighter-rouge">x</code> to integers.
If the addresses we get are the same, we cast <code class="language-plaintext highlighter-rouge">xaddr</code> back to a pointer and write <code class="language-plaintext highlighter-rouge">1</code> to it.
Finally, we return the value stored in <code class="language-plaintext highlighter-rouge">*x</code>.</p>
<p>The <code class="language-plaintext highlighter-rouge">main</code> function simply calls <code class="language-plaintext highlighter-rouge">uwu</code> with two pointers pointing to the first two elements of an array.
Note, in particular, that this <em>will</em> make <code class="language-plaintext highlighter-rouge">xaddr</code> and <code class="language-plaintext highlighter-rouge">y2addr</code> always equal!
<code class="language-plaintext highlighter-rouge">&i[1] - 1</code> denotes the same address as <code class="language-plaintext highlighter-rouge">&i[0]</code>.</p>
<p>Now, let us imagine we run a few seemingly obvious optimizations on <code class="language-plaintext highlighter-rouge">uwu</code>:</p>
<ul>
<li>Inside the <code class="language-plaintext highlighter-rouge">if</code>, we can replace <code class="language-plaintext highlighter-rouge">xaddr</code> by <code class="language-plaintext highlighter-rouge">y2addr</code> since they are both equal integers.</li>
<li>Since this is a <code class="language-plaintext highlighter-rouge">static</code> function and the only caller makes <code class="language-plaintext highlighter-rouge">y2addr</code> always equal to <code class="language-plaintext highlighter-rouge">xaddr</code>, we know that the conditional in the <code class="language-plaintext highlighter-rouge">if</code> will always evaluate to <code class="language-plaintext highlighter-rouge">true</code>. We thus remove the test. (Alternatively, the same transformation can happen by inlining <code class="language-plaintext highlighter-rouge">uwu</code> into <code class="language-plaintext highlighter-rouge">main</code> while preserving the alias information, which <a href="https://lists.llvm.org/pipermail/llvm-dev/2019-March/131127.html">LLVM explicitly aims for</a>.)</li>
<li>Finally, we observe that <code class="language-plaintext highlighter-rouge">xaddr</code> is unused, so we can remove it entirely.</li>
</ul>
<p><code class="language-plaintext highlighter-rouge">uwu</code> now looks as follows:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">int</span> <span class="nf">uwu</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">y2</span> <span class="o">=</span> <span class="n">y</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">uintptr_t</span> <span class="n">y2addr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">y2</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">y2addr</span><span class="p">;</span> <span class="c1">// <-- using y2addr</span>
<span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">return</span> <span class="o">*</span><span class="n">x</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>This might still look harmless.
However, we can do even more!
Notice how this function now consists of a store of <code class="language-plaintext highlighter-rouge">0</code> to <code class="language-plaintext highlighter-rouge">*x</code>, then a bunch of code <em>that does not involve <code class="language-plaintext highlighter-rouge">x</code> at all</em>, and then a load from <code class="language-plaintext highlighter-rouge">*x</code>.
Since <code class="language-plaintext highlighter-rouge">x</code> is a <code class="language-plaintext highlighter-rouge">restrict</code> pointer, this “code that does not involve <code class="language-plaintext highlighter-rouge">x</code>” cannot possibly mutate <code class="language-plaintext highlighter-rouge">*x</code>, as that would be a violation of the <code class="language-plaintext highlighter-rouge">restrict</code>/<code class="language-plaintext highlighter-rouge">noalias</code> guarantee.
Hence we can optimize the <code class="language-plaintext highlighter-rouge">return *x</code> to <code class="language-plaintext highlighter-rouge">return 0</code>.
This kind of optimization is the primary reason to have <code class="language-plaintext highlighter-rouge">restrict</code> annotations in the first place, so this should be uncontroversial.
Formally speaking: only pointers “derived from” <code class="language-plaintext highlighter-rouge">x</code> may access <code class="language-plaintext highlighter-rouge">*x</code>, and while the details of defining “derived from” are nasty, it should be clear that doing a bunch of operations that literally don’t involve <code class="language-plaintext highlighter-rouge">x</code> at all cannot by any stretch of the imagination produce a result that is “derived from” <code class="language-plaintext highlighter-rouge">x</code>.
(If they could, <code class="language-plaintext highlighter-rouge">restrict</code> would be basically worthless.)</p>
<p>Now, the whole program looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">int</span> <span class="nf">uwu</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">y2</span> <span class="o">=</span> <span class="n">y</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">uintptr_t</span> <span class="n">y2addr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">y2</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">y2addr</span><span class="p">;</span>
<span class="o">*</span><span class="n">ptr</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// <-- hard-coded return value</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">};</span>
<span class="kt">int</span> <span class="n">res</span> <span class="o">=</span> <span class="n">uwu</span><span class="p">(</span><span class="o">&</span><span class="n">i</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">&</span><span class="n">i</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="c1">// Now this prints 0!</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">res</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>We started out with a program that always prints <code class="language-plaintext highlighter-rouge">1</code>, and ended up with a program that always prints <code class="language-plaintext highlighter-rouge">0</code>.
This is bad news. Our optimizations changed program behavior. That must not happen! What went wrong?</p>
<p>Fundamentally, this is the same situation as in the previous blog post: this example demonstrates that either the original program already had Undefined Behavior, or (at least) one of the optimizations is wrong.
However, the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip – and if casting integers to pointers is allowed, <em>surely</em> that must work.
I will, for the rest of this post, assume that replacing <code class="language-plaintext highlighter-rouge">x</code> by <code class="language-plaintext highlighter-rouge">(int*)(uintptr_t)x</code> is always allowed.
So, which of the optimizations is the wrong one?</p>
<h2 id="the-blame-game">The blame game</h2>
<p>Remember what I said earlier about <code class="language-plaintext highlighter-rouge">restrict</code> and how it matters which pointer <code class="language-plaintext highlighter-rouge">ptr</code> is “derived from”?
If we follow this lead, it may seem like the bogus optimization is the one that replaced <code class="language-plaintext highlighter-rouge">xaddr</code> by <code class="language-plaintext highlighter-rouge">y2addr</code>.
After this transformation, <code class="language-plaintext highlighter-rouge">ptr</code> is obviously “derived from” <code class="language-plaintext highlighter-rouge">y2</code> (and thus transitively from <code class="language-plaintext highlighter-rouge">y</code>) and not <code class="language-plaintext highlighter-rouge">x</code>, and so obviously <code class="language-plaintext highlighter-rouge">uwu</code> (as called from <code class="language-plaintext highlighter-rouge">main</code>) is wrong since we are doing two memory accesses (at least one of which is a write) to the same location, using two pointers that are “derived from” different <code class="language-plaintext highlighter-rouge">restrict</code> pointers!</p>
<p>However, that optimization doesn’t even have anything to do with pointers.
It just replaces one equal integer by another!
How can that possibly be incorrect?</p>
<p>What this example shows is that the notion of one value being “derived from” another is not very meaningful when considering an optimizing compiler.<sup id="fnref:consume" role="doc-noteref"><a href="#fn:consume" class="footnote" rel="footnote">2</a></sup>
It <em>is</em> possible to “fix” this problem and have a notion of “derived from” that works correctly even with pointer-integer round-trips.
However, this requires saying that not only pointers but also <em>integers carry provenance</em>, such that casting a pointer to an integer can preserve the provenance.
We solved one problem and created many new ones.
For once, we have to stop doing optimizations that replace one <code class="language-plaintext highlighter-rouge">==</code>-equal integer by another, unless we know they carry no provenance.
(Alternatively we could say <code class="language-plaintext highlighter-rouge">==</code>-comparing such integers is Undefined Behavior. But clearly we want to allow people to <code class="language-plaintext highlighter-rouge">==</code>-compare integers they obtained from pointer-integer casts, so this is not an option.)
That seems like a bad deal, since the code that benefits from such optimizations doesn’t even do anything shady – it is the pointer-manipulating code that is causing trouble.
The list doesn’t end here though, and because of that, this option was discarded by the C standardization process during its provenance work, and they ended up picking a “PNVI” model – provenance <em>not</em> via integers.
I think Rust should follow suit.</p>
<p>But, if it’s not the replacement of <code class="language-plaintext highlighter-rouge">xaddr</code> by <code class="language-plaintext highlighter-rouge">y2addr</code> that is wrong, then which optimization <em>is</em> the wrong one?
I will argue that the incorrect optimization is the one that removed <code class="language-plaintext highlighter-rouge">xaddr</code>.
More specifically, the bad step was removing the cast <code class="language-plaintext highlighter-rouge">(uintptr_t)x</code>, irrespective of whether the result of that cast is used or not.
Had this cast been preserved, it would have been a marker for the compiler to know that “the <code class="language-plaintext highlighter-rouge">restrict</code> guarantee of <code class="language-plaintext highlighter-rouge">x</code> ends here”, and it would not have done the final optimization of making <code class="language-plaintext highlighter-rouge">uwu</code> always return <code class="language-plaintext highlighter-rouge">0</code>.</p>
<h2 id="casts-have-a-side-effect">Casts have a side-effect</h2>
<p>How can it <em>not</em> be correct to remove an operation if its result is unused?
If we take a step back, then in general, the answer is simple – if calling <code class="language-plaintext highlighter-rouge">foo()</code> has some side-effect on the global state, like changing the value of a global variable, then of course we have to keep the call to <code class="language-plaintext highlighter-rouge">foo</code> around even if we ignore its return value.
But in this case, the operation in question is <code class="language-plaintext highlighter-rouge">(uintptr_t)x</code>, which has no side-effect – right?</p>
<p>Wrong.
This is exactly the key lesson that this example teaches us: casting a pointer to an integer <em>has a side-effect</em>, and that side-effect has to be preserved even if we don’t care about the result of the cast (in this case, the reason we don’t care is that we <em>already know</em> that <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y2</code> will cast to the same <code class="language-plaintext highlighter-rouge">uintptr_t</code>).</p>
<p>To explain what that side-effect is, we have to get deep into the pointer provenance mindset.
<code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are both pointers, so they carry provenance that tracks which memory they have permission to access.
Specifically, <code class="language-plaintext highlighter-rouge">x</code> has permission to access <code class="language-plaintext highlighter-rouge">i[0]</code> (declared in <code class="language-plaintext highlighter-rouge">main</code>), and <code class="language-plaintext highlighter-rouge">y</code> has permission to access <code class="language-plaintext highlighter-rouge">i[1]</code>.<sup id="fnref:dyn" role="doc-noteref"><a href="#fn:dyn" class="footnote" rel="footnote">3</a></sup>
<code class="language-plaintext highlighter-rouge">y2</code> just inherits the permission from <code class="language-plaintext highlighter-rouge">y</code>.</p>
<p>But which permission does <code class="language-plaintext highlighter-rouge">ptr</code> get?
Since integers do not carry provenance, the details of this permission information are lost during a pointer-integer cast, and have to somehow be ‘restored’ at the integer-pointer cast.
And that is exactly the point where our problems begin.
In the original program, we argued that doing a pointer-integer-pointer round-trip is allowed (as is the intention of the C standard).
It follows that <code class="language-plaintext highlighter-rouge">ptr</code> must pick up the permission from <code class="language-plaintext highlighter-rouge">x</code> (or else the write to <code class="language-plaintext highlighter-rouge">*ptr</code> would be Undefined Behavior: <code class="language-plaintext highlighter-rouge">x</code> is <code class="language-plaintext highlighter-rouge">restrict</code>, nothing else can access that memory).
However, in the final program, <code class="language-plaintext highlighter-rouge">x</code> plays literally no role in computing <code class="language-plaintext highlighter-rouge">ptr</code>!
It would be a disaster to say that <code class="language-plaintext highlighter-rouge">ptr</code> could pick up the permission of <code class="language-plaintext highlighter-rouge">x</code> – just imagine all that <code class="language-plaintext highlighter-rouge">y</code>-manipulating code is moved into a different function.
Do we have to assume that any function we call can just do a cast to “steal” <code class="language-plaintext highlighter-rouge">x</code>’s permission?
That would entirely defeat the point of <code class="language-plaintext highlighter-rouge">restrict</code> and make <code class="language-plaintext highlighter-rouge">noalias</code> optimizations basically impossible.</p>
<p>But how can it be okay for <code class="language-plaintext highlighter-rouge">ptr</code> to pick up <code class="language-plaintext highlighter-rouge">x</code>’s permission in the original program, and <em>not</em> okay for it to pick up the same permission in the final program?
The key difference is that in the original program, <code class="language-plaintext highlighter-rouge">x</code> <em>has been cast to an integer</em>.
When you cast a pointer to an integer, you are basically declaring that its permission is “up for grabs”, and any future integer-pointer cast may end up endowing the resulting pointer with this permission.
We say that the permission has been “exposed”.
And <em>that</em> is the side-effect that <code class="language-plaintext highlighter-rouge">(uintptr_t)x</code> has!</p>
<p>Yes, this way of resolving the conflict <em>does</em> mean we will lose some optimizations.
We <em>have to</em> lose some optimization, as the example shows.
However, the crucial difference to the previous section is that <em>only code which casts pointers to integers is affected</em>.
This means we can keep the performance cost localized to code that does ‘tricky things’ around pointers – that code needs the compiler to be a bit conservative, but all the other code can be optimized without regard for the subtleties of pointer-integer-pointer round-trips.
(Specifically, <em>both</em> pointer-integer and integer-pointer casts have to be treated as impure operations, but for different reasons.
Pointer-integer casts have a side-effect as we have seen.
Integer-pointer casts are <em>non-deterministic</em> – they can produce different results even for identical inputs.
I moved the discussion of this point into the appendix below.)</p>
<h2 id="strict-provenance-pointer-integer-casts-without-side-effects">Strict provenance: pointer-integer casts <em>without</em> side-effects</h2>
<p>This may sound like bad news for low-level coding tricks like pointer tagging (storing a flag in the lowest bit of a pointer).
Do we have to optimize this code less just because of corner cases like the above?
As it turns out, no we don’t – there are some situations where it is perfectly fine to do a pointer-integer cast <em>without</em> having the “exposure” side-effect.
Specifically, this is the case if we never intend to cast the integer back to a pointer!
That might seem like a niche case, but it turns out that most of the time, we can avoid ‘bare’ integer-pointer casts, and instead use an operation like <a href="https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.with_addr"><code class="language-plaintext highlighter-rouge">with_addr</code></a> that explicitly specifies which provenance to use for the newly created pointer.<sup id="fnref:with_addr" role="doc-noteref"><a href="#fn:with_addr" class="footnote" rel="footnote">4</a></sup>
This is more than enough for low-level pointer shenanigans like pointer tagging, as <a href="https://gankra.github.io/blah/tower-of-weakenings/#strict-provenance-no-more-getting-lucky">Gankra demonstrated</a>.
Rust’s <a href="https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance">Strict Provenance experiment</a> aims to determine whether we can use operations like <code class="language-plaintext highlighter-rouge">with_addr</code> to replace basically all integer-pointer casts.</p>
<p>As part of Strict Provenance, Rust now has a second way of casting pointers to integers, <code class="language-plaintext highlighter-rouge">ptr.addr()</code>, which does <em>not</em> “expose” the permission of the underlying pointer, and hence can be treated like a pure operation!<sup id="fnref:experiment" role="doc-noteref"><a href="#fn:experiment" class="footnote" rel="footnote">5</a></sup>
We can do shenanigans on the integer representation of a pointer <em>and</em> have all these juicy optimizations, as long as we don’t expect bare integer-pointer casts to work.
As a bonus, this also makes Rust work nicely on CHERI <em>without</em> a 128bit wide <code class="language-plaintext highlighter-rouge">usize</code>, and it helps Miri, too.</p>
<p>But that is not the focus of this blog post, Gankra has <a href="https://gankra.github.io/blah/tower-of-weakenings/">already written most of what there is to say here</a>.
For this blog post, we are happy with what we learned about casts between pointers and integers.
We have found a way to resolve the conflict uncovered by the example, while keeping performance cost (due to lost optimizations) confined to just the code that is truly ambiguous, and even found alternative APIs that can be used to replace most (all?) uses of ambiguous integer-pointer casts.
All is well that ends well?
Unfortunately, no – we are not quite done yet with pointer provenance nightmares.</p>
<h2 id="lets-do-some-transmutation-magic">Let’s do some transmutation magic</h2>
<p>Languages like C or Rust typically allow programmers to re-interpret the underlying representation of a value at a different type.
In Rust, this is often called “transmutation”; in C, a common term for this is “type punning”.
The easiest way to do this in Rust is via the <a href="https://doc.rust-lang.org/std/mem/fn.transmute.html"><code class="language-plaintext highlighter-rouge">mem::transmute</code></a> function, but alternatively transmutation is possible via <code class="language-plaintext highlighter-rouge">union</code>s or by casting a <code class="language-plaintext highlighter-rouge">*mut T</code> raw pointer to <code class="language-plaintext highlighter-rouge">*mut U</code>.
In C, the easiest way is to use a <code class="language-plaintext highlighter-rouge">memcpy</code> between variables of different types, but <code class="language-plaintext highlighter-rouge">union</code>-based type punning is also sometimes allowed, as is loading data of arbitrary type using a character-typed pointer.
(Other kinds of pointer-based type punning are forbidden by C’s strict aliasing rules, but Rust has no such restriction.)
The next question we are going to treat in this blog post is: what happens when we transmute a pointer to an integer?</p>
<p>Basically, imagine the original example after we replace the two casts (computing <code class="language-plaintext highlighter-rouge">xaddr</code> and <code class="language-plaintext highlighter-rouge">y2addr</code>) with a call to a function like</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">uintptr_t</span> <span class="nf">transmute_memcpy</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">uintptr_t</span> <span class="n">res</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">res</span><span class="p">,</span> <span class="o">&</span><span class="n">ptr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">uintptr_t</span><span class="p">));</span>
<span class="k">return</span> <span class="n">res</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>or</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">uintptr_t</span> <span class="nf">transmute_union</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">ptr</span><span class="p">)</span> <span class="p">{</span>
<span class="k">typedef</span> <span class="k">union</span> <span class="p">{</span> <span class="kt">uintptr_t</span> <span class="n">res</span><span class="p">;</span> <span class="kt">int</span> <span class="o">*</span><span class="n">ptr</span><span class="p">;</span> <span class="p">}</span> <span class="n">Transmute</span><span class="p">;</span>
<span class="n">Transmute</span> <span class="n">t</span><span class="p">;</span>
<span class="n">t</span><span class="p">.</span><span class="n">ptr</span> <span class="o">=</span> <span class="n">ptr</span><span class="p">;</span>
<span class="k">return</span> <span class="n">t</span><span class="p">.</span><span class="n">res</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>All the same optimizations still apply – right?
This requires a compiler that can “see through” <code class="language-plaintext highlighter-rouge">memcpy</code> or union field accesses, but that does not seem too much to ask.
But now we have the same contradiction as before!
Either the original program already has Undefined Behavior, or one of the optimizations is incorrect.</p>
<p>Previously, we resolved this conundrum by saying that removing the “dead cast” <code class="language-plaintext highlighter-rouge">(uintptr_t)x</code> whose result is unused was incorrect, because that cast had the side-effect of “exposing” the permission of <code class="language-plaintext highlighter-rouge">x</code> to be picked up by future integer-pointer casts.
We could apply the same solution again, but this time, we would have to say that a <code class="language-plaintext highlighter-rouge">union</code> access (at integer type) or a <code class="language-plaintext highlighter-rouge">memcpy</code> (to an integer) can have an “expose” side-effect and hence cannot be entirely removed even if its result is unused.
And that sounds quite bad!
<code class="language-plaintext highlighter-rouge">(uintptr_t)x</code> only happens in code that does tricky things with pointers, so urging the compiler to be careful and optimize a bit less seems like a good idea (and at least in Rust, <code class="language-plaintext highlighter-rouge">x.addr()</code> even provides a way to opt-out of this side-effect).
However, <code class="language-plaintext highlighter-rouge">union</code> and <code class="language-plaintext highlighter-rouge">memcpy</code> are all over the place.
Do we now have to treat <em>all</em> of them as having side-effects?
In Rust, due to the lack of a strict aliasing restriction (or in C with <code class="language-plaintext highlighter-rouge">-fno-strict-aliasing</code>), things get even worse, since literally <em>any</em> load of an integer from a raw pointer might be doing a pointer-integer transmutation and thus have the “expose” side-effect!</p>
<p>To me, and speaking from a Rust perspective, that sounds like bad idea.
Sure, we want to make it as easy as possible to write low-level code in Rust, and that code sometimes has to do unspeakable things with pointers.
But we <em>don’t</em> like the <em>entire ecosystem</em> to carry the cost of that decision by making it harder to remove every raw pointer load everywhere!
So what are the alternatives?</p>
<p>Well, I would argue that the alternative is to treat the original program (after translation to Rust) as having Undefined Behavior.
There are, to my knowledge, generally two reasons why people might want to transmute a pointer to an integer:</p>
<ul>
<li>Chaining many <code class="language-plaintext highlighter-rouge">as</code> casts is annoying, so calling <code class="language-plaintext highlighter-rouge">mem::transmute</code> might be shorter.</li>
<li>The code doesn’t actually care about the <em>integer</em> per se, it just needs <em>some way</em> to hold arbitrary data in a container of a given type.</li>
</ul>
<p>The first kind of code should just use <code class="language-plaintext highlighter-rouge">as</code> casts, and we should do what we can (via lints, for example) to identify such code and get it to use casts instead.<sup id="fnref:compat" role="doc-noteref"><a href="#fn:compat" class="footnote" rel="footnote">6</a></sup>
Maybe we can adjust the cast rules to remove the need for chaining, or add some <a href="https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.expose_addr">helper methods</a> that can be used instead.</p>
<p>The second kind of code should not use integers!
Putting arbitrary data into an integer type is already somewhat suspicious due to the trouble around padding (if we want to make use of those shiny new <code class="language-plaintext highlighter-rouge">noundef</code> annotations that LLVM offers, we have to disallow transmuting data with padding to integer types).
The right type to use for holding arbitrary data is <code class="language-plaintext highlighter-rouge">MaybeUninit</code>, so e.g. <code class="language-plaintext highlighter-rouge">[MaybeUninit<u8>; 1024]</code> for up to 1KiB of arbitrary data.
<code class="language-plaintext highlighter-rouge">MaybeUninit</code> can also hold pointers with their provenance without any trouble.</p>
<p>Because of that, I think we should move towards discouraging, deprecating, or even entirely disallowing pointer-integer transmutation in Rust.
That means a cast is the only legal way to turn a pointer into an integer, and after the discussion above we got our casts covered.
A <a href="https://github.com/rust-lang/rust/pull/95547">first careful step</a> has recently been taken on this journey; the <code class="language-plaintext highlighter-rouge">mem::transmute</code> documentation now cautions against using this function to turn pointers into integers.</p>
<p><strong>Update (2022-09-14):</strong> After a lot more discussion, the current model pursued by the Unsafe Code Guidelines WG is to say that pointer-to-integer transmutation is permitted, but just strips provenance without exposing it.
That means the program with the casts replaced by transmutation is UB, because the <code class="language-plaintext highlighter-rouge">ptr</code> it ends up dereferencing has invalid provenance.
However, the transmutation itself is not UB.
Basically, pointer-to-integer transmutation is equivalent to <a href="https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.addr">the <code class="language-plaintext highlighter-rouge">addr</code> method</a>, with all its caveats – in particular, transmuting a pointer to an integer and back is like calling <code class="language-plaintext highlighter-rouge">addr</code> and then calling <a href="https://doc.rust-lang.org/nightly/std/ptr/fn.invalid.html"><code class="language-plaintext highlighter-rouge">ptr::invalid</code></a>.
That is a <em>lossy</em> round-trip: it loses provenance information, making the resulting pointer invalid to dereference.
It is lossy even if we use a regular integer-to-pointer cast (or <code class="language-plaintext highlighter-rouge">from_exposed_addr</code>) for the conversion back to a pointer, since the original provenance might never have been exposed.
Compared to declaring the transmutation itself UB, this model has some nice properties that help compiler optimizations (such as removing unnecessary store-load round-trips). <strong>/Update</strong></p>
<h2 id="a-new-hope-for-rust">A new hope for Rust</h2>
<p>All in all, while the situation may be very complicated, I am actually more hopeful than ever that we can have both – a precise memory model for Rust <em>and</em> all the optimizations we can hope for!
The three core pillars of this approach are:</p>
<ul>
<li>making pointer-integer casts “expose” the pointer’s provenance,</li>
<li>offering <code class="language-plaintext highlighter-rouge">ptr.addr()</code> to learn a pointer’s address <em>without</em> exposing its provenance,</li>
<li>and making pointer-integer transmutation round-trips lossy (such that the resulting pointer cannot be dereferenced).</li>
</ul>
<p>Together, they imply that we can optimize “nice” code (that follows Strict Provenance, and does not “expose” or use integer-pointer casts) perfectly, without any risk of breaking code that does use pointer-integer round-trips.
In the easiest possible approach, the compiler can simply treat pointer-integer and integer-pointer casts as calls to some opaque external function.
Even if the rest of the compiler literally entirely ignores the existence of pointer-integer round-trips, it will still support such code correctly!</p>
<p>However, it’s not just compilers and optimizers that benefit from this approach.
One of my biggest quests is giving a <a href="https://plv.mpi-sws.org/rustbelt/stacked-borrows/">precise model</a> of the Rust aliasing rules, and that task has just gotten infinitely easier.
I used to worry <em>a lot</em> about pointer-integer round-trips while developing Stacked Borrows.
This is the entire reason why all of this “untagged pointer” mess exists.</p>
<p>Under this brave new world, I can entirely ignore pointer-integer round-trips when designing memory models for Rust.
Once that design is done, support for pointer-integer round-trips can be added as follows:</p>
<ul>
<li>When a pointer is cast to an integer, its provenance (whatever information it is that the model attaches to pointers – in Stacked Borrows, this is called the pointer’s <em>tag</em>) is marked as “exposed”.</li>
<li>When an integer is cast to a pointer, we <em>guess</em> the provenance that the new pointer should have from among all the provenances that have been previously marked as “exposed”.
(And I mean <em>all</em> of them, not just the ones that have been exposed “at the same address” or anything like that. People will inevitably do imperfect round-trips where the integer is being offset before being cast back to a pointer, and we should support that. As far as I know, this doesn’t really cost us anything in terms of optimizations.)</li>
</ul>
<p>This “guess” does not need to be described by an algorithm.
Through the magic that is formally known as <a href="https://en.wikipedia.org/wiki/Angelic_non-determinism">angelic non-determinism</a>, we can just wave our hands and say “the guess will be maximally in the programmer’s favor”: if <em>any</em> possible choice of (previously exposed) provenance makes the program work, then that is the provenance the new pointer will get.
Only if <em>all</em> choices lead to Undefined Behavior, do we consider the program to be ill-defined.
This may sound like cheating, but it is actually a legit technique in formal specifications.</p>
<p>Also note how it’s really <em>just</em> the integer-pointer casts that are making things so complicated here.
If it weren’t for them, we would not even need all that “exposure” machinery.
Pointer-integer casts on their own are perfectly fine!
That’s why <a href="https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.addr"><code class="language-plaintext highlighter-rouge">addr</code></a>+<a href="https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.with_addr"><code class="language-plaintext highlighter-rouge">with_addr</code></a> is such a nice API from a memory model perspective.<sup id="fnref:fake_alloc" role="doc-noteref"><a href="#fn:fake_alloc" class="footnote" rel="footnote">7</a></sup></p>
<p>This approach <em>does</em> have the disadvantage that it becomes near impossible to write a tool like Miri that precisely matches the specification, since Miri cannot possibly implement this “guessing” accurately.
However, Miri can still properly check code that uses Strict Provenance operations, so hopefully this is just yet another incentive (besides the more precise specification and better optimization potential) for programmers to move their code away from integer-pointer casts and towards Strict Provenance.
And who knows, maybe there <em>is</em> a clever way that Miri can actually get reasonably close to checking this model?
It doesn’t have to be perfect to be useful.</p>
<p>What I particularly like about this approach is that it makes pointer-integer round-trips a purely local concern.
With an approach like Stacked Borrows “untagged pointers”, <em>every</em> memory operation has to define how it handles such pointers.
Complexity increases globally, and even when reasoning about Strict Provenance code we have to keep in mind that some pointers in other parts of the program might be “untagged”.
In contrast, this “guessing maximally in your favor”-based approach is entirely local; code that does not syntactically contain exposing pointer-integer or integer-pointer casts can literally forget that such casts exist at all.
This is true both for programmers thinking about their <code class="language-plaintext highlighter-rouge">unsafe</code> code, and for compiler authors thinking about optimizations.
Compositionality at its finest!</p>
<h2 id="but-what-about-c">But what about C?</h2>
<p>I have talked a lot about my vision for “solving” pointer provenance in Rust.
What about other languages?
As you might have heard, C is moving towards making <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf">PNVI-ae-udi</a> an official recommendation for how to interpret the C memory model.
With C having so much more legacy code to care about and many more stakeholders than Rust does, this is an impressive achievement!
How does it compare to all I said above?</p>
<p>First of all, the “ae” part of the name refers to “address-exposed” – that’s exactly the same mechanism as what I described above!
In fact, I have taken the liberty to use their terminology.
So, on this front, I see Rust and C as moving into the same direction, which is great.
(Now we just need to get LLVM to also move in that direction.)
I should mention that PNVI-ae-udi does <em>not</em> account for the <code class="language-plaintext highlighter-rouge">restrict</code> modifier of C, so in a sense it is solving an easier problem than the Rust memory model which has no choice but to contend with interesting questions around aliasing restrictions.
However, if/when a more precise model of C with <code class="language-plaintext highlighter-rouge">restrict</code> emerges, I don’t think they will be moving away from the “address-exposed” model – to the contrary, as I just argued this model means we can specify <code class="language-plaintext highlighter-rouge">restrict</code> without giving a thought to pointer-integer round-trips.</p>
<p>The “udi” part of the name means “user disambiguation”, and is basically the mechanism by which an integer-pointer cast in C “guesses” the provenance it has to pick up.
The details of this are complicated, but the end-to-end effect is basically exactly the same as in the “best possible guess” model I have described above!
Here, too, my vision for Rust aligns very well with the direction C is taking.
(The set of valid guesses in C is just a lot more restricted since they do not have <code class="language-plaintext highlighter-rouge">wrapping_offset</code>, and the model does not cover <code class="language-plaintext highlighter-rouge">restrict</code>.
That means they can actually feasibly give an algorithm for how to do the guessing.
They don’t have to invoke scary terms like “angelic non-determinism”, but the end result is the same – and to me, the fact that it is equivalent to angelic non-determinism is what justifies this as a reasonable semantics.
Presenting this as a concrete algorithm to pick a suitable provenance is then just a stylistic choice.)
Kudos go to Michael Sammler for opening my eyes to this interpretation of “user disambiguation”, and arguing that angelic non-determinism might not be such a crazy idea after all.</p>
<p>What is left is the question of how to handle pointer-integer transmutation, and this is where the roads are forking.
PNVI-ae-udi explicitly says loading from a union field at integer type exposes the provenance of the pointer being loaded, if any.
So, the example with <code class="language-plaintext highlighter-rouge">transmute_union</code> would be allowed, meaning the optimization of removing the “dead” load from the <code class="language-plaintext highlighter-rouge">union</code> would <em>not</em> (in general) be allowed.
Same for <code class="language-plaintext highlighter-rouge">transmute_memcpy</code>, where the proposal says that when we access the contents of <code class="language-plaintext highlighter-rouge">ret</code> at type <code class="language-plaintext highlighter-rouge">uintptr_t</code>, that will again implicitly expose the provenance of the pointer.</p>
<p>I think there are several reasons why this choice makes sense for C, that do not apply to Rust:</p>
<ul>
<li>There is a <em>lot</em> of legacy code. A <em>LOT</em>.</li>
<li>There is no alternative like <code class="language-plaintext highlighter-rouge">MaybeUninit</code> that could be used to hold data without losing provenance.</li>
<li>Strict aliasing means that not <em>all</em> loads at integer type have to worry about provenance; only loads at character type are affected.</li>
</ul>
<p>On the other hand, I am afraid that this choice might come with a significant cost in terms of lost optimizations.
As the example above shows, the compiler has to be very careful when removing any operation that can expose a provenance, since there might be integer-pointer casts later that rely on this.
(Of course, until this is actually implemented in GCC or LLVM, it will be hard to know the actual cost.)
Because of all that, I think it is reasonable for Rust to make a different choice here.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This was a long post, but I hope you found it worth reading. :)
To summarize, my concrete calls for action in Rust are:</p>
<ul>
<li>Code that uses pointer-integer transmutation round-trips should migrate to regular casts or <code class="language-plaintext highlighter-rouge">MaybeUninit</code> transmutation ASAP.
I think we should declare pointer-integer transmutation as “losing” provenance, so code that assumes a lossless transmutation round-trip has Undefined Behavior.</li>
<li>Code that uses pointer-integer or integer-pointer <em>casts</em> might consider migrating to the Strict Provenance APIs.
You can do this even on stable with <a href="https://crates.io/crates/sptr">this polyfill crate</a>.
However, such code <em>is and remains</em> well-defined. It just might not be optimized as well as one could hope, it might not compile on CHERI, and Miri will probably miss some bugs.
If there are important use-cases not covered by Strict Provenance, we’d like to hear about them!</li>
</ul>
<p>This is a large undertaking and will require a lot of work!
However, at the end of this road is a language with a coherent, well-defined memory model <em>and</em> support for doing unspeakable things to pointers <em>without</em> incurring a (reasoning or optimization) cost on code that is perfectly nice to its pointers.
Let us work towards this future together. :)</p>
<h2 id="appendix">Appendix</h2>
<h4 id="integer-pointer-casts-are-not-pure-either">Integer-pointer casts are not pure, either</h4>
<p>I promised an example of how integer-pointer casts are “impure”, in the sense that two casts with the same input integer can produce different pointers:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">int</span> <span class="nf">uwu</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="kr">restrict</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="o">*</span><span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uintptr_t</span> <span class="n">xaddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">x</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">y2</span> <span class="o">=</span> <span class="n">y</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">uintptr_t</span> <span class="n">y2addr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="n">y2</span><span class="p">;</span>
<span class="n">assert</span><span class="p">(</span><span class="n">xaddr</span> <span class="o">==</span> <span class="n">y2addr</span><span class="p">);</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">xcopy</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">xaddr</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">y2copy</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="o">*</span><span class="p">)</span><span class="n">y2addr</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">ycopy</span> <span class="o">=</span> <span class="n">y2copy</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span>
<span class="k">return</span> <span class="o">*</span><span class="n">xcopy</span> <span class="o">+</span> <span class="o">*</span><span class="n">ycopy</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">};</span>
<span class="n">uwu</span><span class="p">(</span><span class="o">&</span><span class="n">i</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">&</span><span class="n">i</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="p">}</span></code></pre></figure>
<p>If we ignore the pointer-integer round-trips, this uses <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">xcopy</code> to access <code class="language-plaintext highlighter-rouge">i[0]</code>, while using <code class="language-plaintext highlighter-rouge">y</code> and <code class="language-plaintext highlighter-rouge">ycopy</code> to access <code class="language-plaintext highlighter-rouge">i[1]</code>, so this should be uncontroversial.
<code class="language-plaintext highlighter-rouge">ycopy</code> is computed via <code class="language-plaintext highlighter-rouge">(y-1)+1</code>, but hopefully nobody disagrees with that.
Then we just add some pointer-integer round-trips.</p>
<p>But now, consider that <code class="language-plaintext highlighter-rouge">(int*)xaddr</code> and <code class="language-plaintext highlighter-rouge">(int*)y2addr</code> take the same integer as input!
If the compiler were to treat integer-pointer casts as a pure, deterministic operation, it could replace <code class="language-plaintext highlighter-rouge">(int*)y2addr</code> by <code class="language-plaintext highlighter-rouge">xcopy</code>.
However, that would mean <code class="language-plaintext highlighter-rouge">xcopy</code> and <code class="language-plaintext highlighter-rouge">ycopy</code> have the same provenance!
And there exists no provenance in this program that has access to both <code class="language-plaintext highlighter-rouge">i[0]</code> and <code class="language-plaintext highlighter-rouge">i[1]</code>.
So, either the cast has to synthesize a new provenance that has never been seen before, or doing common subexpression elimination on integer-pointer casts is wrong.</p>
<p>My personal stance is that we should not let the cast synthesize a new provenance.
This would entirely lose the benefit I discussed above of making pointer-integer round-trips a <em>local</em> concern – if these round-trips produce new, never-before-seen kinds of provenance, then the entire rest of the memory model has to define how it deals with those provenances.
We already have no choice but treat pointer-integer casts as an operation with side-effects; let’s just do the same with integer-pointer casts and remain sure that no matter what the aliasing rules are, they will work fine even in the presence of pointer-integer round-trips.</p>
<p>That said, under this model integer-pointer casts still have no side-effect, in the sense that just removing them (if their result is unused) is fine.
Hence, it <em>could</em> make sense to implicitly perform integer-pointer casts in some situations, like when an integer value (without provenance) is used in a pointer operation (due to an integer-to-pointer transmutation).
This breaks some optimizations like load fusion (turning two loads into one assumes the same provenance was picked both times), but most optimizations (in particular dead code elimination) are unaffected.</p>
<h4 id="what-about-llvm">What about LLVM?</h4>
<p>I discussed above how my vision for Rust relates to the direction C is moving towards.
What does that mean for the design space of LLVM?
Which changes would have to be made to fix (potential) miscompilations in LLVM and to make it compatible with these ideas for C and/or Rust?
Here’s the list of open problems I am aware of:</p>
<ul>
<li>LLVM would have to to stop <a href="https://github.com/llvm/llvm-project/issues/33896">removing <code class="language-plaintext highlighter-rouge">inttoptr(ptrtoint(_))</code></a> and stop doing <a href="https://github.com/llvm/llvm-project/issues/34577">replacement of <code class="language-plaintext highlighter-rouge">==</code>-equal pointers</a>.</li>
<li>As the first example shows, LLVM also needs to treat <code class="language-plaintext highlighter-rouge">ptrtoint</code> as a side-effecting operation that has to be kept around even when its result is unused. (Of course, as with everything I say here, there can be special cases where the old optimizations are still correct, but they need extra justification.)</li>
<li>I think LLVM should also treat <code class="language-plaintext highlighter-rouge">inttoptr</code> as a side-effecting (and, in particular, non-deterministic) operation, as per the last example. However, this could possibly be avoided with a <code class="language-plaintext highlighter-rouge">noalias</code> model that specifically accounts for new kinds of provenance being synthesized by casts. (I am being vague here since I don’t know what that provenance needs to look like.)</li>
</ul>
<p>So far, this all applies to LLVM as a Rust and C backend equally, so I don’t think there are any good alternatives.
On the plus side, adapting this strategy for <code class="language-plaintext highlighter-rouge">inttoptr</code> and <code class="language-plaintext highlighter-rouge">ptrtoint</code> means that the recent LLVM <a href="https://lists.llvm.org/pipermail/llvm-dev/2019-March/131127.html">“Full Restrict Support”</a> can also handle pointer-integer round-trips “for free”!</p>
<p>Adding <code class="language-plaintext highlighter-rouge">with_addr</code>/<code class="language-plaintext highlighter-rouge">copy_alloc_id</code> to LLVM is not strictly necessary, since it can be implemented with <code class="language-plaintext highlighter-rouge">getelementptr</code> (without <code class="language-plaintext highlighter-rouge">inbounds</code>).
However, optimizations don’t seem to always deal well with that pattern, so it might still be a good idea to add this as a primitive operation to LLVM.</p>
<p>Where things become more subtle is around pointer-integer transmutation.
If LLVM wants to keep doing replacement of <code class="language-plaintext highlighter-rouge">==</code>-equal integers (which I strongly assume to be the case), <em>something</em> needs to give: my first example, with casts replaced by transmutation, shows a miscompilation.
If we focus on doing an <code class="language-plaintext highlighter-rouge">i64</code> load of a pointer value (e.g. as in the LLVM IR produced by <code class="language-plaintext highlighter-rouge">transmute_union</code>, or pointer-based transmutation in Rust), what are the options?
Here are the ones I have seen so far (but there might be more, of course):</p>
<ol>
<li>The load could be said to behave like <code class="language-plaintext highlighter-rouge">ptrtoint</code>. This means it strips provenance and as a side-effect, it also exposes the pointer.</li>
<li>The load could be said to just strip provenance <em>without</em> exposing the pointer.</li>
<li>The load could be simply UB or return <code class="language-plaintext highlighter-rouge">poison</code>.</li>
<li>The load could produce an integer with provenance, <em>and moreover</em> any computation on such an integer (including <code class="language-plaintext highlighter-rouge">icmp</code>) is UB (or returns <code class="language-plaintext highlighter-rouge">poison</code>).
This has some subtle consequences, but they might be mostly harmless. For example, <code class="language-plaintext highlighter-rouge">x</code> can no longer be replaced by <code class="language-plaintext highlighter-rouge">x+0</code>.
We cannot assume that it is safe to compare arbitrary <code class="language-plaintext highlighter-rouge">i64</code> and branch on the result, even if they are <code class="language-plaintext highlighter-rouge">noundef</code>. Or maybe <code class="language-plaintext highlighter-rouge">noundef</code> also excludes provenance?
This is certainly the least obvious alternative.</li>
</ol>
<p>Except for the first option, these all say that my example with transmutation instead of the pointer-integer casts is UB, which avoids the optimization problems that arise from accepting that example.
That is fine for my vision for Rust, but a problem for C with PNVI-ae-udi.
Only the first option is compatible with that, but that option also means entirely removing a load is non-trivial even if its result is unused!
I hope we can avoid that cost for Rust.</p>
<p>Another interesting difference between these options is whether the resulting semantics are “monotone” with respect to provenance: is “increasing” the provenance of a value (i.e., letting it access more memory) a legal program transformation?
With the last two options, it is not, since adding provenance to a value that did not have it can introduce Undefined Behavior.
The first two options are “monotone” in this sense, which seems like a nice property.
(This is comparable to how the semantics are “monotone” with respect to <code class="language-plaintext highlighter-rouge">undef</code> and <code class="language-plaintext highlighter-rouge">poison</code>: replacing either of them by a fixed value is a legal program transformation. For <code class="language-plaintext highlighter-rouge">undef</code>/<code class="language-plaintext highlighter-rouge">poison</code> this is crucially important, for provenance it seems more like a sanity check of the semantics.)</p>
<p>In all of these cases except the last one, LLVM would probably need something like a <a href="https://gist.github.com/georgemitenkov/3def898b8845c2cc161bd216cbbdb81f">byte type</a> so that a load of arbitrary data (including a pointer with provenance) can be done without losing the provenance attached to the data.</p>
<p>A similar question arises for doing a pointer-typed load of a bare integer (integer-pointer transmutation):</p>
<ol>
<li>The load could have the effects of a <code class="language-plaintext highlighter-rouge">inttoptr</code>. This is less clearly bad than a <code class="language-plaintext highlighter-rouge">ptrtoint</code>, but is still tricky since (at least without extra work) <code class="language-plaintext highlighter-rouge">inttoptr</code> is non-deterministic and depends on the global set of exposed provenances (so, it cannot be easily reordered up across potentially exposing operations).
I also have <a href="https://github.com/rust-lang/unsafe-code-guidelines/issues/286#issuecomment-860189806">another example</a> showing that if <em>both</em> pointer-integer transmutation and integer-pointer transmutation work like the corresponding casts (i.e., if the first of my options is picked for both loads of pointers at integer type, and integers at pointer type), then more optimizations fail:
removing a store that follows a load and just writes back the same value that has just been loaded is no longer correct.
Yet, I think this is what PNVI-ae-udi mandates. Again I hope Rust can opt-out of this.</li>
<li>The load could create a pointer with “invalid” provenance.
That means transmutation of a pointer to an integer and back produces a pointer that cannot be used to access memory, but avoids all the analysis difficulties that come with an <code class="language-plaintext highlighter-rouge">inttoptr</code>.
This is what I think would be best for Rust.</li>
<li>The load could produce <code class="language-plaintext highlighter-rouge">poison</code>, but I see no good reason for doing that.</li>
</ol>
<p>Since LLVM generally errs on the side of delaying UB as long as possible if that is not in conflict with optimizations, the second option for both questions feels most “on-brand” to me personally – but in the end, these are some hard choices that the LLVM community will have to make.
I can help evaluate these trade-offs by giving structure to the design space and pointing out the inevitable consequences of certain decisions, but I can only lend a hand here – while I think and care a lot about LLVM semantics, I haven’t done any direct work on LLVM myself.
I am also not enough of an expert for which optimizations are important and the performance impact of the various options here, so I hope we can get people with that kind of background involved in the discussion as well.
For the sake of the entire ecosystem I mostly hope that LLVM will make <em>some</em> choice so that we can, eventually, leave this limbo state we are currently in.</p>
<h4 id="footnotes">Footnotes</h4>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:restrict" role="doc-endnote">
<p>The exact semantics of <code class="language-plaintext highlighter-rouge">restrict</code> are subtle and I am not aware of a formal definition. (Sadly, the one in the C standard does not really work, as you can see when you try to apply it to my example.) My understanding is as follows: <code class="language-plaintext highlighter-rouge">restrict</code> promises that this pointer, and all pointers derived from it, will not be used to perform memory accesses that <em>conflict</em> with any access done by pointers outside of that set. A “conflict” arises when two memory accesses overlap and at least one of them is a write. This promise is scoped to the duration of the function call when <code class="language-plaintext highlighter-rouge">restrict</code> appears in an argument type; I have no good idea for what the scope of the promise is in other situations. <a href="#fnref:restrict" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:consume" role="doc-endnote">
<p>This is, in fact, a common problem – it is what makes the <code class="language-plaintext highlighter-rouge">consume</code> memory order for atomic accesses basically impossible to specify in a programming language! While instruction sets often have very explicit rules about which instructions are assumed to “depend” on which previous instructions, that notion is hard to rationalize in a language where the compiler can replace <code class="language-plaintext highlighter-rouge">a + (b-a)</code> by <code class="language-plaintext highlighter-rouge">b</code> – and thus <em>remove</em> dependencies from the program. <a href="#fnref:consume" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:dyn" role="doc-endnote">
<p>As mentioned in a previous footnote, this is not actually how <code class="language-plaintext highlighter-rouge">restrict</code> works. The exact set of locations these pointers can access is determined <em>dynamically</em>, and the only constraint is that they cannot be used to access <em>the same location</em> (except if both are just doing a load). However, I carefully picked this example so that these subtleties should not change anything. <a href="#fnref:dyn" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:with_addr" role="doc-endnote">
<p><code class="language-plaintext highlighter-rouge">with_addr</code> has been unstably added to the Rust standard library very recently. Such an operation has been floating around in various discussions in the Rust community for quite a while, and it has even made it into <a href="https://iris-project.org/pdfs/2022-popl-vip.pdf">an academic paper</a> under the name of <code class="language-plaintext highlighter-rouge">copy_alloc_id</code>. Who knows, maybe one day it will find its way into the C standard as well. :) <a href="#fnref:with_addr" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:experiment" role="doc-endnote">
<p>My lawyers advised me to say that all of this is provisional and the specification for <code class="language-plaintext highlighter-rouge">addr</code> and all other Strict Provenance operations might change until their eventual stabilization. <a href="#fnref:experiment" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:compat" role="doc-endnote">
<p>We could even, if we are really desperate, decide to special-case <code class="language-plaintext highlighter-rouge">mem::transmute::<*const T, usize></code> (and likewise for <code class="language-plaintext highlighter-rouge">*mut T</code>) and declare that it <em>does</em> have the “expose” side-effect if the current crate is using some old edition. Sometimes, you have to do ugly things to move forwards. This would not apply to <code class="language-plaintext highlighter-rouge">union</code>- or raw-pointer-based transmutation. <a href="#fnref:compat" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:fake_alloc" role="doc-endnote">
<p>Even more specifically, it’s the integer-pointer cast as part of a pointer-integer round-trip that are a problem. If you are just casting an integer constant to a pointer because on your platform that’s where some fixed memory region lies, and if that memory is entirely outside of the global, stack, and heap allocations that the Rust language itself is aware of, we can still be friends. <a href="#fnref:fake_alloc" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Do we really need Undefined Behavior?2021-11-24T00:00:00+01:00https://www.ralfj.de/blog/2021/11/24/ub-necessary.html
<p>I recently published a <a href="/blog/2021/11/18/ub-good-idea.html">blog post on why Undefined Behavior is actually not a bad idea</a>.
Coincidentally, this is just a few weeks after the publication of <a href="https://dl.acm.org/doi/pdf/10.1145/3477113.3487274">this paper by Victor Yodaiken</a> which basically argues that Undefined Behavior (UB for short) made C unusable for one of its core audiences, OS developers.
Here I refer to the typical modern interpretation of UB: assumptions the compiler may trust, without bounds on what happens if they are violated.
The paper makes many good points, but I think the author is throwing out the baby with the bathwater by concluding that we should entirely get rid of this kind of Undefined Behavior.
The point of this blog post is to argue that we do need UB by showing that even some of the most basic optimizations that all compilers perform require this far-reaching notion of Undefined Behavior.</p>
<!-- MORE -->
<p>To avoid ambiguity, I will refer to the above notion of UB as “unrestricted UB”.
The alternative interpretation of UB promoted by Yodaiken is what one might call “platform-specific UB”.
This requires that even programs with Undefined Behavior should behave in a consistent way: for example, the result of an out-of-bounds write may be ‘unpredictable’, it may either not actually happen or mutate some data somewhere.
However, if a write occurs, the program must still behave in a way that is consistent with performing a write to the given address in the target platform.
(At least, that is my understanding. I hope I am not misrepresenting their position here. The paper does not go into a lot of detail on how the situation could be improved, but it mentions proposals “where compilers map source operations to well-defined instruction sequences, in either a virtual or real machine, from which compiler optimisations may not observably stray”.)<sup id="fnref:N2769" role="doc-noteref"><a href="#fn:N2769" class="footnote" rel="footnote">1</a></sup></p>
<h2 id="examples-of-unrestricted-ub">Examples of unrestricted UB</h2>
<p>So what is the problem with platform-specific UB?
First of all, it does not reflect what the major compilers actually do in practice.
I have seen claims in the past that GCC and LLVM are the only compilers making use of unrestricted UB; this is simply not true.
Here is an <a href="https://godbolt.org/z/j18oW6YaE">example of ICC performing such an optimization</a> (based on example code by Yodaiken):</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include</span> <span class="cpf"><stdlib.h></span><span class="cp">
#include</span> <span class="cpf"><stdio.h></span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span> <span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">i</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>
<span class="o">*</span><span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">j</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>
<span class="o">*</span><span class="n">j</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">k</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">));</span>
<span class="o">*</span><span class="n">k</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="n">j</span><span class="o">+</span><span class="p">(</span><span class="mi">32</span><span class="o">/</span><span class="mi">4</span><span class="p">);</span>
<span class="o">*</span><span class="n">x</span> <span class="o">=</span> <span class="mi">40</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"*i=%d (%p) *j=%d (%p) *k=%d (%p) *x=%d (%p)"</span><span class="p">,</span> <span class="o">*</span><span class="n">i</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="o">*</span><span class="n">j</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="o">*</span><span class="n">k</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="o">*</span><span class="n">x</span><span class="p">,</span> <span class="n">x</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This program prints the values and addresses of a few pointers.
The concrete addresses are different on each execution, but the pattern is always the same:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>*i=1 (0x1aef2a0) *j=1 (0x1aef2c0) *k=1 (0x1aef2e0) *x=40 (0x1aef2e0)
</code></pre></div></div>
<p>Notice how <code class="language-plaintext highlighter-rouge">k</code> and <code class="language-plaintext highlighter-rouge">x</code> point to the same address (<code class="language-plaintext highlighter-rouge">0x1aef2e0</code> in this particular execution), but seem to contain different values.
This is impossible under “platform-specific UB”: no sequence of target platform operations can lead to a situation where the same address contains two different values.<sup id="fnref:N2769-2" role="doc-noteref"><a href="#fn:N2769-2" class="footnote" rel="footnote">2</a></sup>
This example demonstrates that even ICC with <code class="language-plaintext highlighter-rouge">-O1</code> already requires unrestricted UB.
(For completeness’ sake, <a href="https://godbolt.org/z/c8qaWhnEG">here is a similar example for GCC</a>; at the time of writing, <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">x</code> have the same address but different values.
And <a href="https://godbolt.org/z/r8TM7Ga8q">here is an example for clang/LLVM</a>, this time it’s again <code class="language-plaintext highlighter-rouge">k</code> and <code class="language-plaintext highlighter-rouge">x</code> that behave inconsistently.
godbolt supports MSVC but does not seem to be willing to execute the generated programs, but I have no doubt that similar examples can be found for this compiler.)</p>
<p>What about niche compilers specifically built for reliable software?
In their paper, Yodaiken claims that the verified C compiler CompCert “does not do any undefined behavior based optimization”
(with a footnote saying “Except for assuming objects do not overlap in memory”; I am not quite sure what exactly is meant by this).
This is incorrect.
First of all, since CompCert has a proof of correctness, we can have a look at its specification to see what exactly it promises to its users—and that specification quite clearly follows the “unrestricted UB” approach, allowing the compiled program to produce arbitrary results if the source program has Undefined Behavior.
Secondly, while CompCert’s optimizer is very limited, it is still powerful enough that we can actually demonstrate inconsistent behavior for UB programs in practice:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp">
</span>
<span class="kt">int</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">f</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="o">*</span><span class="p">(</span><span class="o">&</span><span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">return</span> <span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">eq</span> <span class="o">=</span> <span class="p">(</span><span class="o">&</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span> <span class="o">==</span> <span class="o">&</span><span class="n">y</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">eq</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d "</span><span class="p">,</span> <span class="n">f</span><span class="p">());</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>(Putting the result of the comparison into a local variable <code class="language-plaintext highlighter-rouge">eq</code> prevents CompCert from optimizing away the entire conditional.)
This program, after being compiled with CompCert, prints “0 1”.
Again, this is printing “the same thing” twice, in this case the value stored at <code class="language-plaintext highlighter-rouge">y</code>, and produces two different results.
CompCert exploited UB in a way that leads to a situation which should be “impossible” on the underlying machine.</p>
<h2 id="platform-specific-ub-is-not-an-option">Platform-specific UB is not an option</h2>
<p>Both of these examples highlight a fundamental problem with “platform-specific UB”: <em>any</em> out-of-bounds write could potentially modify any other variable (at least any variable that has an address in memory).
This can make even the most basic parts of high-quality code generation, such as register allocation, tricky or impossible: a variable that has its address taken has to be re-loaded from that same address any time an out-of-bounds write might have happened, since that write might just have hit the right address to change this variable’s value.
This applies even if the address has not yet been leaked to the outside world, as the first example shows.
This is probably why there is hardly any compiler that follows the platform-specific interpretation of UB.
(I say “hardly any” without knowing a counterexample, but I would not be surprised if some compilers for high-assurance embedded code are so simple that platform-specific UB is sufficient for them. But that is hardly representative for how C is used—and as we have seen with CompCert, even some high-assurance compilers do rely on unrestricted UB.)</p>
<p>I honestly think <em>trying</em> to write a highly optimizing compiler based on a different interpretation of UB would be a worthwhile experiment.
We sorely lack data on how big the performance gain of exploiting UB actually is.
However, I strongly doubt that the result would even come close to the most widely used compilers today—and programmers that can accept such a big performance hit would probably not use C to begin with.
Certainly, any proposal for <em>requiring</em> compilers to curtail their exploitation of UB must come with evidence that this would even be possible while keeping C a viable language for performance-sensitive code.</p>
<p>To conclude, I fully agree with Yodaiken that C has a problem, and that reliably writing C has become incredibly hard since undefined behavior is so difficult to avoid.
It is certainly worth reducing the amount of things that can cause UB in C, and developing practical tools to detect more advanced kinds of UB such as strict aliasing violations.
I also wonder whether strict aliasing can be made more compatible with low-level programming patterns—or whether C should provide alternative means of alias control to programmers, such as <code class="language-plaintext highlighter-rouge">restrict</code> (not that its specification doesn’t have its own set of problems, but an opt-in mechanism like <code class="language-plaintext highlighter-rouge">restrict</code> seems fundamentally more suited when the goal is to ensure compatibility with existing code).</p>
<p>However, I do not think this problem can be solved with a platform-specific interpretation of UB.
That would declare all but the most basic C compilers as non-compliant.
We need to find some middle ground that actually permits compilers to meaningfully optimize the code, while also enabling programmers to actually write standards-compliant programs.
I am not involved in the work that happens here on the C side, but for Rust, I think we can achieve this through a combination of being diligent about how much UB we really need, using language and API design to make it easier for the programmer to be aware of UB requirements imposed by the code they write, and providing <a href="https://github.com/rust-lang/miri/">tools</a> that help programmers determine if their code exhibits UB or not.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:N2769" role="doc-endnote">
<p>The paper also cites <a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2769.pdf">C committee proposal N2769</a>. However, N2769 explicitly says that <code class="language-plaintext highlighter-rouge">a + 1 < a</code> can still be optimized to <code class="language-plaintext highlighter-rouge">false</code>, while Yodaiken mentions this as an undesirable optimization. In fact, N2769 says it is okay and of “great value” to “assume the absence of UB”. I admit I do not understand the distinction N2769 makes between “assuming the absence of UB” and “making assumptions about the result of UB”, but it seems clear that Yodaiken goes even further than N2769 in restricting UB-based optimizations. <a href="#fnref:N2769" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:N2769-2" role="doc-endnote">
<p>I assume N2769 would also not be happy with this outcome of our example program. <a href="#fnref:N2769-2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Undefined Behavior deserves a better reputation2021-11-18T00:00:00+01:00https://www.ralfj.de/blog/2021/11/18/ub-good-idea.html
<p><em>This is a cross-post of an <a href="https://blog.sigplan.org/2021/11/18/undefined-behavior-deserves-a-better-reputation/">article that I wrote for the SIGPLAN blog</a>.</em></p>
<p>“Undefined Behavior” often has a bad reputation. People see it as an excuse compiler writers use to break code, or an excuse by a lazy language designer to not complete the specification and properly define all this behavior.
But what, really, is Undefined Behavior, and is it as bad as its reputation?
In this blog post, I will look at this topic from a PL perspective, and argue that Undefined Behavior (or UB for short) is a valuable tool in a language designer’s toolbox, and that it can be used responsibly to convey more of the programmer’s insight about their code to the compiler with the goal of enabling more optimizations.
I will also explain why I spent a significant amount of time adding <em>more</em> UB to Rust.</p>
<!-- MORE -->
<h2 id="a-simple-example">A simple example</h2>
<p>In the best PL tradition, let us consider an artificial example to demonstrate the benefit of UB.
Imagine we want to implement a function that returns the element in the middle of an array.
If we are using Rust, we would probably write something like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">mid</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&</span><span class="p">[</span><span class="nb">i32</span><span class="p">])</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="nb">i32</span><span class="o">></span> <span class="p">{</span>
<span class="k">if</span> <span class="n">data</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">None</span><span class="p">;</span> <span class="p">}</span>
<span class="k">return</span> <span class="nf">Some</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">data</span><span class="nf">.len</span><span class="p">()</span><span class="o">/</span><span class="mi">2</span><span class="p">]);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The argument is of type <code class="language-plaintext highlighter-rouge">&[i32]</code>, which is called a “slice” and consists of a pointer to some array and information about how long the array is.
<code class="language-plaintext highlighter-rouge">mid</code> itself returns an integer wrapped in <code class="language-plaintext highlighter-rouge">Option</code> (corresponding to <code class="language-plaintext highlighter-rouge">Maybe</code> in Haskell) to properly signal the case where the array is empty.
In the non-empty case, it computes the index in the middle of <code class="language-plaintext highlighter-rouge">data</code>, and returns that element.</p>
<p>Now imagine this function is called in a tight loop in the benchmark for our next paper, so performance <em>really</em> matters.
Is there any performance improvement we can hope to achieve in this function?
It might seem like <code class="language-plaintext highlighter-rouge">mid</code> already does the absolute minimum amount of work required for the task, but there is some hidden cost in the array access <code class="language-plaintext highlighter-rouge">data[_]</code>:
the compiler has to insert a bounds-check here to ensure that we do not access data beyond the size of the array that <code class="language-plaintext highlighter-rouge">data</code> points to.
But as the programmer we know that bounds-check to be entirely unnecessary, since <code class="language-plaintext highlighter-rouge">data.len()/2</code> will always be smaller than <code class="language-plaintext highlighter-rouge">data.len()</code>!
Wouldn’t it be great if there was a way to tell the compiler about this, such that we can be sure no bounds check happens?</p>
<p>Here is one way to accomplish that in Rust:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">mid</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="o">&</span><span class="p">[</span><span class="nb">i32</span><span class="p">])</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="nb">i32</span><span class="o">></span> <span class="p">{</span>
<span class="k">if</span> <span class="n">data</span><span class="nf">.is_empty</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">None</span><span class="p">;</span> <span class="p">}</span>
<span class="k">match</span> <span class="n">data</span><span class="nf">.get</span><span class="p">(</span><span class="n">data</span><span class="nf">.len</span><span class="p">()</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">Some</span><span class="p">(</span><span class="o">&</span><span class="n">x</span><span class="p">)</span> <span class="k">=></span> <span class="k">return</span> <span class="nf">Some</span><span class="p">(</span><span class="n">x</span><span class="p">),</span>
<span class="nb">None</span> <span class="k">=></span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">unreachable_unchecked</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We are now using the <code class="language-plaintext highlighter-rouge">get</code> operation to access the array, which returns an <code class="language-plaintext highlighter-rouge">Option</code> that is <code class="language-plaintext highlighter-rouge">None</code> for out-of-bounds accesses.
And in case we get <code class="language-plaintext highlighter-rouge">None</code>, we call a special function <code class="language-plaintext highlighter-rouge">unreachable_unchecked</code> which makes a <em>binding promise to the compiler</em> that this piece of code is unreachable.
The keyword <code class="language-plaintext highlighter-rouge">unsafe</code> here indicates that what we are doing is not covered by the type safety guarantees of the language: the compiler will not actually check that the promise we made holds true, it will just trust us on that.
(The phrase “unchecked” is a Rust idiom; this is the “unchecked” version of <code class="language-plaintext highlighter-rouge">unreachable</code>, which inserts a run-time check that safely aborts the program should this code ever be reached – or, to be more precise, it triggers a Rust panic.)</p>
<p>After some inlining, the relevant part of this code looks as follows:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">let</span> <span class="n">idx</span> <span class="o">=</span> <span class="n">data</span><span class="nf">.len</span><span class="p">()</span><span class="o">/</span><span class="mi">2</span><span class="p">;</span>
<span class="k">if</span> <span class="n">idx</span> <span class="o"><</span> <span class="n">data</span><span class="nf">.len</span><span class="p">()</span> <span class="p">{</span> <span class="c1">// Automatically inserted bounds-check.</span>
<span class="o">...</span> <span class="c1">// Access the array at `idx`.</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nf">unreachable_unchecked</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Since we told the compiler that the <code class="language-plaintext highlighter-rouge">else</code> branch is unreachable, it is easy to optimize away the conditional, so we end up with just a direct access to element <code class="language-plaintext highlighter-rouge">idx</code> in the array.
Problem solved!
(In fact, Rust provides <code class="language-plaintext highlighter-rouge">get_unchecked</code> as an alternative to <code class="language-plaintext highlighter-rouge">get</code> where the caller has to promise that the index is in-bounds, so a Rust programmer would just write <code class="language-plaintext highlighter-rouge">data.get_unchecked(data.len()/2)</code> to implement <code class="language-plaintext highlighter-rouge">mid</code> efficiently.)</p>
<p>I expect some readers will not be happy with the way I achieved the desired optimization in the initial example, and argue that the compiler should be smart enough to do this automatically.
I will get back to this point later; for now, just note that the latest stable version of Rust at the time of writing does <a href="https://rust.godbolt.org/z/G34Ezzb9c">not perform this optimization</a> (as indicated by the call to <code class="language-plaintext highlighter-rouge">panic_bounds_check</code>).</p>
<h2 id="where-is-the-undefined-behavior">Where is the Undefined Behavior?</h2>
<p>Hang on, you might say at this point, wasn’t this blog post supposed to be about Undefined Behavior?
That term did not even appear in the discussion of the example!
Indeed, I was a bit sneaky and used different terminology that I think better captures a constructive way to think about Undefined Behavior.
In the typical terminology, I would have said that calling the special function <code class="language-plaintext highlighter-rouge">unreachable_unchecked</code> causes immediate Undefined Behavior.
Following the definition in the latest C standard (also shared by C++), the standard “imposes no requirements” on programs that exhibit Undefined Behavior.
The compiler can hence basically replace the <code class="language-plaintext highlighter-rouge">else</code> branch by whatever code it wants, famously including “to make <a href="http://www.catb.org/jargon/html/N/nasal-demons.html">demons fly out of your nose</a>”, but also including just executing the <code class="language-plaintext highlighter-rouge">then</code> branch instead.</p>
<p>This line of reasoning leads to the same result, but it paints an unnecessarily antagonistic picture of compiler writers. It makes it sound like compilers use complicated analyses to detect Undefined Behavior. Once they find UB, they have an excuse to emit broken code and hide behind the standard should anyone complain.
This is not what actually happens.
As we have seen in our example, the compiler really has no idea if this code has Undefined Behavior or not – all it does is perform optimizations that are correct <em>under the extra assumption</em> that there is no Undefined Behavior.</p>
<h2 id="ub-is-a-double-edged-sword">UB is a double-edged sword</h2>
<p>Another reaction you might have is that <code class="language-plaintext highlighter-rouge">unreachable_unchecked</code> is not a “typical” example of UB.
Most people probably associate that term with C or C++, which do not even have <code class="language-plaintext highlighter-rouge">unreachable_unchecked</code> (though many compilers provide an intrinsic with the same effect, e.g., <code class="language-plaintext highlighter-rouge">__builtin_unreachable</code> in GCC).
So it may seem like I picked a strange example.
Shouldn’t I be talking about how, say, signed integer overflow is UB?</p>
<p>This is the right time to admit that I am <em>not</em> going to defend all UB in C/C++.
I think UB as a concept is a great idea, and <code class="language-plaintext highlighter-rouge">unreachable_unchecked</code> is the “most pure” form of UB that shows how it can be used by the programmer to convey extra information to the compiler – but I also think that C and C++ are massively overusing UB.
Of course, it is easy to say this with the benefit of hindsight; the first C compilers were extremely simple and today’s use of UB for optimizations only emerged over time.
It took a while for the implications of the modern interpretation of UB in the standard to become clear – and C and C++, being very successful languages, have massive existing codebases which makes it super hard to revise any prior decision.
This post is about defending and promoting UB as a concept, not UB in C/C++.</p>
<p>Speaking of signed integer overflow, I think this is actually a good example for how to <em>not</em> use UB.
An innocent-looking <code class="language-plaintext highlighter-rouge">+</code> turns into a promise of the programmer that this addition will never overflow, but the programmer probably will not carefully do a mental no-overflow proof for every single addition in their program.
Instead, <code class="language-plaintext highlighter-rouge">+</code> could perform overflow checks or well-defined wrap-around, and the language could provide an <code class="language-plaintext highlighter-rouge">unchecked_add</code> function where overflows are UB.
This lets the programmer opt in to providing extra no-overflow promises, to be used in situations where it is really beneficial for performance that the compiler can make this assumption (such as <a href="https://youtu.be/yG1OZ69H_-o?t=2357">this example</a>).
Basically, I am considering this a language (and library) design problem: UB is a sharp knife; when used well it gets the job done better, but it can also hurt a lot when used without enough care.</p>
<p>Language and library design is not everything that can be used to tame UB, however.
Good tooling can also make a big difference: if programmers can easily run their programs in “UB-checking mode”, they can write tests to at least ensure the absence of UB for certain inputs.
(Shameless plug: I am working on <a href="https://github.com/rust-lang/miri/">Miri</a>, a tool that provides exactly this for Rust.)
Library authors can run their test suites with such a tool, and the tool can also be used exploratively to learn about what exactly is and is not UB in the first place.
I think this is absolutely crucial, and language designers should design UB in a way that makes UB-checking tools more feasible.
For the examples of UB we have seen so far (<code class="language-plaintext highlighter-rouge">unreachable_unchecked</code>, <code class="language-plaintext highlighter-rouge">get_unchecked</code>, and <code class="language-plaintext highlighter-rouge">unchecked_add</code>), this is obviously trivial.</p>
<h2 id="how-far-can-we-push-ub">How far can we push UB?</h2>
<p>That said, not all UB is that simple to teach and test.
Even Rust, with its benefit of learning from several decades of experience with UB in C and C++, has UB that is a lot more subtle than this. The most glaring example of this is probably UB related to incorrect aliasing of mutable references.
(Other, less extreme examples would be UB due to using uninitialized memory, or UB due to data races.)</p>
<p>The Rust type system ensures that mutable references never alias any other reference that is currently being used in the program, i.e., they never point to the same memory as any other reference.
This is a juicy guarantee for compiler writers, because while reordering memory accesses is often beneficial, it can be very hard to figure out if the transformation is even allowed – if two accesses alias, then their original order must be preserved.</p>
<p>However, <code class="language-plaintext highlighter-rouge">unsafe</code> code in Rust could easily create aliasing mutable references.
So what can we do?
We make the programmer promise that they do not do this!
This is a lot like saying “the programmer promises that <code class="language-plaintext highlighter-rouge">unreachable_unchecked</code> is never called”, so we can put on our UB lens and say that it is Undefined Behavior to have aliasing mutable references.</p>
<p>The devil is of course in the details of defining what exactly this means.
<a href="https://plv.mpi-sws.org/rustbelt/stacked-borrows/">Stacked Borrows</a> (part of <a href="https://www.ralfj.de/research/thesis.html">my PhD thesis</a> and also described in a series of blog posts: <a href="https://www.ralfj.de/blog/2018/11/16/stacked-borrows-implementation.html">v1.0</a>, <a href="https://www.ralfj.de/blog/2019/04/30/stacked-borrows-2.html">v2.0</a>, <a href="https://www.ralfj.de/blog/2019/05/21/stacked-borrows-2.1.html">v2.1</a>) goes into all that detail by giving an operational semantics that exactly defines the promises programmers have to make.
And that semantics is non-trivial!
According to Stacked Borrows, the following code has UB:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="o">&</span><span class="k">mut</span> <span class="mi">42</span><span class="p">;</span> <span class="c1">// Safely create a reference.</span>
<span class="k">let</span> <span class="n">xptr</span> <span class="o">=</span> <span class="n">x</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">i32</span><span class="p">;</span> <span class="c1">// Turn that reference into a raw (unchecked) pointer.</span>
<span class="k">let</span> <span class="n">x1</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&</span><span class="k">mut</span> <span class="o">*</span><span class="n">xptr</span> <span class="p">};</span> <span class="c1">// Turn the pointer back into a reference...</span>
<span class="k">let</span> <span class="n">x2</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&</span><span class="k">mut</span> <span class="o">*</span><span class="n">xptr</span> <span class="p">};</span> <span class="c1">// ...twice, so uniqueness is violated.</span>
<span class="o">*</span><span class="n">x1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// Undefined Behavior!</span>
</code></pre></div></div>
<p>The reason this code has UB is that creating <code class="language-plaintext highlighter-rouge">x2</code> makes a promise that this is the unique reference created from <code class="language-plaintext highlighter-rouge">xptr</code>, so the previously created <code class="language-plaintext highlighter-rouge">x1</code> is invalidated when <code class="language-plaintext highlighter-rouge">x2</code> gets created.
This means future uses of <code class="language-plaintext highlighter-rouge">x1</code> are Undefined Behavior.</p>
<p>So this raises the question: can we really expect every author of <code class="language-plaintext highlighter-rouge">unsafe</code> Rust code to internalize Stacked Borrows to the extent that they can faithfully promise to the Rust compiler that their code will comply by this bespoke set of rules?
Is it a good idea to interpret <code class="language-plaintext highlighter-rouge">&mut expr</code> as a promise that all aliasing was carefully checked and this reference is definitely unique?
As with other UB, we can help programmers by providing tools; Miri contains an implementation of Stacked Borrows which both helps us to evaluate whether actual Rust code is compatible (or can reasonably be made compatible) with Stacked Borrows, and it helps Rust programmers by giving them a way to at least test for aliasing violations, and to interactively play with the semantics to gain a better understanding.
I think that puts us in a pretty good spot overall, but some people still argue that Stacked Borrows goes too far and Rust will end up in a situation similar to the one C and C++ find themselves in – where too few programmers actually know how to write UB-free code, and a significant amount of the code people rely on exhibits UB.</p>
<p>Stacked Borrows is not part of the Rust spec, and is not the final word for aliasing-related UB in Rust.
So there is still the chance that future revisions of this model can be made to better align with programmer intuition.
The above code might get accepted because <code class="language-plaintext highlighter-rouge">x2</code> is not actually being used to access memory.
Or maybe <code class="language-plaintext highlighter-rouge">&mut expr</code> should only make such promises when used outside an <code class="language-plaintext highlighter-rouge">unsafe</code> block – but then, should adding <code class="language-plaintext highlighter-rouge">unsafe</code> really change the semantics of the program?
As usual, language design is a game of trade-offs.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I have presented Undefined Behavior as a tool that enables the programmer to write code that the compiler cannot check for correctness, and argued that – used responsibly – it is a useful component in a language designer’s toolbox.</p>
<p>As I alluded to earlier, the “obvious” alternative would be to make the compiler smarter.
However, real programs are typically a lot more complicated than my simple example (which already outsmarts Rust’s LLVM backend), and the reasoning required to justify an optimization can become arbitrarily complicated.
Language designers should acknowledge that optimizers have their limitations and give programmers the tools they need to help the optimizer.
Indeed, I think the fact that Rust combines a clever type checker with the idea of using <code class="language-plaintext highlighter-rouge">unsafe</code> code for the cases where the type checker is not clever enough is crucial for its success: <code class="language-plaintext highlighter-rouge">unsafe</code> is not a bug; it is a feature without which Rust would not be able to make systems programming safer in practice.
It is also worth mentioning that many languages that we all know and love provide comparable “trusted” operations or annotations, e.g., <code class="language-plaintext highlighter-rouge">Obj.magic</code> in OCaml or the rewrite rules in GHC.
Rust only differs in how prevalent unsafe code is in the ecosystem (and in emphasizing the importance of <a href="https://blog.sigplan.org/2019/10/17/what-type-soundness-theorem-do-you-really-want-to-prove/">encapsulating such code within safe APIs</a>).</p>
<p>In closing, I would like to propose that “Undefined Behavior” might need a rebranding.
The term focuses on the negative case, when really all we ever care about as programmers or compiler authors is that programs do <em>not</em> have Undefined Behavior.
Can we get rid of this double negation?
Maybe we should talk about “ensuring Well-Defined Behavior” instead of “avoiding Undefined Behavior”.</p>
<p>To sum up: most of the time, ensuring Well-Defined Behavior is the responsibility of the type system, but as language designers we should not rule out the idea of sharing that responsibility with the programmer.</p>
<p><em>Thanks to Anish Athalye and Adrian Sampson for feedback on earlier drafts of this post.</em></p>
A podcast about GhostCell2021-06-10T00:00:00+02:00https://www.ralfj.de/blog/2021/06/10/ghostcell-podcast.html
<p>I recently got asked to appear as a guest on the podcast <a href="https://anchor.fm/building-with-rust">Building with Rust</a> to talk about our recent work on <a href="http://plv.mpi-sws.org/rustbelt/ghostcell/">GhostCell</a>.
I never was a guest on a podcast before, so this was very exciting and of course I said yes. :)
That episode has been released now, so you can listen to an hour of me talking about GhostCell and about PL research more generally:</p>
<blockquote>
<p><a href="https://anchor.fm/building-with-rust/episodes/Building-with-Rust-Ralf-Jung-on-GhostCell-and-Working-as-a-PL-Researcher-e12auje">Ralf Jung on GhostCell and Working as a PL Researcher</a></p>
</blockquote>
<p>Have fun, and I am sorry for talking so fast. ;)</p>
Safe Systems Programming in Rust2021-03-23T00:00:00+01:00https://www.ralfj.de/blog/2021/03/23/safe-systems-programming-in-rust.html
<p>It has been a long time coming; now our Communications of the ACM article <a href="https://cacm.acm.org/magazines/2021/4/251364-safe-systems-programming-in-rust/fulltext">Safe Systems Programming in Rust</a> has finally been published.
A <a href="https://cacm.acm.org/magazines/2021/4/251364-safe-systems-programming-in-rust/pdf">pdf version</a> is also available.
We explain at a high level what makes Rust so innovative and interesting, and how we are studying Rust formally in the <a href="https://plv.mpi-sws.org/rustbelt/">RustBelt project</a>.
The ACM even produced a <a href="https://vimeo.com/514402648">short video</a> which includes Derek and me explaining the main points of the article.
Have fun. :)</p>