In my previous blog post on pointer provenance, I have shown that not thinking carefully about pointers can lead to a compiler that is internally inconsistent: programs that are intended to be well-behaved get miscompiled by a sequence of optimizations, each of which seems intuitively correct in isolation. We thus have to remove or at least restrict at least one of these optimizations. In this post I will continue that trend with another example, and then I will lay down my general thoughts on how this relates to the recent Strict Provenance proposal, what it could mean for Rust more generally, and compare with C’s PNVI-ae-udi. We will end on a very hopeful note about what this could all mean for Rust’s memory model. There’s a lot of information packed into this post, so better find a comfortable reading position. :)
Some time ago, I wrote a blog post about how there’s more to a pointer than meets the eye. One key point I was trying to make is that
just because two pointers point to the same address, does not mean they are equal in the sense that they can be used interchangeably.
This “extra information” that distinguishes different pointers to the same address is typically called provenance. This post is another attempt to convince you that provenance is “real”, by telling a cautionary tale of what can go wrong when provenance is not considered sufficiently carefully in an optimizing compiler. The post is self-contained; I am not assuming that you have read the first one. There is also a larger message here about how we could prevent such issues from coming up in the future by spending more effort on the specification of compiler IRs.
This post is about uninitialized memory, but also about the semantics of highly optimized “low-level” languages in general. I will try to convince you that reasoning by “what the hardware does” is inherently flawed when talking about languages such as Rust, C or C++. These are not low-level languages. I have made this point before in the context of pointers; this time it is going to be about uninitialized memory.
This summer, I am again working on Rust full-time, and again I will work (amongst other things) on a “memory model” for Rust/MIR. However, before I can talk about the ideas I have for this year, I have to finally take the time and dispel the myth that “pointers are simple: they are just integers”. Both parts of this statement are false, at least in languages with unsafe features like Rust or C: Pointers are neither simple nor (just) integers.
I also want to define a piece of the memory model that has to be fixed before we can even talk about some of the more complex parts: Just what is the data that is stored in memory? It is organized in bytes, the minimal addressable unit and the smallest piece that can be accessed (at least on most platforms), but what are the possible values of a byte? Again, it turns out “it’s just an 8-bit integer” does not actually work as the answer.
I hope that by the end of this post, you will agree with me on both of these statements. :)