Ralf's Ramblings: Programming

Aug 14, 2024 • Programming, Rust • Edits • Permalink

What is a place expression?

One of the more subtle aspects of the Rust language is the fact that there are actually two kinds of expressions: value expressions and place expressions. Most of the time, programmers do not have to think much about that distinction, as Rust will helpfully insert automatic conversions when one kind of expression is encountered but the other was expected. However, when it comes to unsafe code, a proper understanding of this dichotomy of expressions can be required. Consider the following example:

// As a "packed" struct, this type has alignment 1.
#[repr(packed)]
struct MyStruct {
  field: i32
}

let x = MyStruct { field: 42 };
let ptr = &raw const x.field;
// This line is fine.
let ptr_copy = &raw const *ptr;
// But this line has UB!
// `ptr` is a pointer to `i32` and thus requires 4-byte alignment on
// memory accesses, but `x` is just 1-aligned.
let val = *ptr;

Here I am using the unstable but soon-to-be-stabilized “raw borrow” operator, &raw const. You may know it in its stable form as a macro, ptr::addr_of!, but the & syntax makes the interplay of places and values more explicit so we will use it here.

The last line has Undefined Behavior (UB) because ptr points to a field of a packed struct, which is not sufficiently aligned. But how can it be the case that evaluating *ptr is UB, but evaluating &raw const *ptr is fine? Evaluating an expression should proceed by first evaluating the sub-expressions and then doing something with the result. However, *ptr is a sub-expression of &raw const *ptr, and we just said that *ptr is UB, so shouldn’t &raw const *ptr also be UB? That is the topic of this post.

Apr 11, 2022 • Programming, Research, Rust • Edits • Permalink

Pointers Are Complicated III, or: Pointer-integer casts exposed

In my previous blog post on pointer provenance, I have shown that not thinking carefully about pointers can lead to a compiler that is internally inconsistent: programs that are intended to be well-behaved get miscompiled by a sequence of optimizations, each of which seems intuitively correct in isolation. We thus have to remove or at least restrict at least one of these optimizations. In this post I will continue that trend with another example, and then I will lay down my general thoughts on how this relates to the recent Strict Provenance proposal, what it could mean for Rust more generally, and compare with C’s PNVI-ae-udi. We will end on a very hopeful note about what this could all mean for Rust’s memory model. There’s a lot of information packed into this post, so better find a comfortable reading position. :)

Dec 14, 2020 • Programming, Research, Rust • Edits • Permalink

Pointers Are Complicated II, or: We need better language specs

Some time ago, I wrote a blog post about how there’s more to a pointer than meets the eye. One key point I was trying to make is that

just because two pointers point to the same address, does not mean they are equal in the sense that they can be used interchangeably.

This “extra information” that distinguishes different pointers to the same address is typically called provenance. This post is another attempt to convince you that provenance is “real”, by telling a cautionary tale of what can go wrong when provenance is not considered sufficiently carefully in an optimizing compiler. The post is self-contained; I am not assuming that you have read the first one. There is also a larger message here about how we could prevent such issues from coming up in the future by spending more effort on the specification of compiler IRs.

Jul 14, 2019 • Programming, Research, Rust • Edits • Permalink

"What The Hardware Does" is not What Your Program Does: Uninitialized Memory

This post is about uninitialized memory, but also about the semantics of highly optimized “low-level” languages in general. I will try to convince you that reasoning by “what the hardware does” is inherently flawed when talking about languages such as Rust, C or C++. These are not low-level languages. I have made this point before in the context of pointers; this time it is going to be about uninitialized memory.

The trigger for this post is the deprecation of mem::uninitialized() with Rust 1.36, but the post is just as relevant for C/C++ as it is for Rust.¹

This deprecation has been in the works for more than two years, and it has been almost a year since I took over pushing for this. I am very happy that we are finally there! ↩

Jul 24, 2018 • Internship, Programming, Rust • Edits • Permalink

Pointers Are Complicated, or: What's in a Byte?

This summer, I am again working on Rust full-time, and again I will work (amongst other things) on a “memory model” for Rust/MIR. However, before I can talk about the ideas I have for this year, I have to finally take the time and dispel the myth that “pointers are simple: they are just integers”. Both parts of this statement are false, at least in languages with unsafe features like Rust or C: Pointers are neither simple nor (just) integers.

I also want to define a piece of the memory model that has to be fixed before we can even talk about some of the more complex parts: Just what is the data that is stored in memory? It is organized in bytes, the minimal addressable unit and the smallest piece that can be accessed (at least on most platforms), but what are the possible values of a byte? Again, it turns out “it’s just an 8-bit integer” does not actually work as the answer.

I hope that by the end of this post, you will agree with me on both of these statements. :)