Jul 2, 2022 • RustEditsPermalink

The last two years in Miri

It has been almost two years since my last Miri status update. A lot has happened in the mean time that I would like to tell you all about! If you are using Miri, you might also be seeing new errors in code that previously worked fine; read on for more details on that.

For the uninitiated, Miri is an interpreter that runs your Rust code and checks if it triggers any Undefined Behavior (UB for short). You can think of it a as very thorough (and very slow) version of valgrind/ASan/TSan/UBSan: Miri will detect when your program uses uninitialized memory incorrectly, performs out-of-bounds memory accesses or pointer arithmetic, causes a data race, violates key language invariants, does not ensure proper pointer alignment, or causes incorrect aliasing. As such, it is most helpful when writing unsafe code, as it aids in ensuring that you follow all the rules required for unsafe code to be correct and safe. Miri also detects memory leaks, i.e., it informs you at the end of program execution if there is any memory that was not deallocated properly.

Moreover, Miri is able to run code for other targets: for example, you might be developing code on x86_64, a 64-bit little-endian architecture. When you do low-level bit manipulation, it is easy to introduce bugs that only show up on 32-bit systems or big-endian architectures. You can run Miri with --target i686-unknown-linux-gnu and --target mips64-unknown-linux-gnuabi64 to test your code in those situations – and this will work even if your host OS is macOS or Windows!

That said, it’s not all roses and rainbows. Since Miri just knows how to interpret Rust code, it will get stuck when you call into C code. Miri knows how to execute a certain small set of well-known C functions (e.g. to access environment variables or open files), but it is still easy to run into an “unsupported operation” error due to missing C library implementations. In many cases you should be able to still write tests that cover the remaining code that does not need to, for example, directly access the network; but I also hope that Miri will keep growing its support for key platform APIs.

Miri progress

So, what progress has Miri made in the last two years?

Concurrency

The story of concurrency in Miri continues to surprise me: I had not even planned for Miri to support concurrency, but people just keep showing up and implement one part of it after the other, so now we have pretty good support for finding concurrency bugs!

In that spirit, @JCTyblaidd implemented a data race detector. So if your code does not use appropriate atomic operations to make sure all accesses are suitably synchronized, Miri will now detect that problem and report Undefined Behavior. Here’s a demo. (Click that link and then select “Tools - Miri” to see this in action.) Our data race error reports could be improved a lot (in particular they only show one of the two conflicting accesses involved in a data race), but they are still useful and have already found several data races in the wild.

@thomcc changed our compare_exchange_weak implementation so that it randomly just fails with 80% probability. (The exact rate is adjustable via -Zmiri-compared-exchange-weak-failure-rate=<x>.) Here’s a demo. This is super useful to find issues where code uses compare_exchange_weak but cannot handle spurious failures, since those are very unlikely to occur in the wild.

@henryboisdequin added support for the atomic fetch_min and fetch_max operations, completing our support of the Atomic* types.

And finally, @cbeuw showed up and added “weak memory emulation”. This means that when you do an atomic load, you might not observe the latest value written to that location; instead, a previous value can be returned. Here’s a demo. This happens on real hardware, so having this supported in Miri helps to find more potential bugs. The caveat is that Miri still cannot produce all the behaviors that the actual program might exhibit. Also, the C++20 revision of the C++ memory model disallowed some possible behaviors that were previously allowed, but Miri might produce those behaviors – there is currently no known algorithm that would prevent that. This should be very rare though.

I then just put the icing on the cake by fixing some long-standing issues in our scheduler, so that it no longer gets stuck in spin loops. Miri now has a chance to preempt the running thread at the end of each basic block; the preemption probability is 1% but you can adjust it (using -Zmiri-preemption-rate=<x>).

All of this made our concurrency support sufficiently solid that it no longer shows any warning about being “experimental”. For example, it has already found a data race in the standard library. I can barely express how happy and proud I am that I had to do basically none of this work. :)

One warning though: several of the improvements mentioned above rely on doing random choices. So, it is now more likely than before that Miri will work fine one day, and then show an error after some seemingly inconsequential change to the program the next day. I will get back to these problems later.

Pointer provenance and Stacked Borrows

One of the most subtle aspects of Miri is Stacked Borrows. The aliasing model is already quite complicated, and actually debugging what happens when Miri finds an aliasing violation in your code can be pretty tricky. However, @saethlin made this a lot easier! The error messages now show a lot more detail and point to several relevant locations in the code: not only where the bad access happened, but also where the pointer tag used for that access was created, and where that tag was invalidated. I am very impressed by how good some of these errors are, just check this out.

Another big thing that happened recently is the entire “Strict Provenance” story. I am super excited by these developments, because they offer the chance to fix some long-standing open problems in Miri: the issues with “untagged” raw pointers in Stacked Borrows, and Miri not properly supporting integer-to-pointer casts.

After a lot of work by @carbotaniuman and myself, the situation now is as follows:

This is overall much better than previously – there is nothing funky going on with raw pointers any more, and we should never incorrectly report UB any more even when integer-to-pointer casts are used. :-)

Other areas

Concurrency and pointer aliasing are the two big changes, but there is also a long tail of smaller changes that together make Miri a hack of a lot more useful than it used to be:

We also improved out platform API and intrinsic support:

Bugfixes and cleanup

And of course there were tons of bugfixes. I want to particularly call out @hyd-dev who fixed a lot of issues in our cargo miri frontend. @dtolnay did a lot of code cleanup, making Miri pass by clippy’s critical eyes and ensuring all our tests are properly formatted. And last not least, @oli-obk completely re-wrote our test suite so that we can finally actually test the full output of Miri.

I have probably forgotten to mention something interesting as well. See here for the full list of amazing people who contributed to Miri since my last update. I cannot thank all of you enough! <3

Help, Miri suddenly says my code is broken

Several of the changes mentioned above, in particular with regards to concurrency and Stacked Borrows, mean that Miri is now able to detect more problems than before. On the one hand, that’s of course great, but on the other hand, it can mean that when you re-test Miri on some code that seemed fine, it might suddenly complain! And because of all the non-determinism, it might also be the case that Miri sometimes complains, and sometimes doesn’t (or that it works fine locally but complains on CI). What can you do when that happens?

If Miri shows a new Stacked Borrows error, then that is probably caused by raw pointers now being properly tagged. The new Stacked Borrows messages should make it easier than before to diagnose these problems, but in the end this still remains a case-by-case issue. For example, this program will print:

error: Undefined Behavior: attempting a read access using <3255> at alloc1770[0x4], but that tag does not exist in the borrow stack for this location
 --> src/main.rs:4:25
  |
4 |     let _val = unsafe { *ptr.add(1) }; // ...and use it to access the *second* element.
  |                         ^^^^^^^^^^^
  |                         |
  |                         attempting a read access using <3255> at alloc1770[0x4], but that tag does not exist in the borrow stack for this location
  |                         this error occurs as part of an access at alloc1770[0x4..0x8]
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3255> was created by a retag at offsets [0x0..0x4]
 --> src/main.rs:3:15
  |
3 |     let ptr = &x[0] as *const i32; // We create a pointer to the *first* element...
  |               ^^^^^
  = note: backtrace:
  = note: inside `main` at src/main.rs:4:25

In this case, the clue is in the offsets: note that the tag was created for offsets [0x0..0x4] (as usual in Rust, this excludes 0x4), and the access was at alloc1770[0x4]. The pointer was thus used outside the offset range for which its tag (<3255>) is valid. The fix is to use x.as_ptr() rather than &x[0] as *const i32 to get a pointer that is valid for the entire array.

If the error only shows up sometimes, then it probably has something to do with concurrency. Miri is not truly random, but uses a pseudo-random number generator to make all concurrency-related choices (such as when to schedule another thread). This means you can explore various different possible choices by passing different seeds for Miri to use for its pseudo-random number generator. The following little shell snippet will run Miri with many different seeds, which is great to be able to locally reproduce a failure that you saw on CI, but that you are having trouble reproducing:

for SEED in $({ echo obase=16; seq 0 255; } | bc); do
  echo "Trying seed: $SEED"
  MIRIFLAGS=-Zmiri-seed=$SEED cargo miri test || { echo "Failing seed: $SEED"; break; };
done

It is important that you use exactly the same MIRIFLAGS as CI to ensure the failure can even happen! It is also a good idea to use a filter with cargo miri test FILTER to ensure only the test you care about is being run.

Once you confirmed that this is indeed a non-deterministic test failure, you can narrow it down further by reducing Miri’s non-determinism:

Passing all of these flags will make Miri’s concurrency entirely deterministic. That can be useful to avoid non-deterministic test failures, but note that this will also mask many real-world bugs. Those test failures are often real, even if they can be hard to track down!

If you are still having trouble, feel free to come visit us in our Zulip stream, which is the official communication channel for Miri.

By the way, if you are still disabling some tests on Miri because Miri used to not support panics/concurrency, it’s time to give those tests another try. :) So this is a good opportunity to go over your cfg(miri) and similar attributes and re-evaluate if they are still needed.

Using Miri

If this post made you curious and you want to give Miri a try, here’s how to do that. Assuming you have a crate with some unsafe code, and you already have a test suite (you are testing your unsafe code, right?), you can just install Miri (rustup +nightly component add miri) and then run cargo +nightly miri test to execute all tests in Miri. Note that this requires the nightly toolchain as Miri is still an experimental tool.

Miri is very slow, so it is likely that some tests will take way too long to be feasible. You can adjust iteration counts in Miri without affecting non-Miri testing as follows:

let limit = if cfg!(miri) { 10 } else { 10_000 };

If your test suite needs to access OS facilities such as timers or the file system, set MIRIFLAGS=-Zmiri-disable-isolation to enable those. (Miri will tell you when that is necessary.) If your test suite runs into an unsupported operation, please report an issue. However, note that we can only really support sufficiently “generic” operations – like accessing file systems and network sockets. To implement things like Py_IsInitialized would mean putting a Python interpreter into Miri; that is not going to happen. ;)

If you want to add Miri to your CI to ensure your test suite keeps working in Miri, please consult our README. That document is also a great starting point for any other questions you might have.

Miri is also integrated into the Rust Playground: you can select Miri in the “Tools” menu to check the code for Undefined Behavior.

If Miri complains about your code and you do not understand why, we are happy to help! The best place to ask for support is our Zulip stream. Questions are much easier to answer if you manage to reproduce the problem in a small self-contained bit of example code (ideally on the playground), but feel free to ask even if you do not know how to reduce the problem.

Helping Miri

If you want to help improve Miri, that’s awesome! The issue tracker is a good place to start; the list of issues is short enough that you can just browse through it rather quickly to see if anything pikes your interest. The ones that are particularly suited for getting started are marked with a green label, but notice that even “E-easy” issues can require some amount of Rust experience – Miri is not a good codebase for your first steps in Rust. Another good starting point is to try to implement the missing bit of functionality that keeps your test suite from working. If you need any mentoring, just get in touch. :)

That’s it for now. I am totally blown away by how many people (and companies!) are already using and even contributing to Miri. This endeavor of re-shaping the way we approach correctness of unsafe code has been way more successful than my wildest dreams. I hope Miri can also help you to ensure correctness of your unsafe code, and I am excited for what the next year(s) of Miri development will bring. :D

Posted on Ralf's Ramblings on Jul 2, 2022.
Comments? Drop me a mail or leave a note on reddit!