Jul 2, 2022 • Rust • Edits • Permalink

The last two years in Miri

It has been almost two years since my last Miri status update. A lot has happened in the mean time that I would like to tell you all about! If you are using Miri, you might also be seeing new errors in code that previously worked fine; read on for more details on that.

For the uninitiated, Miri is an interpreter that runs your Rust code and checks if it triggers any Undefined Behavior (UB for short). You can think of it a as very thorough (and very slow) version of valgrind/ASan/TSan/UBSan: Miri will detect when your program uses uninitialized memory incorrectly, performs out-of-bounds memory accesses or pointer arithmetic, causes a data race, violates key language invariants, does not ensure proper pointer alignment, or causes incorrect aliasing. As such, it is most helpful when writing unsafe code, as it aids in ensuring that you follow all the rules required for unsafe code to be correct and safe. Miri also detects memory leaks, i.e., it informs you at the end of program execution if there is any memory that was not deallocated properly.

Moreover, Miri is able to run code for other targets: for example, you might be developing code on x86_64, a 64-bit little-endian architecture. When you do low-level bit manipulation, it is easy to introduce bugs that only show up on 32-bit systems or big-endian architectures. You can run Miri with --target i686-unknown-linux-gnu and --target mips64-unknown-linux-gnuabi64 to test your code in those situations – and this will work even if your host OS is macOS or Windows!

That said, it’s not all roses and rainbows. Since Miri just knows how to interpret Rust code, it will get stuck when you call into C code. Miri knows how to execute a certain small set of well-known C functions (e.g. to access environment variables or open files), but it is still easy to run into an “unsupported operation” error due to missing C library implementations. In many cases you should be able to still write tests that cover the remaining code that does not need to, for example, directly access the network; but I also hope that Miri will keep growing its support for key platform APIs.

Miri progress

So, what progress has Miri made in the last two years?

Concurrency

The story of concurrency in Miri continues to surprise me: I had not even planned for Miri to support concurrency, but people just keep showing up and implement one part of it after the other, so now we have pretty good support for finding concurrency bugs!

In that spirit, @JCTyblaidd implemented a data race detector. So if your code does not use appropriate atomic operations to make sure all accesses are suitably synchronized, Miri will now detect that problem and report Undefined Behavior. Here’s a demo. (Click that link and then select “Tools - Miri” to see this in action.) Our data race error reports could be improved a lot (in particular they only show one of the two conflicting accesses involved in a data race), but they are still useful and have already found several data races in the wild.

@thomcc changed our compare_exchange_weak implementation so that it randomly just fails with 80% probability. (The exact rate is adjustable via -Zmiri-compared-exchange-weak-failure-rate=<x>.) Here’s a demo. This is super useful to find issues where code uses compare_exchange_weak but cannot handle spurious failures, since those are very unlikely to occur in the wild.

@henryboisdequin added support for the atomic fetch_min and fetch_max operations, completing our support of the Atomic* types.

And finally, @cbeuw showed up and added “weak memory emulation”. This means that when you do an atomic load, you might not observe the latest value written to that location; instead, a previous value can be returned. Here’s a demo. This happens on real hardware, so having this supported in Miri helps to find more potential bugs. The caveat is that Miri still cannot produce all the behaviors that the actual program might exhibit. Also, the C++20 revision of the C++ memory model disallowed some possible behaviors that were previously allowed, but Miri might produce those behaviors – there is currently no known algorithm that would prevent that. This should be very rare though.

I then just put the icing on the cake by fixing some long-standing issues in our scheduler, so that it no longer gets stuck in spin loops. Miri now has a chance to preempt the running thread at the end of each basic block; the preemption probability is 1% but you can adjust it (using -Zmiri-preemption-rate=<x>).

All of this made our concurrency support sufficiently solid that it no longer shows any warning about being “experimental”. For example, it has already found a data race in the standard library. I can barely express how happy and proud I am that I had to do basically none of this work. :)

One warning though: several of the improvements mentioned above rely on doing random choices. So, it is now more likely than before that Miri will work fine one day, and then show an error after some seemingly inconsequential change to the program the next day. I will get back to these problems later.

Pointer provenance and Stacked Borrows

One of the most subtle aspects of Miri is Stacked Borrows. The aliasing model is already quite complicated, and actually debugging what happens when Miri finds an aliasing violation in your code can be pretty tricky. However, @saethlin made this a lot easier! The error messages now show a lot more detail and point to several relevant locations in the code: not only where the bad access happened, but also where the pointer tag used for that access was created, and where that tag was invalidated. I am very impressed by how good some of these errors are, just check this out.

Another big thing that happened recently is the entire “Strict Provenance” story. I am super excited by these developments, because they offer the chance to fix some long-standing open problems in Miri: the issues with “untagged” raw pointers in Stacked Borrows, and Miri not properly supporting integer-to-pointer casts.

After a lot of work by @carbotaniuman and myself, the situation now is as follows:

Miri always properly tags raw pointers. So there are no longer any counter-intuitive behaviors caused by Miri “mixing up” two raw pointers that point to the same address, but were computed in a different way. (We had a -Zmiri-tag-raw-pointers flag for a while that also achieves this; that flag is now on-by-default.)
If you do not use any integer-to-pointer casts, then you can stop reading here! You can pass -Zmiri-strict-provenance to Miri to ensure that this is indeed the case.
If you are using integer-to-pointer casts, then Miri will warn about that. You now have two options.
- The ideal solution is to avoid using integer-to-pointer casts, and to follow Strict Provenance instead. The pointer library docs explain in more detail what exactly that means. Note that the APIs described there are still unstable, but a polyfill is available for stable Rust. Also see Gankra’s blog post and my own blog post for some more background on this subject.
- If the casts are in code you do not control, or if you cannot currently avoid integer-to-pointer casts, you can pass -Zmiri-permissive-provenance to Miri to silence the warning. Know that this means that Miri might miss some bugs in your code: integer-to-pointer casts make it impossible to precisely track which pointer came from where, so Miri will conservatively accept some code that actually should be rejected.

This is overall much better than previously – there is nothing funky going on with raw pointers any more, and we should never incorrectly report UB any more even when integer-to-pointer casts are used. :-)

Other areas

Concurrency and pointer aliasing are the two big changes, but there is also a long tail of smaller changes that together make Miri a hack of a lot more useful than it used to be:

@teryror made Miri support doctests, so now cargo miri test will also check your doctests for UB!
@Smittyvb fixed our fast-math intrinsics to properly report UB when they are used on non-finite values.
@hyd-dev added “symbol resolution” support to Miri, so if one part of your Rust code defines a function with a given link_name, and another piece of Rust code imports that function via an extern block, Miri now knows how to find the right function implementation.
@atsmtat added a -Zmiri-isolation-error=<action> flag so when a function call is rejected due to isolation, evaluation can continue by reporting an error code to the interpreted program.
@landaire added -Zmiri-panic-on-unsupported, which makes Miri raise a panic rather than stopping evaluation when an unsupported system function is encountered. This can be useful to keep going with the next test in a test suite. However, it also raises panics where usually that would be impossible, which can lead to surprising behavior.
@DrMeepster added support for running programs that use the #[start] attribute, and @oli-obk made that work even for targets without libstd. (You need to set MIRI_NO_STD=1 to make the latter work.)
@DrMeepster also implemented support for the #[global_allocator] attribute.
@camelid made Miri optionally detect UB due to uninitialized integers, which has since become the default.
@saethlin made our errors more readable by pruning irrelevant details from the backtraces.
I have implemented support for calling methods on types like Pin<Box<dyn Trait>>.
@oli-obk fixed our handling of types like MaybeUninit<u64>, where previously we did not properly support only some of the bytes being initialized.

We also improved out platform API and intrinsic support:

Thanks to @m-ou-se Miri now supports the Linux futex APIs used by the Rust standard library. This was crucial for std’s park() and unpark(), but meanwhile is also used for many other synchronization primitives.
On the file system side, @Aaron1011 implemented readlink, which makes std::fs::read_link work on Linux and macOS.
@asquared31415 made the three-argument form of open work.
@tavianator implemented readdir64 so we can still list directories on Linux (the Rust standard library was changed to use that function rather than readdir64_r).
@Aaron1011 has also improved the rendering of panic backtraces inside the interpreter.
@frewsxcv implemented the missing bits to make the aarch64-apple-darwin target work in Miri.
I implemented the intrinsics required by std::simd, so portable-simd code should work with Miri. It will not be very fast, though…
@V0ldek made our Windows GetSystemInfo shim work in more situations.
@saethlin added support for *_COARSE clocks on Linux.
@InfRandomness has started on getting Miri to work on FreeBSD targets (but this support is still incomplete).

Bugfixes and cleanup

And of course there were tons of bugfixes. I want to particularly call out @hyd-dev who fixed a lot of issues in our cargo miri frontend. @dtolnay did a lot of code cleanup, making Miri pass by clippy’s critical eyes and ensuring all our tests are properly formatted. And last not least, @oli-obk completely re-wrote our test suite so that we can finally actually test the full output of Miri.

I have probably forgotten to mention something interesting as well. See here for the full list of amazing people who contributed to Miri since my last update. I cannot thank all of you enough! <3

Help, Miri suddenly says my code is broken

Several of the changes mentioned above, in particular with regards to concurrency and Stacked Borrows, mean that Miri is now able to detect more problems than before. On the one hand, that’s of course great, but on the other hand, it can mean that when you re-test Miri on some code that seemed fine, it might suddenly complain! And because of all the non-determinism, it might also be the case that Miri sometimes complains, and sometimes doesn’t (or that it works fine locally but complains on CI). What can you do when that happens?

If Miri shows a new Stacked Borrows error, then that is probably caused by raw pointers now being properly tagged. The new Stacked Borrows messages should make it easier than before to diagnose these problems, but in the end this still remains a case-by-case issue. For example, this program will print:

error: Undefined Behavior: attempting a read access using <3255> at alloc1770[0x4], but that tag does not exist in the borrow stack for this location
 --> src/main.rs:4:25
  |
4 |     let _val = unsafe { *ptr.add(1) }; // ...and use it to access the *second* element.
  |                         ^^^^^^^^^^^
  |                         |
  |                         attempting a read access using <3255> at alloc1770[0x4], but that tag does not exist in the borrow stack for this location
  |                         this error occurs as part of an access at alloc1770[0x4..0x8]
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3255> was created by a retag at offsets [0x0..0x4]
 --> src/main.rs:3:15
  |
3 |     let ptr = &x[0] as *const i32; // We create a pointer to the *first* element...
  |               ^^^^^
  = note: backtrace:
  = note: inside `main` at src/main.rs:4:25

In this case, the clue is in the offsets: note that the tag was created for offsets [0x0..0x4] (as usual in Rust, this excludes 0x4), and the access was at alloc1770[0x4]. The pointer was thus used outside the offset range for which its tag (<3255>) is valid. The fix is to use x.as_ptr() rather than &x[0] as *const i32 to get a pointer that is valid for the entire array.

If the error only shows up sometimes, then it probably has something to do with concurrency. Miri is not truly random, but uses a pseudo-random number generator to make all concurrency-related choices (such as when to schedule another thread). This means you can explore various different possible choices by passing different seeds for Miri to use for its pseudo-random number generator. The following little shell snippet will run Miri with many different seeds, which is great to be able to locally reproduce a failure that you saw on CI, but that you are having trouble reproducing:

for SEED in $({ echo obase=16; seq 0 255; } | bc); do
  echo "Trying seed: $SEED"
  MIRIFLAGS=-Zmiri-seed=$SEED cargo miri test || { echo "Failing seed: $SEED"; break; };
done

It is important that you use exactly the same MIRIFLAGS as CI to ensure the failure can even happen! It is also a good idea to use a filter with cargo miri test FILTER to ensure only the test you care about is being run.

Once you confirmed that this is indeed a non-deterministic test failure, you can narrow it down further by reducing Miri’s non-determinism:

You can pass -Zmiri-preemption-rate=0 to make the scheduler non-preemptive (only schedule to other threads when a thread explicitly yields). This can lead to infinite loops if there are spin-loops that do not yield, but if it makes the problem go away, then the problem needs some very particular scheduling decisions to surface, which might help you track down its source.
You can also pass -Zmiri-disable-weak-memory-emulation which has the effect of making atomic loads always return the latest value stored in that location. If that makes the problem go away, then the issue is likely caused by insufficient synchronization somewhere. It might be a missing fence, or a Relaxed access that should be Release/Acquire.
Finally, -Zmiri-compare-exchange-weak-failure-rate=0 makes compared_exchange_weak behave exactly like compare_exchange. If that makes the problem go away, then some code using compared_exchange_weak is not properly handling spurious failures.

Passing all of these flags will make Miri’s concurrency entirely deterministic. That can be useful to avoid non-deterministic test failures, but note that this will also mask many real-world bugs. Those test failures are often real, even if they can be hard to track down!

If you are still having trouble, feel free to come visit us in our Zulip stream, which is the official communication channel for Miri.

By the way, if you are still disabling some tests on Miri because Miri used to not support panics/concurrency, it’s time to give those tests another try. :) So this is a good opportunity to go over your cfg(miri) and similar attributes and re-evaluate if they are still needed.

Using Miri

If this post made you curious and you want to give Miri a try, here’s how to do that. Assuming you have a crate with some unsafe code, and you already have a test suite (you are testing your unsafe code, right?), you can just install Miri (rustup +nightly component add miri) and then run cargo +nightly miri test to execute all tests in Miri. Note that this requires the nightly toolchain as Miri is still an experimental tool.

Miri is very slow, so it is likely that some tests will take way too long to be feasible. You can adjust iteration counts in Miri without affecting non-Miri testing as follows:

let limit = if cfg!(miri) { 10 } else { 10_000 };

If your test suite needs to access OS facilities such as timers or the file system, set MIRIFLAGS=-Zmiri-disable-isolation to enable those. (Miri will tell you when that is necessary.) If your test suite runs into an unsupported operation, please report an issue. However, note that we can only really support sufficiently “generic” operations – like accessing file systems and network sockets. To implement things like Py_IsInitialized would mean putting a Python interpreter into Miri; that is not going to happen. ;)

If you want to add Miri to your CI to ensure your test suite keeps working in Miri, please consult our README. That document is also a great starting point for any other questions you might have.

Miri is also integrated into the Rust Playground: you can select Miri in the “Tools” menu to check the code for Undefined Behavior.

If Miri complains about your code and you do not understand why, we are happy to help! The best place to ask for support is our Zulip stream. Questions are much easier to answer if you manage to reproduce the problem in a small self-contained bit of example code (ideally on the playground), but feel free to ask even if you do not know how to reduce the problem.

Helping Miri

If you want to help improve Miri, that’s awesome! The issue tracker is a good place to start; the list of issues is short enough that you can just browse through it rather quickly to see if anything pikes your interest. The ones that are particularly suited for getting started are marked with a green label, but notice that even “E-easy” issues can require some amount of Rust experience – Miri is not a good codebase for your first steps in Rust. Another good starting point is to try to implement the missing bit of functionality that keeps your test suite from working. If you need any mentoring, just get in touch. :)

That’s it for now. I am totally blown away by how many people (and companies!) are already using and even contributing to Miri. This endeavor of re-shaping the way we approach correctness of unsafe code has been way more successful than my wildest dreams. I hope Miri can also help you to ensure correctness of your unsafe code, and I am excited for what the next year(s) of Miri development will bring. :D

Posted on Ralf's Ramblings on Jul 2, 2022.
Comments? Drop me a mail or leave a note on reddit!