Apr 10, 2018 • Research, RustEditsPermalink

Safe Intrusive Collections with Pinning

In my last post, I talked about the new “pinned references” which guarantee that the data at the memory it points to will not, ever, be moved elsewhere. I explained how they enable giving a safe API to code that could previously only be exposed with unsafe, and how one could go about proving such a thing. This post is about another application of pinned references—another API whose safety relies on the pinning guarantees: Intrusive collections. It turns out that pinned references can almost be used for this, but not quite. However, this can be fixed by extending the guarantees provided by pinned references, as suggested by @cramertj.

The first part is going to explain how intrusive collections could benefit from Pin, were we to accept that additional guarantee. This part assumes some familiarity with the Pin API, but not with the formal model I introduced previously.

In the second part, I am going to briefly sketch a formal perspective on intrusive collections and the extended pinning guarantees. This builds on the formal notation I introduced in my last post.

Finally, I will discuss a variant of our “running example” intrusive collection that combines shared references and pinning. It turns out this variant is actually incompatible with the formal model from the last post, but the model can be extended to fix this. However, this extended model will actually call for a change to the Pin API (or rather, for a revert to an earlier version).

(Btw, I’m sorry for this blog post being even longer than the previous. I guess I am just enjoying that there is no page limit (like there is in papers) when writing blog posts, so I can just ramble as much as I like.)

1 Safe Intrusive Collections

Intrusive collections typically are linked data structures that embed the links in the data they contain. In the words of the intrusive-collections crate:

The main difference between an intrusive collection and a normal one is that while normal collections allocate memory behind your back to keep track of a set of values, intrusive collections never allocate memory themselves and instead keep track of a set of objects. Such collections are called intrusive because they requires explicit support in objects to allow them to be inserted into a collection.

To make this work safely, we better make sure that the “objects” (allocated and managed by the user of the collection) are not moved around or deallocated while they are part of the collection. The intrusive-collections crate realizes this by taking ownership of the objects. The crate also offers a more advanced API that works with borrowed data, but in this case the objects must outlive the entire collection. This means we cannot add short-lived objects, e.g. stack-allocated objects, to a collection that was created further up the call chain. How could we go about lifting that restriction?

1.1 An Example Collection

To get more concrete, consider the following example (originally proposed by @cramertj). This is not really an intrusive collections, but it has all the same properties and problems: The collection involves pointers to elements allocated by the user, and hence risks dangling pointers if they get moved or deallocated. (If I were to make this a proper intrusive linked list, the code would be twice as long and I would most definitely get the re-linking wrong. I will leave writing linked lists to others.)

#![feature(pin, arbitrary_self_types, optin_builtin_traits)]

use std::cell::{Cell, RefCell};
use std::mem::Pin;
use std::boxed::PinBox;
use std::marker::Unpin;

struct Collection<T> {
    objects: RefCell<Vec<*const Entry<T>>>,
}
impl<T> !Unpin for Collection<T> {}

struct Entry<T> {
    x: T,
    // set to Some if we are part of some collection
    collection: Cell<Option<*const Collection<T>>>,
}
impl<T> !Unpin for Entry<T> {}

impl<T> Collection<T> {
    fn new() -> Self {
        Collection { objects: RefCell::new(Vec::new()) }
    }

    // Add the entry to the collection
    fn insert(mut self: Pin<Self>, entry: Pin<Entry<T>>) {
        if entry.collection.get().is_some() {
            panic!("Can't insert the same object into multiple collections");
        }
        // Pointer from collection to entry
        let this : &mut Self = unsafe { Pin::get_mut(&mut self) };
        this.objects.borrow_mut().push(&*entry as *const _);
        // Pointer from entry to collection
        entry.collection.set(Some(this as *const _));
    }
    
    // Show all entries of the collection
    fn print_all(self: Pin<Self>)
    where T: ::std::fmt::Debug
    {
        print!("[");
        for entry in self.objects.borrow().iter() {
            let entry : &Entry<T> = unsafe { &**entry };
            print!(" {:?},", entry.x);
        }
        println!(" ]");
    }
}

impl<T> Drop for Collection<T> {
    fn drop(&mut self) {
        // Go through the entries to remove pointers to collection
        for entry in self.objects.borrow().iter() {
            let entry : &Entry<T> = unsafe { &**entry };
            entry.collection.set(None);
        }
    }
}

impl<T> Entry<T> {
    fn new(x: T) -> Self {
        Entry { x, collection: Cell::new(None) }
    }
}

impl<T> Drop for Entry<T> {
    fn drop(&mut self) {
        // Go through collection to remove this entry
        if let Some(collection) = self.collection.get() {
            let collection : &Collection<T> = unsafe { &*collection };
            collection.objects.borrow_mut().retain(|ptr| *ptr != self as *const _);
        }
    }
}

fn main() {
    let mut collection = PinBox::new(Collection::new());
    let mut entry = PinBox::new(Entry::new(42));
    collection.as_pin().insert(entry.as_pin());
    collection.as_pin().print_all(); // Prints "[ 42, ]"
    drop(entry); // Dropping the entry removes it
    collection.as_pin().print_all(); // Prints "[ ]"
}

This is lots of code. The high-level picture is as follows: The collection is represented by a Vec of raw pointers pointing to the entries. Every entry has a “parent” pointer self.collection back to the main collection. Inserting an entry into a collection involves adding it to the Vec and setting the collection pointer of the entry. When an entry is dropped, if it is in a collection, it gets removed from the Vec. When the collection is dropped, all the entries have their collection pointer reset to None.

The example code in main demonstrates how the collection could be used: First we add an entry containing 42 to the collection. This is possible despite there being no guarantee that the entry will outlive the collection. print_all shows that it is there. Then we drop the entry while the collection still exists, and we can see it has vanished from the collection as well.

Notice that using Pin in the insert method above is crucial: If the collection of the entry were to move around, their respective pointers would get stale! This is fundamentally the same problem as SelfReferential in my previous post, and Pin helps. Thanks to this guarantee, and unlike in the intrusive-collections crate, insert can be called with entries that do not outlive the collection. With an API for stack pinning, we could even have put the entry in main on the stack.

1.2 Pinning Is Not Enough

However, it turns out pinning as described so far is not enough to guarantee safety. Imagine we added the following method to PinBox:

fn box_deallocate<T>(x: Box<T>) {
  let content = *x; // move out of the box
  mem::forget(content); // do not call drop
  // Implicitly deallocate the box
}

impl<T> PinBox<T> {
  // Deallocate the box without calling drop
  fn deallocate(self) {
    box_deallocate(self.inner)
  }
}

box_deallocate is obviously safe on Box (we do not use any unsafe code). Moreover, from what I said so far, it is also safe to add to PinBox: While, technically, the data does get moved inside box_deallocate, that’s just a NOP. We essentially perform a memcpy (the move in box_deallocate) followed by a mem::forget, so nothing really happens aside from calling free on the Box itself.

Why is this a problem? Well, we can now add an entry to a collection and then deallocate the entry without calling drop. This leads to a dangling pointer inside the collection, which is obviously bad:

fn main() {
    let mut collection = PinBox::new(Collection::new());
    let mut entry = PinBox::new(Entry::new(42));
    collection.as_pin().insert(entry.as_pin()); // Register entry in collection
    entry.deallocate(); // Now the collection still points to the entry
    collection.as_pin().print_all(); // And now everything explodes
}

Now, PinBox::deallocate is pretty artificial, but in fact the API for stack pinning that is drafted in the RFC, together with ManuallyDrop, make it possible to obtain a Pin<T> to a stack-allocated T and subsequently pop the stack frame without calling drop on the T. That has the same effect as PinBox::deallocate: It renders our collection interface unsound. The concern about dropping pinned data is real.

1.3 Requiring Drop To Be Run

@cramertj has suggested to fix this situation by adding a new guarantee to Pin: If the memory a Pin<T> points to is ever deallocated or otherwise invalidated, drop has been called on it. This just changes the way we have to think about pinned data, but does not affect the API at all.

Combining that with the guarantee about moving, we can summarize Pin as follows:

Given a Pin<'a, T>, the data it points to will remain at the given location until it is dropped. In particular, if it is never dropped, it will remain at the given location forever.

Under our new rules, PinBox::deallocate is bogus, and our collection works again.

And not just our artificial little example collection would benefit from this guarantee. A proper intrusive collection would have the same problem: When an entry in an intrusive doubly-linked list gets deallocated, we better make sure that we patch the pointers from the neighboring elements or else they will soon be dangling. Moreover, this has applications not just for container data structures but also for GC integration: When a stack-allocated root has been registered with the GC, we have to make sure it gets deregistered before the stack frame pops. In fact, the trick of using ManuallyDrop to cause havoc has originally been discovered in the context of GC integration. If Pin obtains the guarantee described above, a GC API can just take something along the lines of Pin<GcRoot<T>>, relying on the fact that GcRoot::drop will get called before this memory gets deallocated.

1.4 Some Common Objections

I hope I convinced you that it is desirable to guarantee that drop is run on pinned references. Before we come to the formal part of this post, let me go over some possible objections to this extended guarantee. In fact, I have been opposed to it initially myself, and went to the Berlin All-Hands arguing against it, but @cramertj convinced me otherwise. So I will just reiterate my own objections and how I feel about them now.

The first objection is: Haven’t we had this discussion already? Leakpocalypse happened back in 2015, and ever since then everyone knows that in Rust, you can leak memory, period. However, even if we accept this proposal leaking pinned data is still allowed! The guarantee says that if memory gets deallocated or otherwise invalidated, we will call drop. So, calling mem::forget on a PinBox is fine. And indeed, that’s fine for our collection as well—the entry will just stay in the collection, but the pointers to it will also remain valid. So, this proposal does not backpedal on memory leak guarantees in Rust. Both mem::forget and ManaullyDrop remain sound. What would not be sound is adding a way to obtain a Pin reference into a ManuallyDrop. In the formal part of this post, we will see that we can express the new guarantee without resorting to “linear reasoning”, which is reasoning that forbids resources to not get used. (Linear reasoning is typically used to prove properties like memory-leak-freedom.)

Okay, so maybe this is much weaker than leak-freedom and we have some good reasons to want such a limited drop-guarantee, but why should this be coupled together with Pin? Pinning and calling drop are entirely orthogonal concerns! Well, this is certainly true for general leak-freedom. However, the guarantee we are after here is that drop will be called if this memory ever gets deallocated. So, the guarantee is tied to a particular spot in memory—a concept that only makes sense if data is pinned to begin with! While Pin without the drop-guarantee makes sense (and is useful, e.g., for async IO), the drop-guarantee really only makes sense for pinned data. Given that async IO is not bothered by this additional guarantee (it doesn’t want to do anything that would violate the guarantee), it seems preferable to have just one notion of pinned data as opposed to two (one where drop will be called, and one where it may not be called). In fact, as we will see in the formal part, the way I have set up formal reasoning about pinning last time, we would have to do extra work to not get this guarantee!

The only remaining downside is that the more ergonomic stack pinning API proposed in the RFC becomes unsound, and we have to use a less ergonomic closure-based API instead. For now, that seems worth it; if one day we decide that pinning ought to be more ergonomic, we could have a built-in type of &pin references with ergonomic stack pinning enforced by the compiler.

2 The Formal Perspective

In this second part of the post, we are going to try and apply the formal methodology from the previous post to the intrusive collection example above. I am going to assume that you have read that post.

2.1 The Intrusive Collection Invariant

Remember that, in my model, every type comes with three separate invariants, each matching a “mode” or “typestate” the type can be in: T.own, T.shr, and T.pin. The key point we are going to exploit here is that T.pin is a predicate on pointers, and it is the job of the type T to say how the memory behind that pointer is managed. In contrast, T.own is a predicate on lists of bytes, and whoever holds an instance of a type will typically assume something along the lines of: I own some memory containing some bytes, and those bytes form a valid T. Formally:

exists |bytes|, ptr.points_to_owned(bytes) && T.own(bytes)

Given such an assumption, we are free to just forget the T.own part, grab ownership of the memory, and use it for our purposes. We could even deallocate it. In fact, this is exactly what happens in the box_deallocate function I wrote above—and as we have seen, it is a disaster for intrusive collections.

With T.pin, we have more freedom to choose our invariant. T.pin does not have to say that it owns the memory the pointer points to—and for Entry, this is exactly what we are going to do.

But let us start with Collection: The idea is that a pinned Collection is going to own the memory of all the entries it contains. All these raw pointers in the Vec point to memory we own and containing an Entry<T>, i.e., a pair of a (pinned) T and a raw pointer back to the collection.1 Actually saying this fully formally would require us to talk about the RefCell and the Vec in there, which we do not want to do, so the following is a very rough sketch:

Collection<T>.pin(ptr) := exists |entries: List<Pointer>|
  ptr.points_to_owned(RefCell::new(Vec::from(entries))) &&
  entries.all(|entry| T.pin(entry.offset_to_field(x)) &&
    entry.offset_to_field(collection).points_to_owned(Cell::new(Some(ptr)))
  )

Notice how we iterate over the elements of the list and make sure that we own whatever memory is to own there. (I love how all really exists for iterators so I can express quantification over lists without any hassle. :)

This invariant already justifies print_all: All the entries we are going to see there are in the list, so we have their T.pin at our disposal and are free to call Debug::fmt.

So, what will Entry say in its pinned invariant? Essentially, there is a case distinction: If collection is None, we own the memory, otherwise all we know is that we are inside that collection.

Entry<T>.pin(ptr) := exists |collection: Option<Pointer>|
  match collection {
    None => T.pin(ptr.offset_to_field(x)) &&
      ptr.offset_to_field(collection).points_to_owned(Cell::new(None)),
    Some(collection) => entry_owned_in_collection(ptr, collection)
  }

The entry_owned_in_collection here does not only assert that this entry is a part of that collection, it also asserts ownership of that membership, which means that nobody else can remove the entry from the collection. This is our first example of “fictional ownership”: Ownership not of part of the memory, but of some made-up resource that has no resemblance in the actual machine. Fictional ownership is an extremely powerful concept—it forms the very basis of the mathematical framework we use in RustBelt. However, actually doing this formally is well beyond the scope of this post.2

This justifies Entry::drop: If we enter the if let, we know we are in the Some branch, so we will for sure find ourselves in that collection. To remove ourselves, we have to give up ownership of entry_owned_in_collection, but in exchange we gain the points_to_owned and the T.pin for the entry which the collection now needs to longer.

This is the inverse of what happens in Collection::insert: After we made sure that the entry is in the None branch, we can take its ownership and put it into the collection. In exchange, we obtain an entry_owned_in_collection, which can put back into the entry. This means, when we return, the entry is fully valid in the Some branch. That’s crucial, because it’s just a reference and hence borrowed—we have to restore all the invariants before we return!

2.2 Requiring Drop To Be Run

So, how is the new drop guarantee we have been talking about reflected in the formal model? First of all, notice that—due to the nature of T.pin taking a pointer, as discussed above—we can already not justify our fictional PinBox::deallocate. The PinBox owns a T.pin(ptr), which it is free to throw away, but it has no way to just gain access to the underlying memory covered by the pointer and deallocate it! So, the question is not actually how we can enforce the guarantee, the question is how we can justify the correctness of dropping a PinBox.

Now, I should make clear that we are entering uncharted territory here. RustBelt does not model drop. So, I am going to take some guesses here, but these are purely based on my intuition and not at all backed by formal proofs. My intuition has been wrong before.

With the disclaimer out of the way, let us look at drop. First of all, drop is not a normal function. If we could just call it, we could write code like

let mut v = vec![0];
Vec::drop(&mut v);
v.push(1); // BOOM!

Vec::drop leaves the vector in a bad state, and we must not call any normal method on it again afterwards. So, a correctness proof for drop is clearly going to be different from a correctness proof for a normal method that takes an &mut. Before drop, we can assume that all the invariants of the type are satisfied; after drop, all we know is that the fields of the vector are themselves ready for drop. In the case of Vec, none of the fields implements Drop, so let us focus on that special case:

Definition 1: drop, special case where no field implements Drop. The drop function must satisfy the following specification (or contract): T::drop(ptr) is safe to call whenever T.pin(ptr) holds, and when drop returns, we own the memory ptr points to.

We write these specifications as “Hoare triples” that consist of a precondition between double curly braces, followed by the code that is executed, followed by the postcondition in double curly braces:

{{ T.pin(ptr) }}
  T::drop(ptr)
{{ exists |bytes| ptr.points_to_owned(bytes) && bytes.len() == mem::size_of<T>() }}

(The usual notation just uses single curly braces, but with me using Rust syntax here that looks too much like a block.)

This definition is deliberately narrow; drop is complicated and as I mentioned I don’t feel ready to fully specify it. But we can already see the interaction between the pinned typestate and dropping: If we move to the pinned typestate, we can call drop to re-gain ownership of the memory backing the T. This is what happens when a PinBox<t> gets deallocated: It first calls T::drop on the contents (which it can do, because it as T pinned and hence the precondition is satisfied), obtaining ownership of the memory as expressed in the postcondition above, and then uses that ownership to deallocate the memory.

Let us look at Entry<T> as an example drop implementation, and see how we can justify that it satisfies the specification in definition 1. We proceed by case distinction on Entry<T>.pin:

  1. We are in the None branch. Then we already own the memory for the collection field, and calling T::drop on x is guaranteed to give us the ownership of the first field as well. Overall, we thus own all the memory backing the entry, so we can satisfy the postcondition.
  2. We are in the Some branch. As already discussed, we then give up ownership of entry_owned_in_collection to remove ourselves from the collection, obtaining ownership of our own memory. Then we proceed as above.

Noticeably, we did not make any kind of reasoning along the lines of “all memory will be deallocated eventually”. There is no talking about leaks. We are just concerned with safety here: When memory is pinned, it is only safe to deallocate after drop has been called.

One important observation is that if we did not remove the entry from the collection, we would be unable to satisfy the postcondition! This matches the fact that our entire collection would be unsound if we removed the Entry::drop. In other words, if we do not implement Drop, this actually incurs a proof obligation! We have to show that not doing anything can turn the precondition T.pin(ptr) into the postcondition exists |bytes| ptr.points_to_owned(bytes) && bytes.len() == mem::size_of<T>(). This is the part that would go wrong if we were to remove Entry::drop. It seems rather funny that not implementing a trait incurs a proof obligation, but there’s also nothing fundamentally wrong with that idea.

Coming back to the general case, it seems quite curious to tie drop to pinning like that. What about dropping things that are not pinned? The answer is that it is always legal to start pinning something that we fully own. Formally speaking, we can always use axiom (b) from my last post to briefly pin it, and then call drop. In other words, we can derive the following Hoare triple:

{ exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes) }
  T::drop(ptr)
{ exists |bytes| ptr.points_to_owned(bytes) && bytes.len() == mem::size_of<T>() }

Using axiom (b), the precondition of this triple implies the precondition of the one in definition 1. That’s good enough to show that this second specification for drop has to hold if the first holds. So, if I use an arbitrary type while being oblivious to pinning, I can keep using the drop specification above. This matches the locality of pinning that I described last time: Code that does not do pinning, does not have to care.

To complete the locality story, we have to think not only about using some arbitrary T, but also about defining a T without caring about pinning. As discussed last time, we will then pick T.pin to just piggy-back on T.own:

T.pin(ptr) := exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes)

For such T, the two different specifications for drop that I showed above are actually equivalent. Such types can prove their drop correct with respect to the T.own-based specification and automatically satisfy the pinning guarantees. In particular, if they do not implement Drop, this specification holds trivially.3 The funny proof obligation incurred by not implementing Drop only arises for types that care about pinning.

3 What About Shared References?

Finally, we come to the part where my current model is not good enough. If you paid really, really close attention to my example above, you may have been irritated by the type of insert:

    fn insert(mut self: Pin<Self>, entry: Pin<Entry<T>>)

@cramertj originally proposed to use a slightly different type instead:

    fn insert(mut self: Pin<Self>, entry: &Pin<Entry<T>>)

Notice the shared reference on the entry argument. Since all the mutation we perform there happens inside a Cell, why shouldn’t we provide this function for shared pinned data?4

That’s a good question! It seems perfectly fine to change insert to take &Pin<Entry<T>>. I can’t come up with any counter-example. However, the formal model also cannot justify this variant of insert: Our model as defind previously defines Pin<'a, T>.shr in terms of T.shr. That made it easy to justify Pin::deref. However, as a consequence, Pin<'a, T>.shr and (&'a T).shr are literally the same invariant, and hence &Pin<T> and &&T are the same type. We could literally write functions transmuting between the two, and we could justify them in my model. Clearly, insert taking entry: &&T would be unsound as nothing stops the entry from being moved later:

fn main() {
    let mut collection = PinBox::new(Collection::new());
    let entry = {
      let mut entry = Box::new(Entry::new(42));
      collection.as_pin().insert(&&*entry); // Register boxed entry in collection
      *entry // Move entry out of the Box, and drop the box
    };
    // We keep the entry so it does not get dropped.
    // The collection still points to the now deallocated box!
    collection.as_pin().print_all(); // Fireworks happen
}

This shows that the version of insert that takes a shared reference cannot be sound in the model.

Notice that this is a consequence of a choice I made when building the model, namely the choice to define Pin<'a, T>.shr in terms of T.shr. This does not show that &Pin<T> and &&T have to be the same type given the public API and the contract they provide. Different choices in the model could lead to a different situation. The problem is, how else could we define Pin<'a, T>.shr, if we do not want to use T.shr? What is the invariant of a shared reference to a pinned reference?

I do have an idea for how to answer this question: Introduce a fourth “mode” or typestate, the “shared pinned” state, with an accompanying invariant. However, I previously argued strongly against introducing such a fourth state, on the grounds that three typestates is already complicated enough. In fact, an earlier version of the Pin API used to have two kinds of pinned references (shared and mutable) reflecting the two distinct “shared pinned” and “(owned) pinned” typestates. The shared variant got subsequently removed, and I feel somewhat guilty about that now because I strongly pushed for it. It seems, after all, that the fourth typestate arises anyway.

We have to make a decision here, before stabilizing Pin: Do we want insert with a shared reference to be legal? Or do want to declare that &Pin<T> and &&T are interchangeable types? One may argue that adding an entry to a collection logically changes that entry, so really this should require an exclusive pinned reference. However, given that the changed collection API seems entirely safe to use, I have to admit that’s a somewhat weak argument. If the discussion so far is any indication, people want to write such APIs and will probably not be happy if we tell them that’s illegal because there is a way to convert between &Pin<T> and &&T. After all, that conversion seems to have no tangible benefit other than satisfying my model.

For these reasons, I now think that removing the shared pinned reference may have been a mistake: It turns out that, if we want to allow an intrusive collection like the above that works with &Pin, we need the fourth typestate anyway. At that point we might as well have a type reflecting it, avoiding the confusion and the double indirection that comes with &Pin. I hope we do not end up in a situation where insert with a shared reference is legal, but we do not have a type for shared pinned references. That just seems like the worst of both worlds.

However, now we already have a version of the futures crate using the revised Pin, so I don’t know if changing it again is realistic. :/ (Update: Seems like there may be breaking changes in future future versions anyway, so maybe the ship has not yet sailed after all. /Update) I feel bad about that. Can we still fix this before everything gets stabilized? Others have argued for a shared pinned reference after it got removed from the API, and I argued against them as I did not understand how it could be useful. Thanks for not being convinced by me!

However, there is one strange aspect to this “shared pinned” typestate: Due to Pin::deref, we can turn a “shared pinned” reference into a shared reference. We can go from &Pin<T> to &&T. In other words, the shared pinned typestate invariant must imply the invariant for the (general, unpinned) shared typestate. The reason for Pin::deref to exist is mostly a rustc implementation detail related to using Pin<Self> as the type of self, but this detail has some significant consequences: Even with a separate shared pinned typestate, we can still not define RefCell in a way that a pinned RefCell pins its contents. In particular, we cannot have a method like fn get_pin(self: Pin<RefCell<T>>) -> Pin<T>. Removing Pin::deref (or restricting it to types that implement Unpin) would solve this issue. I spelled out the details in the RFC issue. So, if we want to declare that shared pinning is a typestate in its own right—which overall seems desirable—do we want it to be restricted like this due to an implementation detail of arbitrary self types?

Update: @Diggsey points out that we can still have a PinRefCell with a method like fn get_pin(self: Pin<PinRefCell<T>>) -> Pin<T>, as long as the PinRefCell never gives out mutable references. So it turns out that combining interior mutability and pinning should work fine. Later, @glaebhoerl suggested we can even combine RefCell and PinRefCell into one type if we dynamically track the pinning state. /Update

4 Conclusion

Leaving aside the last part about shared pinning, I am really excited how all of this fits together so nicely. Back when I was learning Rust, I remember thinking how intrusive collections with entries on the stack (as they are commonly used, for example, in the Linux kernel) unfortunately cannot be given a safe interface in Rust. I am happy to learn that I was wrong! I am impressed by the creativity that went into coming up with these APIs, and looking forward to analyzing more of them in the future.

The situation around shared pinning is still open, and it seems we need to have a discussion about what model we ultimately want to adopt—which code we ultimately want to be legal—and whether we want to change the Pin types before stabilization.

Anyway, as usual, please let me know what you think!

Update: Years later, I finally realized that there still is a major problem with intrusive collections – it is basically impossible to have a by-reference iterator! With the collection not actually owning the memory the elements are stored in, it is basically impossible to guarantee that during the iteration, elements do not get removed from the collection, which would invalidate the reference. However, for many use-cases (like an intrusive queue) it is sufficient to just support by-value operations such as “enqueue” and “dequeue” over such a collection, and those can be given an entirely safe interface. If iteration is required, then more advanced techniques to control sharing seem to be needed. /Update

Footnotes

  1. The fact that the inner T are pinned here is an arbitrary choice. We could also make them fully owned. This is a choice every collection has to make. Of course, if they decide to go for pinning, they have to uphold all the pinning guarantees, including calling destructors—which, for example, VecDeque does not

  2. For example, for fictional ownership to actually work, we would have to extend Collection<T>.pin to be set up as “the other end” of this ownership relation. With fictional ownership, there is always some invariant somewhere that ties it to “less fictional” ownership. Think of how a check is just fictional money that is “tied to” real money by the bank, and how “real” money used to be tied to some actual value, namely gold, by the central reserve. Similarly, Collection<T> owns the “real thing”, the memory, and then ties it to the fictional entry_owned_in_collection

  3. This uses the fact that T.own(bytes) implies bytes.len() == mem::size_of<T>(). This is another axiom that has to hold for all types, one that did not come up in previous discussion. 

  4. In my version, the collection itself is also using RefCell, which I believe @cramertj just forgot to do—the write access from Entry::drop would be unsound otherwise. However, it seems more natural to ask the collection itself to be mutable for insertion, not exposing the fact that we use interior mutability. 

Posted on Ralf's Ramblings on Apr 10, 2018.
Comments? Drop me a mail or leave a note in the forum!