part16.rs

#

Rust-101, Part 16: Unsafe Rust, Drop

use std::ptr;
use std::mem;
use std::marker::PhantomData;
#

As we saw, the rules Rust imposes to ensure memory safety can get us pretty far. A large amount of programming patterns can be written within safe Rust, and, more importantly, library code like iterators or threads can make use of the type system to ensure some level of correctness beyond basic memory safety.

However, there will still be programs that one cannot write in accordance with the borrow checker. And there will be cases where it may be possible to satisfy the compiler, but only at the cost of some run-time overhead, as we saw with RefCell - overhead which may not be acceptable. In such a situation, it is possible to use unsafe Rust: That's a part of the language that is known to open the gate to invalid pointer access and all other sorts of memory safety. Of course, unsafe also means "Here Be Dragons": You are on your own now.
The goal in these cases is to confine unsafety to the local module. Types like Rc and Vec are implemented using unsafe Rust, but using them as we did is (believed to be) perfectly safe.

Unsafe Code

As an example, let us write a doubly-linked list. Clearly, such a data-structure involves aliasing and mutation: Every node in the list is pointed to by its left and right neighbor, but still we will want to modify the nodes. We could now try some clever combination of Rc and RefCell, but this would end up being quite annoying - and it would incur some overhead. For a low-level data-structure like a doubly-linked list, it makes sense to implement an efficient version once, that is unsafe internally, but that can be used without any risk by safe client code.

#

As usually, we start by defining the types. Everything is parameterized by the type T of the data stored in the list. A node of the list consists of the data, and two node pointers for the predecessor and successor.

struct Node<T> {
    next: NodePtr<T>,
    prev: NodePtr<T>,
    data: T,
}
#

A node pointer is a mutable raw pointer to a node. Raw pointers (*mut T and *const T) are the Rust equivalent of pointers in C. Unlike references, they do not come with any guarantees: Raw pointers can be null, or they can point to garbage. They don't have a lifetime, either.

type NodePtr<T> = *mut Node<T>;
#

The linked list itself stores pointers to the first and the last node. In addition, we tell Rust that this type will own data of type T. The type PhantomData<T> does not actually store anything in memory - it has size zero. However, logically, Rust will consider a T to be present. In this case, Rust knows that data of type T may be dropped whenever a LinkedList<T> is dropped. Dropping has a lot of subtle checks to it, making sure that things can't go wrong. For this to work, Rust needs to know which types could potentially be dropped. In safe Rust, this can all be inferred automatically, but here, we just have a *mut Node<T>, and we need to tell Rust that we actually own such data and will drop it. (For more of the glory details, see this RFC.)

pub struct LinkedList<T> {
    first: NodePtr<T>,
    last:  NodePtr<T>,
    _marker: PhantomData<T>,
}
#

Before we get to the actual linked-list methods, we write two short helper functions converting between mutable raw pointers, and boxed data. Both employ mem::transmute, which can convert anything to anything, by just re-interpreting the bytes. Clearly, that's an unsafe operation and must only be used with great care - or even better, not at all. Seriously. If at all possible, you should never use transmute.
We are making the assumption here that a Box and a raw pointer have the same representation in memory. In the future, Rust will provide such operations in the standard library, but the exact API is still being fleshed out.

#

We declare raw_into_box to be an unsafe function, telling Rust that calling this function is not generally safe. This grants us the unsafe powers for the body of the function: We can dereference raw pointers, and - most importantly - we can call unsafe functions. (The other unsafe powers won't be relevant here. Go read The Rustonomicon if you want to learn all about this, but be warned - That Way Lies Madness.)
Here, the caller will have to ensure that r is a valid pointer, and that nobody else has a pointer to this data.

unsafe fn raw_into_box<T>(r: *mut T) -> Box<T> {
    mem::transmute(r)
}
#

The case is slightly different for box_into_raw: Converting a Box to a raw pointer is always safe. It just drops some information. Hence we keep the function itself safe, and use an unsafe block within the function. This is an (unchecked) promise to the Rust compiler, saying that a safe invocation of box_into_raw cannot go wrong. We also have the unsafe powers in the unsafe block.

fn box_into_raw<T>(b: Box<T>) -> *mut T {
    unsafe { mem::transmute(b) }
}

impl<T> LinkedList<T> {
#

A new linked list just contains null pointers. PhantomData is how we construct any PhantomData<T>.

    pub fn new() -> Self {
        LinkedList { first: ptr::null_mut(), last: ptr::null_mut(), _marker: PhantomData }
    }
#

This function adds a new node to the end of the list.

    pub fn push_back(&mut self, t: T) {
#

Create the new node, and make it a raw pointer. Calling box_into_raw gives up ownership of the box, which is crucial: We don't want the memory that it points to to be deallocated!

        let new = Box::new( Node { data: t, next: ptr::null_mut(), prev: self.last } );
        let new = box_into_raw(new);
#

Update other pointers to this node.

        if self.last.is_null() {
            debug_assert!(self.first.is_null());
#

The list is currently empty, so we have to update the head pointer.

            self.first = new;
        } else {
            debug_assert!(!self.first.is_null());
#

We have to update the next pointer of the tail node. Since Rust does not know that a raw pointer actually points to anything, dereferencing such a pointer is an unsafe operation. So this unsafe block promises that the pointer will actually be valid.

            unsafe { (*self.last).next = new; }
        }
#

Make this the last node.

        self.last = new;
    }
#

Exercise 16.1: Add some more operations to LinkedList: pop_back, push_front and pop_front. Add testcases for push_back and all of your functions. The pop functions should take &mut self and return Option<T>.

#

Next, we are going to provide an iterator. This function just creates an instance of IterMut, the iterator type which does the actual work.

    pub fn iter_mut(&mut self) -> IterMut<T> {
        IterMut { next: self.first, _marker: PhantomData  }
    }
}
#

What does the iterator need to store? Strictly speaking, all it needs is the pointer to the next node that it is going to visit. However, how do we make sure that this pointer remains valid? We have to get this right ourselves, as we left the safe realms of borrowing and ownership. Remember that the key ingredient for iterator safety was to tie the lifetime of the iterator to the lifetime of the reference used for iter_mut. We will thus give IterMut two parameters: A type parameter specifying the type of the data, and a lifetime parameter specifying for how long the list was borrowed to the iterator.

#

For Rust to accept the type, we have to add two more annotations. First of all, we have to ensure that the data in the list lives at least as long as the iterator: If you drop the T: 'a, Rust will tell you to add it back. And secondly, Rust will complain if 'a is not actually used in the struct. It doesn't know what it is supposed to do with that lifetime. So we use PhantomData again to tell it that in terms of ownership, this type actually (uniquely) borrows a linked list. This has no operational effect, but it means that Rust can deduce the intent we had when adding that seemingly useless lifetime parameter.

pub struct IterMut<'a, T> where T: 'a {
    next: NodePtr<T>,
    _marker: PhantomData<&'a mut LinkedList<T>>,
}
#

When implementing Iterator for IterMut, the fact that we have the lifetime 'a around immediately pays of: We would not even be able to write down the type Item without that lifetime.

impl<'a, T> Iterator for IterMut<'a, T> {
    type Item = &'a mut T;

    fn next(&mut self) -> Option<Self::Item> {
#

The actual iteration is straight-forward: Once we reached a null pointer, we are done.

        if self.next.is_null() {
            None
        } else {
#

Otherwise, we can convert the next pointer to a reference, get a reference to the data and update the iterator.

            let next = unsafe { &mut *self.next };
            let ret = &mut next.data;
            self.next = next.next;
            Some(ret)
        }
    }
}
#

In next above, we made crucial use of the assumption that self.next is either null or a valid pointer. This only works because if someone tries to delete elements from a list during iteration, we know that the borrow checker will catch them: If they call next, the lifetime 'a we artificially added to the iterator has to still be active, which means the mutable reference passed to iter_mut is still active, which means nobody can delete anything from the list. In other words, we make use of the expressive type system of Rust, decorating our own unsafe implementation with just enough information so that Rust can check uses of the linked- list. If the type system were weaker, we could not write a linked-list like the above with a safe interface!

#

Exercise 16.2: Add a method iter and a type Iter providing iteration for shared references. Add testcases for both kinds of iterators.

#

Drop

The linked list we wrote is already working quite nicely, but there is one problem: When the list is dropped, nobody bothers to deallocate the remaining nodes. Even worse, if T itself has a destructor that needs to clean up, it is not called for the element remaining in the list. We need to take care of that ourselves.

#

In Rust, adding a destructor for a type is done by implementing the Drop trait. This is a very special trait. It can only be implemented for nominal types, i.e., you cannot implement Drop for &mut T. You also cannot restrict the type and lifetime parameters further than the type does - the Drop implementation has to apply to all instances of LinkedList.

impl<T> Drop for LinkedList<T> {
#

The destructor itself is a method which takes self in mutably borrowed form. It cannot own self, because then the destructor of self would be called at the end of the function, resulting in endless recursion.

    fn drop(&mut self) {
        let mut cur_ptr = self.first;
        while !cur_ptr.is_null() {
#

In the destructor, we just iterate over the entire list, successively obtaining ownership (Box) of every node. When the box is dropped, it will call the destructor on data if necessary, and subsequently free the node on the heap. We call drop explicitly here just for documentation purposes.

            let cur = unsafe { raw_into_box(cur_ptr) };
            cur_ptr = cur.next;
            drop(cur);
        }
    }
}
#

The End

Congratulations! You completed Rust-101. This was the last part of the course. I hope you enjoyed it. If you have feedback or want to contribute yourself, please head to the Rust-101 website fur further information. The entire course is open-source (under CC-BY-SA 4.0).

If you want to do more, the examples you saw in this course provide lots of playground for coming up with your own little extensions here and there. The index contains some more links to additional resources you may find useful. With that, there's only one thing left to say: Happy Rust Hacking!

#

index | previous | raw source | next