Self-referential structs are infamously painful to implement, and, besides something like an arena allocator, are impossible without messy or unsafe code. Awkward data structure juggling to satisfy the borrow checker, redundant runtime checks to delay borrow checking, lots of unwraps and pinky promises, or outright unsafe code that can be a pain to validate - it's usually just much better to rethink your data structures.

However, sometimes you have to store both owned and dependent data at once. The borrow checker can reason within the scope of a single function (and, thanks to traits like Send and Sync, even across asynchronous program flow to some extent), but it falls short when borrowed data needs to persist across function call boundaries. One common place where this becomes apparent are Iterator implementations.

In this article, I'll walk you through one through one such iterator implementation. I'll show where I ran into issues when implementing this myself, explain why we're fighting the borrow checker and what it's protecting us from, and then circle back to the final implementation, where we safely construct and use a simple self-referential type.

Writing a Database Iterator Wrapper

Let's look at rusqlite, an SQLite library, and how it handles prepared statements and queries. You start with a Connection object, where you call .prepare(&self, ...) to get a Statement<'_>, and then you can call .query(&mut self, ...) to get Rows<'_> and iterate over them.

// Every call is fallible, hence the `?`

// Our DB connection
let conn: Connection = Connection::open(path)?;
// A prepared statement with placeholders
let stmt: Statement<'_> = conn.prepare("SELECT name FROM players WHERE session_id = ?")?;
// The result of an executed statement - an iterator over rows
let rows: Rows<'_> = stmt.query(("123abc",))?;

for r in rows {
	// Here we can finally iterate over individual rows
	dbg!(r);
}

Quick note: all code here is just for demonstration. I tested most of it, but it's possible I forgot something somewhere.

Rows borrows from Statement, which borrows from Connection. While we're using Rows, we need to hold the Statement instance somewhere. While the Statement is being used, we need to keep our Connection object alive. This isn't a problem within the scope of a single function, since the borrow checker simply restricts the use of Connection and Statement until we're done using the Rows object. Now, say we wanted to implement a wrapper for our query, which would let us iterate over rows with an Iterator...

pub struct PlayerNameIter {
	// TODO
}

impl PlayerNameIter {
	pub fn new(conn: &Connection, session_id: &str) -> PlayerNameIter {
		todo!()
	}
}

impl Iterator for PlayerNameIter {
	type Item = String;

	fn next(&mut self) -> Option<Self::Item> {
		todo!()
	}
}

Here's a standard iterator implementation with a bunch of placeholders. We want to give it access to a database connection, tell it what session ID to use in the query, and it should yield items as we ask for them. Since we don't want to retrieve all the rows at once, we need an Iterator, which only gets the next item when (and if) we ask for it.

Lifetimes and borrows

Rather than iterating over rows in a for loop (as we did in the earlier example), we need to store the Rows object in the iterator, and only ask for the next row whenever next() is called. We need to ensure that the originating Statement remains alive while Rows is in use, ditto for Connection and Statement. Since we're passing in the Connection by reference, we can solve the second part with a lifetime on the struct - after all, it makes sense that our iterator over the database can only live as long as the database connection.

pub struct PlayerNameIter<'a> {
	// TODO
}

impl<'a> PlayerNameIter<'a> {
	pub fn new(conn: &'a Connection, session_id: &str) -> PlayerNameIter<'a> {
		todo!()
	}
}

However, the other part isn't as easy...

pub struct PlayerNameIter<'a> {
	// We need to hold both Rows and their originating Statement
	// Both live as long as the struct,
	//   which can only live as long as the Connection
	stmt: Statement<'a>,
	rows: Rows<'a>,
}

impl<'a> PlayerNameIter<'a> {
	pub fn new(conn: &'a Connection, session_id: &str) -> rusqlite::Result<PlayerNameIter<'a>> {
		// Note: the Result is needed since each call is fallible
		
		// Statement borrows from Connection
		let stmt = connection.prepare("SELECT name FROM players WHERE session_id = ?")?;
		
		// Rows borrows from Statement
		let rows = stmt.query((session_id,))?;
		
		Ok(PlayerNameIter { // [E0515]: cannot return value referencing local variable `stmt`
			stmt, // [E0505]: cannot move out of `stmt` because it is borrowed
			rows,
		})
	}
}

If we only borrowed from Connection and stored that value, that would have been fine. The caller only gives us a reference to Connection, and so they hold onto the Connection for us. Our struct's lifetime is what prevents the caller from moving, dropping, or mutating the Connection while the iterator is alive. If we only had to store Statement, this would have worked fine. However, we need to store both Statement and Rows, where the latter also borrows from the former, creating a chain of borrows: Connection > Statement > Rows.

Pointers and Moves

Don't curse out the borrow checker just yet, let's look at a little example to demonstrate what makes pointers/references (and self-referential data) dangerous. The borrow checker might not be as advanced with its reasoning as we'd like it to be here, but it is protecting us from something very important.

Look at this struct with a field and a reference, where we want x_ref to point to the struct's own x.

#[derive(Debug)]
struct NumWithRef<'a> {
	x: usize,
	x_ref: &'a usize,
}

Now, this type is self-referential, and we can even construct it without unsafe.

fn main() {
    let tmp = 0usize; // We need something temporary for x_ref
    let mut r = NumWithRef { x: 4, x_ref: &tmp };
    r.x_ref = &r.x; // Now, `r` borrows from itself!
    
    println!("{r:?}"); // NumWithRef { x: 4, x_ref: 4 }
}

However, we run into issues the moment we try to move the object...

let mut v = vec![];
// push() takes items by value - we're moving `r` into it, and losing ownership
v.push(r); // [E0505]: cannot move out of `r` because it is borrowed

Annoying! Because r is borrowing from itself, we cannot move it, since it is still borrowed (by itself). A self-referential value can exist, but becomes impossible to move (or even use &mut self methods on!)

But it makes total sense when you look at what's happening in memory. NumWithRef consists of two parts: a usize number (x), and a pointer (x_ref). x is a simple value in and of itself, and moving it around does not change its meaning. However, x_ref points to a specific place in memory, and expects a valid value to be there (if it weren't, we might be pointing to deallocated memory, memory that doesn't belong to us, or completely garbage data!) Moving a pointer is also fine, but moving the thing it points to is the main problem here. In this instance, r points to itself, so by moving it to a different place in memory, r.x ends up at a different address, but r.x_ref still points to the old location (in this case, the old location is on the stack, while the new location is somewhere on the heap).

ℹ️ Move constructors
C++ has move constructors (and move assigns), which give you a chance to run code to fix up self-referential pointers whenever your data type is moved around. However, this can also lead to even more confusing code, and it's easy to forget something or do it incorrectly. More powerful, but a deadly footgun.

Rust describes this behavior with lifetimes. If we have an array in memory, and get a slice to its data, we borrow from the array object. We may use the borrowing slice to access the underlying data, but as soon as we want to move the array (that is, invalidate the data, possibly due to deallocating memory), our slice becomes unusable. In lifetime terms, because the slice can only live as long as the array, when we give away ownership of the array (by moving or destroying it), the slice is no longer alive. This has additional behavior when it comes to mutability (where a mutable borrow invalidates all other borrows), but what matters for our iterator implementation is: we need to be extra careful around moving data that we're borrowing from, which can make self-referential data types painfully tricky.

Tangent: Pinning

Well, if the problem is that the data we're pointing to cannot move, why don't we just prevent it from moving? Good point! That's what pinning is about, and the standard library implements the Pin<T> wrapper type for it. I won't go into much detail here, but, in brief, the wrapper type ensures that its value won't be moved out of (or that the type is always safe to move around - that is, it implements Unpin). You'll commonly see the pin! macro, which pins a value locally (often on the stack), or Box::pin (or one of its variants), which pins a value on the heap.

Pinning is very important for Futures, for instance, since they're effectively a code block broken up into a state machine. This way, Rust can implement async without requiring heap allocations, which is critical for environments like microcontrollers.

Also, regarding the earlier example - while we couldn't move r, we could still at least pass &r around instead. Trying to heap-allocate the struct still wouldn't fix the issue, since the Box would once again borrow from itself.

ℹ️ More traits
Similar to Send/Sync, there's talk about introducing more marker traits, such as Move, Destruct, and Forget. They would annotate types that are safe to move, that have a destructor, and that are safe to "forget" (i.e. leak or otherwise skip running their destructor).
You can already find Destruct and Freeze in the standard library as experimental marker types.

self_cell

Okay, back to our iterator implementation. Now that we've looked at why references and moves are tricky, let's take a look at how we could implement this correctly. As the header suggests, there's a useful little crate called self_cell that solves this exact problem.

ℹ️ Ouroboros
For more advanced use cases, there's ouroboros. It's capable of defining far more complex self-referential types, but it also carries the heft of a proc macro, among other things. self_cell is fairly straightforward: it lets us define a type that pairs up an owner and a dependent value. More than enough for our use case.

Let's start with the self-referential part. It's not a good idea to expose such types as public API, so this will be a private struct that is only responsible for storing our data.

self_cell!(
    struct PlayerNameIterCell<'a> {
        owner: MutBorrow<Statement<'a>>,
        #[covariant]
        dependent: Rows,
    }
);

self_cell provides a simple declarative macro, where we can specify an owner type - what we're borrowing from, and what we need to keep alive - and a dependent type - the main thing that we'll be using afterwards. We need our Rows during iteration, so we're asking PlayerNameIterCell to safely hold Statement for us somewhere, and ensure it doesn't move. Since Rows borrows from Statement mutably, we also use the self_cell::MutBorrow<T> wrapper type, which ensures that we can only access owner during initialization (and exactly once).

self_cell types have a two-step initialization process. First, the cell moves the owner value onto the heap, where it will remain for the lifetime of the cell. Then, it gives us access to the owner value - now safely pinned on the heap - through a closure, where we may construct our dependent value. Once our closure returns, the dependent value also ends up on the heap (in a pre-allocated spot next to the owner). With both our values on the heap, we don't have to worry about moves invalidating any references. The rest of the cell's API ensures that we can only ever access our owner and dependent values in a safe manner (where MutBorrow also disallows any access to owner after initialization).

Basically, this is a Box with late-initialization for a dependent value (and a safe API).

Why not just use a pinned Box?
I alluded to this solution earlier - in short, more borrow checker issues. Pin<T> is great for implementing methods on Pin<&mut Self> (see Future::poll), but if we try to borrow from a Pin<Box<T>>, the borrow checker will still think that we're borrowing from a value that we then want to move/mutate. There might be more to it, but I'm not yet well-versed in the arcane arts of pinning.
Anyway, that's why we use safe_cell here! It essentially does this via some unsafe magic while guaranteeing that the implementation is correct.
pub struct PlayerNameIter<'a>(PlayerNameIterCell<'a>)

impl<'a> PlayerNameIter<'a> {
	pub fn new(conn: &'a Connection, session_id: &str) -> rusqlite::Result<PlayerNameIter<'a>> {
		// Note: the Result is needed since each call is fallible
		
		// Statement borrows from Connection
		let stmt = connection.prepare("SELECT name FROM players WHERE session_id = ?")?;
		
		// Instead of constructing the Rows object here,
		//   we move it into the cell first, and call query()
		//   within the context of its builder closure
        let cell = PlayerNameIterCell::try_new(MutBorrow::new(stmt), |stmt| {
	        // If this returns an error, it is propagated outwards
            stmt.borrow_mut().query((session_id,))
        })?;
        // cell is safe to move, since owner data is pinned on the heap
        Ok(PlayerNameIter(cell))
	}
}

Success! This compiles! Now we just need the Iterator implementation...

impl<'a> Iterator for PlayerNameIter<'a> {
    type Item = String;

    fn next(&mut self) -> Option<Self::Item> {
        self.0.with_dependent_mut(|_, rows| {
	        // Silence any DB errors
            let Ok(row) = rows.next() else {
	            return None;
            };
            // row is now an Option, so we just take
            //   the first element as a String
            row?.get_unwrap(0)
        })
    }
}

And we're done! The self_cell library is thoroughly tested to be safe, so we managed to bypass the borrow checker without exposing ourselves to the dangers of unsafe.