An example we can build upon

In this example we will create our own stack and make our CPU return out of it's current execution context and over to the stack we just created. We will build on these concepts in the following chapters (we will not build upon the code though).

Setting up our project

First let's start a new project in a folder named "green_threads". Run:

cargo init

We need to use the Rust Nightly since we will use some features that are not stabilized yet:

rustup override set nightly

In our we start by setting a feature flag that lets us use the llvm_asm!macro:

Let's set a small stack size here, only 48 bytes so we can print the stack and look at it before we switch contexts:
const SSIZE: isize = 48;

There seems to be an issue in OSX using such a small stack. The minimum for this code to run is a stack size of 624 bytes . The code works on Rust Playground as written here if you want to follow this exact example (however you'll need to wait ~30 seconds for it to time out due to our loop in the end).

Then let's add a struct that represents our CPU state. We only focus on the register that stores the "stack pointer" for now since that is all we need:
#[derive(Debug, Default)]
struct ThreadContext {
rsp: u64,

In later examples we will use all the registers marked as "callee saved" in the specification document i linked to. These are the registers described in the x86-64 ABI that we'll need to save our context, but right now we only need one register to make the CPU jump over to our stack.

Note that this needs to be #[repr(C)] because we access the data the way we do in our assembly. Rust doesn't have a stable ABI so there is no way for us to be sure that this will be represented in memory with rsp as the first 8 bytes. C has a stable ABI and that's exactly what this attribute tells the compiler to use. Granted, our struct only has one field right now but we will add more later.
fn hello() -> ! {
loop {}

For this very simple example we will define a function that just prints out a message and then loops forever:

Next up is our inline assembly where we switch over to our own stack.

unsafe fn gt_switch(new: *const ThreadContext) {
mov 0x00($0), %rsp
: "r"(new)
: "alignstack" // it will work without this now, will need it later

We use a trick here. We write the address of the function we want to run on our new stack. Then we pass the address of the first byte where we stored this address to the rsp register (the address we set to new.rsp will point to an address located on our own stack that leads to the function above). Got it?

The ret keyword transfers program control to the return address located on top of the stack. Since we pushed our address to the %rsp register, the CPU will think that is the return address of the function it's currently running so when we pass the ret instruction it returns directly into our own stack.

The first thing the CPU does is read the address of our function and runs it.

Quick introduction to Rusts inline assembly macro

If you haven't used inline assembly before this might look foreign, but we'll use an extended version of this later to switch contexts so I'll explain what we're doing line by line:

unsafe is a keyword that indicates that Rust cannot enforce the safety guarantees in the function we write. Since we are manipulating the CPU directly this is most definitely unsafe

gt_switch(new: *const ThreadContext)

Here we take a pointer to an instance of our ThreadContext from which we will only read one field.


This is the llvm_asm!macro in the Rust standard library. It will check our syntax and provide an error message if it encounters something that doesn't look like valid AT&T (by default) Assembly syntax.

The first thing the macro takes as input is the assembly template:

mov 0x00($0), %rsp

This is a simple instruction that moves the value stored at 0x00 offset (that means no offset at all in hex) from the memory location at $0 to the rsp register. Since the rsp register stores a pointer to the next value on the stack, we effectively push the address we provide it on top of the current stack overwriting whats already there.

You will not see $0 used like this in normal assembly code. This is part of the assembly template and is a placeholder for the first parameter. The parameters are numbered from 0, 1, 2… starting with the output parameters and then moving on to the input parameters. We only have one input parameter here which corresponds to $0.

If you encounter $ in assembly it most likely means an immediate value (an integer constant) but that depends (yeah, the $ can mean different things between dialects and between x86 assembly and x86-64 assembly).


The ret keyword instructs the CPU to pop a memory location off the top of the stack and then makes an unconditional jump to that location. In effect we have hijacked our CPU and made it return to our stack.


Inline ASM is a bit different from plain ASM. There are four additional parameters we pass in after the assembly template. This is the first one called output and is where we pass in our output parameters which are parameters we want to use as return values in our Rust function.

: "r"(new)

The second is our input parameter. The "r" literal is what is called a constraint when writing inline assembly. You can use these constraints to effectively direct where the compiler can decide to put your input (in one of the registers as a value or use it as a "memory" location for example). "r" simply means that this is put in a general purpose register that the compiler chooses. Constraints in inline assembly is a pretty big subject themselves, fortunately we have pretty simple needs.

clobber list

The next option is the clobber list where you specify what registers the compiler should not touch and let it know we want to manage these in our assembly code. If we pop any values of the stack we need to specify what registers here and let the compiler know so it know it can't use these registers freely. We don't need that here since we return to a brand new stack.

: "alignstack"

The last one is our options. These are unique for Rust and there are three options we can set: alignstack, volatile and intel. I'll refer you to the documentation to read about them since they're explained there. Worth noting is that we need to specify the "alignstack" for the code to work on Windows.

Running our example

fn main() {
let mut ctx = ThreadContext::default();
let mut stack = vec![0_u8; SSIZE as usize];
unsafe {
let stack_bottom = stack.as_mut_ptr().offset(SSIZE);
let sb_aligned = (stack_bottom as usize & !15) as *mut u8;
std::ptr::write(sb_aligned.offset(-16) as *mut u64, hello as u64);
ctx.rsp = sb_aligned.offset(-16) as u64;
gt_switch(&mut ctx);

So this is actually designing our new stack. hello is a pointer already (a function pointer) so we can cast it directly as an u64 since all pointers on 64 bits systems will be, well, 64 bit, and then we write this pointer to our new stack.

We'll talk more about the stack in the next chapter but one thing we need to know already now is that the stack grows downwards. If our 48 byte stack starts at index 0, and ends on index 47, index 32 will be the first index of a 16 byte offset from the start/base of our stack.

Make note that we write the pointer to an the offset of 16 bytes from the base of our stack (remember what I wrote about 16 byte alignment?).

What does the linelet sb_aligned = (stack_bottom as usize &! 15) as *mut u8;do?

When we ask for memory like we do when creating a Vec<u8>, there is no guarantee that the memory we get is 16-byte aligned when we get it. This line of code essentially rounds our memory address down to the nearest 16-byte aligned address. If it's already 16-byte aligned it does nothing.

We cast it as a pointer to an u64 instead of a pointer to a u8. We want to write to position 32, 33, 34, 35, 36, 37, 38, 39 which is the 8 byte space we need to store our u64. If we don't do this cast we try to write an u64 only to position 32 which is not what we want.

We set the rsp (Stack Pointer) to the memory address of index 32 in our stack, we don't pass the value of the u64 stored at that location but an address to the first byte.

When we cargo run this code we get:

Finished dev [unoptimized + debuginfo] target(s) in 0.58s
Running `target\debug\green_thread_start.exe`

OK, so what happened? We didn't call the function hello at any point but it still was run. What happened is that we actually made the CPU jump over to our own stack and execute code there. We have taken the first step towards implementing a context switch.

In the next chapters we will talk about the stack a bit before we implement our green threads, it will be easier now that we have covered so much of the basics.

You can have a look at the full code here if you want to run it.