Green threads, userland threads, goroutines or fibers, they have many names but for simplicity's sake I'll refer to them all as green threads from now on.
In this article I want to explore how they work by implementing a very simple example where we create our own green threads in 200 lines of Rust code. We'll be explaining everything along the way so our main focus here is to understand them and learn how they work by using simple, but working example.
We are peeking down the rabbit hole in this article so if that sounds scary, this article probably isn't for you. Just go back and live happily ever after.
If you are the curious kind and want to understand how things work, then read on. Maybe you've heard of Go and its goroutines, or the equivalent in Ruby or Julia and you know how to use them but want to know how they work - well then read on.
In addition, this should be interesting if:
You're new to Rust and want to learn more about its features.
You have followed the discussions in the Rust community about async/await, the Pin-API and why we need generators. In this case I try to put all the pieces together in this article.
If you want to learn the basics of inline assembly in Rust.
If you're just curious.
Well, join me as we try to figure out everything we need to understand them.
You don't have to be a Rust programmer to understand this article but it is highly recommended to read some of the basic syntax first. If you want to follow a long or clone the repo and play around with the code you should probably get Rust and learn the basics.
All the code I provide here is in a single file and has no dependencies which means that you can easily start your own project and follow along if you want to (I suggest you do). You can even run most of the code in the Rust playground. Just remember to use the
nightly version of the compiler.
Currently there is an issue I have with the
llvm_asm!macro that doesn't compile in release mode. It seems to be related to the
"=m"constraint I use in the inline macro.
Edit 2019-06-21: I've decided to work around this and change the inline assembly to compile and run on release builds.
I've tested the code on OSX, Linux and Windows.
I'm not trying to make a perfect implementation here. I'm cutting corners to get down to the essence and fit it into what was originally intended to be an article but expanded into a small book. This is not the best way of displaying Rusts greatest strengths, its safety guarantees, but it does show an interesting use of Rust and the code is mostly pretty clean and easy to follow.
However, if you spot places where I can make the code safer without making it significantly more complex, I welcome you to create an issue in the repo or even better, a pull request.
2020-08-04: Thanks to ziyi-yan which identified an issue with the
guard function not being 16-byte aligned the example is now correct and can be used as a basis for more advanced experimentation. See the relevant issue in the example repo for more information.
2020-05-20: Changed to the
llvm_asm! macro since we use the syntax used by llvm in this book. The new Rust syntax for inline assembly has now been merged into the nightly compiler and uses the
asm! macro, which would have caused problems for the examples in this book.
2019-06-18: New chapter implementing a proper Windows support
2019-06-21: Rather substantial change and cleanup. An issue was reported that Valgrind reported some troubles with the code and crashed. This is now fixed and there are currently no unsolved issues. In addition, the code now runs on both
releasebuilds without any issues on all platforms. Thanks to everyone for reporting issues they found.
2019-06-26: The Supporting Windows appendix treated the
XMMfields as 64 bits, but they are 128 bits which was an oversight on my part. Correcting this added some interesting material to that chapter but unfortunately also some complexity. However, it's now corrected and explained.
2019-22-12: Added one line of code to make sure the memory we get from the allocator is 16 byte aligned. Refactored to use the "high" memory address as basis for offsets when writing to the stack since this made alignment easier. Thanks to @Veetaha for addressing this issue.