Background Information
This is the most technical part of this book but if we truly want to understand, we just have to go through it. I promise I will make it as fast and to the point as possible. We'll soon enough move on to the code.
So here we go! First of all, we are going to interfere and control the CPU directly. This is not extremely portable since there are many kinds of CPU's out there. The main ideas are the same, but a small part of the implementation details will differ.
We will cover one of the more commonly used architectures: x86-64.
In this architecture the CPU features a set of 16 registers:
Click the picture to view an enlarged view
If you're interested you can find the rest of the specification here:
Out of interest for us right now is the registers marked as "callee saved". These are the registers that keep track of our context: the next instructions to run, the base pointer, the stack pointer and so on. We'll get to know this more in detail later.
If we want to direct the CPU directly we need some minimal code written in Assembly, fortunately we only need to know some very basic assembly instructions for our mission. How to move values to and from registers:
mov rax, rsp
Windows has a slightly different convention. On Windows the registers XMM6:XMM15 is also callee-saved and must be saved and restored if our functions use them. Our code runs fine on Windows even if we only use the psABI convention in this example.
There are one more subtle difference as well that you can read about in Appendix: Supporting Windows where we go through everything. You can follow along anyway, since everything will work on Windows, but it will not be a correct implementation.

A super quick introduction to Assembly

First and foremost. Assembly language is not very portable, each CPU might have a special set of instructions, however some are common on most desktop computers today.
There are two popular dialects: AT&T dialect and Intel dialect.
The Intel dialect is the standard when writing inline assembly in Rust, but in Rust we can specify that we want to use the "AT&T" dialect instead if we want to. Rust has its own take on how to do inline assembly that will at first glance look foreign to anyone used to inline assembly in C. It's well thought through though, and I'll spend a bit of time to explain it more thoroughly as we go through the code so both readers with experience with the C-type inline assembly and readers who have no experience should be able to follow along.
We will use the Intel dialect in our examples.
Assembly has a strong backwards compatibility guarantee. That's why you will see that the same registers are addressed in different ways. Let's look at the rax register we used as an example above for an explanation:
rax # 64 bit register (8 bytes)
eax # 32 low bits of the "rax" register
ax # 16 low bits of the "rax" register
ah # 8 high bits of the "ax" part of the "rax" register
al # 8 low bits of the "ax" part of the "rax" register
As you can see, this is basically watching the history of CPUs evolve in front of us. Since most CPUs today are 64 bits, we will use the 64 bit registers in our code.
The word size in assembly also has historical reasons. It stems from the time when the CPU had 16 bit data buses, so a word is 16 bits. This is relevant because in the AT&T dialect you will see many instructions suffixed with "q" (quad-word), or "l" (long-word). So a movq would mean a move of 4 * 16 bits = 64 bits.
A plain mov will use the size of the register you use. This is the standard in both AT&T and the Intel dialect used in inline assembly, and the one we will use in our code.
We will go through a bit more of the syntax of inline assembly in the next chapter.
One more thing to note is that the stack alignment on x86-64 is 16 bytes. Just remember this for later.