The Riscvonomicon (pronounced "risk five o-nomicon") provides a reference for using the RISC-V instruction set and the Rust programming language together. It contains information about:
- Utilizing Rust and RISC-V specific crates, tools and environments
- Building, testing, fuzzing and formally verify code for RISC-V written in Rust
- Utilizing RISC-V instructions, intrinsics and extensions in Rust
This book assumes familiarity with both the RISC-V instruction set and the Rust programming language, and is made to extend the Embedonomicon, Embedded Rust Book and Rustonomicon. Some chapters within this book provide information which generally useful for RISC-V or Rust.
This project is not an official Rust project, nor is it an official RISC-V project.
Getting Started
To get started with using RISC-V and Rust together, we will need a couple of things.
- Cross-compilation linker
- ISA emulation environment
- Rust toolchain for RISC-V
Cross-compilation linker
It is generally advised to install the GNU RISC-V Toolchain for the corresponding target. This differs between from target to target. Generally, it is advised to install both the RV32 and RV64 versions. These can found here.
Below is a table for different platforms and which commands can be executed to get the RISC-V GNU Toolchain.
Platform | Command | Link |
---|---|---|
ArchLinux | paru -S riscv-gnu-toolchain-bin | Link |
Ubuntu Linux | Install from source | |
macOS | brew tap riscv-software-src/riscv && brew install riscv-tools | Link |
ISA emulation environment
- QEMU
- Spike
Rust toolchain for RISC-V
Rust supports several variants of the RISC-V instruction set. All the targets can be listed with the following command.
rustup target list | grep '^riscv'
The target determines the register sizes and which instructions to use. For
example, the riscv32imac-unknown-none-elf
target includes the m
(Multiplication), a
(Atomics) and c
(Compressed) instructions, where the
riscv32imac-unknown-none-elf
target only includes the base instructions.
To install the toolchain for a specific target, you can run the following command.
# Replace 'riscv64gc-unknown-linux-gnu' with the desired target
rustup target add riscv64gc-unknown-linux-gnu
Bare Metal
Getting started with Bare Metal.
# 32-bit target
# Alternatives: riscv32imac, riscv32imc
rustup target add riscvi-unknown-none-elf
# 64-bit target
# Alternatives: riscv64imac
rustup target add riscv64gc-unknown-none-elf
A very minimal example of a binary can ran as follows:
#![allow(unused)] #![no_std] #![no_main] fn main() { #[panic_handler] fn panic_handler(_info: &core::panic::PanicInfo) -> ! { loop {} } #[no_mangle] extern "C" fn _start() -> ! { // NOTE: // The `.data` and `.bss` sections are not initiialized. // You might use the `r0` crate for this. Take a look at the section // on this crate. loop {} } }
A safer way that handles all initialization for you is to use the riscv-rt
crate.
cargo add riscv-rt panic-halt
#![no_std] #![no_main] extern crate panic_halt; use riscv_rt::entry; #[entry] fn main() -> ! { loop {} }
Cargo and RISC-V
Building
Running
RISC-V 32-bit
To run 32-bit RISC-V code, it is possible to use the QEMU emulator. How to
test and run this code, depends on the contents of this code. For libraries that
contain arithmetic instructions, it is possible to use
ralte32
or the
riscv32gc-unknown-linux-gnu
nightly target.
RISC-V 64-bit
Fuzz
Formally Verify
riscv-rt
crate
The riscv-rt
crate provides a "Minimal runtime / startup for RISC-V CPU's". It makes sure that:
- Set up
.data
,.bss
sections correctly - Set up traps / interrupts in the correct place
- Allocate a stack per hardware thread (hart)
#![no_std] #![no_main] extern crate panic_halt; use riscv_rt::entry; #[entry] fn main() -> ! { loop {} }
Reset to main
on riscv-rt
This section describes all the code that gets executed from a CPU reset to the main
function.
The link.x
file defines the regions of the resulting [ELF] file. What is important that the the order of items in the .text
section of the binary
.section .init, "ax"
.global _start
_start:
The .section .init
ensures that the following code is put in the .init
section, which the link.x
puts at the start of the .text
region. Then, we define a _start
symbol and label this section to begin here. The _start
section is generally assumed to be the entry point for [ELF] binaries.
// Only for rv32
lui ra, %hi(_abs_start) // ra <- addr_of(_abs_start) & 0xFFFF_F000
jr %lo(_abs_start)(ra)
// Only for rv64
.option push
.option norelax // to prevent an unsupported R_RISCV_ALIGN relocation from being generated
1:
auipc ra, %pcrel_hi(1f)
ld ra, %pcrel_lo(1b)(ra)
jr ra
.align 3
1:
.dword _abs_start
.option pop
_abs_start:
This code seems quite crazy and uses some strange assembly syntax. Let us first dive into the 32-bit version as it is the easier one.
It consists of two instructions. The first instruction loads the upper 20 bits of the _abs_start
symbol, which is defined at the end, into the ra
(return address) register. The second instruction jumps to the ra
register offset by the lower 12 bits of
_abs_start:
.option norelax
.cfi_startproc
.cfi_undefined ra
#[cfg(feature = "s-mode")]
{
csrw sie, 0
csrw sip, 0
}
#[cfg(not(feature = "s-mode"))]
{
csrw mie, 0
csrw mip, 0
}
li x1, 0
li x2, 0
li x3, 0
li x4, 0
li x5, 0
li x6, 0
li x7, 0
li x8, 0
li x9, 0
// a0..a2 (x10..x12) skipped
li x13, 0
li x14, 0
li x15, 0
li x16, 0
li x17, 0
li x18, 0
li x19, 0
li x20, 0
li x21, 0
li x22, 0
li x23, 0
li x24, 0
li x25, 0
li x26, 0
li x27, 0
li x28, 0
li x29, 0
li x30, 0
li x31, 0
.option push
.option norelax
la gp, __global_pointer$
.option pop",
#[cfg(all(not(feature = "single-hart"), feature = "s-mode"))]
"mv t2, a0 // the hartid is passed as parameter by SMODE",
#[cfg(all(not(feature = "single-hart"), not(feature = "s-mode")))]
"csrr t2, mhartid",
#[cfg(not(feature = "single-hart"))]
"lui t0, %hi(_max_hart_id)
add t0, t0, %lo(_max_hart_id)
bgtu t2, t0, abort",
"// Allocate stacks
la sp, _stack_start
lui t0, %hi(_hart_stack_size)
add t0, t0, %lo(_hart_stack_size)",
#[cfg(all(not(feature = "single-hart"), riscvm))]
"mul t0, t2, t0",
#[cfg(all(not(feature = "single-hart"), not(riscvm)))]
"beqz t2, 2f // Jump if single-hart
mv t1, t2
mv t3, t0
1:
add t0, t0, t3
addi t1, t1, -1
bnez t1, 1b
2: ",
"sub sp, sp, t0
// Set frame pointer
add s0, sp, zero
jal zero, _start_rust
.cfi_endproc",
https://twilco.github.io/riscv-from-scratch/2019/04/27/riscv-from-scratch-2.html
Vector (V) Extension
Memory Operations
Loads:
- Unit Strided (VLE)
- Constant-Strided (VLSE)
- Indexed (VLUXEI, VLOXEI)
Stores:
- Unit Strided (VSE)
- Constant-Strided (VSSE)
- Indexed (VSUXEI, VSOXEI)
Array Element-Wise Addition
#![allow(unused)] fn main() { fn array_addition(mut x: &[u8], mut y: &[u8], result: &mut [u8]) { assert!(x.len() == y.len()); assert!(x.len() == result.len()); let mut ax = &[]; let mut ay = &[]; let mut aresult = &mut []; let mut vl = x.len(); loop { let avl = setvli(vl, Element::E8, LMul::M1); (cx, x) = x.split_at(avl); (cy, y) = y.split_at(avl); (cresult, result) = result.split_at_mut(avl); let vx = vle8_vv(ax, avl); let vy = vle8_vv(ay, avl); let vresult = vadd_vv(vx, vy, avl); vse8_vv(vresult, aresult); vl -= avl; } } }
Overview of RISC-V Scalar Cryptography (Zk) Extension
The RISC-V Scalar Cryptography is a small extension that helps embedded and application processors to reduce code size, reduce the energy consumption and reduce the execution time of cryptographic code. The extension consists of five parts.
- Bit-manipulation instructions for cryptography (
Zbkx
,Zbkc
andZbkb
). Zks
which defines the instructions relating to the ShangMi Suite. This includes the SM3 hash function and the SM4 block cipher.Zkn
defines instructions for the NIST Suite cryptographic primitives including AES block cipher and the SHA-2 hash function.Zkr
defines a CSR for a hardware entropy source. This can be used as a secure source of randomness.Zkt
specification for constant time execution of specific instructions.
This chapter talks about these parts and how they can be used.
Sources
Instruction | 32-bit | 64-bit | Description |
---|---|---|---|
ror | x | x | Rotate Right by register value |
rol | x | x | Rotate Left by register value |
rori | x | x | Rotate Right by immediate value |
rorw | x | Rotate Word Right by register value | |
rolw | x | Rotate Word Left by register value | |
roriw | x | Rotate Word Right by immediate value | |
andn | x | x | Bitwise And & Negate |
orn | x | x | Bitwise Or & Negate |
xnor | x | x | Exclusive-Not-Or |
pack | x | x | Pack register from two register low-halves |
packh | x | x | Pack register halfword from two register low-bytes |
packw | x | Pack register word from two register low-halfwords | |
brev8 | x | x | Reverse bits within bytes |
rev8 | x | x | Reverse bytes within register |
zip | x | Zip upper and lower register halves into odd and even bits | |
unzip | x | Unzip odd and even bits into upper and lower register halves | |
clmul | x | x | |
clmulh | x | x | |
xperm8 | x | x | |
xperm4 | x | x |
NIST Suite: Encryption & Decryption (Zkned)
The Zkned set contains instructions for the AES block cipher. The extension
defines 4 instructions for riscv32
and 7 instructions for riscv64
. These
instructions can be used to implement AES-128, AES-196 and AES-256. The table
below lists all the instructions that are defined by the Zkned extension.
32-bit | 64-bit | Usage |
---|---|---|
aes32dsi | aes64ds | Decryption Final Round |
aes32dsmi | aes64dsm | Decryption Middle Round |
aes32esi | aes64es | Encryption Final Round |
aes32esmi | aes64esm | Encryption Middle Round |
aes64ks1i | Key Schedule | |
aes64ks2 | Key Schedule | |
aes64im | Decryption Key Schedule |
This section contains usage examples for the 32-bit instructions and for the 64-bit instructions. These implementations can also be found in the GitHub repository with the examples for the entire Zk extension.
32-bit AES
This section explains how to use the aes32esmi
, aes32esi
, aes32dsi
and
aes32dsmi
instructions in the Zkne
extension to simplify and speed up the
implementation of the Advanced Encryption Standard. The instructions can
be used to implement AES128, AES196 and AES256. A talk at the RISC-V summit1
claims a speed-up of ~4x and a code size reduction of 0.3x1.
⚠️ WARNING ⚠️
It is especially difficult to implement cryptography correctly and securely. If you can use a existing implementation that has been battle tested, you probably should. Still, this page exists to show how you would go about using this extension.
Encryption
The aes32esmi
instruction helps with implementing the middle rounds of AES.
It performs a byte substitution, mixing of columns and adding the roundkey. The
aes32esi
instruction is used for the last round of the AES and performs a
byte substitution and adding the roundkey. An rust equivalent implementation of
the instructions would look like:
#![allow(unused)] fn main() { static SBOX: [u8; 256] = [ // ... ]; fn xt2(x: u8) -> u8 { (x << 1) ^ if x & 0x80 != 0 { 0x1B } else { 0x00 } } // Galois Field Multiplication for y in [[0..16]] fn gfmul(x: u8, y: u8) -> u8 { let mut out = 0; let mut mask = x; for i in 0..4 { if y & (1 << i) != 0 { out ^= mask; } mask = xt2(x); } mask } fn aes32esmi(rs1: u32, rs2: u32, bs: u8) -> u32 { let shift_amount = bs * 8; // Substitution let sub_input = (rs2 >> shift_amount) & 0xFF; let sub_output = SBOX[sub_input as usize] as u8; // Mix Columns let mixed = u32::from_be_bytes([ gfmul(sub_output, 0x3), sub_output, sub_output, gfmul(sub_output, 0x2), ]); // Add Roundkey rs1 ^ mixed.rotate_left(shift_amount) } fn aes32esi(rs1: u32, rs2: u32, bs: u8) -> u32 { let shift_amount = bs * 8; // Substitution let sub_input = (rs2 >> shift_amount) & 0xFF; let sub_output = SBOX[sub_input as usize] as u32; // Add Roundkey rs1 ^ (sub_output << shift_amount) } }
Middle Round implementation
This can be used to implement an encryption middle encryption round, where rk
is an array of the roundkeys and block
is the input state. Note, how in the
following code example it manually handles the shifting of rows.
#![allow(unused)] fn main() { // Block and RoundKey contain little-endian encoded rows let RoundKey(mut a0, mut a1, mut a2, mut a3) = rk[i]; a0 = aes32esmi(a0, block.0, 0); a0 = aes32esmi(a0, block.1, 1); a0 = aes32esmi(a0, block.2, 2); a0 = aes32esmi(a0, block.3, 3); a1 = aes32esmi(a1, block.1, 0); a1 = aes32esmi(a1, block.2, 1); a1 = aes32esmi(a1, block.3, 2); a1 = aes32esmi(a1, block.0, 3); a2 = aes32esmi(a2, block.2, 0); a2 = aes32esmi(a2, block.3, 1); a2 = aes32esmi(a2, block.0, 2); a2 = aes32esmi(a2, block.1, 3); a3 = aes32esmi(a3, block.3, 0); a3 = aes32esmi(a3, block.0, 1); a3 = aes32esmi(a3, block.1, 2); a3 = aes32esmi(a3, block.2, 3); block = Block(a0, a1, a2, a3); }
Final Round implementation
Similarly to the Middle Round implementation,
the final round is implemented. Here, the aes32esmi
instruction is replaced
by the aes32esi
instruction.
#![allow(unused)] fn main() { // Block and RoundKey contain little-endian encoded rows let RoundKey(mut a0, mut a1, mut a2, mut a3) = rk[i]; a0 = aes32esi(a0, block.0, 0); a0 = aes32esi(a0, block.1, 1); a0 = aes32esi(a0, block.2, 2); a0 = aes32esi(a0, block.3, 3); a1 = aes32esi(a1, block.1, 0); a1 = aes32esi(a1, block.2, 1); a1 = aes32esi(a1, block.3, 2); a1 = aes32esi(a1, block.0, 3); a2 = aes32esi(a2, block.2, 0); a2 = aes32esi(a2, block.3, 1); a2 = aes32esi(a2, block.0, 2); a2 = aes32esi(a2, block.1, 3); a3 = aes32esi(a3, block.3, 0); a3 = aes32esi(a3, block.0, 1); a3 = aes32esi(a3, block.1, 2); a3 = aes32esi(a3, block.2, 3); block = Block(a0, a1, a2, a3); }
Decryption
#![allow(unused)] fn main() { }
Key Schedule implementation
To implement the key schedule, we can also use the aes32esi
instruction. This
prevents the need for a substitution table in software. The implementation
differs slightly between AES128, AES196 and AES256 and therefore all three
implementations are given separately.
#![allow(unused)] fn main() { pub struct AES128Key(u32, u32, u32, u32); pub struct AES196Key(u32, u32, u32, u32, u32, u32); pub struct AES256Key(u32, u32, u32, u32, u32, u32, u32, u32); pub struct RoundKey(u32, u32, u32, u32); fn aes128_key_schedule(ck: AES128Key) -> [RoundKey; 11] { let mut rk = [0u32; 11 * 4]; let AES128Key( mut t0, mut t1, mut t2, mut t3, ) = ck; let mut i = 0; loop { rk[(i << 2) + 0] = t0; rk[(i << 2) + 1] = t1; rk[(i << 2) + 2] = t2; rk[(i << 2) + 3] = t3; if i == 10 { break; } t0 ^= u32::from(RCON[i]); let tr = t3.rotate_right(8); t0 = aes32esi(t0, tr, 0); t0 = aes32esi(t0, tr, 1); t0 = aes32esi(t0, tr, 2); t0 = aes32esi(t0, tr, 3); t1 ^= t0; t2 ^= t1; t3 ^= t2; i += 1; } // SAFETY: We know that rk has 13 * 4 times a u32. So it has space for 13 RoundKeys unsafe { core::mem::transmute(rk) } } fn aes196_key_schedule(ck: AES196Key) -> [RoundKey; 13] { let mut rk = [0u32; 13 * 4]; let AES196Key( mut t0, mut t1, mut t2, mut t3, mut t4, mut t5, ) = ck; let mut i = 0; loop { rk[i * 6 + 0] = t0; rk[i * 6 + 1] = t1; rk[i * 6 + 2] = t2; rk[i * 6 + 3] = t3; if i == 8 { break; } rk[i * 6 + 4] = t4; rk[i * 6 + 5] = t5; t0 ^= u32::from(RCON[i]); let tr = t5.rotate_right(8); t0 = aes32esi(t0, tr, 0); t0 = aes32esi(t0, tr, 1); t0 = aes32esi(t0, tr, 2); t0 = aes32esi(t0, tr, 3); t1 ^= t0; t2 ^= t1; t3 ^= t2; t4 ^= t3; t5 ^= t4; i += 1; } // SAFETY: We know that rk has 13 * 4 times a u32. So it has space for 13 RoundKeys unsafe { core::mem::transmute(rk) } } fn aes256_key_schedule(ck: AES256Key) -> [RoundKey; 15] { let mut rk = [0u32; 15 * 4]; let AES256Key( mut t0, mut t1, mut t2, mut t3, mut t4, mut t5, mut t6, mut t7, ) = ck; let mut i = 0; loop { rk[i * 8 + 0] = t0; rk[i * 8 + 1] = t1; rk[i * 8 + 2] = t2; rk[i * 8 + 3] = t3; if i == 7 { break; } rk[i * 8 + 4] = t4; rk[i * 8 + 5] = t5; rk[i * 8 + 6] = t6; rk[i * 8 + 7] = t7; t0 ^= u32::from(RCON[i]); let tr = t7.rotate_right(8); t0 = aes32esi(t0, tr, 0); t0 = aes32esi(t0, tr, 1); t0 = aes32esi(t0, tr, 2); t0 = aes32esi(t0, tr, 3); t1 ^= t0; t2 ^= t1; t3 ^= t2; t4 = aes32esi(t4, t3, 0); t4 = aes32esi(t4, t3, 1); t4 = aes32esi(t4, t3, 2); t4 = aes32esi(t4, t3, 3); t5 ^= t4; t6 ^= t5; t7 ^= t6; i += 1; } // SAFETY: We know that rk has 15 * 4 times a u32. So it has space for 15 RoundKeys unsafe { core::mem::transmute(rk) } } fn aes_decrypt_key_schedule<const KEYS: usize>(rk: &mut [RoundKey; KEYS]) { fn subkey(mut x: u32) -> u32 { let mut y; unsafe { y = aes32esi(0, x, 0); y = aes32esi(y, x, 1); y = aes32esi(y, x, 2); y = aes32esi(y, x, 3); x = aes32dsmi(0, y, 0); x = aes32dsmi(x, y, 1); x = aes32dsmi(x, y, 2); x = aes32dsmi(x, y, 3); } x } for k in &mut rk[1..KEYS - 1] { unsafe { k.0 = subkey(k.0); k.1 = subkey(k.1); k.2 = subkey(k.2); k.3 = subkey(k.3); } } } fn aes128_decrypt_key_schedule(rk: &mut [RoundKey; 11]) { aes_decrypt_key_schedule::<11>(rk) } fn aes196_decrypt_key_schedule(rk: &mut [RoundKey; 13]) { aes_decrypt_key_schedule::<13>(rk) } fn aes256_decrypt_key_schedule(rk: &mut [RoundKey; 15]) { aes_decrypt_key_schedule::<15>(rk) } }
64-bit AES
32-bit | 64-bit |
---|---|
sha256sig0 | sha256sig0 |
sha256sig1 | sha256sig1 |
sha256sum0 | sha256sum0 |
sha256sum1 | sha256sum1 |
sha512sig0h | sha512sig0 |
sha512sig0l | sha512sig1 |
sha512sig1h | sha512sum0 |
sha512sig1l | sha512sum1 |
sha512sum0r | |
sha512sum1r |
Instruction |
---|
sm4ed |
sm4ks |
Instruction |
---|
sm4ed |
sm4ks |