Ferris holding a RISC-V logo

The Riscvonomicon (pronounced "risk five o-nomicon") provides a reference for using the RISC-V instruction set and the Rust programming language together. It contains information about:

Utilizing Rust and RISC-V specific crates, tools and environments
Building, testing, fuzzing and formally verify code for RISC-V written in Rust
Utilizing RISC-V instructions, intrinsics and extensions in Rust

This book assumes familiarity with both the RISC-V instruction set and the Rust programming language, and is made to extend the Embedonomicon, Embedded Rust Book and Rustonomicon. Some chapters within this book provide information which generally useful for RISC-V or Rust.

This project is not an official Rust project, nor is it an official RISC-V project.

Getting Started

To get started with using RISC-V and Rust together, we will need a couple of things.

Cross-compilation linker
ISA emulation environment
Rust toolchain for RISC-V

Cross-compilation linker

It is generally advised to install the GNU RISC-V Toolchain for the corresponding target. This differs between from target to target. Generally, it is advised to install both the RV32 and RV64 versions. These can found here.

Below is a table for different platforms and which commands can be executed to get the RISC-V GNU Toolchain.

Platform	Command	Link
ArchLinux	`paru -S riscv-gnu-toolchain-bin`	Link
Ubuntu Linux	Install from source
macOS	`brew tap riscv-software-src/riscv && brew install riscv-tools`	Link

ISA emulation environment

QEMU
Spike

Rust toolchain for RISC-V

Rust supports several variants of the RISC-V instruction set. All the targets can be listed with the following command.

rustup target list | grep '^riscv'

The target determines the register sizes and which instructions to use. For example, the riscv32imac-unknown-none-elf target includes the m (Multiplication), a (Atomics) and c (Compressed) instructions, where the riscv32imac-unknown-none-elf target only includes the base instructions.

To install the toolchain for a specific target, you can run the following command.

# Replace 'riscv64gc-unknown-linux-gnu' with the desired target
rustup target add riscv64gc-unknown-linux-gnu

Bare Metal

Getting started with Bare Metal.

# 32-bit target
# Alternatives: riscv32imac, riscv32imc
rustup target add riscvi-unknown-none-elf

# 64-bit target
# Alternatives: riscv64imac
rustup target add riscv64gc-unknown-none-elf

A very minimal example of a binary can ran as follows:

#![allow(unused)]
#![no_std]
#![no_main]

fn main() {
#[panic_handler]
fn panic_handler(_info: &core::panic::PanicInfo) -> ! {
	loop {}
}

#[no_mangle]
extern "C" fn _start() -> ! {
	// NOTE:
	// The `.data` and `.bss` sections are not initiialized.
	// You might use the `r0` crate for this. Take a look at the section
	// on this crate.

	loop {}
}
}

A safer way that handles all initialization for you is to use the riscv-rt crate.

cargo add riscv-rt panic-halt

#![no_std]
#![no_main]

extern crate panic_halt;

use riscv_rt::entry;

#[entry]
fn main() -> ! {
	loop {}
}

Cargo and RISC-V

Building

Running

RISC-V 32-bit

To run 32-bit RISC-V code, it is possible to use the QEMU emulator. How to test and run this code, depends on the contents of this code. For libraries that contain arithmetic instructions, it is possible to use ralte32 or the riscv32gc-unknown-linux-gnu nightly target.

RISC-V 64-bit

Fuzz

Formally Verify

`riscv-rt` crate

Repository | Documentation

The riscv-rt crate provides a "Minimal runtime / startup for RISC-V CPU's". It makes sure that:

Set up.data, .bss sections correctly
Set up traps / interrupts in the correct place
Allocate a stack per hardware thread (hart)

#![no_std]
#![no_main]

extern crate panic_halt;

use riscv_rt::entry;

#[entry]
fn main() -> ! {
	loop {}
}

Reset to `main` on `riscv-rt`

This section describes all the code that gets executed from a CPU reset to the main function.

The link.x file defines the regions of the resulting [ELF] file. What is important that the the order of items in the .text section of the binary

.section .init, "ax"
.global _start
_start:

The .section .init ensures that the following code is put in the .init section, which the link.x puts at the start of the .text region. Then, we define a _start symbol and label this section to begin here. The _start section is generally assumed to be the entry point for [ELF] binaries.

// Only for rv32
lui ra, %hi(_abs_start) // ra <- addr_of(_abs_start) & 0xFFFF_F000
jr %lo(_abs_start)(ra)
	
// Only for rv64
.option push
.option norelax // to prevent an unsupported R_RISCV_ALIGN relocation from being generated
1:
	auipc ra, %pcrel_hi(1f)
	ld ra, %pcrel_lo(1b)(ra)
	jr ra
.align  3
1:
.dword _abs_start
.option pop
_abs_start:

This code seems quite crazy and uses some strange assembly syntax. Let us first dive into the 32-bit version as it is the easier one.

It consists of two instructions. The first instruction loads the upper 20 bits of the _abs_start symbol, which is defined at the end, into the ra (return address) register. The second instruction jumps to the ra register offset by the lower 12 bits of

_abs_start:
    .option norelax
    .cfi_startproc
    .cfi_undefined ra
    #[cfg(feature = "s-mode")]
    {
	    csrw sie, 0
	    csrw sip, 0
	}
	
    #[cfg(not(feature = "s-mode"))]
    {
	    csrw mie, 0
	    csrw mip, 0
	}
    
    li  x1, 0
    li  x2, 0
    li  x3, 0
    li  x4, 0
    li  x5, 0
    li  x6, 0
    li  x7, 0
    li  x8, 0
    li  x9, 0
    // a0..a2 (x10..x12) skipped
    li  x13, 0
    li  x14, 0
    li  x15, 0
    li  x16, 0
    li  x17, 0
    li  x18, 0
    li  x19, 0
    li  x20, 0
    li  x21, 0
    li  x22, 0
    li  x23, 0
    li  x24, 0
    li  x25, 0
    li  x26, 0
    li  x27, 0
    li  x28, 0
    li  x29, 0
    li  x30, 0
    li  x31, 0

    .option push
    .option norelax
    la gp, __global_pointer$
    .option pop",
    #[cfg(all(not(feature = "single-hart"), feature = "s-mode"))]
    "mv t2, a0 // the hartid is passed as parameter by SMODE",
    #[cfg(all(not(feature = "single-hart"), not(feature = "s-mode")))]
    "csrr t2, mhartid",
    #[cfg(not(feature = "single-hart"))]
    "lui t0, %hi(_max_hart_id)
    add t0, t0, %lo(_max_hart_id)
    bgtu t2, t0, abort",
    "// Allocate stacks
    la sp, _stack_start
    lui t0, %hi(_hart_stack_size)
    add t0, t0, %lo(_hart_stack_size)",
    #[cfg(all(not(feature = "single-hart"), riscvm))]
    "mul t0, t2, t0",
    #[cfg(all(not(feature = "single-hart"), not(riscvm)))]
    "beqz t2, 2f  // Jump if single-hart
    mv t1, t2
    mv t3, t0
1:
    add t0, t0, t3
    addi t1, t1, -1
    bnez t1, 1b
2:  ",
    "sub sp, sp, t0

    // Set frame pointer
    add s0, sp, zero

    jal zero, _start_rust

    .cfi_endproc",

https://twilco.github.io/riscv-from-scratch/2019/04/27/riscv-from-scratch-2.html

Vector (V) Extension

Memory Operations

Loads:

Unit Strided (VLE)
Constant-Strided (VLSE)
Indexed (VLUXEI, VLOXEI)

Stores:

Unit Strided (VSE)
Constant-Strided (VSSE)
Indexed (VSUXEI, VSOXEI)

Array Element-Wise Addition

#![allow(unused)]
fn main() {
fn array_addition(mut x: &[u8], mut y: &[u8], result: &mut [u8]) {
    assert!(x.len() == y.len());
    assert!(x.len() == result.len());

    let mut ax = &[];
    let mut ay = &[];
    let mut aresult = &mut [];

    let mut vl = x.len();
    
    loop {
        let avl = setvli(vl, Element::E8, LMul::M1);

        (cx, x) = x.split_at(avl);
        (cy, y) = y.split_at(avl);
        (cresult, result) = result.split_at_mut(avl);

        let vx = vle8_vv(ax, avl);
        let vy = vle8_vv(ay, avl);

        let vresult = vadd_vv(vx, vy, avl);

        vse8_vv(vresult, aresult);

        vl -= avl;
    }
}
}

Overview of RISC-V Scalar Cryptography (Zk) Extension

The RISC-V Scalar Cryptography is a small extension that helps embedded and application processors to reduce code size, reduce the energy consumption and reduce the execution time of cryptographic code. The extension consists of five parts.

Bit-manipulation instructions for cryptography (Zbkx, Zbkc and Zbkb).
Zks which defines the instructions relating to the ShangMi Suite. This includes the SM3 hash function and the SM4 block cipher.
Zkn defines instructions for the NIST Suite cryptographic primitives including AES block cipher and the SHA-2 hash function.
Zkr defines a CSR for a hardware entropy source. This can be used as a secure source of randomness.
Zkt specification for constant time execution of specific instructions.

This chapter talks about these parts and how they can be used.

Sources

Instruction	32-bit	64-bit	Description
`ror`	x	x	Rotate Right by register value
`rol`	x	x	Rotate Left by register value
`rori`	x	x	Rotate Right by immediate value
`rorw`		x	Rotate Word Right by register value
`rolw`		x	Rotate Word Left by register value
`roriw`		x	Rotate Word Right by immediate value
`andn`	x	x	Bitwise And & Negate
`orn`	x	x	Bitwise Or & Negate
`xnor`	x	x	Exclusive-Not-Or
`pack`	x	x	Pack register from two register low-halves
`packh`	x	x	Pack register halfword from two register low-bytes
`packw`		x	Pack register word from two register low-halfwords
`brev8`	x	x	Reverse bits within bytes
`rev8`	x	x	Reverse bytes within register
`zip`	x		Zip upper and lower register halves into odd and even bits
`unzip`	x	Unzip odd and even bits into upper and lower register halves
`clmul`	x	x
`clmulh`	x	x
`xperm8`	x	x
`xperm4`	x	x

NIST Suite: Encryption & Decryption (Zkned)

The Zkned set contains instructions for the AES block cipher. The extension defines 4 instructions for riscv32 and 7 instructions for riscv64. These instructions can be used to implement AES-128, AES-196 and AES-256. The table below lists all the instructions that are defined by the Zkned extension.

32-bit	64-bit	Usage
`aes32dsi`	`aes64ds`	Decryption Final Round
`aes32dsmi`	`aes64dsm`	Decryption Middle Round
`aes32esi`	`aes64es`	Encryption Final Round
`aes32esmi`	`aes64esm`	Encryption Middle Round
	`aes64ks1i`	Key Schedule
	`aes64ks2`	Key Schedule
	`aes64im`	Decryption Key Schedule

This section contains usage examples for the 32-bit instructions and for the 64-bit instructions. These implementations can also be found in the GitHub repository with the examples for the entire Zk extension.

32-bit AES

This section explains how to use the aes32esmi, aes32esi, aes32dsi and aes32dsmi instructions in the Zkne extension to simplify and speed up the implementation of the Advanced Encryption Standard. The instructions can be used to implement AES128, AES196 and AES256. A talk at the RISC-V summit¹ claims a speed-up of ~4x and a code size reduction of 0.3x¹.

⚠️ WARNING ⚠️

It is especially difficult to implement cryptography correctly and securely. If you can use a existing implementation that has been battle tested, you probably should. Still, this page exists to show how you would go about using this extension.

Encryption

The aes32esmi instruction helps with implementing the middle rounds of AES. It performs a byte substitution, mixing of columns and adding the roundkey. The aes32esi instruction is used for the last round of the AES and performs a byte substitution and adding the roundkey. An rust equivalent implementation of the instructions would look like:

#![allow(unused)]
fn main() {
static SBOX: [u8; 256] = [
    // ...
];

fn xt2(x: u8) -> u8 {
    (x << 1) ^ if x & 0x80 != 0 { 0x1B } else { 0x00 }
}

// Galois Field Multiplication for y in [[0..16]]
fn gfmul(x: u8, y: u8) -> u8 {
    let mut out = 0;
    let mut mask = x;

    for i in 0..4 {
        if y & (1 << i) != 0 {
            out ^= mask;
        }

        mask = xt2(x);
    }

    mask
}

fn aes32esmi(rs1: u32, rs2: u32, bs: u8) -> u32 {
    let shift_amount = bs * 8;

    // Substitution
    let sub_input = (rs2 >> shift_amount) & 0xFF;
    let sub_output = SBOX[sub_input as usize] as u8;

    // Mix Columns
    let mixed = u32::from_be_bytes([
        gfmul(sub_output, 0x3),
        sub_output,
        sub_output,
        gfmul(sub_output, 0x2),
    ]);

    // Add Roundkey
    rs1 ^ mixed.rotate_left(shift_amount)
}

fn aes32esi(rs1: u32, rs2: u32, bs: u8) -> u32 {
    let shift_amount = bs * 8;

    // Substitution
    let sub_input = (rs2 >> shift_amount) & 0xFF;
    let sub_output = SBOX[sub_input as usize] as u32;

    // Add Roundkey
    rs1 ^ (sub_output << shift_amount)
}
}

Middle Round implementation

This can be used to implement an encryption middle encryption round, where rk is an array of the roundkeys and block is the input state. Note, how in the following code example it manually handles the shifting of rows.

#![allow(unused)]
fn main() {
// Block and RoundKey contain little-endian encoded rows
let RoundKey(mut a0, mut a1, mut a2, mut a3) = rk[i];

a0 = aes32esmi(a0, block.0, 0);
a0 = aes32esmi(a0, block.1, 1);
a0 = aes32esmi(a0, block.2, 2);
a0 = aes32esmi(a0, block.3, 3);

a1 = aes32esmi(a1, block.1, 0);
a1 = aes32esmi(a1, block.2, 1);
a1 = aes32esmi(a1, block.3, 2);
a1 = aes32esmi(a1, block.0, 3);

a2 = aes32esmi(a2, block.2, 0);
a2 = aes32esmi(a2, block.3, 1);
a2 = aes32esmi(a2, block.0, 2);
a2 = aes32esmi(a2, block.1, 3);

a3 = aes32esmi(a3, block.3, 0);
a3 = aes32esmi(a3, block.0, 1);
a3 = aes32esmi(a3, block.1, 2);
a3 = aes32esmi(a3, block.2, 3);

block = Block(a0, a1, a2, a3);
}

Final Round implementation

Similarly to the Middle Round implementation, the final round is implemented. Here, the aes32esmi instruction is replaced by the aes32esi instruction.

#![allow(unused)]
fn main() {
// Block and RoundKey contain little-endian encoded rows
let RoundKey(mut a0, mut a1, mut a2, mut a3) = rk[i];

a0 = aes32esi(a0, block.0, 0);
a0 = aes32esi(a0, block.1, 1);
a0 = aes32esi(a0, block.2, 2);
a0 = aes32esi(a0, block.3, 3);

a1 = aes32esi(a1, block.1, 0);
a1 = aes32esi(a1, block.2, 1);
a1 = aes32esi(a1, block.3, 2);
a1 = aes32esi(a1, block.0, 3);

a2 = aes32esi(a2, block.2, 0);
a2 = aes32esi(a2, block.3, 1);
a2 = aes32esi(a2, block.0, 2);
a2 = aes32esi(a2, block.1, 3);

a3 = aes32esi(a3, block.3, 0);
a3 = aes32esi(a3, block.0, 1);
a3 = aes32esi(a3, block.1, 2);
a3 = aes32esi(a3, block.2, 3);

block = Block(a0, a1, a2, a3);
}

Decryption

#![allow(unused)]

fn main() {
}

Key Schedule implementation

To implement the key schedule, we can also use the aes32esi instruction. This prevents the need for a substitution table in software. The implementation differs slightly between AES128, AES196 and AES256 and therefore all three implementations are given separately.

#![allow(unused)]
fn main() {
pub struct AES128Key(u32, u32, u32, u32);
pub struct AES196Key(u32, u32, u32, u32, u32, u32);
pub struct AES256Key(u32, u32, u32, u32, u32, u32, u32, u32);

pub struct RoundKey(u32, u32, u32, u32);

fn aes128_key_schedule(ck: AES128Key) -> [RoundKey; 11] {
    let mut rk = [0u32; 11 * 4];

    let AES128Key(
        mut t0, mut t1,
        mut t2, mut t3,
    ) = ck;

    let mut i = 0;
    loop {
        rk[(i << 2) + 0] = t0;
        rk[(i << 2) + 1] = t1;
        rk[(i << 2) + 2] = t2;
        rk[(i << 2) + 3] = t3;

        if i == 10 {
            break;
        }

        t0 ^= u32::from(RCON[i]);
        let tr = t3.rotate_right(8);

        t0 = aes32esi(t0, tr, 0);
        t0 = aes32esi(t0, tr, 1);
        t0 = aes32esi(t0, tr, 2);
        t0 = aes32esi(t0, tr, 3);

        t1 ^= t0;
        t2 ^= t1;
        t3 ^= t2;

        i += 1;
    }

    // SAFETY: We know that rk has 13 * 4 times a u32. So it has space for 13 RoundKeys
    unsafe { core::mem::transmute(rk) }
}

fn aes196_key_schedule(ck: AES196Key) -> [RoundKey; 13] {
    let mut rk = [0u32; 13 * 4];

    let AES196Key(
        mut t0, mut t1,
        mut t2, mut t3,
        mut t4, mut t5,
    ) = ck;

    let mut i = 0;
    loop {
        rk[i * 6 + 0] = t0;
        rk[i * 6 + 1] = t1;
        rk[i * 6 + 2] = t2;
        rk[i * 6 + 3] = t3;

        if i == 8 {
            break;
        }

        rk[i * 6 + 4] = t4;
        rk[i * 6 + 5] = t5;

        t0 ^= u32::from(RCON[i]);
        let tr = t5.rotate_right(8);

        t0 = aes32esi(t0, tr, 0);
        t0 = aes32esi(t0, tr, 1);
        t0 = aes32esi(t0, tr, 2);
        t0 = aes32esi(t0, tr, 3);

        t1 ^= t0;
        t2 ^= t1;
        t3 ^= t2;
        t4 ^= t3;
        t5 ^= t4;

        i += 1;
    }

    // SAFETY: We know that rk has 13 * 4 times a u32. So it has space for 13 RoundKeys
    unsafe { core::mem::transmute(rk) }
}

fn aes256_key_schedule(ck: AES256Key) -> [RoundKey; 15] {
    let mut rk = [0u32; 15 * 4];

    let AES256Key(
        mut t0, mut t1,
        mut t2, mut t3,
        mut t4, mut t5,
        mut t6, mut t7,
    ) = ck;

    let mut i = 0;
    loop {
        rk[i * 8 + 0] = t0;
        rk[i * 8 + 1] = t1;
        rk[i * 8 + 2] = t2;
        rk[i * 8 + 3] = t3;

        if i == 7 {
            break;
        }

        rk[i * 8 + 4] = t4;
        rk[i * 8 + 5] = t5;
        rk[i * 8 + 6] = t6;
        rk[i * 8 + 7] = t7;

        t0 ^= u32::from(RCON[i]);
        let tr = t7.rotate_right(8);

        t0 = aes32esi(t0, tr, 0);
        t0 = aes32esi(t0, tr, 1);
        t0 = aes32esi(t0, tr, 2);
        t0 = aes32esi(t0, tr, 3);

        t1 ^= t0;
        t2 ^= t1;
        t3 ^= t2;

        t4 = aes32esi(t4, t3, 0);
        t4 = aes32esi(t4, t3, 1);
        t4 = aes32esi(t4, t3, 2);
        t4 = aes32esi(t4, t3, 3);

        t5 ^= t4;
        t6 ^= t5;
        t7 ^= t6;

        i += 1;
    }

    // SAFETY: We know that rk has 15 * 4 times a u32. So it has space for 15 RoundKeys
    unsafe { core::mem::transmute(rk) }
}

fn aes_decrypt_key_schedule<const KEYS: usize>(rk: &mut [RoundKey; KEYS]) {
    fn subkey(mut x: u32) -> u32 {
        let mut y;

        unsafe {
            y = aes32esi(0, x, 0);
            y = aes32esi(y, x, 1);
            y = aes32esi(y, x, 2);
            y = aes32esi(y, x, 3);

            x = aes32dsmi(0, y, 0);
            x = aes32dsmi(x, y, 1);
            x = aes32dsmi(x, y, 2);
            x = aes32dsmi(x, y, 3);
        }

        x
    }

    for k in &mut rk[1..KEYS - 1] {
        unsafe {
            k.0 = subkey(k.0);
            k.1 = subkey(k.1);
            k.2 = subkey(k.2);
            k.3 = subkey(k.3);
        }
    }
}

fn aes128_decrypt_key_schedule(rk: &mut [RoundKey; 11]) {
    aes_decrypt_key_schedule::<11>(rk)
}

fn aes196_decrypt_key_schedule(rk: &mut [RoundKey; 13]) {
    aes_decrypt_key_schedule::<13>(rk)
}

fn aes256_decrypt_key_schedule(rk: &mut [RoundKey; 15]) {
    aes_decrypt_key_schedule::<15>(rk)
}
}

https://www.youtube.com/watch?v=-HVRjbxWF-I

64-bit AES

32-bit	64-bit
`sha256sig0`	`sha256sig0`
`sha256sig1`	`sha256sig1`
`sha256sum0`	`sha256sum0`
`sha256sum1`	`sha256sum1`
`sha512sig0h`	`sha512sig0`
`sha512sig0l`	`sha512sig1`
`sha512sig1h`	`sha512sum0`
`sha512sig1l`	`sha512sum1`
`sha512sum0r`
`sha512sum1r`

Instruction
`sm4ed`
`sm4ks`

Instruction
`sm4ed`
`sm4ks`

Entropy Source (Zkr)

Data Independent Execution Latency (Zkt)