Attempts to debug a crashing Rust program with GDB

While I’ve used gdb for debugging C programs, I’ve never attempted to do it for Go or Rust, mainly because I was worried about how complex it might get, even when full debug symbols are available. However, earlier this year, I had to do so because I was experiencing a mysterious segfault that was not explaining anything to me about what was wrong with my program. In particular, I noticed that only in debug mode, my program was crashing at the SSL handshake step, which meant that my websocket connection was crashing before it could even fully establish a connection. This made no sense to me, because if the SSL library had an issue with handshakes, a lot more programs would have been unable to use libssl at all.

To complicate things, this program I had to debug had FFI dependencies with: 1) C libraries. So it turns out that we use rustls with native crypto instead of pure rust crypto, which meant our crypto libraries are provided by underlying system C libraries. 2) A custom crypto library written in Go - this was used for signing certain web3 transactions that required a very specific signature type, and referenced in via the usual C-FFI. This was particularly problematic, and I’ll elaborate on it later.

At the time of writing of this blogpost, I have upgraded my system through several ubuntu versions and so I ended up having to do several other experiments to try to reproduce the error, since the crash always seems to happen at different locations.

Anyway, let’s begin!

To get started, the first thing I realised was that I had no idea how to retrieve a coredump on a modern ubuntu system, nor did I have any experience debugging from one (I used to run the program directly in gdb itself and try to trigger the crash and get the backtrace). So it turns out that there are two things you need to do.

Enable core dumps: ulimit -c unlimited , and then run your program till the crash happens.
On systemd based systems, coredumps are typically logged to /var/log/systemd/coredump/. To make things easier, I installed coredumpctl off apt. This way, the steps to debug a core dump are as follows:

# 1. List the available core dumps
coredumpctl list
# 2. Get the pid of the program you want to debug, and then run this
coredumpctl debug <pid>

Within gdb, the first thing you want to do is try to find out more about the signal that was raised:

(gdb) p $_siginfo
$1 = siginfo {si_signo: 11, si_errno: 0, si_code: 128, _sifields: {_pad: [
      0 <repeats 28 times>], _kill: {si_pid: 0, si_uid: 0}, _timer: {si_tid: 0, si_overrun: 0, si_sigval: sigval_t {sival_int: 0, sival_ptr: 0x0}}, _rt: {si_pid: 0, si_uid: 0, si_sigval: sigval_t {sival_int: 0, sival_ptr: 0x0}}, _sigchld: {si_pid: 0, si_uid: 0, si_status: 0, si_utime: 0, si_stime: 0}, _sigfault: {si_addr: 0x0}, _sigpoll: {si_band: 0, si_fd: 0}, _sigsys: {_call_addr: 0x0, _syscall: 0, _arch: 0}}}

si_signo of 11 refers to SIGSEGV (segfault) while si_code tells you more about what raised the error. In the context of this debug session:

1: SEGV_MAPPER : Address not mapped
2: SEGV_ACCERR : Address mapped but no permission (an example could be an address falling on a stack guard page)
128: SI_KERNEL : Usually generated by a kernel, or mangled by a signal handler.

In this case, the struct we got was entirely filled with zeros, and the si_code was 128, which suggests that our signal got mangled.

After some digging around, it turns out that Go actually installs its own signal handlers, causing our signals to be intercepted and re-raised as something else, making it difficult for us to figure out the true cause of the fault. The reason Go does this, apparently, is to catch when the stack size needs to be increased in goroutines, and use it to grow the stack, since goroutines have dynamic stack size. The discussion of this is out of scope of this post, but this medium article gives a very detailed but easy explanation on Go’s goroutines and stack growth and why Go is so highly performant in practice.

To remedy this, I added a piece of code near the start of my main function block to restore details to siginfo:

#[cfg(all(debug_assertions, feature = "debug-crash"))]
unsafe {
    libc::signal(libc::SIGSEGV, libc::SIG_DFL);
}

Now, after re-executing everything and opening the new core dump again, we get this:

(gdb) p $_siginfo
$1 = siginfo {si_signo: 11, si_errno: 0, si_code: 1, _sifields: {_pad: [-1621964664, 31975, 
      0 <repeats 26 times>], _kill: {si_pid: -1621964664, si_uid: 31975}, _timer: {si_tid: -1621964664, si_overrun: 31975, si_sigval: sigval_t {sival_int: 0, sival_ptr: 0x0}}, _rt: {si_pid: -1621964664, si_uid: 31975, si_sigval: sigval_t {sival_int: 0, sival_ptr: 0x0}}, _sigchld: {si_pid: -1621964664, si_uid: 31975, si_status: 0, si_utime: 0, si_stime: 0}, _sigfault: {si_addr: 0x7ce79f52c888}, _sigpoll: {si_band: 137334252292232, si_fd: 0}, _sigsys: {_call_addr: 0x7ce79f52c888, _syscall: 0, _arch: 0}}}
      
p/x $rsp
$2 = 0x7ce79f52c888

Now, we see the si_code of 1 (address not mapped) and and the segfault happens at 0x7ce79f52c888. Interestingly, I attempted to print the current RSP (the current stack pointer address), and it turns out that the value is the same.

I decided to see all the address mappings:

(gdb) info proc mappings
Start Addr         End Addr           Size               Offset             File
...
...

There’s a lot of libraries loaded, so I only printed the header, but I realised that my stack address did not actually reside within any file’s start to end addresses. For a quick idea on how your application is typically laid out in memory, it is something like this:

+-----------------------+ (Highest Memory Address)
|         STACK         | -> Grows DOWNWARD (towards lower addresses)
|                       |    Stores local variables, function calls, return addresses
|           |           |
|           V           |
|-----------------------|
|       / / / / /       | -> Unused address space
|       / / / / /       |
|-----------------------|
|           ^           |
|           |           |
|         HEAP          | -> Grows UPWARD (towards higher addresses)
|                       |    Stores dynamically allocated data (e.g., malloc/new)
+-----------------------+
|         BSS           | -> Uninitialized global & static variables
+-----------------------+
|         Data          | -> Initialized global & static variables
+-----------------------+
|     Text / Code       | -> Compiled program instructions
+-----------------------+ (Lowest Memory Address)

Anyway, based on everything so far, I figured I was getting a stack overflow. I did a backtrace, and noticed:

(gdb) bt
#0  0x000064cc6118c016 in tungstenite::handshake::machine::HandshakeMachine<tokio_tungstenite::compat::AllowStd<tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>>::single_round<tokio_tungstenite::compat::AllowStd<tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>, http::response::Response<core::option::Option<alloc::vec::Vec<u8, alloc::alloc::Global>>>> (self=...)
    at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tungstenite-0.20.1/src/handshake/machine.rs:41
#1  0x000064cc611a3ce0 in tungstenite::handshake::MidHandshake<tungstenite::handshake::client::ClientHandshake<tokio_tungstenite::compat::AllowStd<tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>>>::handshake<tungstenite::handshake::client::ClientHandshake<tokio_tungstenite::compat::AllowStd<tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>>> (self=...)
    at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tungstenite-0.20.1/src/handshake/mod.rs:41
#2  0x000064cc6118dfd2 in tokio_tungstenite::client_async_with_config::{async_fn#0}::{closure#0}<http::request::Request<()>, tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>> (allow_std=...)
    at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-tungstenite-0.20.1/src/lib.rs:107
#3  0x000064cc6119c0ad in tokio_tungstenite::handshake::{impl#1}::poll<tungstenite::handshake::client::ClientHandshake<tokio_tungstenite::compat::AllowStd<tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>>, tokio_tungstenite::client_async_with_config::{async_fn#0}::{closure_env#0}<http::request::Request<()>, tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>, tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>> (self=..., ctx=0x7ce79f6f6e40)
    at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-tungstenite-0.20.1/src/handshake.rs:147
#4  0x000064cc6119885e in tokio_tungstenite::handshake::handshake::{async_fn#0}<tungstenite::handshake::client::ClientHandshake<tokio_tungstenite::compat::AllowStd<tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>>, tokio_tungstenite::client_async_with_config::{async_fn#0}::{closure_env#0}<http::request::Request<()>, tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>>, tokio_tungstenite::stream::MaybeTlsStream<tokio::net::tcp::stream::TcpStream>> ()
    at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-tungstenite-0.20.1/src/handshake.rs:84

...
...

#80 0x000064cc61e84a8f in core::ops::function::FnOnce::call_once<std::thread::lifecycle::spawn_unchecked::{closure_env#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>, ()> ()
    at /rustc/2972b5e59f1c5529b6ba770437812fd83ab4ebd4/library/core/src/ops/function.rs:250
#81 0x000064cc61f58e5f in alloc::boxed::{impl#31}::call_once<(), (dyn core::ops::function::FnOnce<(), Output=()> + core::marker::Send), alloc::alloc::Global> ()
    at library/alloc/src/boxed.rs:2240
#82 std::sys::thread::unix::{impl#2}::new::thread_start ()
    at library/std/src/sys/thread/unix.rs:118
#83 0x00007ce80aea40da in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:454
#84 0x00007ce80af377ac in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The backtrace had 84 functions, and I think some of the functions had pretty huge stack frames in debug mode.

The high level flow of what was actually happening when my program tried to handle my websocket request:

axum handler -> websocket_server code -> subscription_manager -> tokio-tungstenite -> tokio-rustls -> tls handshake (crash)

I set a smaller stack size of 1MB and 1.5MB just to see what could happen, and it turns out that it can also crash as early as thread creation in tokio. In certain cases, it might hit a stack guard page and give me a SEGV_ACCERR error.

Anyway, to solve the error on debug mode, I added a gated piece of code to my main function, where essentially, what I did was to increase the default stack size on tokio threads from 2MB to 8MB only on debug:

let mut builder = tokio::runtime::Builder::new_multi_thread();
builder.enable_all();

#[cfg(debug_assertions)]
builder.thread_stack_size(8*1024*1024); // 8MB

builder.build()?.block_on(async_main())

What about `Release` mode?

So, why didn’t this error occur on release mode?

The reason is that in release mode, the function assemblies are optimized, which means the usual calling conventions for setting up a function frame (also known generically as the Application Binary Interface, ABI for short) may not be followed. On top of that, functions may be inlined, which means you get a giant chunk of assembly that looks like one huge function. This means that the entire stack size usually ends up much smaller than in debug mode, which makes it much less likely to hit stack size limits.

What about Release mode?

What about `Release` mode?