Attempts at Debugging Memory Fragmentation

For the first time in my life, I was modifying a piece of open source Rust code that gave me quite a fair bit of problem with what initially seemed like a memory leak problem.

My first stab at the problem was to first understand some of the general high levels of what’s going on, to see if there’s any logic that might have resulted in memory being kept instead of being freed. In this program, we have lots of memory allocations, frees, and modifications going on, potentially up to 10,000 of such operations per second theoretically. Eventually, I landed on 3 things that I felt could immediately change.

There were certain object types that could be removed yet were not explicitly removed. I added the logic to make sure that happened.
There were lots of useless allocations even on objects that did not change at all. I added logic to ensure that only objects with additions or changes were actually created.
Somehow, I was spawning lots of tokio channels to transmit information between the threads. I reduced all of these to the bare minimum as well.

However, I still had problems with growing memory usage. So, clueless as I am with how to debug such issues, I decided to google on whether Rust had a memory profiler and how to use one. I came across a rather nice post of how to profile memory in rust, although the post is a bit outdated. Using this required me to switch my memory allocator from glibc to jemalloc, and so I did (although not without first reading more about their differences).

How to Profile Memory Usage on Rust programs

Add Dependency

Added this to Cargo.toml

[dependencies]
jemallocator = { version = "0.5.4", features = ["profiling"] }

Configure Jemallocator

Configure at the entry point of your app, typically main.rs

extern crate jemallocator;

#[global_allocator]
static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc;

Then, build with CARGO_PROFILE_RELEASE_DEBUG=true cargo build --release. I put the allocator behind a feature flag so that I could easily toggle the switch in allocator quickly if I needed.

Configure docker container

I was running the app in a docker container, so I configured these env vars:

export MALLOC_CONF=prof:true,lg_prof_interval:30,lg_prof_sample:21,prof_prefix:/tmp/jeprof
export _RJEM_MALLOC_CONF=${MALLOC_CONF}

To make my debugging easier, I also temporarily installed these in my container:

apt install libjemalloc-dev graphviz

Dump profile periodically

I wrote a script to essentially run this command in my docker container every 30 mins for around 2 days:

jeprof --text ./my_program heap.prof

Results of the Profiling

So, after collecting snapshots of data for 2 days straight and analysing it, I came to the conclusion that:

My docker RSS was growing
But apparently my heap size was constant, except when it suddenly and unexpectedly ballons and OOMs.

And so, it was deduced that I might be having a memory fragmentation problem, along with some sort of unexpected memory leak.

How to deal with the fragmentation?

Change the allocator

This is what I already did earlier. Anyway, also I read that jemalloc allocator might also be slightly faster than glibc, which would be great if it worked, since this piece of code would be used in a system that would be very latency sensitive. When I did my profiling after, I did note that it was a little bit faster than what it was originally, although not by much.

Pre-allocate pools for objects to reuse memory blocks

So, there was a portion of the code which used a huge VecDeque, which the documentation describes as Since VecDeque is a ring buffer, its elements are not necessarily contiguous in memory. I attempted to try to reduce fragmentation by pre-allocating 1000 slots to be reused upfront - although unfortunately, I do not have any clue how helpful this was.

Reduce dynamic memory allocations

I couldn’t change many of the stuff to use stack variables by default, since a lot of them depend on dynamic input, so I had to focus on reducing memory allocation logic (which was what I did right at the top of this post)

If all else fails: restart the server

So, I did notice that there’s a small scenario where the object tracker somehow goes out of sync through no fault of it’s own (mostly, likely due to network issues) - in such cases, my memory tends to ballon rather quickly and eventually crash due to OOM. At some point, I came up with a different strategy to keep the entire system alive and let this particular component fail and restart without any consequences, since there’s no easy way to solve this problem.

Conclusions

I do not know how much closer I am to being better with memory management, but if there’s some takeaways for me:

I did manage to reduce how quickly the memory got consumed - it started as eating over 30GB of my memory across 3 hours to dropping to something like around 2GB of memory across 2 days, so that is a small win.
Planning types in Rust or any low level language is quite important - when possible, the best is to use a stack allocated variable, and only use a heap variable if I need to. This typically affects anything that would become an Arc, Box, and what not. Preallocate any memory ahead of time if you think it would be useful.
I learnt the basics of how to use a memory profiler with Rust.