Designing a robust memory management system for C++ software
This is the first part of a series of posts which I will be putting up here over the next several weeks or so (provided I can come up with the material for an actual series).
I will be discussing the topic of memory management and how to go about setting up a C++ program for debugging and tracking your memory usage. There are of course as many strategies for this as there are programmers in the world, so please don't take this to be the final word on memory management! It simply reflects my experience and my current set of best practices.
Getting Started
The first thing we want to do is define exactly what our system will do, and define a few critical concepts.
First of all, what exactly is memory management? A large number of things fall into this category, but I will be dealing specifically with a couple of special points. (As always, feedback is welcome - if you notice some major feature or behaviour which I've omitted, please let me know!)
- We need a way to track how much memory overall a program is using
- We need a way to track specifically how much memory various modules are using
- We need to detect duplicate freeings of memory so we can debug them
- We need to do our best to detect leaks and orphaned memory so we can fix those issues
- We need to keep overhead (both speed and memory overhead) minimal
- We need to support aligned memory allocators
- We would like some nifty statistics and graphs and such too
I'll be covering all this (and possibly more) in the remainder of the series. For now, we'll start with the basics.
The General Strategy
What exactly do we need to do in order to track where our memory usage is going? I use a four-pronged attack to solve this question:
- Track all "general" allocations by overloading global new and delete
- Track specific allocations by overloading new and delete for particular classes
- Allow third party libraries to register their memory usage via a simple interface
- Provide STL allocators that track the memory they allocate
Believe it or not, this covers the vast majority of allocations. (Since we are using C++, I will assume pray that you don't use malloc or something. However, if you do, be sure to pass those allocations through a wrapper, e.g. my_malloc, which takes care of registering the memory with the memory tracker system, via the interface defined in the third point above).
The remainder is stuff like HeapAlloc, VirtualAlloc, and so on (depending on your platform you may get others, eg. on the Xbox). Those allocations will need to be registered manually via point #3. However, since they can be found easily using a global find/replace, this isn't too big of a headache.
Designing the Memory Tracker
Before we get too bogged down in the details of actually capturing memory allocations, we need to figure out exactly how to keep track of all the memory usage in the first place.
Normally, we could be really lazy and do something simple, like this:
struct AllocationRecord{ void* Address; size_t NumBytes;};vector Allocations;
However, this will invoke the global ::new operator when the vector does its allocations, which means the memory tracker will try to record itself - and die in a fiery infinite loop that eventually obliterates your stack and crashes the program.
To get around this, we use the Windows feature that a process can have multiple heaps. By using HeapCreate to create a secondary heap, we can set up a special sandbox for keeping all of the memory metadata in.
So now we have some space, but what data structure should we use?
There are two basic concerns we need to balance out here: speed of allocation, and speed of deallocation.
Option One: Unsorted Vector
The simplest approach is to just blob all the memory into an unsorted vector, and insert/remove items as they are allocated/freed, respectively. However, this quickly leads to terrible performance, because the vector will get resized and recopied many times.
A better approach is to flag a slot in the vector as "freed". When a new piece of memory needs to be added, we first look through the vector for a free slot, and place the memory record there. If no free slot is found, then we expand the vector by a factor of 1.5 or 2, give or take, depending on how the memory usage patterns look. (This is one of many numbers that can take a lot of tweaking to get just right.)
Unfortunately, this option performs very poorly because it requires a large number of O(n) lookups and copies, where n is the number of allocations your program has made. So let's try something different.
Option Two: Sorted Vector
If we sort the vector by allocation address, we can then look up an item in O(log2 n), using a binary search. This makes it very fast to free an item of memory. However, we pay a dear cost for this in the memory allocation routine, which has to ensure that the vector remains sorted. Not the best option available.
Option Three: Bucket-Sorted Vectors
The final approach, which I have personally settled on, is to use a set of buckets, each with a small vector associated with it. We look at the address of each allocated block of memory, do some bit shuffling, and use the resulting number to decide which bucket the address belongs to.
Once we have a bucket determined, we can simply add the memory record to the end of the vector, resizing it as needed. We can keep the vector size (and resize increment) fairly small.
To make things even better, instead of just a vector, we can use a linked list of vectors, each with a fixed (small) size. When one vector fills up, we just allocate another one and point to it from the end of the last vector. This allows us to keep a theoretically unlimited number of allocation records without requiring huge amounts of overhead up-front.
Typical overhead in the places I've used this technique varies from 1 to 25 MB, depending on the number and pattern of allocations being made. In general, it is very efficient, both in terms of space and time - lookups are dramatically sped up by the bucket system, and traversing the linked list of vectors is fairly fast. Overall there was less than 1 FPS drop by introducing this system in our current project code.
Keeping Track of Call Stacks
When you identify a memory leak or other problematic allocation, it is extremely useful to be able to see a call stack describing where and how the allocation occurred. So along with our allocation size and address, let's also store a short (maybe 5-10 entry) call stack capture. This has a minimal cost since we can simply store return addresses from the stack frames (more technical detail on that in a later article).
Memory Types
One final thing we want to consider is dividing up the memory usage into types. In our current project at the moment we have 26 different types, ranging from a "general" catchall type to highly specific things like XML parsing, strings, textures, meshes, and so on.
This can help save a lot of time diagnosing memory problems, since you always know what part of the code is using the memory.
A Peek Ahead
I'm deliberately avoiding implementation details in this first part of the series, specifically to keep things simple and short. A good programmer should be able to look at the described system and put it together by himself in a few hours. For those of you interested in a step-by-step walkthrough of the implementation, stay tuned!
For now though, here's a juicy little preview, summing up the information we want to track:
struct MemTrackRecord{ void* Address; unsigned LastAccessCallStack[MEMTRACK_CALLSTACK_DEPTH]; int Size; MemType MemoryType;};struct MemTrackBucketPage{ MemTrackBucketPage* NextPage; unsigned NumberOfEntries; MemTrackRecord* RecordArray;};
Until next time!