🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Managing Your Memory - Part I

Published October 16, 2008
Advertisement
Managing Your Memory - Part I
Designing a robust memory management system for C++ software

This is the first part of a series of posts which I will be putting up here over the next several weeks or so (provided I can come up with the material for an actual series).

I will be discussing the topic of memory management and how to go about setting up a C++ program for debugging and tracking your memory usage. There are of course as many strategies for this as there are programmers in the world, so please don't take this to be the final word on memory management! It simply reflects my experience and my current set of best practices.


Getting Started
The first thing we want to do is define exactly what our system will do, and define a few critical concepts.

First of all, what exactly is memory management? A large number of things fall into this category, but I will be dealing specifically with a couple of special points. (As always, feedback is welcome - if you notice some major feature or behaviour which I've omitted, please let me know!)

  • We need a way to track how much memory overall a program is using

  • We need a way to track specifically how much memory various modules are using

  • We need to detect duplicate freeings of memory so we can debug them

  • We need to do our best to detect leaks and orphaned memory so we can fix those issues

  • We need to keep overhead (both speed and memory overhead) minimal

  • We need to support aligned memory allocators

  • We would like some nifty statistics and graphs and such too


I'll be covering all this (and possibly more) in the remainder of the series. For now, we'll start with the basics.


The General Strategy
What exactly do we need to do in order to track where our memory usage is going? I use a four-pronged attack to solve this question:

  1. Track all "general" allocations by overloading global new and delete

  2. Track specific allocations by overloading new and delete for particular classes

  3. Allow third party libraries to register their memory usage via a simple interface

  4. Provide STL allocators that track the memory they allocate


Believe it or not, this covers the vast majority of allocations. (Since we are using C++, I will assume pray that you don't use malloc or something. However, if you do, be sure to pass those allocations through a wrapper, e.g. my_malloc, which takes care of registering the memory with the memory tracker system, via the interface defined in the third point above).

The remainder is stuff like HeapAlloc, VirtualAlloc, and so on (depending on your platform you may get others, eg. on the Xbox). Those allocations will need to be registered manually via point #3. However, since they can be found easily using a global find/replace, this isn't too big of a headache.


Designing the Memory Tracker
Before we get too bogged down in the details of actually capturing memory allocations, we need to figure out exactly how to keep track of all the memory usage in the first place.

Normally, we could be really lazy and do something simple, like this:
struct AllocationRecord{   void* Address;   size_t NumBytes;};vector Allocations;


However, this will invoke the global ::new operator when the vector does its allocations, which means the memory tracker will try to record itself - and die in a fiery infinite loop that eventually obliterates your stack and crashes the program.

To get around this, we use the Windows feature that a process can have multiple heaps. By using HeapCreate to create a secondary heap, we can set up a special sandbox for keeping all of the memory metadata in.

So now we have some space, but what data structure should we use?

There are two basic concerns we need to balance out here: speed of allocation, and speed of deallocation.


Option One: Unsorted Vector
The simplest approach is to just blob all the memory into an unsorted vector, and insert/remove items as they are allocated/freed, respectively. However, this quickly leads to terrible performance, because the vector will get resized and recopied many times.

A better approach is to flag a slot in the vector as "freed". When a new piece of memory needs to be added, we first look through the vector for a free slot, and place the memory record there. If no free slot is found, then we expand the vector by a factor of 1.5 or 2, give or take, depending on how the memory usage patterns look. (This is one of many numbers that can take a lot of tweaking to get just right.)

Unfortunately, this option performs very poorly because it requires a large number of O(n) lookups and copies, where n is the number of allocations your program has made. So let's try something different.


Option Two: Sorted Vector
If we sort the vector by allocation address, we can then look up an item in O(log2 n), using a binary search. This makes it very fast to free an item of memory. However, we pay a dear cost for this in the memory allocation routine, which has to ensure that the vector remains sorted. Not the best option available.


Option Three: Bucket-Sorted Vectors
The final approach, which I have personally settled on, is to use a set of buckets, each with a small vector associated with it. We look at the address of each allocated block of memory, do some bit shuffling, and use the resulting number to decide which bucket the address belongs to.

Once we have a bucket determined, we can simply add the memory record to the end of the vector, resizing it as needed. We can keep the vector size (and resize increment) fairly small.

To make things even better, instead of just a vector, we can use a linked list of vectors, each with a fixed (small) size. When one vector fills up, we just allocate another one and point to it from the end of the last vector. This allows us to keep a theoretically unlimited number of allocation records without requiring huge amounts of overhead up-front.

Typical overhead in the places I've used this technique varies from 1 to 25 MB, depending on the number and pattern of allocations being made. In general, it is very efficient, both in terms of space and time - lookups are dramatically sped up by the bucket system, and traversing the linked list of vectors is fairly fast. Overall there was less than 1 FPS drop by introducing this system in our current project code.


Keeping Track of Call Stacks
When you identify a memory leak or other problematic allocation, it is extremely useful to be able to see a call stack describing where and how the allocation occurred. So along with our allocation size and address, let's also store a short (maybe 5-10 entry) call stack capture. This has a minimal cost since we can simply store return addresses from the stack frames (more technical detail on that in a later article).


Memory Types
One final thing we want to consider is dividing up the memory usage into types. In our current project at the moment we have 26 different types, ranging from a "general" catchall type to highly specific things like XML parsing, strings, textures, meshes, and so on.

This can help save a lot of time diagnosing memory problems, since you always know what part of the code is using the memory.


A Peek Ahead
I'm deliberately avoiding implementation details in this first part of the series, specifically to keep things simple and short. A good programmer should be able to look at the described system and put it together by himself in a few hours. For those of you interested in a step-by-step walkthrough of the implementation, stay tuned!

For now though, here's a juicy little preview, summing up the information we want to track:

struct MemTrackRecord{	void* Address;	unsigned LastAccessCallStack[MEMTRACK_CALLSTACK_DEPTH];	int Size;	MemType MemoryType;};struct MemTrackBucketPage{	MemTrackBucketPage* NextPage;	unsigned NumberOfEntries;	MemTrackRecord* RecordArray;};



Until next time!
0 likes 12 comments

Comments

a_insomniac
This is going to be pretty awesome. Looking forward to the series.
October 16, 2008 09:06 PM
android_808
Is there any way that you can avoid the use of HeapCreate? Some macros or something that cause only the memory manager to use the "normal" global new?

I'm looking at implementing a memory manager for a cross-platform application and I like the ideas discussed so far but would be unable to make use of the manager as it uses a Win32 only function.
October 18, 2008 05:48 AM
ApochPiQ
It's not a brilliant solution, but you can simply use malloc inside the memory tracker as a replacement for a separate heap. That should be about as portable as it gets [smile]
October 18, 2008 10:57 AM
VBStrider
How are you getting the return addresses? Inline assembly or the StackWalk64() function? Also, how would you use the return addresses in debugging? Does it output the filename/line number of the return address?
October 18, 2008 12:13 PM
ApochPiQ
Good questions - and all of them will be answered in upcoming portions of the article [smile]


For the generally curious, this is the rough outline I currently have planned:
  1. Introduction and Rationale

  2. Code for capturing memory allocations

  3. Code for tracking the data that is captured

  4. Tying it all together, plus some miscellaneous tricks and tips



Most of the good stuff will be detailed in parts 2 and 3.
October 18, 2008 03:46 PM
rick_appleton
Interesting start, and I'm looking forward to the rest.

Unless I'm mistaken, isn't such a linked list of vectors basically a std::deque? Any reason to avoid using that and implementing your own linked list of vectors?
October 18, 2008 04:15 PM
ApochPiQ
A deque tends to allow gaps and holes in ways that my implementation does not - i.e. memory is never used sparsely, but always as densely as possible. More on this will become clear when I describe in the implementation in more detail, but for now, suffice it to say that my custom version of the data structure is fine-tuned for the specific access patterns needed by the memory tracker. Specifically, a std::deque doesn't provide the ability to directly access its blobs by an index; my implementation allows faster lookups by exploiting the bucket-sorted architecture.

Normally a deque would be a worthwhile implementation of the structure, but in our specific case it's overkill and we can get away with a much simpler implementation. Plus, it saves us having to write a special STL allocator that doesn't cause the memory tracker to go into an infinite loop [wink]
October 18, 2008 11:53 PM
rick_appleton
Quote: Original post by ApochPiQ
A deque tends to allow gaps and holes in ways that my implementation does not - i.e. memory is never used sparsely, but always as densely as possible. More on this will become clear when I describe in the implementation in more detail, but for now, suffice it to say that my custom version of the data structure is fine-tuned for the specific access patterns needed by the memory tracker. Specifically, a std::deque doesn't provide the ability to directly access its blobs by an index; my implementation allows faster lookups by exploiting the bucket-sorted architecture.

Normally a deque would be a worthwhile implementation of the structure, but in our specific case it's overkill and we can get away with a much simpler implementation. Plus, it saves us having to write a special STL allocator that doesn't cause the memory tracker to go into an infinite loop [wink]


Fair enough.
October 19, 2008 04:06 AM
MickeWi
I'm really looking forward to the rest of the series. Great topic!
October 19, 2008 05:08 AM
RedDrake
How do you specify these "modules" to allocators ?
Last time I tried to be "smart" about this I overloaded a global new operator with char*, int parameters and added a global macro to my Prerequisites.hpp (project scope precompiled header that always gets included first) so that all of my library code gets to use new(__FILE__, __LINE__) by default. All of the external libraries (stl for eg.) are included before the macro definition and if the source needs some externals it undefs the macro before including and defines it again after (since this was a rare case it wasn't a big problem). I also overload global new to log other allocations (file name "", line 0), and added STL allocators that indicated on which file, line they were defined (also trough a macro). This made tracking the leak really simple, and you could be back to the standard allocators by simply removing the macros (or defining the STL one to be std::allocator)
Anyway that's how I remember doing it (it was a long time ago tough)
October 19, 2008 11:40 AM
rick_appleton
Quote: Original post by rick_appleton
Quote: Original post by ApochPiQ
A deque tends to allow gaps and holes in ways that my implementation does not - i.e. memory is never used sparsely, but always as densely as possible. More on this will become clear when I describe in the implementation in more detail, but for now, suffice it to say that my custom version of the data structure is fine-tuned for the specific access patterns needed by the memory tracker. Specifically, a std::deque doesn't provide the ability to directly access its blobs by an index; my implementation allows faster lookups by exploiting the bucket-sorted architecture.

Normally a deque would be a worthwhile implementation of the structure, but in our specific case it's overkill and we can get away with a much simpler implementation. Plus, it saves us having to write a special STL allocator that doesn't cause the memory tracker to go into an infinite loop [wink]


Fair enough.


Coming back to this, I'm not sure I actually do follow :D

The comment about the STL allocator I can understand, but surely you've already done that work for the std::vector allocator, and you can simply use that same allocator for the std::deque.

Also, you mention that std::deque allows holes in it. Are you sure about that? I've never investigated, but surely it also fills gaps before allocating more memory.

Lastly, from your post I understood that this linked list of vectors is one bucket, so I'm unsure on why/how you access those vectors by index. I understand you could/would index the individual buckets by index, but since this is within a bucket wouldn't you need to search through the entire contents anyway?

Looking forward to your reply.
October 20, 2008 02:06 AM
android_808
Any news on the next post?
December 31, 2008 03:22 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement