Allocations, Revisited

posted in Excursions into the Unknown

Published June 15, 2009

Some of you may remember my entry on the custom allocator we wrote to try to take advantage of stack allocation for small temporary arrays. Yes, it was Premature Optimization, and as no good deed goes unpunished, we botched the job and caused all sorts of problems. We ended up scrapping the whole thing and just using std::vector for our temporary array needs.

A few weeks ago, we got an issue filed on our tracker that claimed that a particular shader method in D3D10, which was called many times per frame, was spending 80% of its time diddling with std::vector. I was skeptical to be sure, but a few quick tests proved that at least a good portion of the method was in fact being eaten up. I figured it high time I resurrected our old stack allocation code, but this time I had the benefit of hind sight to help guide me along.

Our old attempt had basically involved using a custom allocator for std::vector, which seemed good on the surface until we realized that allocated memory from the stack inside an allocator wasn't going to be of much use outside of it. To that end, I began thinking of ways that we could reliably allocate memory on the stack of the calling method, but still wrap it all up nicely so that the user didn't have to worry about cleanup or any other nonsense. I hit upon the idea of using a macro that would discreetly allocate a chunk from the stack and then forward the call on to the actual stack_array constructor. Since the macro is a simple text replacement, the actual stack allocation call happens in the calling function, which is exactly where it needs to happen.

You can see the entire contents of the stack_array class here:

#define stackalloc(type, length) stack_array::from_stack_ptr(reinterpret_cast(_malloca(sizeof(type) * length)), length)templatestruct stack_array_ref{	explicit stack_array_ref(T *right, size_t length, bool on_stack)		:	ptr(right),			len(length),			on_stack(on_stack)	{	}	T *ptr;	size_t len;	bool on_stack;};templateclass stack_array{private:	T* ptr;	size_t len;	bool on_stack;	explicit stack_array(T* memory, size_t length) throw()		:	len(length),			ptr(memory),			on_stack(true)	{	}public:	explicit stack_array(size_t length = 0) throw()		:	len(length),			ptr(new T[length]),			on_stack(false)	{	}	stack_array(stack_array& right) throw()		:	ptr(right.ptr),			len(right.len),			on_stack(right.on_stack)	{		right.ptr = NULL;		right.len = 0;		right.on_stack = false;	}	stack_array(stack_array_ref right) throw()	{		ptr = right.ptr;		len = right.len;		on_stack = right.on_stack;		right.ptr = NULL;	}	~stack_array()	{		if (on_stack)			_freea(ptr);		else			delete[] ptr;	}	static stack_array from_stack_ptr(T* memory, size_t length)	{		return stack_array(memory, length);	}	operator stack_array_ref() throw()	{		stack_array_ref ans(ptr, len, on_stack);		ptr = NULL;		len = 0;		on_stack = false;		return ans;	}	stack_array& operator = (stack_array& right) throw()	{		if (right.ptr != ptr)		{			if (on_stack)				_freea(ptr);			else				delete[] ptr;		}		ptr = right.ptr;		len = right.len;		on_stack = right.on_stack;		right.ptr = NULL;		right.len = 0;		right.on_stack = false;		return *this;	}	stack_array& operator = (stack_array_ref right) throw()	{		if (right.ptr != ptr)		{			if (on_stack)				_freea(ptr);			else				delete[] ptr;		}		ptr = right.ptr;		len = right.len;		on_stack = right.on_stack;		return *this;	}	const T* get() const	{		return ptr;	}	T* get() throw()	{		return ptr;	}	size_t size() const throw()	{		return len;	}	T& operator [] (size_t index)	{		return ptr[index];	}	const T& operator [] (size_t index) const	{		return ptr[index];	}};

It's a very lightweight template class that really only exists to hold temporary values while we marshal between .NET and DirectX. Notice the stackalloc macro, which is where the magic happens. If the user fails to use this macro to set up the array, it will go ahead and use a standard new/delete, which means we don't get unspeakable errors from a simple typo. Here's an example of using it:

stack_array d3dpp = stackalloc( D3DPRESENT_PARAMETERS, presentParameters->Length );

I'm pretty happy with the way it turned. Benchmarks place stack_array at around 3x faster than std::vector, and even slightly faster than raw memory allocation, so we've definitely done some good work there. I'm not sure why std::vector is so slow in this case; I've turned off every security and debugging feature I can think of; maybe there's some quirk when it comes to using it in C++/CLI.

Previous Entry Minimal Initialization

Next Entry SlimLine

0 likes 0 comments

Comments

Nobody has left a comment. You can be the first!

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

Mike.Popoloski

Author

Allocations, Revisited

Comments

Mike.Popoloski

Latest Entries

New Blog

Exodus of the Faithful

Demystifying SSE Move Instructions

Progress Update

Start of Project

New Job

Language Builder IDE

The Downward Spiral

Job Interviews

HLSL Language Service

Allocations, Revisited

Comments

Mike.Popoloski

Latest Entries

New Blog

Exodus of the Faithful

Demystifying SSE Move Instructions

Progress Update

Start of Project

New Job

Language Builder IDE

The Downward Spiral

Job Interviews

HLSL Language Service

Reticulating splines