A few weeks ago, we got an issue filed on our tracker that claimed that a particular shader method in D3D10, which was called many times per frame, was spending 80% of its time diddling with std::vector. I was skeptical to be sure, but a few quick tests proved that at least a good portion of the method was in fact being eaten up. I figured it high time I resurrected our old stack allocation code, but this time I had the benefit of hind sight to help guide me along.
Our old attempt had basically involved using a custom allocator for std::vector, which seemed good on the surface until we realized that allocated memory from the stack inside an allocator wasn't going to be of much use outside of it. To that end, I began thinking of ways that we could reliably allocate memory on the stack of the calling method, but still wrap it all up nicely so that the user didn't have to worry about cleanup or any other nonsense. I hit upon the idea of using a macro that would discreetly allocate a chunk from the stack and then forward the call on to the actual stack_array constructor. Since the macro is a simple text replacement, the actual stack allocation call happens in the calling function, which is exactly where it needs to happen.
You can see the entire contents of the stack_array class here:
#define stackalloc(type, length) stack_array::from_stack_ptr(reinterpret_cast(_malloca(sizeof(type) * length)), length)templatestruct stack_array_ref{ explicit stack_array_ref(T *right, size_t length, bool on_stack) : ptr(right), len(length), on_stack(on_stack) { } T *ptr; size_t len; bool on_stack;};templateclass stack_array{private: T* ptr; size_t len; bool on_stack; explicit stack_array(T* memory, size_t length) throw() : len(length), ptr(memory), on_stack(true) { }public: explicit stack_array(size_t length = 0) throw() : len(length), ptr(new T[length]), on_stack(false) { } stack_array(stack_array& right) throw() : ptr(right.ptr), len(right.len), on_stack(right.on_stack) { right.ptr = NULL; right.len = 0; right.on_stack = false; } stack_array(stack_array_ref right) throw() { ptr = right.ptr; len = right.len; on_stack = right.on_stack; right.ptr = NULL; } ~stack_array() { if (on_stack) _freea(ptr); else delete[] ptr; } static stack_array from_stack_ptr(T* memory, size_t length) { return stack_array(memory, length); } operator stack_array_ref() throw() { stack_array_ref ans(ptr, len, on_stack); ptr = NULL; len = 0; on_stack = false; return ans; } stack_array& operator = (stack_array& right) throw() { if (right.ptr != ptr) { if (on_stack) _freea(ptr); else delete[] ptr; } ptr = right.ptr; len = right.len; on_stack = right.on_stack; right.ptr = NULL; right.len = 0; right.on_stack = false; return *this; } stack_array& operator = (stack_array_ref right) throw() { if (right.ptr != ptr) { if (on_stack) _freea(ptr); else delete[] ptr; } ptr = right.ptr; len = right.len; on_stack = right.on_stack; return *this; } const T* get() const { return ptr; } T* get() throw() { return ptr; } size_t size() const throw() { return len; } T& operator [] (size_t index) { return ptr[index]; } const T& operator [] (size_t index) const { return ptr[index]; }};
It's a very lightweight template class that really only exists to hold temporary values while we marshal between .NET and DirectX. Notice the stackalloc macro, which is where the magic happens. If the user fails to use this macro to set up the array, it will go ahead and use a standard new/delete, which means we don't get unspeakable errors from a simple typo. Here's an example of using it:
stack_array d3dpp = stackalloc( D3DPRESENT_PARAMETERS, presentParameters->Length );
I'm pretty happy with the way it turned. Benchmarks place stack_array at around 3x faster than std::vector, and even slightly faster than raw memory allocation, so we've definitely done some good work there. I'm not sure why std::vector is so slow in this case; I've turned off every security and debugging feature I can think of; maybe there's some quirk when it comes to using it in C++/CLI.