As I already aknowledged from previous issues with DIrectX Math is the vectors have to be 16 bit aligned or there will be access violations exceptions thrown about. I recently had issues like this when I changed the camera class.
On my performance test I tested out a simple function that would return the minimal of two floating points.
inline float _b_min(float a, float b) { return (a < b ? a : b);}inline float _a_nin(float a, float b) { float result = 0.0f; __asm { mov eax,a mov ebx,b cmp eax,ebx mov [result], eax }return result;}
the function _a_min was a bit faster than _b_min. _a_min gave around 900 microseconds.
_b_min function gave over 1 millisecond time elapsed from the high resolution timer.
The sse2 min function gave me around 0.003 milliseconds.
inline float *_sse2_min(float a, float b) { __m128 _a = _mm_set_ps1(a); //-- set _a to the floating point a; __m128 _b = _mm_set_ps1(b); //-- set _b to floating point b; __m128 c = _mm_min_ps(_a,_b); //-- return in C the minimal of _a,_b; float *result = (float*)_aligned_malloc(sizeof(float), sizeof(float)); _mm_store_ps(result, c); //-- store the result float* result; return result;}
my question is for whomever the reader is - why is SSE2 and bit operator maniupation a bit slower than ASM or does it not matter? Possible SSE2 is better with bigger data than just comparing minimal of floating points?