frob said:
JoeJ said:
I am sure 32 bits are enough for my data, so why using 64? On some (ore any) platform this could affect performance?You are quite likely paying the cost of conversions that you didn't intend. It is faster to keep them all as the same type and do the math than it is to do the conversions.
Sometimes those conversions you pay for happen inside the CPU core itself. Modern processors have moved to 64-bit processing internally so many of the 32-bit, 16-bit, and 8-bit operations are generally slower. For many operations they do the work as though it were a 64-bit value sometimes having to extend the value out, do the math, and then chop it back down. Benchmark tools like PassMark PerformanceTest have found notable differences on many CPUs which they have occasionally discussed in their forums. For example, 64-bit integer multiplies may be 6x faster than the 32-bit integer operations, or 64-bit division might be 4x faster than the same 32-bit integer division. Exact numbers depend on the chips and the operations involved.
None of this is new. The performance difference was noted about 15 years ago when Intel and AMD moved to 64-bit cpu cores. They continue to focus first on 64-bit (and with modern extensions also 128-bit, 256-bit, and even 512-bit) optimizations and leave the old, smaller cases behind as historical footnotes. The engineers might have left in faster hardware designs specific to the 32-bit or 16-bit operations, but quite often they don't bother and simply do the work at 64-bit (or higher) precision.
If your data is 32-bit use a 32-bit type like u32 or i32 or int32_t or whatever you want in your system. If it is 64 bit do likewise, using the actual size of the data. If you're working with memory sizes then size_t is the proper data type, not int or int64_t or int32_t, or other variations. Don't intentionally do conversions you don't need.
I am very sceptical of this claim that 64-bit ops could be (that much) faster than 32-bit ops, so I tried to find the source and I think this is where it comes from: https://forums.passmark.com/performancetest/3278-amd-llano-cpumark-artificially-inflated-scores?postcount=7#post18109
And it just seems to be the usual mixup of 64-bit architecture and 64-bit data. When they're talking about 32bit and 64bit in that thread it's about architecture, without specifying what datatypes are used - just “Integer Math” and “Find Prime Numbers”. A bit more reading ( https://forums.passmark.com/performancetest/3383-64bit-vs-32bit-benchmarks-integer-maths-pt8?t=3348 ) shows that both tests mixed 32-bit and 64-bit integers and performance numbers where dominated by integer division and square roots respectively. So it's not that 32-bit ops are slow(er) on modern x64 CPUs, it's that 64-bit integer operations are slow when you have to emulate them with x86 instructions.
And then coincidentally the specific chips in that original benchmark also had a hardware bug, with a workaround that completely ruined their integer division performance: https://forums.passmark.com/performancetest/3705-amd-llano-a-series-benchmark-and-cpu-bug?t=3656
And if you want to look up exact performance numbers for individual instructions on specific processor architectures, Agner Fog has you covered: https://www.agner.org/optimize/instruction_tables.pdf