Re: 4 hashes parallel on SSE2 CPUs for 0.3.6

This patch will calculate four hashes on one core using vector instructions. There’s a test programm included that validates the new hash function against the old one so it should be correct.

The patch is against 0.3.6. Improves khash/s by roughly 115%.

That’s amazing…

So are you saying you use 128-bit registers to SIMD four 32-bit data at once?  I’ve wondered about that for a long time, but I didn’t think it would be possible due to addition carrying into the neighbour’s value.

20,439 total views, 1 views today