Re: tcatm’s 4-way SSE2 for Linux 32/64-bit is in 0.3.10

I just reviewed the sourcecode as I had a few ideas to optimize it further and I noticed that 4way is partly broken:

from main.cpp:

Code:
                for (int j = 0; j < NPAR; j++) 
                {    
                    if (thash[7][j] == 0)
                    {                        
                        for (int i = 0; i < sizeof(hash)/4; i++) 
                          ((unsigned int*)&hash)[i] = thash[i][j];
                        pblock->nNonce = ByteReverse(tmp.block.nNonce + j);
                    }    
                }

The code will only process one hash (the last with thash[7] == 0) out of 32 hashes even when there is more than one hash that might be a correct one.

Somethine like this should fix it but it won’t be safe at higher difficulties. Also, I’m not sure whether the byte order should be reversed or not. Could someone review this?

Code:
                unsigned int min_hash = ~1;
       for (int j = 0; j < NPAR; j++) 
                {    
                    if (thash[7][j] == 0)
                    {    
                        if(thash[6][j] < min_hash) {
                          min_hash = thash[6][j];
                          for (int i = 0; i < sizeof(hash)/4; i++) 
                            ((unsigned int*)&hash)[i] = thash[i][j];
                          pblock->nNonce = ByteReverse(tmp.block.nNonce + j);
                        }    
                    }    
                }

The simplification is intentional.  There will only be more than one thash[7]=0 in one out of 134,217,728 cases.  It only makes it 0.0000007% slower.

34,961 total views, 4 views today

https://bitcointalk.org/index.php?topic=820.msg11503#msg11503