This is a very simple implementation taken from rhash (rhash.sourceforge.net). Every article about crc32 describes the code below, it is not something outstanding from rhash.
unsigned get_crc32(unsigned crcinit, const char *p, int len) {
register unsigned crc;
const char *e = p + len;
for(crc=crcinit^0xFFFFFFFF; p<e; p++)
crc = crcTable[(crc^ *p) & 0xFF] ^ (crc >> 8);
return( crc^0xFFFFFFFF );
}
And this is an x86 asm code, produced by gcc 4.1.2 from the 'for' loop:
.L11:
movsbl (%ecx,%esi),%eax
incl %ecx
xorl %edx, %eax
andl $255, %eax
shrl $8, %edx
xorl crcTable(,%eax,4), %edx
cmpl %ebx, %ecx
jne .L11
One of my friends found a bit faster implementation written in inline asm and used it for his hash checker ArXSum. I will not give here his code, because my optimization of rhash code is even faster. ArXSum just gave me an idea to use 8-bit registers to get rid of the andl instruction.
First, I enforced gcc to use 8-bit register. I hoped it would be enough, but it won't.
unsigned get_crc32(unsigned crcinit, const char *p, int len) {
register unsigned crc;
unsigned char m;
const char *e = p + len;
m = 0;
for(crc=crcinit^0xFFFFFFFF; p<e; p++) {
m = (crc^ *p);
crc = crcTable[m] ^ (crc >> 8);
}
return( crc^0xFFFFFFFF );
}
The code produced is
.L11:
movzbl (%ecx,%esi), %eax
incl %ecx
xorb %dl, %al
movzbl %al, %eax
shrl $8, %edx
xorl crcTable(,%eax,4), %edx
cmpl %ebx, %ecx
jne .L11
Do you see that utterly useless second movzbl? My final optimization was just to remove it and to add "xorl %eax,%eax" before the loop (that would be "m = 0" which had been lost by gcc). Newest version of gcc also produces the same code.
I still want to carefully look into zlib one day and to compare their high-level optimization with mine. I will eventually post about it.
No comments:
Post a Comment