サヴルスのブログ: crc32

Long time passed since my last play with crc32. I wanted to learn and benchmark zlib's implementation as well (it's quite complex in comparison to basic ones), but it seems it will stuck in my todo list for ages. So, I decided to write what I know right now.

This is a very simple implementation taken from rhash (rhash.sourceforge.net). Every article about crc32 describes the code below, it is not something outstanding from rhash.

unsigned get_crc32(unsigned crcinit, const char *p, int len) {
  register unsigned crc;
  const char *e = p + len;
    
  for(crc=crcinit^0xFFFFFFFF; p<e; p++) 
    crc = crcTable[(crc^ *p) & 0xFF] ^ (crc >> 8);
  return( crc^0xFFFFFFFF );
}

And this is an x86 asm code, produced by gcc 4.1.2 from the 'for' loop:

.L11:
        movsbl  (%ecx,%esi),%eax
        incl    %ecx
        xorl    %edx, %eax
        andl    $255, %eax
        shrl    $8, %edx
        xorl    crcTable(,%eax,4), %edx
        cmpl    %ebx, %ecx
        jne     .L11

One of my friends found a bit faster implementation written in inline asm and used it for his hash checker ArXSum. I will not give here his code, because my optimization of rhash code is even faster. ArXSum just gave me an idea to use 8-bit registers to get rid of the andl instruction.
First, I enforced gcc to use 8-bit register. I hoped it would be enough, but it won't.

unsigned get_crc32(unsigned crcinit, const char *p, int len) {
  register unsigned crc;
  unsigned char m;
  const char *e = p + len;
  m = 0;
    
  for(crc=crcinit^0xFFFFFFFF; p<e; p++) { 
    m = (crc^ *p);
    crc = crcTable[m] ^ (crc >> 8);
  }
  return( crc^0xFFFFFFFF );
}

The code produced is

.L11:
        movzbl  (%ecx,%esi), %eax
        incl    %ecx
        xorb    %dl, %al
        movzbl  %al, %eax
        shrl    $8, %edx
        xorl    crcTable(,%eax,4), %edx
        cmpl    %ebx, %ecx
        jne     .L11

Do you see that utterly useless second movzbl? My final optimization was just to remove it and to add "xorl %eax,%eax" before the loop (that would be "m = 0" which had been lost by gcc). Newest version of gcc also produces the same code.

I still want to carefully look into zlib one day and to compare their high-level optimization with mine. I will eventually post about it.

サヴルスのブログ

Saturday, November 15, 2008

crc32

No comments:

Labels

Blog Archive

Followers

About Me