Being network administrator at MSU dorm two years ago, I made a public ssh server. Users were presented in mysql database on another server, so I used pam_mysql and libnss_mysql, which were already existed, for my public ssh server. I also wanted to set disk quota automatically for each user, but linux setquota(8) doesn't allow you to edit quota for non-existing user. Nor did work any pam_setquotas I found. So I wrote a one myself.
Of course, I edited limits.conf. But it didn't save from stupid cpu-intensive "while(1);" programs some nasty users had left running. I decided to kill every user's process, if he/she is no longer logged on the system, and wrote pam_kill for that.
Today, to share my old code, I created two Google Code projects: pam-setquota and pam-kill. You can access them via my Google Code profile.
Tuesday, November 18, 2008
Saturday, November 15, 2008
crc32
Long time passed since my last play with crc32. I wanted to learn and benchmark zlib's implementation as well (it's quite complex in comparison to basic ones), but it seems it will stuck in my todo list for ages. So, I decided to write what I know right now.
This is a very simple implementation taken from rhash (rhash.sourceforge.net). Every article about crc32 describes the code below, it is not something outstanding from rhash.
And this is an x86 asm code, produced by gcc 4.1.2 from the 'for' loop:
One of my friends found a bit faster implementation written in inline asm and used it for his hash checker ArXSum. I will not give here his code, because my optimization of rhash code is even faster. ArXSum just gave me an idea to use 8-bit registers to get rid of the andl instruction.
First, I enforced gcc to use 8-bit register. I hoped it would be enough, but it won't.
The code produced is
Do you see that utterly useless second movzbl? My final optimization was just to remove it and to add "xorl %eax,%eax" before the loop (that would be "m = 0" which had been lost by gcc). Newest version of gcc also produces the same code.
I still want to carefully look into zlib one day and to compare their high-level optimization with mine. I will eventually post about it.
This is a very simple implementation taken from rhash (rhash.sourceforge.net). Every article about crc32 describes the code below, it is not something outstanding from rhash.
unsigned get_crc32(unsigned crcinit, const char *p, int len) {
register unsigned crc;
const char *e = p + len;
for(crc=crcinit^0xFFFFFFFF; p<e; p++)
crc = crcTable[(crc^ *p) & 0xFF] ^ (crc >> 8);
return( crc^0xFFFFFFFF );
}
And this is an x86 asm code, produced by gcc 4.1.2 from the 'for' loop:
.L11:
movsbl (%ecx,%esi),%eax
incl %ecx
xorl %edx, %eax
andl $255, %eax
shrl $8, %edx
xorl crcTable(,%eax,4), %edx
cmpl %ebx, %ecx
jne .L11
One of my friends found a bit faster implementation written in inline asm and used it for his hash checker ArXSum. I will not give here his code, because my optimization of rhash code is even faster. ArXSum just gave me an idea to use 8-bit registers to get rid of the andl instruction.
First, I enforced gcc to use 8-bit register. I hoped it would be enough, but it won't.
unsigned get_crc32(unsigned crcinit, const char *p, int len) {
register unsigned crc;
unsigned char m;
const char *e = p + len;
m = 0;
for(crc=crcinit^0xFFFFFFFF; p<e; p++) {
m = (crc^ *p);
crc = crcTable[m] ^ (crc >> 8);
}
return( crc^0xFFFFFFFF );
}
The code produced is
.L11:
movzbl (%ecx,%esi), %eax
incl %ecx
xorb %dl, %al
movzbl %al, %eax
shrl $8, %edx
xorl crcTable(,%eax,4), %edx
cmpl %ebx, %ecx
jne .L11
Do you see that utterly useless second movzbl? My final optimization was just to remove it and to add "xorl %eax,%eax" before the loop (that would be "m = 0" which had been lost by gcc). Newest version of gcc also produces the same code.
I still want to carefully look into zlib one day and to compare their high-level optimization with mine. I will eventually post about it.
Subscribe to:
Posts (Atom)