Tuesday, November 18, 2008

pam_setquota, pam_kill

Being network administrator at MSU dorm two years ago, I made a public ssh server. Users were presented in mysql database on another server, so I used pam_mysql and libnss_mysql, which were already existed, for my public ssh server. I also wanted to set disk quota automatically for each user, but linux setquota(8) doesn't allow you to edit quota for non-existing user. Nor did work any pam_setquotas I found. So I wrote a one myself.

Of course, I edited limits.conf. But it didn't save from stupid cpu-intensive "while(1);" programs some nasty users had left running. I decided to kill every user's process, if he/she is no longer logged on the system, and wrote pam_kill for that.

Today, to share my old code, I created two Google Code projects: pam-setquota and pam-kill. You can access them via my Google Code profile.

Saturday, November 15, 2008

crc32

Long time passed since my last play with crc32. I wanted to learn and benchmark zlib's implementation as well (it's quite complex in comparison to basic ones), but it seems it will stuck in my todo list for ages. So, I decided to write what I know right now.

This is a very simple implementation taken from rhash (rhash.sourceforge.net). Every article about crc32 describes the code below, it is not something outstanding from rhash.
unsigned get_crc32(unsigned crcinit, const char *p, int len) {
register unsigned crc;
const char *e = p + len;

for(crc=crcinit^0xFFFFFFFF; p<e; p++)
crc = crcTable[(crc^ *p) & 0xFF] ^ (crc >> 8);
return( crc^0xFFFFFFFF );
}

And this is an x86 asm code, produced by gcc 4.1.2 from the 'for' loop:
.L11:
movsbl (%ecx,%esi),%eax
incl %ecx
xorl %edx, %eax
andl $255, %eax
shrl $8, %edx
xorl crcTable(,%eax,4), %edx
cmpl %ebx, %ecx
jne .L11

One of my friends found a bit faster implementation written in inline asm and used it for his hash checker ArXSum. I will not give here his code, because my optimization of rhash code is even faster. ArXSum just gave me an idea to use 8-bit registers to get rid of the andl instruction.
First, I enforced gcc to use 8-bit register. I hoped it would be enough, but it won't.
unsigned get_crc32(unsigned crcinit, const char *p, int len) {
register unsigned crc;
unsigned char m;
const char *e = p + len;
m = 0;

for(crc=crcinit^0xFFFFFFFF; p<e; p++) {
m = (crc^ *p);
crc = crcTable[m] ^ (crc >> 8);
}
return( crc^0xFFFFFFFF );
}

The code produced is
.L11:
movzbl (%ecx,%esi), %eax
incl %ecx
xorb %dl, %al
movzbl %al, %eax
shrl $8, %edx
xorl crcTable(,%eax,4), %edx
cmpl %ebx, %ecx
jne .L11

Do you see that utterly useless second movzbl? My final optimization was just to remove it and to add "xorl %eax,%eax" before the loop (that would be "m = 0" which had been lost by gcc). Newest version of gcc also produces the same code.

I still want to carefully look into zlib one day and to compare their high-level optimization with mine. I will eventually post about it.