Saturday, August 8, 2009

LANG vs LC_MESSAGES in man(1) error messages

At first I'd like to say that I'm pretty happy with english being my system's language. But since I have to use russian sometimes I've set LANG=ru_RU.UTF-8. Unfortunately man(1) error messages are encoded in KOI8-R so they look like garbage on my locale. I don't particularly want them to be in russian so I've set LC_MESSAGE=C. The man page locale(1P) says:

LC_MESSAGES
Determine the locale that should be used to affect the format
and contents of diagnostic messages written to standard error.

But it doesn't work for man(1). After little investigation I've learned that man(1) uses catgets(3) to get an error string. The catalog descriptor for catgets(3) is received from catopen(3) which (in case $LC_MESSAGES=C) carefully tries to open /usr/share/locale/C and some other dirs with C locale and finally fallbacks to /usr/share/locale/ru. If man(1) obtains an empty string from catgets(3) it looks for it in builtin table which is what I need.
As for now I can't say which part of this complicated situation is buggy, so I've inserted a hack into man(1) to get an error string right from builtin table if the locale is C or POSIX.
Here is my patch.

diff -rNu a/man-1.6f/src/gripes.c b/man-1.6f/src/gripes.c
--- a/man-1.6f/src/gripes.c 2006-11-21 22:53:44.000000000 +0300
+++ b/man-1.6f/src/gripes.c 2009-08-11 10:36:08.000000000 +0400
@@ -99,15 +99,22 @@
static char *
getmsg (int n) {
char *s = "";
-
- catinit ();
- if (catfd != (nl_catd) -1) {
- s = catgets(catfd, 1, n, "");
- if (*s && is_suspect(s))
- s = "";
- }
- if (*s == 0 && 0 < n && n <= MAXMSG)
- s = msg[n];
+ char *lm;
+
+ lm = getenv("LC_MESSAGES");
+ if (lm && (!strcmp(lm, "C") || !strcmp(lm, "POSIX"))) {
+ if (0 < n && n <= MAXMSG)
+ s = msg[n];
+ } else {
+ catinit ();
+ if (catfd != (nl_catd) -1) {
+ s = catgets(catfd, 1, n, "");
+ if (*s && is_suspect(s))
+ s = "";
+ }
+ if (*s == 0 && 0 < n && n <= MAXMSG)
+ s = msg[n];
+ }
if (*s == 0) {
fprintf(stderr,
"man: internal error - cannot find message %d\n", n);

2 comments:

amonakov said...

> But since I have to use russian sometimes I've set LANG=ru_RU.UTF-8

Why's that? With utf8 you can use russian regardless of lang/country, don't you? Thus, I always set LANG=en_US.utf8.

mospehraict said...

Yep, I also use en_US.UTF-8, and still use Russian and Japanese w/o any problems.

In fact, setting locale to smth other than en_US always brings a lot of trouble :)