Re: [2nd Oops] 2.4.17, looks ext3 related

Andrew Morton (akpm@zip.com.au)
Sun, 06 Jan 2002 16:58:38 -0800


Stefan Frank wrote:
>
> Some more oopses occured this evening after a strange behaviour:
>
> A few hours ago i noticed that the system load was stable around 1, but
> according to top no process was responsible. I just ignored it but
> then vim got stuck when quitting it inside a xterm. What followed were 3
> oopses of different processes (see ksymoops decoding below).
>
> Somewhere between the 2nd and the 3rd oops (i'm NOT exactly shure when) the
> system load went up to around 7. Again no, process seemed responsible.

The system load goes up by one for each process which is stuck on
a semaphore. You have lots of processes stuck on a semaphore because
a process took an oops with a semaphore held, and killed itself
without releasing the semaphore.

> Jan 6 19:58:46 asterix kernel: printing eip:
> Jan 6 19:58:46 asterix kernel: c012ea98
> Jan 6 19:58:46 asterix kernel: Oops: 0000
> Jan 6 19:58:46 asterix kernel: CPU: 0
> Jan 6 19:58:46 asterix kernel: EIP: 0010:[get_hash_table+104/144]

We appear to not have the faulting address. And because klogd attempted
to interpret the oops, we don't have the instruction decode which would
allow us to work out the faulting address.

Please. edit /etc/rc.d/init.f/syslog (or equivalent) and ensure that
klogd is launched with the `-x' option:

daemon klogd -x $KLOGD_OPTIONS

Once this is done, your logs will be ksymoops-friendly.

So. Why did it oops? Don't know. The buffercache hashtable has
become corrupted. Could be bad memory, could be some part of the
kernel (any part) scribbled on memory. And the latter is one of
the very hardest bugs to find.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/