Re: What's left over.

Michael Shuey (shuey@purdue.edu)
Thu, 31 Oct 2002 14:42:12 -0500

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Gabor MICSKO: "kernel.org is down?"
Previous message: Castor Fu: "Re: [lkcd-devel] Re: What's left over."
In reply to: Alan Cox: "Re: What's left over."
Next in thread: Pavel Machek: "Re: What's left over."

On Thu, Oct 31, 2002 at 07:04:31PM +0000, Alan Cox wrote:
> On Thu, 2002-10-31 at 17:13, Michael Shuey wrote:
> > I'm a user, and I request that LKCD get merged into the kernel. :-)
> > Do you feel like donating a 700-port console server? Right, so it's LKCD
> > for me then.
>
> Wouldn't you rather they neatly tftp'd dumps to a nominated central
> server which noticed the arrival, did the initial processing with a perl
> script and mailed you a summary ?

Generally speaking, no.

A tftp server doesn't provide enough security (specifically authentication).
It would need to be accessible from clusters in multiple buildings and on
multiple networks (some of which must be public).

I've seen more network adapter issues than drive controller issues. In
particular, some vendors (Compaq, listen up) can't implement an eepro100 to
save their asses, especially on older hardware.

from being reliably delivered.

Right now we use the presence of a local dump to indicate that a machine
should not join the PBS pool (and begin to run more jobs) on a reboot. I'd
rather not have the nodes check a central server to see if it's okay to run
jobs. And no, I don't want machines to stay down after a crash - many nodes
are in distant corners of campus and it's cold outside. :-) If I can fix the
problem through software I'd prefer that the problematic host be up, rather
than having to walk over to it just to hit reset and load a new kernel.

That said, it would be really nice if LKCD would log dumps to both the swap
device and to a remote server. That way if the machine crashed because of
disk failure I'd still have an uncorrupted dump image (and could then notice
all the little errors coming back out of the swap device). A tool to
automatically analyze a dump and email back summaries would be much more
useful, though. If someone were to write such a widget, that'd be swell. :-)

Right now I'm less concerned with getting dumps to exactly the right place
and a bit more concerned with getting dumps in the main kernel at all.

-- 
Mike Shuey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Gabor MICSKO: "kernel.org is down?"
Previous message: Castor Fu: "Re: [lkcd-devel] Re: What's left over."
In reply to: Alan Cox: "Re: What's left over."
Next in thread: Pavel Machek: "Re: What's left over."