Generally speaking, no.
A tftp server doesn't provide enough security (specifically authentication).
It would need to be accessible from clusters in multiple buildings and on
multiple networks (some of which must be public).
I've seen more network adapter issues than drive controller issues. In
particular, some vendors (Compaq, listen up) can't implement an eepro100 to
save their asses, especially on older hardware.
from being reliably delivered.
Right now we use the presence of a local dump to indicate that a machine
should not join the PBS pool (and begin to run more jobs) on a reboot. I'd
rather not have the nodes check a central server to see if it's okay to run
jobs. And no, I don't want machines to stay down after a crash - many nodes
are in distant corners of campus and it's cold outside. :-) If I can fix the
problem through software I'd prefer that the problematic host be up, rather
than having to walk over to it just to hit reset and load a new kernel.
That said, it would be really nice if LKCD would log dumps to both the swap
device and to a remote server. That way if the machine crashed because of
disk failure I'd still have an uncorrupted dump image (and could then notice
all the little errors coming back out of the swap device). A tool to
automatically analyze a dump and email back summaries would be much more
useful, though. If someone were to write such a widget, that'd be swell. :-)
Right now I'm less concerned with getting dumps to exactly the right place
and a bit more concerned with getting dumps in the main kernel at all.
-- Mike Shuey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/