Re: What's left over.

Linus Torvalds (torvalds@transmeta.com)
Fri, 1 Nov 2002 11:18:35 -0800 (PST)


On Fri, 1 Nov 2002, Joel Becker wrote:
>
> I always liked the AIX dumper choices. You could either dump to
> the swap area (and startup detects the dump and moves it to the
> filesystem before swapon) or provide a dedicated dump partition. The
> latter was prefered.
> Either of these methods merely require the dumper to correctly
> write to one disk partition. This is about as simple as you are going
> to get in disk dumping.

Ehh.. That was on closed hardware that was largely designed with and for
the OS.

Alan isn't worried about the "which sector do I write" kind of thing.
That's the trivial part. Alan is worried about the fact that once you know
which sector to write, actually _doing_ so is a really hard thing. You
have bounce buffers, you have exceedingly complex drivers that work
differently in PIO and DMA modes and are more likely than not the _cause_
of a number of problems etc.

And you have a situation where interrupts are not likely to work well
(because you crashed with various locks held), so the regular driver
simply isn't likely to work all that well.

And you have a situation where there are hundreds of different kinds of
device drivers for the disk.

In other words, the AIX situation isn't even _remotely_ comparable. A
large portion of the complexity in the PC stability space is in device
drivers. It's the thing I worry most about for 2.6.x stabilization, by
_far_.

And if you get these things wrong, you're quite likely to stomp on your
disk. Hard. You may be tryign to write the swap partition, but if the
driver gets confused, you just overwrote all your important data. At which
point it doesn't matter if your filesystem is journaling or not, since you
just potentially overwrote it.

In other words: it's a huge risk to play with the disk when the system is
already known to be unstable. The disk drivers tend to be one of the main
issues even when everything else is _stable_, for chrissake!

To add insult to injury, you will not be able to actually _test_ any of
the real error paths in real life. Sure, you will be able to test forced
dumps on _your_ hardware, but while that is fine in the AIX model ("we
control the hardware, and charge the user five times what it is worth"),
again that doesn't mean _squat_ in the PC hardware space.

See?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/