Re: 2.4.21 IDE and IEEE1394+SBP2 regressions, orinoco_pci progress

Zygo Blaxell (uixjjji1@umail.furryterror.org)
Wed, 09 Jul 2003 10:31:53 -0400


On Wed, 09 Jul 2003 05:34:34 -0400, Benjamin Herrenschmidt wrote:

> Whatever happens, you shouldn't let the machine suspend while ongoing
> disk IOs are in progress. A lot of bad things could result from that.

This is unavoidable, without some really draconian measures that will be
highly visible to user-space (killing or stopping processes, umounting
filesystems...). There will always be a risk that something in user-space
will want to read or write data at the exact moment of suspend. The apmd
process itself comes to mind.

Bad things haven't happened to me in the 1100+ successful suspend/resume
cycles on this laptop prior to the installation of 2.4.21. On 2.4.18-20 I
have typical laptop uptimes of 60+ days, including about 80 suspend/resume
cycles. At conferences I may do 50 suspend/resume cycles within in a
single week, and continue running without rebooting for two months
following. Most of those suspends occur under some kind of I/O load,
often a quite heavy one. It is very rare that I can choose the time when
I have to suspend the machine--generally I have to go
somewhere--quickly--and the laptop runs most of the /etc/apm/event.d
scripts from inside its carrying case. Up to and including 2.4.20, I
could simply expect this to work.

Every month I do a full md5sum-based check of the 800,000-or-so files on
the 48G internal disk and review the files that have changed so that I can
be absolutely _certain_ bad things are not happening (although the check
is designed to handle malicious software, not hardware problems, it will
detect both).

By contrast, 2.4.21 crashes on most resumes. That's just broken.

> Actually, the proper fix is to implement some working suspend/resume
> handlers in the IDE layer like we did in 2.5, though the problem here is
> that 2.4 lacks proper infrastructure for doing that in a properly
> ordered way.

Um, yes. That's all very nice, but kernels from 2.4.10 through 2.4.20
were capable of handling this sort of thing all by themselves. They did
so more by cleaning up after the fact rather than by preventing the mess
from appearing in the first place, but whatever the mechanism was, it did
work in kernels prior to 2.4.21.

If that robustness is due to a bug, then that bug should be reproduced
exactly until a proper solution is ready to replace it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/