Re: Question about SysRq

Boris Pisarcik (boris@acheron.sk)
Wed, 4 Apr 2001 02:39:11 +0200


Unfortunately,
> the only one that responds is sysrq-b, which boots the box without
> syncing or unmounting the disks. Not only does that piss me off but it's
> led to some fs corruption as well (which pisses me off even more). sysrq-b
> is the *only* combination I can get working when this happens.

I looked a bit at the source of sysrq handling. I've found there is
rather big difference between sysrq+b and other killers handling.
Sysrq+b is just called with pretty straitforward fashion - stops other
processors on SMP and reboots directly (hardware impulse or triple fault)
or through the bios - so it just calls for the corruptions.

On the other side sysrq+s marks a single variable, which will be tested
next cycle in the bdflush kernel threads' main loop, and adds bdflush to
scheduler runqueue list. So it gets possibility to check for emergency
sync onle when gets next scheduled. Does it ?

Can you anyhow find something in your logs/console/serial console messages
like 13.13.2000 kernel : Sysrq: Emergency Sync (this should be present - is
written within keyboard handler, not after shedule) and what's next logs ?
We could determine, if the bdflush thread got scheduled and called emergency
syncing routine indeed.

As you wrote no of your processes does respond - probably telnet will
not help. You may try to write experimental programme, that only log
say current time every n seconds, and see, if it just stopped to
log messages after lockup-time. If not - it doesn't get scheduled.
If continues - there's problem with syncing. Again - try, as far
as i understand, log kernel messages to serial console or alike, because
the messages should not get written to logfiles - syslogd can't be woken up
eg.

Quick help against those corruptions, which comes on my mind, is use
the reiserfs. I have no real experiences with that and its reliability,
also as aj followed some of messages in this list about resierfs - it has
some problems too - but in definition it shoudn't get corrupted by not-
syncing reboot. But i see this not much helpfull ,cause if you really
would depend on big reliability, you wouldn't intall 2.3.x-pre kernel :)

There go also occasionally discussions about watchdogs - it may be
helpfull - but none of the two really solve the problem.

LW: today a got ugly lockup with dosemu and experimental execution of
virtual pool ;). Neither Sysrq+b functioned. But that's probably another
story. Root or privilege suid processes (X server among them) need really
just a 1-bit error to corrupt near what they like.

The least fsck sessions and nice day B.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/