When you need this: If you have a system running a kernel with a buggy
raid1 driver and you can't easily switch to a non-buggy kernel. Vanilla
2.4.9 is buggy. Red hat 2.4.9 is not. I do not know which other kernels
are/are not buggy. If you get hit with the bug, the symptom will be that
one or more programs is "frozen", can't be killed, and never does
anything. If you check the "wchan" parameter of "top" or "ps", it is
stuck in "raid1_alloc_r1bh". This will happen very rarely - only if a
process is doing a disk write at the exact moment that the system is
temporarily out of physical memory. If your system is under heavy load
it will happen much more often.
After you apply the bug, once per minute this module will wake up, look
for stuck raid devices, and "unstick" them if necessary. So at most your
processes will freeze for 1 minute. You can also make it check more
often if you want.
Not sure if anybody but me will need it, but just in case - here it is!
#define __KERNEL__ 1
#define MODULE 1
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kdev_t.h>
#include <linux/raid/md.h>
#include <linux/raid/md_k.h>
#include <linux/raid/raid1.h>
static struct timer_list timer;
#define CHECK_INTERVAL (60 * HZ) /* Once per minute. */
#define NUM_RAIDS 3 /* # of raid1 partitions. Make sure this is correct! */
static void doFix(unsigned long x) {
int i;
raid1_conf_t *conf;
for (i = 0; i < NUM_RAIDS; ++i) {
conf = mddev_to_conf(kdev_to_mddev(MKDEV(9, i)));
if (conf->freer1_blocked == 1) {
printk(KERN_WARNING "fixRaid: MD system %d is stuck. Fixing.\n", i);
conf->freer1_blocked = 0;
wake_up(&conf->wait_buffer);
}
}
timer.expires = jiffies + CHECK_INTERVAL;
add_timer(&timer);
}
int fixRaid_init(void) {
init_timer(&timer);
timer.expires = jiffies + CHECK_INTERVAL;
timer.data = 0;
timer.function = &doFix;
add_timer(&timer);
return(0);
}
void fixRaid_cleanup(void) {
del_timer(&timer);
}
module_init(fixRaid_init);
module_exit(fixRaid_cleanup);
--Bill Shubert (wms@igoweb.org) <mailto:wms@igoweb.org> http://www.igoweb.org/~wms/ <http://igoweb.org/%7Ewms/>
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/