Re: ext3-2.4-0.9.0

Neil Brown (neilb@cse.unsw.edu.au)
Sun, 8 Jul 2001 08:15:17 +1000 (EST)


On Saturday July 7, andrewm@uow.edu.au wrote:
> An update of the ext3 journalling filesystem for 2.4 kernels
> is available at
>
> http://www.uow.edu.au/~andrewm/linux/ext3/
>
> Patches are against 2.4.6-ac1 and 2.4.6.

I thought it was time to try out ext3 between nfsd and raid5, so I
built 2.4.6 plus this patch, and an ext3 filesystem on a largish
raid5 volume, exported it (with the "sync" flag), mounted it from
another machines with NFSv2, and ran "dbench 4".

This produces a live-lock (I think that it the right term).
Throughput would drop to zero (determined by watching the counts in
/proc/nfs/rpc/nfsd), but could be coaxed along by generating other
filesystem activity.

I tried nfs over ext3 on a plain ide disc and it worked fine.
I tried dbench directly on ext3/raid5 and it worked fine.
I tried dbench/nfs/ext2/raid5 and it worked fine.

So I think it is some interaction between ext3fs and raid5 triggered
by the high rate of "fsync" calls made by nfsd. Naturally I blame
ext3 because I know more about raid5 and nfsd :-)

One particular aspect of raid5 that *could* be related is that it is
very reticent to schedule write requests. It tries to hang on the them
as long as possible in the hope of getting more write requests in the
same stripe. My guess as to what is happening is that as write
request is submitted and then waited-for without an intervening
run_task_queue(&tq_disk);

When the system is livelocked, all I can tell at the moment (I am at
home and the console is at work so I cannot use alt-sysrq) is that
kjournal is waiting in wait_on_buffer and an nfsd thread is waiting on
the journal.

I will try to explore it more deeply next time I am at work, but if
there are any suggestions as to what it might be, or how I might more
easily find out what is going on, I am all ears.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/