Re: Disk hardware caching, performance, and journalling

Andrew Morton (akpm@zip.com.au)
Sat, 24 Nov 2001 11:08:17 -0800


Steve Bergman wrote:
>
> Note that block writes are over 3 times faster with caching on.

With a large linear write, linux typically feeds requests into the
disk like this:

write 248 sectors
write 248 sectors
...
write 248 sectors
write 8 sectors
write 248 sectors
...

Now, 248+8 sectors is 128 kbytes. A track is, say, 300 kbytes.

With writebehind the disk can write that entire track in pretty
much a single spin. But if we're waiting on the result of each
request we'll lose revolutions. In synchronous mode it's going
to take three or four spins to write a track.

> So what are the implications here for journalling? Do I have to turn
> off caching and suffer a huge performance hit?

In theory, yes. In my opinion, no. For ext3, at least. Caching
isn't bad per-se. It's reordering which can break the journalling
constraints. But given that the journal is, we hope, a strictly
ascending and (we really hope) contiguous chunk of blocks, it's
quite unlikely that the disk will decide to write them in an
unexpected order. This is especially true if the journal was
created when the disk was relatively unfragmented.

And if the disk _does_ write them in the wrong order, it has
to be specifically the journal commit block which was written
prior to some data blocks. And you need to lose power (not
just crash) prior to the data blocks hitting disk. It's a
very small time window containing an improbable occurrence.

Now that's all just vigorous handwaving, and may be wrong,
and yes, we really need a way of propagating barriers down
to the request queue. But I've not seen a whisker of a report
which indicates that write reordering has caused on-recovery
corruption.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/