I don't know if it is a scalability issue, frankly. It will be for
buffered writes, but for writes which wait on IO, the mechanics of
the media probably make the benefits small. Conceivably there are
some additional merging opportunities, but it's thin.
We can do the rwsem thing, and that would be good. But there may
be filesystems which are relying on i_sem to provide protection
against concurrent invokations of get_block(create=1), inside i_size.
> [ The "nonextending writes" case is somewhat interesting, a write probably
> needs to actually take the semaphore for writing, and then downgrading
> it to reading after it has checked that it doesn't end up extending the
> file.
>
> What makes this even more interesting is that depending on the semaphore
> implementation you can actually split up the "take write lock" into
> "prepare to take write lock" and "turn it into a read lock" or "confirm
> write lock", where the "prepare to take write lock" allows existing
> readers but not new write-lockers, so that if you downgrade to a read
> lock you never had to synchronize with anybody else who was already
> reading. ]
>
> I'd much rather do this _right_ than have some ugly blockdev-only hack,
> since the problem certainly would happen with files too. A lot of people
> want to do databases on a filesystem, just because it is so much easier to
> administer.
OK. It'd be nice to get some benchmarks first (say, between O_DIRECT-to-blockdev
and the raw driver) to see if it's worth bothering with.
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/