This should probably be done unconditionally or not at all.
We've worked very hard on making block devices more "normal" in 2.5.x, and
I don't want to start diverging again.
If this is really a scalability issue, I would suggest that people who
care look into just getting rid of "i_sem", and replacing it with a
read-write semaphore that explicitly protects only "i_size". Then you make
reads and non-extending writes take that semaphore for reading, and
extending writes and truncates taking it for writing.
[ The "nonextending writes" case is somewhat interesting, a write probably
needs to actually take the semaphore for writing, and then downgrading
it to reading after it has checked that it doesn't end up extending the
file.
What makes this even more interesting is that depending on the semaphore
implementation you can actually split up the "take write lock" into
"prepare to take write lock" and "turn it into a read lock" or "confirm
write lock", where the "prepare to take write lock" allows existing
readers but not new write-lockers, so that if you downgrade to a read
lock you never had to synchronize with anybody else who was already
reading. ]
I'd much rather do this _right_ than have some ugly blockdev-only hack,
since the problem certainly would happen with files too. A lot of people
want to do databases on a filesystem, just because it is so much easier to
administer.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/