OK. The main gain here is from the large context switch rate which
lock_super() can cause on big machines.
> - if (!ext2_clear_bit(bit + i, bitmap_bh->b_data))
> + if (!test_and_clear_bit(bit + i, (void *) bitmap_bh->b_data))
Nope.
This is an on-disk bitmap. ext2_clear_bit() is endian-neutral - see the
ppc/ppc64/mips/etc implementations. The code you have here will not work on
big-endian architectures.
We either need to create per-architecture atomic implementations of
ext2_foo_bit(), or use the existing ones under spinlock.
You could do:
int bzzz_set_bit(struct ext2_bg_info *bgi, void *addr, int bit)
{
#if __BIG_ENDIAN
int ret;
spin_lock(&bgi->s_alloc_lock);
ret = ext2_set_bit(addr, bit);
spin_unlock(&bgi->s_alloc_lock);
return ret;
#else
return test_and_set_bit(addr, bit);
#endif
}
I think that will work...
> @@ -45,6 +45,7 @@
> u32 s_next_generation;
> unsigned long s_dir_count;
> u8 *s_debts;
> + spinlock_t s_alloc_lock;
> };
You can do better than this. A spinlock per blockgroup will scale better,
and is pretty easy.
See that s_debts thing? That points to an array of bytes, one per
blockgroup. Turn it into:
struct ext2_bg_info {
u8 s_debt;
spinlock_t s_alloc_lock;
};
And the locking can become per-blockgroup.
The problem with this is the fs-wide s_free_blocks_count thing. It needs
global locking. But do we need it?
If you look, you'll see that's not really used for much. When we report the
free block count to userspace you can just locklesly zoom across all the
blockgroups adding them up. You'll have to do the same in
find_group_orlov(), which is a bit sucky, but that's only used by mkdir.
The only thing left which needs the global free blocks counter is the
"reserved blocks for root" thing, which doesn't work very well anyway. A way
to fix that would be to add a "reserved to root" field to ext2_bg_info, and
to precalculate these at mount time.
So the mount code walks across the blockgroups reserving blocks in each one
until it has reserved the required number of blocks. This way the for-root
reservation becomes per-block-group. It should only be dipped into if all
blockgroups are otherwise full.
Or something like that ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/