Re: Ext3 meta-data performance

Andreas Dilger (adilger@clusterfs.com)
Fri, 30 May 2003 11:44:44 -0600


On May 30, 2003 09:21 -0700, Andrew Morton wrote:
> So the workload is a `cp -Rl' of a 500,000 file tree?
>
> Vast amounts of metadata. ext2 will fly through that. Poor old ext3 has
> to write everything twice, and has to keep on doing seek-intensive
> checkpointing when the journal fills.
>
> When we get Andreas's "dont bother reading empty inode blocks" speedup
> going it will help both filesystems quite a bit.

Yes, that code is specifically a win for creating lots of files at once
and also reading large chunks of itable at once, so it will help on both
sides. The difficulty is that it checks the inode bitmap to decide if
it should read/zero the inode table block, but with the advent of Alex's
no-lock inode allocation this is racy.

I'm thinking that what needs to be done is to lock the inode table buffer
head if we think all of the bits for that block are empty (before setting
a bit there). Then, we re-check the bitmap before making the read/zero
decision if the itable block is not up-to-date, and zero the buffer and
mark it up-to-date if we again find the corresponding bits are zero, and
mark one bit in-use for our current inode allocation. Other threads that
are trying to allocate in that region will wait on the buffer head when
they find it not up-to-date, and wake after it has been set up appropriately.

Cheers, Andreas

--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/