Re: [reiserfs-dev] Re: Ext2 directory index: ALS paper and benchmarks

Jeff Garzik (jgarzik@mandrakesoft.com)
Sat, 08 Dec 2001 12:54:27 -0500


Daniel Phillips wrote:
> Linus wrote:
> > So "ext2_write_inode()" would basically become somehting like
> >
> > struct ext2_inode *raw_inode = inode->u.ext2_i.i_raw_inode;
> > struct buffer_head *bh = inode->u.ext2_i.i_raw_bh;
> >
> > /* Update the stuff we've brought into the generic part of the inode */
> > raw_inode->i_size = cpu_to_le32(inode->i_size);
> > ...
> > mark_buffer_dirty(bh);
> >
> > with part of the data already in the right place (ie the current
> > "inode->u.ext2_i.i_data[block]" wouldn't exist, it would just exist as
> > "raw_inode->i_block[block]" directly in the buffer block.

note we do this for the superblock already, and it's pretty useful

> I'd then be able to write a trivial program that would eat inode+blocksize
> worth of cache for each cached inode, by opening one file on each itable
> block.

you already have X overhead per inode cached... yes this would increase
X but since there is typically more than one inode per block there is
also sharing as well. So inode+blocksize is not true.

> I'd also regret losing the genericity that comes from the read_inode (unpack)
> and update_inode (repack) abstraction.

so what is write_inode... re-repack? :)

> Right now, I don't see any fields in
> _info that aren't directly copied, but I expect there soon will be.

i_data[] is copied, and that would be nice to directly access in
inode->u.ext2_i.i_bh...

Also in my ibu fs (you can look at it now in gkernel cvs) it uses a
fixed inode size of 512 bytes, with file or extent data packed into that
512 bytes after the fixed header ends. Having the bh right there would
be nice. [note there shouldn't be aliasing problems related to that in
ibu's case, because when data-in-inode is implemented readpage and
writepage handle that case anyway]

> An alternative approach: suppose we were to map the itable blocks with
> smaller-than-blocksize granularity. We could then fall back to smaller
> transfers under cache pressure, eliminating much thrashing.

in ibu fs the entire inode table[1] is accessing via the page cache.
ext2 could do this too. If ext2's per-block-group inode table has
padding at the end page calculations get a bit more annoying but it's
still doable.

Jeff

[1] ibu's inode table is a normal, potentially-fragmented file. thus it
is possibly broken up in chunks spread across the disk like ext2's block
groups.

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/