The difference is, there's only one superblock per mount. There are
bazillions
of inodes.
> > I'd then be able to write a trivial program that would eat inode+blocksize
> > worth of cache for each cached inode, by opening one file on each itable
> > block.
>
> you already have X overhead per inode cached... yes this would increase
> X but since there is typically more than one inode per block there is
> also sharing as well. So inode+blocksize is not true.
You skipped over my example too fast.
> > I'd also regret losing the genericity that comes from the read_inode
(unpack)
> > and update_inode (repack) abstraction.
>
> so what is write_inode... re-repack? :)
It's a trivial shell for update_inode:
http://innominate.org/~graichen/projects/lxr/source/fs/ext2/inode.c?v=v2.4#L1146
> > Right now, I don't see any fields in
> > _info that aren't directly copied, but I expect there soon will be.
>
> i_data[] is copied, and that would be nice to directly access in
> inode->u.ext2_i.i_bh...
Yes, that's the major one, it's 60 bytes, more than the other _info fields
put together. However, almost half the itable block data is going to be
redundant, and the proposal as I understand it is to lock it in cache while
the inode is in cache. This makes things worse, not better - it reduces the
total number of inodes that can be cached. And that's the best case,
when *all* the inodes on an itable block are in cache. Take a look at the
inode distribution in your directories and see if you think that's likely.
> > An alternative approach: suppose we were to map the itable blocks with
> > smaller-than-blocksize granularity. We could then fall back to smaller
> > transfers under cache pressure, eliminating much thrashing.
>
> in ibu fs the entire inode table[1] is accessing via the page cache.
> ext2 could do this too. If ext2's per-block-group inode table has
> padding at the end page calculations get a bit more annoying but it's
> still doable.
That's roughly what I had in mind, for starters.
It's worth keeping in mind that tweaking the icache efficiency in this case
is really just curing a symptom - the underlying problem is a mismatch
between readdir order and inode order.
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/