And on a 10K scsi disk I'm running 35s per 10,000 directories, which is way,
way slower than it ought to be. There are two analysis tools we're hurting
for badly here:
- We need to see the physical allocation maps for directories, preferably
in a running kernel. I think the best way to do this is a little
map-dumper hooked into ext3/dir.c and exported through /proc.
- We need block-access traces in a nicer form than printks (or nobody
will ever use them). IOW, we need LTT or something very much like
it.
> Depending on the size of the journal vs. how many block/inode bitmaps and
> directory blocks are dirtied, you will likely wrap the journal before you
> return to the first block group, so you might write 20kB * 32000 for the
> directory creates instead of 8kB for the file creates. You also have a
> lot of seeking to each block group to write out the directory data, instead
> of nearly sequential IO for the inode create case.
Yes, I think that's exactly what's happening. There are some questions
remaining, such as why doesn't it happen to akpm. Another question: why does
it happen to the directory creates, where the only thing being accessed
randomly is the directory itself - the inode table is supposedly being
allocated/dirtied sequentially.
Regards,
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/