Re: BIG files & file systems

Andrew Morton (akpm@zip.com.au)
Thu, 01 Aug 2002 13:33:44 -0700


"Peter J. Braam" wrote:
>
> Hi,
>
> I've just been told that some "limitations" of the following kind will
> remain:
> page index = unsigned long
> ino_t = unsigned long
>
> Lustre has definitely been asked to support much larger files than
> 16TB. Also file systems with a trillion files have been requested by
> one of our supporters (you don't want to know who, besides I've no
> idea how many bits go in a trillion, but it's more than 32).
>
> I understand why people don't want to sprinkle the kernel with u64's,
> and arguably we can wait a year or two and use 64 bit architectures,
> so I'm probably not going to kick up a fuss about it.
>
> However, I thought I'd let you know that there are organizations that
> _really_ want to have such big files and file systems and get quite
> dismayed about "small integers". And we will fail to deliver on a
> requirement to write a 50TB file because of this.

I don't know about the ino_t thing, but as far as the pagecache
indices goes it's simply a matter of

- s/unsigned long/pgoff_t/ in a zillion places
- modify the radix tree code a bit
- implement CONFIG_LL_PAGECACHE_INDEX
- make it all work
- convince Linus

Linus's objections are threefold: it expands struct page, 64 bit
arith is slow and gcc tends to get it wrong. And I would add "most
developers won't test 64-bit pgoff_t, and it'll get broken regularly".

The expansion of struct page and the performance impact is just a
cost which you'll have to balance against the benefits. For a few
people, 32-bit pagecache index is a showstopper and they'll accept that
tradeoff.

Sprinkling `pgoff_t' everywhere is, IMO, not a bad thing - it aids code
readability because it tells you what the variable is used for.

As for broken gcc, well, the proponents of 64-bit pgoff_t would have
to work to identify the correct gcc version and generally get gcc
doing the right thing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/