Re: [Lse-tech] Re: 10.31 second kernel compile

Linus Torvalds (torvalds@transmeta.com)
Sun, 17 Mar 2002 17:31:10 -0800 (PST)


On Sun, 17 Mar 2002, Davide Libenzi wrote:
>
> What's the reason that would make more convenient for us, upon receiving a
> request to map a NNN MB file, to map it using 4Kb pages instead of 4MB ones ?

Ehh.. Let me count the ways:
- reliably allocation of 4MB of contiguous data
- graceful fallback when you need to start paging
- sane coherency with somebody who mapped the same file/segment in a much
smaller chunk

Guyes, 4MB pages are always going to be a special case. There's no sane
way to make them automatic, for the simple reason that they are USELESS
for "normal" work, and they have tons of problems that are quite
fundamental and just aren't going away and cannot be worked around.

The only sane way to use 4MB segments is:

- the application does a special system call (or special flag to mmap)
saying that it wants a big page and doesn't care about coherency with
anybody else that didn't set the flag (and realize that that probably
includes things like read/write)

- the machine has sufficiently enough memory that the user can be allowed
to _lock_ the area down, so that you don't have to worry about
swapping out that thing in 4M pieces. (This of course implies that
per-user memory counters have to work too, or we have to limit it by
default with a rlimit or something to zero).

In short, very much a special case.

(There are two reasons you don't want to handle paging on 4M chunks: (a)
they may be wonderful for IO throughput, but they are horrible for latency
for other people and (b) you now have basically just a few bits of usage
information for 4M worth of memory, as opposed to a finer granularity view
of which parts are actually _used_).

Once you can count on having memory sizes in the hundreds of Gigs, and
disk throughput speeds in the hundreds of megs a second, and ther are
enough of these machines to _matter_ (and reliably 64-bit address spaces
so that virtual fragmentation doesn't matter), we might make 4MB the
regular mapping entity.

That's probably at least a decade away.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/