Re: [Lse-tech] Re: 10.31 second kernel compile

Linus Torvalds (torvalds@transmeta.com)
Sat, 16 Mar 2002 12:46:08 -0800 (PST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: yodaiken@fsmlabs.com: "Re: [Lse-tech] Re: 10.31 second kernel compile"
Previous message: Alan Cox: "Re: [PATCH] devexit fixes in i82092.c"

On Sat, 16 Mar 2002, David Mosberger wrote:
>
> Yes, Itanium has a two-level DTLB, McKinley has both ITLB and DTLB
> split into two levels. Not quite as big though: "only" on the order
> of hundreds of entries (partially offset by larger page sizes). Of
> course, operating the hardware walker in hashed mode can give you an
> L3 TLB as large as you want it to be.

The problem with caches is that if they are not coherent (and TLB's
generally aren't) you need to invalidate them by hand. And if they are
in main memory, that invalidation can be expensive.

Which brings us back to the whole reason for the discussion: this is not a
theoretical argument. Look at the POWER4 numbers, and _shudder_ at the
expense of cache invalidation.

NOTE! The goodness of a cache is not in its size, but how quickly you can
fill it, and what the hitrate is. I'd be very surprised if you get
noticeably higher hitrates from "as large as you want it to be" than from
"a few thousand entries that trivially fit on the die".

And I will guarantee that the on-die ones are faster to fill, and much
faster to invalidate (on-die it is fairly easy to do content-
addressability if you limit the addressing to just a few ways - off-chip
memory is not).

> Linus> - ability to fill multiple entries in one go to offset the
> Linus> cost of taking the trap.
>
> The software fill can definitely do that. I think it's one area where
> some interesting experimentation could happen.

If you can do it, and you don't do it already, you're just throwing away
cycles. If that was your comparison with the "superior hardware fill", it
really wasn't very fair.

Note that by "multiple entry support" I don't mean just a loop that adds
noticeable overhead for each entry - I mean something which can fairly
efficiently load contiguous entries pretty much in "one go". A TLB fill
routine can't afford to spend time setting up tag registers etc.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: yodaiken@fsmlabs.com: "Re: [Lse-tech] Re: 10.31 second kernel compile"
Previous message: Alan Cox: "Re: [PATCH] devexit fixes in i82092.c"