The problem with caches is that if they are not coherent (and TLB's
generally aren't) you need to invalidate them by hand. And if they are
in main memory, that invalidation can be expensive.
Which brings us back to the whole reason for the discussion: this is not a
theoretical argument. Look at the POWER4 numbers, and _shudder_ at the
expense of cache invalidation.
NOTE! The goodness of a cache is not in its size, but how quickly you can
fill it, and what the hitrate is. I'd be very surprised if you get
noticeably higher hitrates from "as large as you want it to be" than from
"a few thousand entries that trivially fit on the die".
And I will guarantee that the on-die ones are faster to fill, and much
faster to invalidate (on-die it is fairly easy to do content-
addressability if you limit the addressing to just a few ways - off-chip
memory is not).
> Linus> - ability to fill multiple entries in one go to offset the
> Linus> cost of taking the trap.
>
> The software fill can definitely do that. I think it's one area where
> some interesting experimentation could happen.
If you can do it, and you don't do it already, you're just throwing away
cycles. If that was your comparison with the "superior hardware fill", it
really wasn't very fair.
Note that by "multiple entry support" I don't mean just a loop that adds
noticeable overhead for each entry - I mean something which can fairly
efficiently load contiguous entries pretty much in "one go". A TLB fill
routine can't afford to spend time setting up tag registers etc.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/