Re: about kmap_high function

Paul Mackerras (paulus@samba.org)
Tue, 3 Jul 2001 22:47:20 +1000 (EST)


Stephen C. Tweedie writes:

> kmap_high is intended to be called routinely for access to highmem
> pages. It is coded to be as fast as possible as a result. TLB
> flushes are expensive, especially on SMP, so kmap_high tries hard to
> avoid unnecessary flushes.

The code assumes that flushing a single TLB entry is expensive on SMP,
while flushing the whole TLB is relatively cheap - certainly cheaper
than flushing several individual entries. And that assumption is of
course true on i386.

On PPC it is a bit different. Flushing a single TLB entry is
relatively cheap - the hardware broadcasts the TLB invalidation on the
bus (in most implementations) so there are no cross-calls required. But
flushing the whole TLB is expensive because we (strictly speaking)
have to flush the whole of the MMU hash table as well.

The MMU gets its PTEs from a hash table (which can be very large) and
we use the hash table as a kind of level-2 cache of PTEs, which means
that the flush_tlb_* routines have to flush entries from the MMU hash
table as well. The hash table can store PTEs from many contexts, so
it can have a lot of PTEs in it at any given time. So flushing the
whole TLB would imply going through every single entry in the hash
table and clearing it. In fact, currently we cheat - flush_tlb_all
actually only flushes the kernel portion of the address space, which
is all that is required in the three places where flush_tlb_all is
called at the moment.

This is not a criticism, rather a request that we expand the
interfaces so that the architecture-specific code can make the
decisions about when and how to flush TLB entries.

For example, I would like to get rid of flush_tlb_all and define a
flush_tlb_kernel_range instead. In all the places where flush_tlb_all
is currently used, we do actually know the range of addresses which
are affected, and having that information would let us do things a lot
more efficiently on PPC. On other platforms we could define
flush_tlb_kernel_range to just flush the whole TLB, or whatever.

Note that there is already a flush_tlb_range which could be used, but
some architectures assume that it is only used on user addresses.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/