Ingo Molnar wrote:
>
> - if the faulting context is a non-owner (ie. the fork()-ed child), then
> the normal COW path is taken - new page allocated and installed.
>
> - if the faulting context is the owner, then the pte chain is walked, and
> the new page is installed into every 'other' pte. This needs a
> cross-CPU single-page TLB flush though. The TLB flush could be
> optimized if we had a way to get to the mapping MM's of the individual
> pte chain entries - is this possible?
Actually, we don't have to do it this way. My preferred solution would be
to make the pinning data structure be a special one with a callback (which
also means that you do _not_ have to re-use the LRU list), and what we do
is that when we're getting called back the futex code just updates to the
new physical page instead.
So the data structures would look something like this:
struct page_change_struct {
unsigned long address;
struct mm_struct *vm;
struct list_head list;
void (*callback)(struct page_change_struct *data, struct page *new);
}
struct list_head page_change_struct_hash[HASHSIZE];
and then when we pin a page, we do
/* This is part of the
struct page_change_struct pinned_data;
pinned_data.address = virtual_address;
pinned_data.vm = current_mm;
pinned_data.callback = futex_cow_callback;
insert_pin_page(page, &pinned_data);
.. this does a hash on address, inserts it into the
page_change_struct_hash table, and is done..
unpinning does:
remove_pin_page(page, &pinned_data);
.. this just does a "list_del(&pinned_data); ...
and COW does:
.. hash the COW address, look up the page_change_struct_hash,
search if the page/vm tuple exists in the index, and if it
does, call the callback()..
and then the "callback" function just updates the page information in the
futex block directly - as if it was looked up anew.
This has the advantage that it works without any cross-CPU tlb stuff, and
that other users (not just futexes) can also register themselves for
getting callbacks if somebody COW's a page they had.
We could extend it to work for unmapping etc too if we wanted (ie anybody
who caches a virtual->physical translation for a specific page can always
ask for a "invalidate this particular page mapping" event.
I really like this approach.
[ Of course I do, since I thought it up. All my ideas are absolutely
brilliant, until somebody points out why they can't work. The locking
might be interesting, but the most obvious locking seems to be to have
some per-hash thing. ]
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/