it's an hack primarly because you're mixing linear with non linear,
incidentally that as well breaks truncate. In the current state truncate
is malfunctioning. To make truncate working in the current state you
would need to check all pages->indexes for every page pointed by the
pagetables belonging to each vma linked in the objrmap.
I don't think anybody wants to slowdown truncate like that (I mean, with
partial truncates and huge vmas).
Fixing it so truncate works still at a the current speed (when you don't
use sys_remap_file_pages) means changing the API to be sane and at the
very least to stop mixing linaer with nonlinaer vmas.
And I found very unclean anyways that you can mangle a linaer vma, and
to have it partly linear and partly nonlinear. nonlinear vmas are
special, if they would not be special we would not break anything with
the nonlinear behaviour inside a linear vma.
At the very least you need a mmap(VM_NONLINEAR) to allocate the
nonlinaer virtual space, and to have sys_remap_file_pages working only
inside this space.
This was one of my first points to consider sys_remap_file_pages a stay
in the kernel as a sane API. The other points are lower prio actually.
As for the other points I still think the whole purpose of
sys_remap_file_pages is to bypass the VM enterely so it should have the
least possible hardware cost associated with it. It is meant only to
mangle pagetables from userspace. And sys_remap_file_pages has nothing
to do with rmap or objrmap btw (that is an issue for everything, not
just this). But since the whole purpose of sys_remap_file_pages is to
bypass the VM enterely and to make it as fast as possible, we should as
well turn off the paging to allow people to get the biggest advantage
out of sys_remap_file_pages and to allow to pass the filedescriptor as
well to sys_remap_file_pages, so that you can map multiple files in the
same vma. I think allowing multiple files makes perfect sense and the
lack of this additional important feature is a concern to me.
Also sys_remap_file_pages should as well try to use largepages to map
the pagecache, as far as the alignment and the largepage pool allows it.
That makes perfect sense.
As for bochs it will have no problem in enabling a system wide sysctl
before running, that's much cleaner than loading two kernel modules.
Overall trying to make nonlinear a usable by default generic API looks
wrong to me, sys_remap_file_pages has to be a VM bypass or it has to go.
If you want it to stay as a possibly default generic API then drop the
vma enterely and have mmap() and mprotect and mlock not generating any
vma overhead, but have them generating nonlinare stuff inside a single
whole vma for the whole address space. If you can do everything
generically (as you seem to want to reach) with sys_remap_file_pages,
then do it with the current API w/o generating a new non standard API.
It's a matter of functionalty inside the kernel, if you can do
everything w/o vma, then dorp the vma from mmap, that's all.
sys_remap_file_pages is equivalent to a mmap(MAP_FIXED) anyways.
I'm not against making mmap faster or whatever, but sys_remap_file_pages
makes sense to me only as a VM bypass, something that will always be
faster than the regular mmap or whatever by bypassing the VM. If you
don't bypass the VM you should make mmap run as fast as
sys_remap_file_pages instead IMHO.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/