Well, #3 falls under "ptrace()" as far as I'm concerned, I don't really
want to expose things through /proc (or /dev, which is even _worse_).
We used to have things that could be done with /proc/<pid>/mem, and it was
a total security disaster. It was removed in the 2.3.x series because of
that.
As to #1, that certainly shouldn't be a problem at all. We already do it
temporarily internally inside the kernel for execve() setup and for things
liek lazy TLB switching for kernel threads, and there's nothing keeping us
from having multiple "struct mm_struct" per process. The only issue is
what the interfaces should be to create one (/dev/mm is right _out_), and
how to switch them around sanely.
Having a
int fd = create_mm();
system call is certainly not wrong per se (but thinking that it should be
done using a special file is wrong - we don't have /dev/pipe either). And
creating that system call is trivial - but only worth it if there are good
sane interfaces to switch mm's around and do interesting things with them.
Done right, it should be possible to have "posix_spawn()" etc done using
something like that, ie
/* Create new VM */
int fd = create_mm();
/* populate the dang thing.. */
mmap_mm(fd, .. );
/* start it up */
clone_with_mm(fd, ...);
and the internal implementation should be perfectly trivial, since the
kernel already largely works this way internally anyway (yeah, it is
likely to need some re-organization of clone() to handle pre-created VM's
etc, but that's nothing really fundamental).
> Beats me. My first suggestion was to add another file descriptor argument
> to mmap et al which would represent the address space to be modified. Alan
> didn't like that idea too much.
I do believe that fd's are a natural way to handle it, since it needs
_some_ kind of handle, and the only generic handles the kernel has is a
file descriptor. We could create a new kind of handle, but it would be
likely to be just more complexity.
HOWEVER, the part I worry about is creating tons of new system calls that
just duplicate existing ones by adding a "fd" argument. That part I really
don't much like. Because if this were to really be a generic feature, it
really wants pretty much _all_ system calls supported, ie things like
fd = open(<mm,ptr>, flags, ...);
retval = read(<mm,ptr>..
to allow the user to not just mmap but generally "take the guise of" any
other mm for the duration of the system call.
Which really means that I _think_ the right approach would be to literally
have a "indirect-system-call-using-this-mm" system call, which does
something like
asmlinkage sys_mm_indirect(int fd, struct syscall_descriptor_block *user_args)
{
struct mm_struct *old_mm;
struct syscall_descriptor_block args;
if (memcpy_from_user(&args, user_args, sizeof(args)))
return -EFAULT;
mm = get_fd_mm(fd);
old_mm = current->mm;
current->mm = mm;
switch_mm(mm);
arch_do_syscall(&args);
current->mm = old_mm;
switch_mm(old_mm);
put_mm(mm);
}
which allows _any_ system call to be made for that mm.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/