Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel.

Mark Gross (mgross@unix-os.sc.intel.com)
Thu, 16 May 2002 14:08:10 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Alan Cox: "Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel."
Previous message: Andrew Morton: "Re: Process priority in 2.4.18 (RedHat 7.3)"
Next in thread: Alan Cox: "Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel."
Reply: Alan Cox: "Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel."

On Thursday 16 May 2002 01:36 pm, Daniel Jacobowitz wrote:
> On Thu, May 16, 2002 at 07:27:59PM +0200, Andi Kleen wrote:
> > On Thu, May 16, 2002 at 10:13:40AM -0400, Mark Gross wrote:
> > > Also, does anyone know WHY the mmap_sem is needed in the elf_core_dump
> > > code, and is this need still valid if I've suspended all the other
> > > processes that could even touch that mm? I.e. can I fix this by
> > > removing the down_write / up_write in elf_core_dump?
> >
> > The mmap_sem is needed to access current->mm (especially the vma list)
> > safely. Otherwise someone else sharing the mm_struct could modify it.
> > If you make sure all others sharing the mm_struct are killed first
> > (including now way for them to start new clones inbetween) then
> > the only loophole left would be remote access using /proc/pid/mem or
> > ptrace. If you handle that too then it is probably safe to drop it.
> > Unfortunately I don't see a way to handle these remote users without at
> > least
> > taking it temporarily.
> >
> > Of course there are other semaphores in involved in dumping too (e.g. the
> > VFS ->write code may take the i_sem or other private ones). I guess they
> > won't be a big problem if you first kill and then dump later.
>
> Except unfortunately we don't kill; the other threads are resumed
> afterwards for cleanup. They're just suspended.

Yes, they start back up after the dump.

It certainly seems that with the processes paused that the use of the
current->mm->mm_sem could be obsolete for core dumps. I'm not so sure
protecting the core file data from ptrace or /proc/pid/mem is important in
the case of core dumping.

I just don't want the kernel to lock up dumping the multithreaded core file.

I'm still not sure we have a problem yet. (wishful thinking I suppose).
Also I've seen zero lock ups from semaphore being held by one of the
processes getting pauses temporarily in my testing on the patch I posted.

To restate: the only way I see that my design gets into trouble is when a
semaphore is HELD, not getting waited on, by one of the processes that gets
put onto the phantom runqueue, AND that semaphore is needed in the processing
of elf_core_dump(...).

For this to happen that semaphore would have to held across schedule()'s.
The ONLY place I've seen that in the kernel is set_CPUs_allowed +
migration_thread.

Can someone point me at other critical sections that have non-deterministic
life times as a function of when the process holding the semaphore gets
scheduled onto a CPU? That type of code seems very risky to me. This is the
only type of code that could get my design into trouble.

--mgross

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan Cox: "Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel."
Previous message: Andrew Morton: "Re: Process priority in 2.4.18 (RedHat 7.3)"
Next in thread: Alan Cox: "Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel."
Reply: Alan Cox: "Re: PATCH Multithreaded core dump support for the 2.5.14 (and 15) kernel."