Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch

Arjan Filius (iafilius@xs4all.nl)
Sun, 9 Sep 2001 00:18:46 +0200 (CEST)


Hello Roger,

On Sat, 8 Sep 2001, Roger Larsson wrote:

> Hi,
>
> This is interesting. [Assumes UP Athlon - correct]

UP Athlon, and compiled as UP (as always).
I haven't tested my system with an SMP kernel for a long while.

> Note that all BUGs out in highmem.h:95 (kmap_atomic)
> and that test is only on if you have enabled HIGHMEM_DEBUG
It seems to be on indeed.

> [my analyze is done with a 2.4.10-pre2 kernel, but I checked with
> later patches and I do not think they fix it either...]
>
> The preemptive kernel puts more SMP stress on the kernel than
> running with multiple CPUs.
>
> So this might be a potential bug in the kernel proper, running with
> a SMP computer.

>
> If I understand the bug correctly, a process gets a page fault.
> Starts to map in the page. But before the final part it checks -
> and the page is already there!!! Correct?

ehh.. Should compiling SMP on UP (just for test) trigger this?

Greatings,

>
> On Saturday den 8 September 2001 19:33, Arjan Filius wrote:
> > Hello Robert,
> >
> >
> > I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> > But it seems to hit highmem (see below) (i do have 1.5GB ram)
> > 2.4.10-pre4 plain runs just fine.
> >
> > With the kernel option mem=850M the patched kernel boots an seems to run
> > fine. However i didn't do any stress testing yet, but i still notice
> > hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
> > Athlon, and removing for example a _large_ file (reiser-on-lvm).
> >
> > My syslog output with highmem:
> >
> > Sep 8 18:10:16 sjoerd kernel: kernel BUG at
> > /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep 8 18:10:16 sjoerd
> > kernel: invalid operand: 0000
> > Sep 8 18:10:16 sjoerd kernel: CPU: 0
> > Sep 8 18:10:16 sjoerd kernel: EIP: 0010:[do_wp_page+636/1088]
> > [- - -]
> > sjoerd kernel: Call Trace: [handle_mm_fault+141/224]
> > [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64]
> > [do_exit+595/640] Sep 8 18:10:16 sjoerd kernel: [error_code+52/64]
>
> Lets look at this example. You need to add some inline functions...
>
> handle_mm_fault
> takes the mm->page_table_lock [this should prevent reschedules]
> allocs pmd
> allocs pte
> handle_pte_fault(...)
> handle_pte_fault [inline, most likely path]
> pte is present
> it is a write access
> but the pte is not writeable - call do_wp_page
> do_wp_page
> plays some games with the lock...
> finally calls copy_cow_page [inline] with the page_table_lock
> UNLOCKED!
> copy_cow_page
> calls clear_user_highpage or copy_user_highpage
> both clear_user_highpage and copy_user_highpage
> calls kmap_atomic
> kmap_atomic
> page is a highmem page
> but during the time this process was unlocked some other
> thread has allocated the page in question... BUG out.
>
> So somewere between the UNLOCK (might be a lot later) and the
> BUG test in kmap_atomic the process running in kernel got preempted.
> (most likely during the page copy since it will take some time)
>
> Another process (thread) started to run - hit the same page fault
> but succeeded in its alloc.
>
> Back to the first process it continues, finally checks - the page
> is there... and BUGS.
>
> Note that this can happen in a pure SMP kernel.
>
> But let the processes (threads) run on two CPUs. And let the
> first get an interrupt/bh after unlock - the other can pass
> and add the page before the first one can continue - same
> result!
>
> /RogerL
>
>

-- 
Arjan Filius
mailto:iafilius@xs4all.nl

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/