It didn't seem to make any difference when swap was on or off. I tested this
because I saw about 10 similar messages about bad swap pages (corrupted,
invalid, not found, I can't remember) while booting pre5 the first time,
however they ran off the console and the system Oopsed then froze before I
could record them unfortunately.
The 5 oopses I sent happened over about 15 minutes, however I think the
frequency of them would be very dependant on vm pressure.
Nick
----- Original Message -----
From: "Linus Torvalds" <torvalds@transmeta.com>
Newsgroups: linux.dev.kernel
To: <s3293115@student.anu.edu.au>; "Hugh Dickins" <hugh@veritas.com>;
"Alexander Viro" <viro@math.psu.edu>
Sent: Sunday, September 09, 2001 4:50 AM
Subject: Re: 2.4.10-pre5: Bug in alloc_pages
> [ Background for Al Viro: there's an oops in at least 2.4.10-pre5 that
> triggers in mm/page_alloc.c, line 204 - which implies that we are
> trying to allocate a page off the free list that is already on one of
> the page lists. Which implies rather serious MM corruption ]
>
> In article <000701c13856$14e01fe0$0200a8c0@W2K> you write:
> >Here are a few Oopses which appeared in 2.4.10-pre5 (not in pre4). The
first
> >2 appeared during the startup scripts and the next ones appeared over the
> >next 20 minutes or so. I'd be happy to try patches. Please CC me.
>
> Nick, can you do some more debugging for me? The bug is definitely real,
> there's no question about it - I have now seen it myself, but I don't
> seem to be able to reproduce it on my machines. It seems to happen quite
> early for you..
>
> What I'd ask you to do is:
>
> - can you verify that it is repeatable under pre5? Does it happen every
> time, or at least easy to trigger?
>
> - can you try to trigger it some more under pre4? In particular, pre5
> doesn't actually have any MM changes _at_all_, which makes me suspect
> that maybe something in pre5 just made it easier to trigger. This is
> also why I'd like to hear whether it is really repeatable in pre5: if
> it's not repeatable in pre5, maybe you ran pre4 for a longish time
> and just didn't happen to hit it..
>
> I know that the above kind of testing is rather nasty and boring
> (especially as you'd end up having to reboot multiple times), but it
> would really help.. Thanks.
>
> If it really happens only under pre5, and never under pre4, then that is
> very interesting indeed. As mentioned, the pre4->pre5 thing doesn't
> actually change any of the VM code itself, so then there's something
> else going on. The only thing I can imagine right now is:
>
> - the initbootdata handling changed a bit. Does the problem go away if
> you copy 'arch/i386/kernel/setup.c' from pre4 into the pre5 tree?
>
> - Al Viro's FS-layer changes somehow trigger this bug, possibly by
> freeing some inode early. I don't have any real reason to suspect the
> FS changes, except for the fact that with no MM changes, the FS is
> the only other thing that has changed and is fairly intimate with MM
> stuff.
>
> Most of the pre4->pre5 changes are in fact things that I know cannot
> matter, simply because I don't even have them compiled into my kernels.
> Things like bluetooth, ARM, sparc, minix, telephony, framebuffer etc.
> This is why it would be so interesting to make sure that it really _is_
> pre5 only, and never happens in pre4..
>
> Hugh, if it turns out to be possible to trigger on pre4 too, I'm still
> going to blame your swap changes. So please give them a double look just
> in case..
>
> Nick, I don't have any real patches for you to test yet (except the
> suggestion to reverse i386/kernel/setup.c if you can't re-create it on
> pre4), but I'd be very grateful for as much information as you can
> possibly gather.. Things like patterns to how the oopses happen etc.
>
> Thanks,
> Linus
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/