PROBLEM: memory allocation error with token ring tms380/abyss modules

Brian McEntire (brianm@fsg1.nws.noaa.gov)
Tue, 19 Jun 2001 17:47:10 -0400 (EDT)


[1.] memory allocation error with token ring tms380/abyss modules

[2.] a memory allocation error causes the system to go into an infinite
loop about once every week or two. This most recent time was 8 days, to
the hour from the last crash. Everything on the system stops working and I
need to hit the reset button to reboot the system. At crash time, the
following message scrolls up the screen:

__alloc_pages: 1-order allocation failed

* Actually, I did some searching on the web and found this problem
discussed but not fixed. According to one e-mail, I patched
mm/page_alloc.c and rebuilt the kernel so that I could get the following
_slightly_ more informative message after the crash:

__alloc_pages: 1-order allocation failed from c01290e8

The modification I made to mm/page_alloc.c is:
change the line:

printk(KERN_ERR "__alloc_pages: %lu-order allocation failed.\n", order);

to:

printk(KERN_ERR "__alloc_pages: %lu-order allocation failed from %p\n",
order, __builtin_return_address(0));

Then I'm supposed to be able to look up the hex code from the error
message in the /boot/System.map (it is the correct one for my kernel) and
find out what function is causing the problem.

But, I don't find c01290e8 in my System.map. Two hex addresses close to it
are found there:

c01290d4 T __get_free_pages
c01290f4 T get_zeroed_page

Yup, looks like it has something to do with memory allocation alright ;-)

Everytime this crash occurs, it is the same hex address given in the error
message. I can't cause this error to occur. The system is running as a
basic router and has two Netgear FA310TX ethernet cards in it and one
Madge Smart 16/4 PCI Ringnode Mk2 token ring card in it.

The system logs don't give any indication that the crash is coming or any
information about it, not even the 1-order allocation error is listed.
One exception (maybe):

I get the following messages showing up occasionally in /var/log/messages
on weekdays (when I expect network usage to be higher than on weekends).
But these don't seem to come in any higher frequency leading up to the
crash.

Jun 19 15:10:23 ohdrouter kernel: Cancel tx (C04D0068h).
Jun 19 15:10:23 ohdrouter kernel: Cancel tx (C04D0098h).
Jun 19 15:10:59 ohdrouter kernel: Cancel tx (C04D0068h).
Jun 19 15:10:59 ohdrouter kernel: Cancel tx (C04D0098h).
Jun 19 15:14:14 ohdrouter kernel: Cancel tx (C04D0068h).
Jun 19 15:14:14 ohdrouter kernel: Cancel tx (C04D0098h).
Jun 19 15:25:05 ohdrouter kernel: Cancel tx (C04D00C8h).

[3.] token ring, tms380, abyss, memory allocation failure, 1-order
allocation failed, __alloc_pages, kernel 2.4.5 and previous 2.4 kernels

[4.] Linux version 2.4.5 (root@ohdrouter.nws.noaa.gov) (gcc version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #6 Wed May 30 17:43:06
EDT 2001

[5.] N/A ... no Oops

[6.] N/A ... can't force it to happen, just have to wait a week or so

[7.] Red Hat Linux release 6.2 (Zoot)

[7.1] output from sh scripts/ver_linux

Gnu C egcs-2.91.66
Gnu make 3.78.1
binutils 2.9.5.0.22
util-linux 2.10f
mount 2.10r
modutils 2.4.5
e2fsprogs 1.18
pcmcia-cs 3.1.8
PPP 2.3.11
Linux C Library 2.1.3
Dynamic linker (ldd) 2.1.3
Procps 2.0.6
Net-tools 1.54
Console-tools 0.3.3
Sh-utils 2.0
Modules Loaded ipchains abyss tms380tr tulip

[7.2]
[root@ohdrouter linux]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 3
model name : Pentium II (Klamath)
stepping : 4
cpu MHz : 298.737
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov
mmx
bogomips : 596.37

[7.3]
[root@ohdrouter linux]# cat /proc/modules
ipchains 32320 0 (unused)
abyss 2960 1 (autoclean)
tms380tr 43216 0 (autoclean) [abyss]
tulip 39328 2 (autoclean)

[7.4]
[root@ohdrouter linux]# cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
4000-403f : Intel Corporation 82371AB PIIX4 ACPI
5000-501f : Intel Corporation 82371AB PIIX4 ACPI
d000-dfff : PCI Bus #01
e000-e01f : Intel Corporation 82371AB PIIX4 USB
e400-e4ff : Lite-On Communications Inc LNE100TX
e400-e4ff : tulip
e800-e8ff : Lite-On Communications Inc LNE100TX (#2)
e800-e8ff : tulip
ec00-ecff : Madge Networks Smart 16/4 PCI Ringnode Mk2
ec00-ec3f : tr0
f000-f00f : Intel Corporation 82371AB PIIX4 IDE

[root@ohdrouter linux]# cat /proc/iomem
00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-03ffffff : System RAM
00100000-001e11cf : Kernel code
001e11d0-00232c2b : Kernel data
e0000000-e3ffffff : Intel Corporation 440LX/EX - 82443LX/EX Host bridge
e4000000-e5ffffff : PCI Bus #01
e4000000-e4ffffff : NVidia / SGS Thomson (Joint Venture) Riva128
e6000000-e6ffffff : PCI Bus #01
e6000000-e6ffffff : NVidia / SGS Thomson (Joint Venture) Riva128
ea000000-ea0000ff : Lite-On Communications Inc LNE100TX
ea000000-ea0000ff : tulip
ea001000-ea0010ff : Madge Networks Smart 16/4 PCI Ringnode Mk2
ea002000-ea0020ff : Lite-On Communications Inc LNE100TX (#2)
ea002000-ea0020ff : tulip
ffff0000-ffffffff : reserved

[7.5]
[root@ohdrouter linux]# cat lspci -vvv
00:00.0 Host bridge: Intel Corporation 440LX/EX - 82443LX/EX Host bridge
(rev 03)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 64 set
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 1.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Intel Corporation 440LX/EX - 82443LX/EX AGP bridge
(rev 03) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 set
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: e4000000-e5ffffff
Prefetchable memory behind bridge: e6000000-e6ffffff
BridgeCtl: Parity+ SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0 set

00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
(prog-if 80 [Master])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 set
Region 4: I/O ports at f000 [size=16]

00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
(prog-if 00 [UHCI])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 set
Interrupt: pin D routed to IRQ 10
Region 4: I/O ports at e000 [size=32]

00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 01)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin ? routed to IRQ 9

00:09.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
Subsystem: Netgear FA310TX
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 set
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at e400 [size=256]
Region 1: Memory at ea000000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at e7000000 [disabled] [size=256K]

00:0a.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
Subsystem: Netgear FA310TX
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 set
Interrupt: pin A routed to IRQ 9
Region 0: I/O ports at e800 [size=256]
Region 1: Memory at ea002000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at e8000000 [disabled] [size=256K]

00:0b.0 Token ring network controller: Madge Networks Smart 16/4 PCI
Ringnode Mk2
Subsystem: Madge Networks Smart 16/4 PCI Ringnode Mk2
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 set, cache line size 08
Interrupt: pin A routed to IRQ 12
Region 0: I/O ports at ec00 [size=256]
Region 1: Memory at ea001000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at e9000000 [disabled] [size=1M]

01:00.0 VGA compatible controller: NVidia / SGS Thomson (Joint Venture)
Riva128 (rev 21) (prog-if 00 [VGA])
Subsystem: STB Systems Inc STB Velocity 128
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 3 min, 1 max, 64 set
Interrupt: pin A routed to IRQ 11
Region 0: Memory at e4000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at e6000000 (32-bit, prefetchable) [size=16M]
Expansion ROM at e5000000 [disabled] [size=4M]
Capabilities: [44] AGP version 1.0
Status: RQ=4 SBA- 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

[7.6] N/A ... no SCSI on this system

[7.7] Its pretty standard PC hardware with the exception of the token ring
card which I know aren't nearly as widely used as ethernet cards.

[root@ohdrouter linux]# free
total used free shared buffers cached
Mem: 62604 60068 2536 0 6120 47944
-/+ buffers/cache: 6004 56600
Swap: 136512 7280 129232

[7.8] No ideas on patches or work arounds other than the mm/page_alloc.c
patch mentioned above.

* * *

Please help! =)

We're only using this "router" in testing at this point but would like to
roll it into our department's network. I won't do that as long as it is
crashing once a week.

If you need more information, I'll be happy to provide it let me know if I
can help.

Thanks,
Brian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/