See the following jpg of top as an example of the system usage. It
doesn't seem to be any one program.
http://www.wencor.com/slow2.4.20.jpg
When we start to have users log off the server (we have 300 telnet users
that login) the system usually bounces right back to normal. We have
had to reboot once or twice to get it fully working again (lpd went into
limbo and wouldn't come back). After the server bounces back to normal,
we can run the rest of the day without any trouble and under full heavy
load. I have never seen it happen at any other time of day and it
doesn't happen every day.
With some tips from James Cleverdon (IBM), I turned on some kernel
debugging and got the following from readprofile when the server was
having problems (truncated to the first 22 lines):
16480 total 0.0138
6383 .text.lock.swap 110.0517
4689 .text.lock.vmscan 28.2470
4486 shrink_cache 4.6729
168 rw_swap_page_base 0.6176
124 prune_icache 0.5167
81 statm_pgd_range 0.1534
51 .text.lock.inode 0.0966
38 system_call 0.6786
31 .text.lock.tty_io 0.0951
31 .text.lock.locks 0.1435
18 .text.lock.sched 0.0373
16 _stext 0.2000
15 fput 0.0586
11 .text.lock.read_write 0.0924
9 strnicmp 0.0703
9 do_wp_page 0.0110
9 do_page_fault 0.0066
9 .text.lock.namei 0.0073
9 .text.lock.fcntl 0.0714
8 sys_read 0.0294
Here is a snapshot when the server is fine, no problems (truncated):
1715833 total 1.4317
1677712 default_idle 26214.2500
4355 system_call 77.7679
2654 file_read_actor 11.0583
2159 bounce_end_io_read 5.8668
1752 put_filp 18.2500
1664 do_page_fault 1.2137
1294 fget 20.2188
1246 do_wp_page 1.5270
1233 fput 4.8164
1138 posix_lock_file 0.7903
1120 kmem_cache_alloc 3.6842
1098 do_softirq 4.9018
1042 statm_pgd_range 1.9735
882 kfree 6.1250
732 __loop_delay 15.2500
673 flush_tlb_mm 6.0089
610 fcntl_setlk64 1.3616
554 __kill_fasync 4.9464
498 zap_page_range 0.4716
414 do_generic_file_read 0.3696
409 __free_pages 8.5208
401 sys_semop 0.3530
I have to admit that most of this doesn't make a lot of sense to me and
I don't know what the .text.lock.* processes are doing. Any ideas?
Anything I can try?
Chris Wood
Wencor West, Inc.
-----------------------------------
System Info From Here Down:
IBM x440 - Dual Xeon 1.4ghz MP, with Hyperthreading turned on
6 gig RAM
2 internal 36gig drives mirrored
1 additional intel e1000 network card
2 IBM fibre adapters (QLA2300s) connected to a FastT700 SAN
RedHat Advanced Server 2.1
2.4.20 kernel built using the RH 2.4.9e8summit .config file as template
These things are listed below (hopefully this isn't overkill):
x440:/proc$ cat modules (see results below)
x440:/proc$ cat scsi/scsi (see results below)
x440:/proc$ cat cpuinfo (see results below)
x440:/proc$ cat ioports (see results below)
x440:/proc$ cat iomem (see results below)
x440:/proc$ cat modules
autofs 11876 0 (autoclean) (unused)
e1000 59280 1
bcm5700 95076 1
ipchains 50728 28
usb-uhci 26724 0 (unused)
usbcore 76448 1 [usb-uhci]
ext3 69888 7
jbd 51808 7 [ext3]
qla2300 236608 2
ips 45184 6
aic7xxx 133376 0
sd_mod 13020 16
scsi_mod 121304 4 [qla2300 ips aic7xxx sd_mod]
x440:/proc$ cat scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: SERVERAID Rev: 1.00
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 15 Lun: 00
Vendor: IBM Model: SERVERAID Rev: 1.00
Type: Processor ANSI SCSI revision: 02
Host: scsi2 Channel: 01 Id: 09 Lun: 00
Vendor: IBM Model: GNHv1 S2 Rev: 0
Type: Processor ANSI SCSI revision: 02
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: 1742 Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03
x440:/proc$ cat cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2785.28
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2791.83
processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2791.83
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2791.83
x440:/proc$ cat ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03f6-03f6 : ide0
0440-044f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
0700-070f : VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE
0700-0707 : ide0
0708-070f : ide1
0cf8-0cff : PCI conf1
1800-187f : PCI device 1014:010f (IBM)
1880-189f : VIA Technologies, Inc. USB
1880-189f : usb-uhci
18a0-18bf : VIA Technologies, Inc. USB (#2)
18a0-18bf : usb-uhci
2000-20ff : Adaptec AIC-7899P U160/m
2100-21ff : Adaptec AIC-7899P U160/m (#2)
2800-28ff : QLogic Corp. QLA2300 64-bit FC-AL Adapter
2800-28fe : qla2300
4000-40ff : QLogic Corp. QLA2300 64-bit FC-AL Adapter (#2)
4000-40fe : qla2300
7000-701f : Intel Corp. 82544EI Gigabit Ethernet Controller
7000-701f : e1000
x440:/proc$ cat iomem
00000000-0009c7ff : System RAM
0009c800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000cffff : Extension ROM
000d0000-000d01ff : Extension ROM
000f0000-000fffff : System ROM
00100000-dffb707f : System RAM
00100000-0022b615 : Kernel code
0022b616-002a525f : Kernel data
dffb7080-dffbf7ff : ACPI Tables
dffbf800-dfffffff : reserved
e0000000-e7ffffff : S3 Inc. Savage 4
e8400000-e8401fff : IBM Netfinity ServeRAID controller
e8400000-e8401fff : ips
f0c20000-f0c3ffff : Intel Corp. 82544EI Gigabit Ethernet Controller
f0c20000-f0c3ffff : e1000
f0c40000-f0c5ffff : Intel Corp. 82544EI Gigabit Ethernet Controller
f0c40000-f0c5ffff : e1000
f1000000-f11fffff : PCI device 1014:010f (IBM)
f1200000-f127ffff : S3 Inc. Savage 4
f1600000-f160ffff : Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet
f1600000-f160ffff : bcm5700
f1610000-f1610fff : Adaptec AIC-7899P U160/m
f1610000-f1610fff : aic7xxx
f1611000-f1611fff : Adaptec AIC-7899P U160/m (#2)
f1611000-f1611fff : aic7xxx
f1820000-f1820fff : QLogic Corp. QLA2300 64-bit FC-AL Adapter
f1920000-f1920fff : QLogic Corp. QLA2300 64-bit FC-AL Adapter (#2)
fec00000-ffffffff : reserved
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/