I was searching on google for some reports on the problem I'm seeing with our NFS server/clients and
found this thread. It looked somewhat the same (atleast the result with the EIO is the same).
Parts of old message :
>From: Mike Black (mblack@csihq.com)
>Date: Sep 05 2001
>I've been getting random NFS EIO errors for a few months but
>now it's repeatable.
>Trying to copy a large file from one 2.4.8 SMP box to another
>is consistently failing (at different offsets >each time).
Our setup is like this :
Server:
RedHat 7.2 - kernel 2.4.9-13smp
nfs-utils-0.3.1-13.7.2.1
ext3 filesystem (73GB)
Clients:
ia32 client - RedHat 6.2 - kernel 2.2.19-6.2.7enterprise
mount-2.10r-0.6.x
alpha client - RedHat 6.2 - kernel 2.2.19 (vanilla)
mount-2.10r-5
ia64 client - RedHat 7.1 - kernel 2.4.3-12smp
mount-2.10r-5
I've seen the "Input/Output error" problem only on the Alpha and the IA64 clients and the problem is
occuring when making a static library (with 'ar'). The message is like this :
ar: xxxxxx/libmpi.a: Input/output error
The mountpoints is mounted like this :
ia32 client:
huey:/export/home/mpitest /home/mpitest nfs rw,v3,rsize=8192,wsize=8192,addr=huey 0 0
alpha client:
huey:/export/home/mpitest /home/mpitest nfs rw,v3,rsize=8192,wsize=8192,addr=huey 0 0
ia64 client:
huey:/export/home/mpitest /home/mpitest nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=huey 0 0
I don't know why the "hard" and "lock" options doesn't appear on ia32 and alpha, but this might be
related to the /proc/mounts interface on the running kernel (these clients are running 2.2.19 while
the ia64 client is running 2.4). The automount entry looks like this :
/home auto_home rsize=8192,wsize=8192
So according to the nfs man pages the "hard" option should be default :
hard If an NFS file operation has a major timeout then report "server not
responding" on the console and continue retrying indefinitely. This
is the default.
So what could be the problem here ? Is it a NFS server bug, a NFS client bug or a NFS/ext3 bug ? We
used to run RedHat 7.0 on this server with the 2.2.19-enterprise kernel, nfs-utils-0.3.1-7 and with
a ext2 filesystem. This problem did not occur back then.
Thanks,
-- Steffen Persvold | Scalable Linux Systems | Try out the world's best mailto:sp@scali.no | http://www.scali.com | performing MPI implementation: Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 - Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/