Thankyou for using the development kernel and sharing your woes...
>
> Server 1 exports A, B, and C to server 2. Server 2 exports D and E back
> to server 1 and exports F and G to two other clients. Each of these
> (A-G) are distinctly different filesystem paths and not part of each other.
>
> 1. If server 1 is restarted, server 2 will invalidate (make all 'df'
> values '1') F and G. This requires an 'exportfs -vra' or similar on
> server 2 to fix the client 'df' values. The client doesn't need to do
> anything.
This has me completely mystified. If I understand correctly, an event
on server 1 causes a failure to commuinicate between server 2 and some
third party..
I can only imagine that as server 1 boots it does something to server
2. At the very least it sends a mount request for D and E. I'm not
sure how a mount request for D or E would affect F or G.
Are you using an automounter at all?
>
> 2. Repeated nfs system stops and starts (/etc/init.d/nfs restart) will
> eventually cause a kernel panic on server 2 (haven't tested on server
> 1). The number of restarts is variable.
Can you capture the panic and send it to me please?
>
> 3. Mount point F (/home/david) infrequently loops. ls -la /home/david
> will loop forever until all client memory is exhausted and the kernel
> kills it via OOM. ls -la /home/david/somefile or /home/david/somedir/
> works just fine as well as any sub directory under /home/david.
> Restarts of both systems refuse to fix things.
I think this might be a reiserfs problem. Someone else mentioned that
this started happening when they upgrade from an earlier 2.5 kernel.
If you can capture the NFS traffic
tcpdump -s 1500 -w /tmp/afile host $server and host $client
we could have a look at the directory cookies and see what is
happening.
>
> 4. Mounts infrequently get "permission denied" messages on the client
> with a " rpc.mountd: getfh failed: Operation not permitted" message on
> the server. This is fixable by restarting the nfs system on the server.
>
I've seen this, but it was fixed by the time 2.5.59 came out.
If/when it happens again, could you please check if the IP address of
the client in question is in
/proc/net/rpc/auth.unix.ip/content
and if the name found there is in
/proc/fs/nfs/exports
next to the appropriate filesystem.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/