Re: Current NFS issues (2.5.59)

Neil Brown (neilb@cse.unsw.edu.au)
Wed, 12 Feb 2003 09:32:07 +1100


On Tuesday February 11, david+cert@blue-labs.org wrote:
>
> No, no automount of any sort. Server 1 and server 2 share /home and
> apache virtuals back and forth, shell and web server. So they are
> mounted at boot.
>
> Server 1 is the shell server, 2 is the web server. When the shell
> server is restarted, all the clients that fetch other mounts off the web
> server get '1's for the df information in short order. There is some
> delay, not sure what the delay is for. During that delay,
> /nfsmountpoint access stalls on the clients. Unfortunately my own home
> directory comes off that mountpoint and the wonder coding of Raster
> causes multiple large explosions and instantaneous destruction of your
> graphical session. So I've lost a fair amount of NFS debug notes
> unexpectedly :S
>
> If I'm fast on the draw and run exportfs on server 2 quick enough, I
> manage to save my desktop before that timeout hits.

I think I would need a precise description of everything that is
mounted and exactly where. I don't know what use this would actually
be, but it is very hard to reason about this sort of thing in the
abstract. Maybe there will be something in the details that will ring
a bell.

> >>
> >
> >Can you capture the panic and send it to me please?
> >
>
> I plan to setup a notebook w/ serial console capture.

Thanks.

> >I think this might be a reiserfs problem. Someone else mentioned that
> >this started happening when they upgrade from an earlier 2.5 kernel.
> >If you can capture the NFS traffic
> > tcpdump -s 1500 -w /tmp/afile host $server and host $client
> >we could have a look at the directory cookies and see what is
> >happening.
> >
>
> Is this important to start the tcpdump before the mount is established?
> If I start the tcpdump after I've detected the looping, is that useful?
> There's a lot of NFS traffic :)

Starting the tcpdump once the looping has started would be fine.
However your description of repeated rings makes it sould very much
like a directory cookie problem.

Could you run this program on the server:
--------------------------
#include <sys/types.h>
#include <dirent.h>

main()
{
DIR *dir;
struct dirent *de;
dir = opendir(".");

while ((de = readdir(dir)))
printf("%10lu %10lu %s\n",
de->d_off,
de->d_ino,
de->d_name);
}
----------------------------
In the directory that is causing problems. The first column printed
is the cookie. If it ever repeats, you have simple proof that
reiserfs is doing the wrong thing, and you should report it to the
reiserfs team.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/