Kernel 2.4.0.
Brian,
linux-smp@vger is rather dead. You certainly won't get the
attention of the NFS developers there.
is calling schedule() with nfs_flushd_lock held. The same
CPU comes back into the NFS code on behalf of a different task,
hits the lock and it's lights out.
This is pretty hard to track down. One approach is
to put
show_stack(p->thread.esp);
right at the end of kernel/sched.c:show_task(). When the deadlock
happens, type ALT-SYSRQ-T, pray like hell that the debug output
makes it to disk. Reboot, feed the logs into ksymoops, see which
task is sleeping within the NFS code.
The alternative is to read the code :)
There appear to be two places where the NFS client code can
deadlock:
nfs_reqlist_init()
{
spin_lock(&nfs_flushd_lock);
rpc_new_task->
rpc_allocate->
->kmalloc(GFP_RPC) (__GFP_WAIT is true)
inode_remove_flushd()
{
spin_lock(&nfs_flushd_lock);
iput(inode)->
nfs_delete_inode->
delete_inode->
wait_on_inode
truncate_inode_pages->
truncate_list_pages->
wait_on_page
The latter is most likely the problem. Here's a patch - please
test. The inode_remove_flush() change is correct. Not so
sure about the nfs_reqlist_init() change.
--- linux-2.4.0/fs/nfs/flushd.c Sat Jun 24 15:39:46 2000
+++ linux-akpm/fs/nfs/flushd.c Wed Jan 10 19:25:44 2001
@@ -55,7 +55,7 @@
/*
* Spinlock
*/
-spinlock_t nfs_flushd_lock = SPIN_LOCK_UNLOCKED;
+static spinlock_t nfs_flushd_lock = SPIN_LOCK_UNLOCKED;
/*
* Local function declarations.
@@ -71,6 +71,7 @@
int status = 0;
dprintk("NFS: writecache_init\n");
+ task = rpc_new_task(server->client, NULL, RPC_TASK_ASYNC);
spin_lock(&nfs_flushd_lock);
cache = server->rw_requests;
@@ -79,7 +80,6 @@
/* Create the RPC task */
status = -ENOMEM;
- task = rpc_new_task(server->client, NULL, RPC_TASK_ASYNC);
if (!task)
goto out_unlock;
@@ -195,7 +195,9 @@
if (*q) {
*q = inode->u.nfs_i.hash_next;
NFS_FLAGS(inode) &= ~NFS_INO_FLUSH;
+ spin_unlock(&nfs_flushd_lock);
iput(inode);
+ return;
}
out:
spin_unlock(&nfs_flushd_lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/