I'm experimenting with the lock_buffer() in ll_rw_block for one month
(not only for the sync writes like in Andrew's patch, but for everything
including reads), and I believe this is what triggered a deadlocks on my
most stressed machine this night.
I attached below the best debug info I collected so far (I've the full
SYSRQ+T too of course but the below should make it easier to focus on
the right place).
I'm going to backout that patch
(http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.21pre4aa3/00_ll_rw_block-sync-race-1)
until this issue is resolved.
One simple case where it could introduce a deadlock is if a locked
buffer visible to the rest of the fs/vm isn't immediatly inserted in the
I/O queue, if two tasks runs that code at the same time they can
deadlock against each other. Another possibility is that the journal is
running a ll_rw_block on a locked buffer not yet in the I/O queue, and
it pretends not to deadlock for whatever reason. In all cases I believe
this is an issue in ext3 that would better be fixed, but it isn't
necessairly a bug with the 00_ll_rw_block patch backed out. Note: it
maybe something else too in my tree that deadlocked the box, but I
really doubt it could be something else, also see the wait_on_buffer
from ll_rw_block in the first line that makes it quite obvious that the
new behaviour is happening during the deadlock.
any help from the ext3 developers and everybody involved in this thread
(the 2.0, 2.2, 2.4 fsync races) is welcome, thanks!
(of course the vfs down shouldn't matter, the wait_on_buffers are
interesting here, but since they were in D state I didn't cut them out)
mutt D 00000292 0 14627 24890 (NOTLB)
Call Trace: [__wait_on_buffer+86/144] [ext3_free_inode+391/1248] [ll_rw_block+309/464] [ext3_find_entry+828/864] [ext3_lookup+65/208]
[lookup_hash+194/288] [sys_unlink+139/288] [system_call+51/56]
tee D ECD560C0 0 111 11445 110 (NOTLB)
Call Trace: [journal_dirty_metadata+420/608] [__wait_on_buffer+86/144] [__cleanup_transaction+362/368] [journal_cancel_revoke+91/448] [log_do_checkpoint+299/448]
[n_tty_receive_buf+400/1808] [journal_dirty_metadata+420/608] [n_tty_receive_buf+400/1808] [journal_dirty_metadata+420/608] [ext3_do_update_inode+375/992] [__jbd_kmalloc+35/176]
[log_wait_for_space+130/144] [start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [ext3_prepare_write+567/576] [write_chan+335/512]
[generic_file_write_nolock+1023/2032] [pipe_wait+125/160] [generic_file_write+77/112] [ext3_file_write+57/224] [sys_write+163/336] [system_call+51/56]
python D 00000180 0 21334 112 21336 21335 (NOTLB)
Call Trace: [start_request+409/544] [__down+105/192] [__down_failed+8/12] [.text.lock.checkpoint+15/173] [start_this_handle+191/864]
[new_handle+75/112] [journal_start+165/208] [ext3_lookup+0/208] [ext3_create+300/320] [vfs_create+153/304] [open_namei+1301/1456]
[filp_open+62/112] [sys_open+83/192] [system_call+51/56]
python D E2EB302C 2388 21335 112 21337 21334 (NOTLB)
Call Trace: [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
[sys_open+83/192] [system_call+51/56]
procmail D E51AC016 0 21378 1 19102 (NOTLB)
Call Trace: [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
[sys_open+83/192] [system_call+51/56]
python D C01186FF 0 21470 107 21471 (NOTLB)
Call Trace: [do_page_fault+463/1530] [__down+105/192] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/173]
[start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [do_wp_page+416/1088] [ext3_dirty_inode+336/368] [__mark_inode_dirty+195/208]
[update_inode_times+39/64] [generic_file_write_nolock+630/2032] [generic_file_write+77/112] [ext3_file_write+57/224] [sys_write+163/336] [system_call+51/56]
python D C01186FF 0 21471 107 21470 (NOTLB)
Call Trace: [do_page_fault+463/1530] [__down+105/192] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/173]
[start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [do_wp_page+416/1088] [ext3_dirty_inode+336/368] [__mark_inode_dirty+195/208]
[update_inode_times+39/64] [generic_file_write_nolock+630/2032] [generic_file_write+77/112] [ext3_file_write+57/224] [sys_write+163/336] [system_call+51/56]
python D 000001E0 0 21732 21731 (NOTLB)
Call Trace: [bread+32/160] [__down+105/192] [__jbd_kmalloc+35/176] [__down_failed+8/12] [.text.lock.checkpoint+15/173]
[start_this_handle+191/864] [new_handle+75/112] [journal_start+165/208] [dput+33/320] [ext3_unlink+440/448] [vfs_unlink+233/480]
[sys_unlink+183/288] [system_call+51/56]
procmail D D0E13016 0 21914 21913 (NOTLB)
Call Trace: [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
[sys_open+83/192] [system_call+51/56]
procmail D E39F3016 0 21999 21998 (NOTLB)
Call Trace: [dput+33/320] [__down+105/192] [__down_failed+8/12] [.text.lock.namei+265/988] [filp_open+62/112]
[sys_open+83/192] [system_call+51/56]
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/