actually, while refining the patch and integrating it, I audited it some
more carefully and the above was wrong, we must ensure nobody is able to
acquire BH_Lock before BH_Launder. So we must also enforce ordering at
the cpu level. This is the correct version.
clear_bit(BH_Wait_IO, &bh->b_state);
clear_bit(BH_Launder, &bh->b_state);
/* nobody must acquire BH_Lock again before BH_Launder is clear */
smp_mb__after_clear_bit();
clear_bit(BH_Lock, &bh->b_state);
The race would been nearly impossible to trigger during stress testing,
you'd need to BH_Lock + GFP_NOFS + alloc_pages + shrink_cache + the
interesting page is near the end of the lru + try_to_free_buffers +
sync_page_buffers + wait_on_buffer all in a few cycles (irq handlers
could trigger it :), but with the huge userbase somebody I don't exclude
somebody could been really hurted by and anyways it would be still a
common code bug even if it would not be possible to reproduce it with
current available hardware.
Note that the above very same race can happen in mainline too on
architectures where a clear_bit doesn't imply a strong CPU barrier.
So on the paper your same kind of deadlock could happen on mainline as
well but there it is reduced to an SMP race and it would affect only
alpha, ppc and s390. So I'm glad my deadlock is been useful to fix
another SMP race condition at least :)
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/