Re: Copying to loop device hangs up everything

Andrea Arcangeli (andrea@suse.de)
Thu, 20 Dec 2001 12:27:35 +0100


On Wed, Dec 19, 2001 at 11:41:47PM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> >
> > ...
> > > The thing I don't like about the Andrea+Momchil approach is that it
> > > exposes the risk of flooding the machine with dirty data. A scheme
> >
> > it doesn't, balance_dirty() has to work only at the highlevel.
> > sync_page_buffers also is no problem, we'll try again later in those
> > GFP_NOIO allocations.
>
> Not so. The loop thread *copies* the data. We must throttle it,
> otherwise the loop thread gobbles all memory and the box dies. This
> is trivial to demonstrate.

loop thread can generate how many dirty bh it wants, as said it's the
higher layer that takes care of the write throttling, we don't need to
stop into the loop thread. While the loop thread writes the higher layer
will throttle. It's as simple as that.

Also the write throttling isn't about correctness. Once there are too
many dirty bh the VM will take care of flushing them. So whatever you
claim about balance_dirty() is the last of my worries, I just had
reports that the loop write performance are greatly improved with my
recent -aa async flushing changes.

> > furthmore you don't even address the writepage from loop thread on the
> > loop queue.
>
> How can this deadlock? The only path to those buffers is
> via the page, and the page is locked.

loop -> writepage -> get_block -> ll_rw_block -> read metadata and queue
the reads into the loop queue -> wait_on_buffer -> deadlock

In general the loop thread must never do ll_rw_block into unknown
buffers.

> > The final fix should be in rc2aa1 that I will release in a jiffy. It
> > takes care now of both the VM and balance_dirty().
> >
> > this is the incremental fix against rc1aa1:
> >
>
> No. Your patch removes *all* loop thread throttling, it doesn't even start
> IO (thus removing the throttling which request starving would provide)
> and doesn't even wake up bdflush.
>
> If you set nfract to 70%, nfract_sync to 80% and do a big write, the
> machine falls into a VM coma within 15 seconds. The same happens
> with both my patches :-(
>
> And it's not legitimate to say "don't do that". If we can't survive
> those settings, we don't have a solution. We need to throttle writes
> *more*, not less.

we throttle, always, at every write submitted by the user. check
mark_buffer_dirty, and the pagecache write-side operations that calls
balance_dirty explicitly (blkdev on pagecache included of course). that
works fine in practice. The next balance_dirty() will throttle what the
loop written.

>
> I'll keep poking at it. If you have any more suggestions/patches,
> please toss them over...

I don't think other changes are needed, and it works fine in my systems
and all my 100mbyte email folders are on top of cryptoloop, so if I'm
writing this it means it works fine, I tried a cp /dev/zero into the
cryptoloop and it seems fine as well.

The other deadlock report I had may be yet another problem, I believe
the loop is just fine both in terms of balance_dirty() and VM.

If you really want to balance_dirty() at every loop write we can by
simply tracking which bh are on top of the loop thread, but I don't
think it's necessary. So it could be a combination of PF_NOIO and a BH_
bitflag that will trigger the balance_dirty(), same finegrined level
could be used by the vm as well, but again, I don't think it's needed,
the per-process PF_NOIO that avoids all the balance_dirty() and forbids
the loop thread to use I/O methods to release memory should be just
enough to have a stable system performing well.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/