Re: [PATCH] 2.5.8 IDE 36

Peter T. Breuer (ptb@it.uc3m.es)
Fri, 19 Apr 2002 19:17:50 +0200 (MET DST)


(mutters under breath .. let me put this back in the thread it came
from. Sorry. Double post for the records, with the right subject line
this time ...)

> On Tue, 16 Apr 2002, Alan Cox wrote:
> > We need to support partitioning on loopback devices in that case.
>
> No, you just need to do the loopback over nbd - you need something to do
> the byte swapping anyway (ie you can't really use the normal "loop"
> device: I really just meant the more generic "loop the data back"
> approach).
>
> nbd devices already do partitioning, I'm fairly certain.

Well, sort of. They make the right motions in the code, waving
their arms about in the right directions. It's doubtful if what's
there is enough.

> (But no, I've never tested it, obviously).

Neither have the authors.

> Btw, while I'm at it - who out there actually uses the new "enbd"
> (Enhanced NBD)? I have this feeling that that would be the better choice,
> since unlike plain nbd it should be deadlock-free on localhost (ie you
> don't need a remote machine).

No - it will deadlock the same way. The deadlock is not related to the
device driver itself. It goes like this.

kernel runs low on memory
kernel flushes buffers to device drivers under pressure
nbd client daemon sends to the net (potential conflict with tcp buffers,
but not the mechanism that hurts)
server on the _same machine_ receives request over the net
server tries to write request to disk
server process needs buffers to write to
bang (or whatever deadlock sounds like)

The enbd client has a special "-a" flag which is meant to be used in
cases where we trust the transport medium. Localhost is such a case.

With it, the protocol in the client daemon will ack the kernel _before_
it gets an ack from the remote server. That should help relieve the
deadlock. Things then go like this:

kernel runs low on memory
kernel flushes buffers to device drivers under pressure
nbd client daemon sends to the net
* client acks kernel and releases buffers in kernel
server on the _same machine_ receives request over the net
server tries to write request to disk
server process needs buffers to write to
* server gets buffers released by client in kernel

At least, potentially. If I recall right, there's still a deadlock
window in-kernel, but it's small. I don't recall the details. Oh -
well, the request still hangs around in the driver until the client
daemon acks .. maybe the client daemon can't ack without being swapped
in first, and that'd be deadlock.

Well, there's a "-s" flag (for swap devices) that does an mlockall()
that might take care of that. So "-a -s" might do it. Really needs
a "-aa" option (release write request in kernel asap).

However, a real cure would be not to use the network transport at all in
enbd. Enbd can use different transports. Instead of using the stub for
the server in the client daemon code, one could use the "fileserver"
object directly. It'd just be a question of changing the
initialization. Both have the same generic interface.

Incidentally, the newer VM code in 2.4 is a real problem for enbd.
I would ideally like _no_ _buffering_ whatsoever on the clientside.

As it is, it seems almost mandatory to raise the bdflush sync
boundary as high as possible, and to drop the async boundary really
low. One wants to start sending buffers out across the net
asynchronously as early as possible, and one never wants to go
sync. Going sync makes for a deadlock where we need to send
a request across the net in order to free a buffer, but we can't,
because we need some buffers to do that with ...

* Is there any way of turning off the VM for a single device? *

The 2.5 i/o changes, otoh, seem immensely favourable to enbd, since
it's multithreaded and asyncronous in-kernel.

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/