Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains

Christoph Hellwig (hch@caldera.de)
Thu, 1 Feb 2001 19:48:00 +0100


On Thu, Feb 01, 2001 at 06:25:16PM +0000, Alan Cox wrote:
> > array_len, io_count, the presence of wait_queue AND end_io, and the lack of
> > scatter gather in one kiobuf struct (you always need an array), and AFAICS
> > that is what the networking guys dislike.
>
> You need a completion pointer. Its arguable whether you want the wait_queue
> in the default structure or as part of whatever its contained in and handled
> by the completion pointer.

I personaly think that Ben's function pointer on wakeup work is the alternative in
this area.

> And I've actually bothered to talk to the networking people and they dont have
> a problem with the completion pointer.

I have never said that they don't like it - but having both the waitqueue and the
completion handler in the kiobuf makes it bigger.

> > Now one could say: just let the networkers use their own kind of buffers
> > (and that's exactly what is done in the zerocopy patches), but that again leds
> > to inefficient buffer passing and ungeneric IO handling.
>
> Careful. This is the line of reasoning which also says
>
> Aeroplanes are good for travelling long distances
> Cars are better for getting to my front door
> Therefore everyone should drive a 747 home

Hehe ;)

> It is quite possible that the right thing to do is to do conversions in the
> cases it happens.

Yes, this would be THE alternative to my suggestion.

> That might seem a good reason for having offset/length
> pairs on each block, because streaming from the network to disk you may well
> get a collection of partial pages of data you need to write to disk.
> Unfortunately the reality of DMA support on almost (but not quite) all
> disk controllers is that you don't get that degree of scatter gather.
>
> My I2O controllers and I think the fusion controllers could indeed benefit
> and cope with being given a pile of randomly located 1480 byte chunks of
> data and being asked to put them on disk.

It doesn't really matter that much, because we write to the pagecache
first anyway.

The real thing is that we want to have some common data structure for
describing physical memory used for IO. We could either use special
structures in every subsystem and then copy between them or pass
struct page * and lose meta information. Or we could try to find a
structure that holds enough information to make passing it from one
subsystem to another usefull. The cut-down kio design (heavily inspired
by Larry McVoy's splice paper) should allow just that, nothing more an
nothing less. For use in disk-io and networking or v4l there are probably
other primary data structures needed, and that's ok.

Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/