Re: LANANA: To Pending Device Number Registrants

Linus Torvalds (torvalds@transmeta.com)
Tue, 15 May 2001 11:04:27 -0700 (PDT)


On Tue, 15 May 2001, James Simmons wrote:
> >
> > And if write() has too much overhead - we'd better fix _that_, because
> > it's much more likely hotspot than ioctl ever will be.
>
> I would use write except we use write to draw into the framebuffer. If I
> write to the framebuffer with that data the only thing that will happen is
> I will get pretty colors on my screen.

Note that this was the same argument that the USB people had, and it was
wrong then. It's wrong now.

The USB people decided on using ioctl's, because the way USB works you
send a packet down a "USB pipe", which is identified by the direction, the
device number and the type (and other details). So what the USB system
does to expose this to user land is very similar to what you propose for
ioctl's: a structured ioctl that has a "data" field.

What Al is saying, and what makes perfect sense is that you generate a
separate fd for each "pipe". It's even more obvious in the case of USB,
because, by golly, the things are actually _called_ "pipes" in the USB
documentation, which should have made people make the immediate
association. Instead of doing

fd = open("unstructured-name" ...);
ioctl(fd, MAGICIOCTL, { structured data });

you do

fd = open("/structured/name", ...);
write(fd, data, size);

or possibly you take a more socket-like approach and do

fd = socket(part-of-the-structure);
bind(fd, more-of-the-structure)
connect(fd, last-part-of-the-structure);

and use write() there (or use "sendto()" etc which allow more dynamic
structure constructs - you don't have to statically bind the fd early at
bind/connect time.

See?

Don't get boxed in by thinking that you only have one fd. Even if you have
only one _device_node_, you can have multiple fd's. In fact, you can, with
the Linux VFS layer, fairly easily do things like

mknod /dev/fd0 c X Y

and then use

fd = open("/dev/fd0/colourspace", O_RDWR);

and your device just implements some trivial "lookup()" functions (you
don't _have_ to be a directory to allow name lookups - although right now
I suspect that you can confuse the VFS layer if you aren't. That's a VFS
layer deficiency, if so. Nobody has tested it, but it should be really
easy to fix if somebody is really interested).

Note that with these kinds of things, you don't need ugly ioctl's. The
code, I bet, would be a LOT more readable. There's nothing fundamentally
impossible with having

> /dev/fd0/eject

cause an eject event on /dev/fd0. It would be fairly easy, I bet, to
expand the current "struct file_operations def_blk_fops" to also include
_dentry_ operations, and then all of this could be done by fs/block_dev.c,
with the actual device drivers not having to know about it.

Same thing with character devices. We should be fairly easily able to make
something like

fd = open("/dev/fd0/colourspace=1", ...)

be fully parsed by the fs/block_dev.c layer: we could add a nice string to
"bd_op->open()", to be passed in to the device driver to do with as it
wishes. That would require _no_ changes from device driver writers except
the addition of a new argument ("const char * arg") and the choice to
possibly using that argument for extra structure..

This, btw, is Al Viro's wet dream. But I have to agree: using name spaces
etc is MUCH preferable to ioctl's, makes code more readable and logical,
and often makes it possible to do things you couldn't sanely do before
(control these things from scripts etc).

And using ASCII names ("eject") instead of numbers (see the "FDEJECT" and
"CDROMEJECT" etc #defines) sure as hell makes for easier maintenance and
avoids the whole issue of maintaining static numbers (all the same things
that make me hate device number maintenance makes me also hate the fact
that we need to maintain this list of ioctl numbers etc). By using
descriptive names, the "maintenance" simple does not exist.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/