Re: and nicer too - Re: [PATCH] epoll more scalable than poll

Jamie Lokier (lk@tantalophile.demon.co.uk)
Wed, 30 Oct 2002 00:26:44 +0000


> 1) "issuing a command to an IDE disk" == "using read/write until EAGAIN"
> 2) "adding yourself on the IDE disk wait queue" == "calling sys_epoll_wait()"

That is quite a good analogy. epoll is like a waitqueue - which is
also like a futex. To use a waitqueue properly you have do these
things in the order shown:

1. Set the task state to stopped.
2. Register yourself on the waitqueue.
3. Check the condition.
4. If condition is not met, schedule.

With epoll it is very similar. To wait for a condition on a file
descriptor, such as readability, you must do these things in the order
shown:

1. Register your interest using epoll_ctl.
2. Check the condition by actually calling read().
3. If the condition is not met (i.e. read() returned EAGAIN),
call epoll_wait (i.e. equivalent to schedule).

With epoll, you can optimise by registering interest just once. In
other words, steps 2 and 3 may be repeated without repeating step 1.

And if you are concerned about starvation -- that is, one of your file
descriptors always has new data so others don't get a chance to be
serviced -- don't be. You don't have to completely read one fd until
you see EGAIN. All that matters is that until you see the EAGAIN,
your user space data structure should have a flag that says the fd is
still readable, so another epoll event is not expected or required for
that fd.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/