Re: [OKS] O(1) scheduler in 2.4

Rob Landley (landley@trommello.org)
Thu, 4 Jul 2002 14:08:42 -0400


On Wednesday 03 July 2002 11:36 pm, Bill Davidsen wrote:

> This is not some neat feature to buy a few percent better this or that,
> this is roughly 50% more users on the server before it falls over, and no
> total bogs when many threads change to hog mode at once.
>
> You will not hear me saying this about preempt, or low-latency, and I bet
> that after I try lock-break this weekend I won't fell that I have to have
> that either. The O(1) scheduler is self defense against badly behaved
> processes, and the reason it should go in mainline is so it won't depend
> on someone finding the time to backport the fun stuff from 2.5 as a patch
> every time.

I've got a similar setup. At work I'm doing a simple ssh based vpn:
connections to the vpn address range outside the local subnet are intercepted
by port forwarding to a tiny daemon (700 lines of C source, mostly comments),
that shells out to ssh (forwarding stdin and stdout back to the net
connection) to connect to the appropriate remote gateaway, where it runs
netcat to complete the connection.

So each tcp/ip stream is individually wrapped in its own ssh process, which
exits automatically when the connection closes. No mess, no fuss, and
scalability is based on active connections rather than the number of systems
in the VPN.

Unfortunately, some of these VPN gateways are behind existing firewalls
(cisco, etc). If I can get a port forwarded to my vpn gateway from that
firewall, life is good (it's a little more work for the daemon to figure out
where to ssh to, but that's part of the 700 lines). But when I can't get
that, the machine has to dial out to a known public machine (the "star
server") and have its incoming data bounced off of that machine. (Evil, but
only incoming connections to those trapped machines need to use the star
server. Everybody else can still dial direct, and the trapped machines can
still dial out direct.)

The star server tends to be running LOTS of ssh processes (four for each
connection: one instance of sshd for each incoming connection, plus the
netbounce processes that sshd instance runs, which talk to one another
through named pipes. I could get that down to two processes by modifying
sshd to integrate the netbounce functionality, but it hasn't been a
bottleneck. Netbounce doesn't eat much, sshd is the real cpu hog. And it's
not as easy to rewrite netbounce to be one central process with a poll loop
as you'd think: sshd wants to run SOMETHING. So far I'm using standard sshd
code, I'd prefer not to make special purpose modifications to the thing it if
I can help it.)

The bottleneck is that with thirty big data transfers going through sixty
sshd processes (which are real CPU hogs decrypting incoming data and
encrypting outgoing data), a 700 mhz athlon goes catatonic. The existing
bulk data shoveling connections have their data shoveled fine, but new
incoming connections (even for short lived "fetch me 10k of web data of the
remote box" type connections) are Not Happy. The existing scheduler's
getting confused by the fact that the sshd sessions DO sometimes block to
get/send their data, and isn't so good at keeping a running average to spot
the CPU hogs and the sessions that are more interactive or simply short lived.

That's why I'm playing with the O(1) scheduler. I may need to put rate
limiting in netbounce anyway, but the problem I'm HITTING is that the
existing scheduler is melting down so badly that past a fairly low saturation
level, fresh connection attempts through the star server are timing out.
(This hardware seems like it should be able to handle around 100 simultaneous
connections, and it's currently melting down around 30.)

Yeah, I'm beating the CPU to death encrypting and decrypting data. Yeah, I
could throw more hardware at the problem (and will). I could take another
stab at redesigning the star server to consolidate all the netbounce
processes into a single poll loop (which would require modifying sshd), but
netbounce isn't the problem: the two sshd processes per connection are. (I
could merge all the connections to and from each box into a single sshd
process per gateway, but that clashes with the way the rest of the VPN works,
which is simple and suprisingly reliable, and there would still be at least
one per box anyway. And what that really MEANS is that I'd be bypassing the
process scheduler and doing my own manual scheduling.)

This is a real-world situation of a pure scheduling problem. The star server
has a quarter gigabyte of ram and isn't going anywhere near swap. The
scheduler has plenty of hints about CPU usage, blocking for I/O, and freshly
spawned processes needing to start at a higher priority than entrenched
saturation level data shovelers.

Hence putting "play with O(1)" on my to-do list...

Rob

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/