Re: [PATCH] Set TIF_IRET in more places

Jamie Lokier (lk@tantalophile.demon.co.uk)
Tue, 7 Jan 2003 11:19:05 +0000


Zack Weinberg wrote:
> Consider SA_RESTORER - there isn't a guarantee that user space will
> use the same code as the kernel's trampoline. glibc happens to, but
> only because GDB has a hardwired idea of what a signal trampoline
> looks like. Of course, you could simply document that sigreturn() is
> another of the system calls that must be made through int 0x80.

Glibc must use the same code as the kernel's trampoline because of
MD_FALLBACK_FRAME_STATE_FOR() in GCC's exception handling... (or
libgcc.so must change).

It explicitly checks for the opcode sequences 0x58b877000000cd80 and
0xb8ad000000cd80 in order to unwind exception frames around a
handled signal. Ugly, isn't it?

> It occurs to me that the kernel-provided signal trampoline could go in
> the page at 0xffff0000, instead of on the user stack, which would
> eliminate the need for glibc to set SA_RESTORER (it's a pure
> optimization).

Yup.

> Tangentially, I've seen people claim that the trampoline ought to be
> able to avoid entering the kernel, although I'm not convinced (how
> does the signal mask get reset, otherwise?)

Welcome to a wonderful if rather unsightly optimisation:

1. libc installs its own handler function for all non-SIG_DFL signals,
and sigaction() mostly updates a table in userspace.

2. The libc signal handler redirects all signals to the application
through a funky trampoline in libc.

3. A signal mask is maintained in userspace. Also, a pending mask
is maintained in userspace.

4. When a signal is delivered, libc's handler function checks the
userspace signal mask. If the signal should be blocked, and it
is possible to block it, it is marked as pending in _userspace_
pending mask, and the userspace signal mask is propagated to the
kernel to prevent further signals queuing up. Any siginfo_t is
also saved for tha signa. Then libc's handler returns, without
calling the application handler (because that is deferred).

5. When a signal is unblocked from the userspace signal mask, if it
is in the userspace pending mask, it is synthetically delivered
by userspace, which creates a context _as if_ the kernel had
delivered the signal.

By this mechanism, calls to unblock signals from the signal mask can
be done without entering the kernel, because the unblocking can be
done lazily.

Voila! sigreturn() can be written to avoid entering the kernel. Note
that this is possible _now_, with no changes to the kernel. It only
requires changes to libc. I think it would work on all architectures,
not just i386. (It may also be possible to do it without libc help,
in the vsyscall page).

-- Jamie

ps. A similar optimisation allows "spin_lock_irqsave" and
"spin_unlock_irqrestore" to avoid using the cli & sti instructions.
Spin locks already modify the preempt_count, so use a bit of that to
hold the synthetic interrupt-disabled flag, at zero cost... :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/