That seems amazingly dumb. You'd think a new processor design would
optimize parallel computation over calls, but what do I know?
> Most of these "unconditional branches" are indirect, because rather few
> 64-bit architectures have a full 64-bit branch. That means that in
This is something I don't get: I never understood why 32bit risc designers
were so damn obstinate about "every instruction fits in 32 bits"
and refused to have "call 32 bit immediate given in next word" not
to mention a "load 32bit immediate given in next word".
Note, the superior x86 instruction set has a 5 byte call immediate.
> There are lots of good arguments for function calls: they improve icache
> when done right, but if you have some non-C-semantics assembler sequence
> like "cli" or a spinlock that you use a function call for, that would
> _decrease_ icache effectiveness simply because the call itself is bigger
> than the instruction (and it breaks up the instruction sequence so you
> get padding issues).
I think anywhere that you have inner loop or often used operations
that are short assembler sequences, inline asm is a win - it's easy to
show for example, that the Linux asm x86 macro semaphore down
is three times as fast as
a called version. I wish, however
that GCC did not use a horrible overly complex lisplike syntax and
that there was a way to inline functions written in .S files.
And the feature is way too easy to abuse - same argument here as in
the threads argument.
It's a far better thing to not need a semaphore at all than to rely
on handcoded semaphore down to make your poorly synchronized design
sort-of perform.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/