That document is short for several reasons.
- It has been written for particular needs, and updated rarely
- Most long-time kernel coders really don't need detailed guidelines,
although newcomers (like yourself) apparently want something..
Things you need to understand include at least:
- Linux kernel is one of the most complicated codes that gcc compiler
is ever put to work on, and over the years we have unearthed tons of
optimizer bugs. Many of somewhat odd looking coding details do
originate from circumventions of those compiler bugs.
- Advanced optimizer hinting features, like unlikely() attribute
are very new in (this) compiler, and while they in theory move the
"unlikely" codes out of the fast-path, such things have been buggy
in the past, and we are worried of bug effects...
- Why code in a manner, where the optimizer needs to work very hard to
produce fast code, when simplified structures, like those with 'goto'
in them, can result the same with much less effort ?
- Knowing effects of those optimizations requires profiling, and studying
produced assembly code on different target systems.
- Linux is multi-CPU-architecture kernel, looking at it via "intel-only,
lattest Pentium-XIV" viewpoint is not going to cover all cases. Making
changes that bloat code will make system that much harder to fit into
smaller embedded machines.
- Sometimes you need to code differently to achieve higher performance
at different processors. See drivers/md/xor.c and header files
it includes. Especially all include/asm-*/xor.h files, including
include/asm-generic/xor.h ... (That shows to you how different
approaches work at different systems. What works fast in register-
rich processors does not necessarily be fast in register-poor things,
like i386 series.)
- Adding function classifiers like "static inline" do change rules
somewhat. (Especially things like "mid-function return".)
- Larger code, either with manually replicated exit cleanup (which
are pain to maintain, when some large-scale change is needed), or
via excessive use of "static inline", will pressure caches, and
makes it more likely to drop soon to be again needed code (or data)
from cache. Cases where cache-miss costs your tens, or hundreds
of processor clock cycles that is really important question.
- Explaining things like GCC asm() statement syntaxes isn't a thing
for Linux kernel documentation. (Those vary from target processor
to another!)
I repeat:
To understand effects of various coding details, you need to study produced
assembly code, and keep in mind that in modern processors a branch out of
straight-line execution path (after optimizations) is expensive (in time),
and cache-misses are even more expensive.
Then you need to benchmark those codes with different loads (where
applicable) to find out what caches affect them -- and when you must
bypass caches (they are limited resource, after all) in order to give
overall improved system performance, when your code does not cause
actively used code or data to be evicted from caches unnecessarily.
> -Rob
/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/