There's an important case you're overlooking that takes us out of the
"sometimes" zone, and that is where you want to load up some piece of
heavily-context-dependent code with assertions, just to have confidence
that the many assumptions actually hold. And once you have those hooks
in the code, it often makes little sense to take them out, because
they'll just have to go back in again the next time some remote part
of the kernel violates one of the assumptions.
What you *want* to do, is just turn them off, not remove them. (Sure,
there are lots that shouldn't survive the alpha version of any code,
but still lots of good ones that should be kept, just stubbed out.)
So this is just a name problem: define MYSUBSYSTEM_BUG_ON which is nil
in production, but equivalent to BUG_ON for alpha or beta code. In the
former case it means the code is going to run a little faster, plus the
system is going to be a little more resilient, as you say.
Otherwise, completely agreed.
> And actually, outside of drivers and filesystems you can often know (or
> control) the number of locks the surrounding code is holding, and then a
> BUG() may not be as lethal. At which point the normal "oops and kill the
> process" action is clearly fine - the machine is still perfectly usable.
Eventually we could try some fancy trick that involves keeping track
of the locks in a systematic way, so that they can be analyzed
automatically by a tool that generates lock-breaking code. Then a
subsystem could use the generated code in its error path. Sure, it's a
lot of work just to add another "9" to reliability, but when was anything
ever easy?
-- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/