if anyone can send me some pointers or hints on how to tackle this bug i
will be very happy.
on an SMP machine i get:
"stuck on TLB IPI wait (CPU#1)"
the driver that i am debugging uses a spin lock, and sometimes we take the
lock for a pretty long time.
this happens during heavy load, which is why i think that the problem
is that in smp_flush_tlb() in ./arch/i386/kernel/smp.c, one of the CPUs gets
all upset that the other CPU is stuck in the lock for too long, and releases
it before it was ment to be released.
things i did that didn't help:
a patch that fixed a similar problem in reiserfs
(http://www.geocrawler.com/mail/msg.php3?msg_id=3962182&list=3455)
the patch for the fast pentium problem, since i have a pentium III.
(http://www.ultraviolet.org/mail-archives/reiserfs.2000/6201.html)
i put a breakpoint when this occurs using kGDB, but i am not able to get
the registers (and stack) of the CPU that is stuck, only the one that
prints the message. so i don't really know where this occurs
in our own code.
does anyone know how i may extract the stack of the second CPU at the
time of this error?
I am using an Intel pentium III with dual CPU.
I am debugging check point's firewall and vpn modules, with kernel-2.2.14
from the redhat RPM, but this also happens with the latest 2.2.19.
it happens quite often (at random), so it's not too hard to recreate it.
thanks a lot.
(please CC me, as i am not subscribed to the list.)
-- Elisheva Alexander Software Developer ================================================================ Email from people at checkpoint.com does not usually represent official policy of Check Point (TM) Software Technologies Ltd. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/