This could be due to disk request elevator latency and VM imbalance.
Your application has a page dropped due to the competing write activity,
and it takes ages to be restored, due to the write activity.
> Seems only to happen on 2 or more processor boxes.
In which case the above theory is wrong.
> I'm not deep into kernel nor ext3, but how is the journal flushed if
> full?
Nothing special, really - we just pump a stream of data out to disk.
While this is happening, other processes can still attach data to the
journal without getting blocked. Up to a point. Our handling of this
is a bit sudden at present. Some people have reported benefit from
radically decreasing the buffer flushtimes. See Daniel Robbins' article
at http://www-106.ibm.com/developerworks/linux/library/l-fs8/ for this.
Yes, improvements are needed in this area. Not only in ext3.
You haven't really defined "freeze", but it's certainly different
from Matti's freeze.
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/