Re: Serverworks OSB4 in impossible state.

Alan Cox (alan@lxorguk.ukuu.org.uk)
04 Jun 2002 01:29:08 +0100


On Mon, 2002-06-03 at 18:40, Steven Timm wrote:
> Serverworks OSB4 in impossible state.
> Disable UDMA or if you are using Seagate then try switching disk types
> on this controller. Please report this event to osb4-bug@ide.cabal.tm
> OSB4: continuing might cause disk corruption.
>
> This is the only one of 60 machines thus configured that has
> had the error thus far.
>
> Two points:
> 1) The E-mail address in that kernel debug message doesn't exist.
> E-mail bounces back from it.

Oops I'll go fix that small detail. It should have been forwarded to me.

> 2) What is causing the hang and are there any hopes to
> fix it in software this time? Last year when I came to the kernel
> list with problems very similar, the consensus was that this
> is actually broken hardware in the OSB4 chipset...but obviously
> it is possible for at least some kernels to run quasi-normally
> on this hardware... what changed between 2.4.9 and 2.4.18 so
> it doesn't anymore?

The code traps out when it sees the I/O complete and it turns out that
the DMA engine flags say the engine is still running. In this state we
kill the box because we know the next I/O will be written 4 bytes skewed
with the last 4 bytes of the previous I/O apparently repeated at the
start.

I took it up with the Serverworks guys at the time, but they were not
able to duplicate the problem and provide advice. Since we could verify
this across an entire rendering farm it was clearly not a weird one off
bug. It also doesn't appear to be a Linux bug (but maybe one day I'll be
proved wrong).

If you drop the drives to MWDMA2 you'll see only slightly lower
performance and solid behaviour

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/