Re: ReiserFS buglet

Jakob Oestergaard (jakob@unthought.net)
Tue, 24 Sep 2002 12:03:38 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Frank Cornelis: "Oops on umount -a -f"
Previous message: Nikita Danilov: "Re: [ANNOUNCE] Native POSIX Thread Library 0.1"

On Tue, Sep 24, 2002 at 01:48:16PM +0400, Oleg Drokin wrote:
> Hello!
...
> > Disk errors are common. Software can also flip that bit.
>
> Not only disk errors are common, but also CPU/memory/chipset/wiring errors are.

It's a question of which errors one wishes to handle, and which you
simply choose to ignore.

It's a compromise, and I understand that.

For example, BK uses checksums on all it's files (AFAIK). This allows
you to at least discover hardware errors. CVS doesn't - but CVS is
still "good enough" for a lot of people.

Some hardware problems cannot be detected, much less recovered from,
without adding some significant cost (run-time performance wise,
complexity wise, or ...).

This problem, however, you can both detect and recover from perfectly,
with little extra effort. Whether you want to do so or not is of course
up to you.

I'm not going to push - I just wanted to point it out :)

...
> > I posted to LKML about a month ago with some questions regarding exactly
> > this issue. I had a disk that worked on 128 byte atomic writes - a
> > standard IDE disk.
>
> Hm. This is still much larger than 20 bytes we use.

Assuming your 20 bytes do not span a 128 byte boundary ;)

Perhaps you're safe on current LVM/RAID/partition layers (which may
guarantee a coarser alignment - today).

And perhaps there is no disk out there with less than 128 byte atomic
writes. Maybe. Do you know? I certanly don't.

>
> > The conclusion was something like "we know jack about the disk's
> > internal logic" so we need consistency measures instead of relying on
> > anything from the disk.
>
> Actually we submit data to disk in 512 byte chunks (4k blocksize case),
> and disk should not write any data before it receives all of it and
> checks the integrity (hm, this is only true for UDMA, though.)
> Still I do not see why any sane disk would start to write a sector before fully
> receiving new sector's content (thinking of disk drives of course, solid state
> stuff should take its own measures in this direction too).

Please read the original mails about the IDE disk writing.

The date is 5th of August this year, the subject was "Disk (block) write
strangeness".

The conclusion really was that there is no such thing as a 512 byte
sector. Not in the real world. The disk interface may emulate it, but
that doesn't mean that the disk is internally working with a concept
even remotely close to that.

> This is even more insane than ACKing data and putting it in not battery
> backed cache to be lost on power loss (Yes, I know this is a common
> practice now. At least there is a way either to turn such feature off
> or to flush a cache on demand).

I was pretty surprised about all this myself, and I just wanted to bring
the issue to your attention.

The real world just sucks sometimes ;)

>
> Thanks for bringing our attention to such issues, though changing disk format
> is our of questions for reiser3 now, some kind of verifying single instance
> on-disk structures may/will be incorporated in reiser4.

Of course - I look forward to seeing how/if you will deal with the
problem.

Cheers,

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Frank Cornelis: "Oops on umount -a -f"
Previous message: Nikita Danilov: "Re: [ANNOUNCE] Native POSIX Thread Library 0.1"