Re: Switching Kernels without Rebooting?

Andreas Dilger (adilger@turbolinux.com)
Wed, 11 Jul 2001 17:44:45 -0600 (MDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Brian Strand: "Re: 2x Oracle slowdown from 2.2.16 to 2.4.4"
Previous message: Richard Purdie: "Re: PROBLEM: <BUG Report: kernel BUG at slab.c:1062! from pppd with speedtouch drivers and pppoatm>"
Next in thread: Ralf Baechle: "Re: Switching Kernels without Rebooting?"
Reply: Ralf Baechle: "Re: Switching Kernels without Rebooting?"

Colin Slater writes:
> Does it come up often? Well, I have a sourceforge project setup and am
> currently only waiting on finalizing how it's going to be done. So we have
> about proved the first possibility wrong, and if you ever hear anything else
> about this in a while, we will have proved the second wrong too. Soo, while
> we are at it, ill say, that if anyone wants to help with it, email me. We
> especialy need people that either have ideas on how to do this or have a
> good knowledge of the kernel, mainly memory, processes, and initilization.

Not to be overly negative, I don't intend this email as an insult, but rather
as a "shed a little light" on the discussion email. I would be _happy_ if
you actually succeed in your project, but your comments come out as follows:

a) we want this "sounds real good" feature
b) we don't know how we will do it, beyond some hand waving ideas
c) we want kernel experts who know what they are doing to help us
d) kernel experts who have replied so far (negatively) don't know what
they are talking about, so please butt out
e) you have "started coding" by setting up a sourceforge project

Note that you are talking about a VERY difficult problem, which is
not available on 99.9% of systems out there. Maybe on a few highly
specialized *nixes which were designed for this (Sequent or such),
and probably have extra hardware support to help along. I'm _pretty_
sure that Solaris and AIX and HP/UX do NOT do this, and don't you think
they would want to if it were easy? It would be easier than under
Linux from the perspective that their kernels change far less often,
and have relatively static interfaces.

The best proposal I've heard so far was to use MOSIX to do live job
migration between machines, and then upgrade the kernel like normal.
In the end, it is the jobs that are running on the kernel, and not
the kernel or the individual machine that are the most important. One
person pointed out that there is a single point of failure in the
MOSIX "stub" machine, which doesn't help you in the end (how do you
update the kernel there?). If you can figure a way to enhance MOSIX
to allow migrating the MOSIX "stub" processes to another machine, you
will have solved your problem in a much easier way, IMHO.

Note also that you need to look at the _specific_ reason why you want to
do live kernel upgrades, besides it "sounds real good". If you have such
tight uptime deadlines that you can't take 5 minutes of downtime to boot
a new kernel, then you are probably using a load balancing cluster anyways
in case of hardware failure, so live kernel updates are not needed here.

Note that all real-world high-availability systems I ever worked on
still allowed for SCHEDULED maintenance downtime, but highly frowned
upon UNSCHEDULED downtime. Even IBM's S/390 99.999% uptime numbers
exclude downtime for SCHEDULED outages, which are simply a fact of life.

Please prove everyone wrong by developing a way to do this, or even
showing a proof-of-concept (i.e. a user-space framework for translating
every kernel data structures from one kernel version to another, that
works across, say, a large fraction of the 2.2 kernel, or maybe from
2.4.0-test until 2.4.current). It doesn't have to be in-kernel (yet).

Cheers, Andreas

-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Next message: Brian Strand: "Re: 2x Oracle slowdown from 2.2.16 to 2.4.4"
Previous message: Richard Purdie: "Re: PROBLEM: <BUG Report: kernel BUG at slab.c:1062! from pppd with speedtouch drivers and pppoatm>"
Next in thread: Ralf Baechle: "Re: Switching Kernels without Rebooting?"
Reply: Ralf Baechle: "Re: Switching Kernels without Rebooting?"