Not to be overly negative, I don't intend this email as an insult, but rather
as a "shed a little light" on the discussion email. I would be _happy_ if
you actually succeed in your project, but your comments come out as follows:
a) we want this "sounds real good" feature
b) we don't know how we will do it, beyond some hand waving ideas
c) we want kernel experts who know what they are doing to help us
d) kernel experts who have replied so far (negatively) don't know what
they are talking about, so please butt out
e) you have "started coding" by setting up a sourceforge project
Note that you are talking about a VERY difficult problem, which is
not available on 99.9% of systems out there. Maybe on a few highly
specialized *nixes which were designed for this (Sequent or such),
and probably have extra hardware support to help along. I'm _pretty_
sure that Solaris and AIX and HP/UX do NOT do this, and don't you think
they would want to if it were easy? It would be easier than under
Linux from the perspective that their kernels change far less often,
and have relatively static interfaces.
The best proposal I've heard so far was to use MOSIX to do live job
migration between machines, and then upgrade the kernel like normal.
In the end, it is the jobs that are running on the kernel, and not
the kernel or the individual machine that are the most important. One
person pointed out that there is a single point of failure in the
MOSIX "stub" machine, which doesn't help you in the end (how do you
update the kernel there?). If you can figure a way to enhance MOSIX
to allow migrating the MOSIX "stub" processes to another machine, you
will have solved your problem in a much easier way, IMHO.
Note also that you need to look at the _specific_ reason why you want to
do live kernel upgrades, besides it "sounds real good". If you have such
tight uptime deadlines that you can't take 5 minutes of downtime to boot
a new kernel, then you are probably using a load balancing cluster anyways
in case of hardware failure, so live kernel updates are not needed here.
Note that all real-world high-availability systems I ever worked on
still allowed for SCHEDULED maintenance downtime, but highly frowned
upon UNSCHEDULED downtime. Even IBM's S/390 99.999% uptime numbers
exclude downtime for SCHEDULED outages, which are simply a fact of life.
Please prove everyone wrong by developing a way to do this, or even
showing a proof-of-concept (i.e. a user-space framework for translating
every kernel data structures from one kernel version to another, that
works across, say, a large fraction of the 2.2 kernel, or maybe from
2.4.0-test until 2.4.current). It doesn't have to be in-kernel (yet).
Cheers, Andreas
-- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/