Not the old bootimg code that I had found, but the mclx crash dump
code based on bootimg includes these modifications.
Something like this runs on all cpu's as part of the crash code,
where machine_restart calls bootimg directly if configured.
+/*
+ * If we are not the panicking thread, we simply halt. Otherwise,
+ * we take care of calling the reboot code.
+ */
+#ifdef CONFIG_SMP
+ if (!boot_cpu) {
+ stop_this_cpu(NULL);
+ /* NOTREACHED */
+ }
+#endif
+
+ machine_restart(NULL)
And the following code added along the init path:
diff -urN linux-2.4.17-vanilla/init/main.c linux-2.4.17-mcore/init/main.c
--- linux-2.4.17-vanilla/init/main.c Fri Dec 21 12:42:04 2001
+++ linux-2.4.17-mcore/init/main.c Fri Jan 11 11:04:52 2002
@@ -580,6 +593,15 @@
kmem_cache_init();
sti();
+#if defined(CONFIG_BOOTIMG) && defined(CONFIG_X86_LOCAL_APIC)
+ /* If we don't make sure the APIC is enabled, AND the LVT0
+ * register is programmed properly, we won't get timer interrupts
+ */
+ setup_local_APIC();
+
+ value = apic_read(APIC_LVT0);
+ apic_write_around(APIC_LVT0, value & ~APIC_LVT_MASKED);
+#endif
calibrate_delay();
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
>
> I have tried 3 different solutions for for Linux-reloading-linux
> (bootimg, two-kernel monte, and kexec), and none of them fully support
> the kinds of enterprise-class systems we (OSDL) care about:
>
> 1. multiprocessor x86 (p3, p4, +xeons) with APICs
> 2. >4GB memory
> 3. CPU hotplug
> 4. device hotplug
> 5. >= 2.5.x kernel
>
> In fact, I have yet to find any variation of linux-loading-linux that
> works at all on the 2-way P4-Xeon under my desk or the 8-way P3-Xeon in
> the lab. The only system I have ever seen Two Kernel Monte work on here
> is a Celeron-based machine in a nearby cube.
>
> Why do we care about this? Rebooting these kinds of sytsems can take
> several minutes, and in my sample of the systems in the lab, ~80% of the
> reboot time is spent slogging through the platform's firmware, ~20% of
> the time is spent between LILO and login:. 80% of several minutes is
> often greater than the allowable annual downtime for some enterprise
> systems.
>
> What about LinxuBIOS? While an attractive solution for many, it is a
> long, uphill battle to add support for chipset after chipset, and
> motherboard after motherboard.
>
> The >4GB of memory problem is an interesting quirk -- if the
> linux-loading-linux implementation assumes that it can perform the final
> copy in 32-bit protected mode *without* paging enabled, it won't
> reliably work on >4GB systems.
Isn't the image copied into kernel pages/buffers within allowable ranges
first (when loading the image) ?
>
> > In general yes. There are some interesting side effects though.
> > Going through the pci bus and shutting off bus masters is a good
> > first approximation of what needs to happen.
> >
>
> The new device model from Pat (mochel@osdl.org) is probably the best way
> to go here; you'll be able to walk the driver tree and reliably turn off
> devices.
Yes, I had discussed this with him sometime back to understand if his
model would support what we need. Conditions are more stringent or
less reliable in a crash scenario, but still this looks like the
right direction.
>
> For the CPU side of things, the CPU hotplug work looks promising as
> well.
>
> Andy
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/