> > Basically, you get two virtual CPU's per die, and each CPU can run two
> > threads at the same time. It slows some stuff down, because it makes for
> > much more cache pressure, but Intel claims up to 30% improvement on some
> > loads that scale well.
> >
> > The 30% is probably a marketing number (ie it might be more like 10% on
> > more normal loads), but you have to give them points for interesting
> > technology <)
>
> Specifically, the 30% comes in two places. Using Intel proprietary
> benchmarks (unreleased, according to the footnotes) they find that a
> typical IA32 instruction mix uses some 35% of system resources in an
> advanced device like the P4 with NetBurst. the rest is idle.
>
> by using the SMT model with two virtual systems - each with complete
> register sets and independent APICs, sharing only the backend exec
> units - they claim you get a 30% improvement in wall-clock time. This
> is supposed to be on their benchmarks *without* recompiling anything. To
> get "additional" improvement, using code to take advantage of the dual
> virtual CPUs nature of the chip and recompiling should give some
> unquantified gain.
All things being equal, this probably will make a NxMHz P4 be as fast as a
NxMHz PIII. But the new complexity it may require in real life may just
turn the gain into just nil.
What a great improvement, indeed! :-)
Gérard.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/