Perhaps is just synchronization of caches.
say you want to sum all the elements of a vector in parallele split in
two pieces:
int total=0;
thread 1:
for fist half
total += v[i]
thread 2:
for second half
total += v[i]
and you tought: 'well, I need a mutex for access to total. that will slow
down things, lets use separate counters':
int bigtotal;
int total[2];
thread 1:
for fist half
total[0] += v[i]
thread 2:
for second half
total[1] += v[i]
bigtotal = total[0]+total[1]
The problem ? total[0] and total[1] are nearby one of each other. So in
the same cache line. So on every write to total[?], even if they are
independent, system has to synchrnize caches.
Big iron (SGI, Sparc), has special hardware, but cheap PC mobos...
-- J.A. Magallon # Let the source be with you... mailto:jamagallon@able.es Linux Mandrake release 8.1 (Cooker) for i586 Linux werewolf 2.4.5-ac13 #1 SMP Sun Jun 10 21:42:28 CEST 2001 i686 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/