I also considered this possibility before taking the other approch, I
thought it was inferior because it adds branches and it increases the
dcache pressure, so I thought just marking our cacheline dirty and
reading it in one go, with no additional overhead would been a win (the
less possible number of cycles and no branch prediction issues). Of
course the below will allow parallel i_size readers to scale, but again,
I think the fstat benchmark doesn't matter much and true parallel
readers on the same inode (not only i_size readers) will have to collide
on the pagecache_lock anyways (even in 2.5). So I still think the
chmpxchg8b is a win despite it marks the i_size cacheline dirty, but
somebody should try to benchmark it probably to verify the major
bottleneck remains the pagecache_lock.
I actually applied the below but I enabled it only for the non x86 32bit
archs (like s390, ppc) where I have no idea how to code a get_64bit it
in asm. It should be definitely better than a separate spinlock
protecting the i_size.
comments?
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/