Good catch!
As for your alternate approch patch I've a few comments:
1) there maybe an RT tasks running, shrinking ram without reschedules in
between (ignore the current page_alloc that does a bogus schedule before
starting memory balancing), so schedule may never run and the RT task
can run oom, so you should at least set need_resched of the interesting
cpus + send the IPI reschedule before returning from call_rcu to avoid
to be starved
2) the real design issue here if we should pay 8k per-cpu and zero cpu
cost for the fast paths, or if we want to pay with a new branch in
schedule() fast path, I preferred the krcud approch for that reason:
+ if (atomic_read(&rcu_pending))
+ goto rcu_process;
+rcu_process_back:
> The patch I submitted to Andrea had logic to make sure that
> two CPUs don't execute wait_for_rcu() at the same time.
> Somehow it seems to have got lost in Andrea's modifications.
I think the bug was in your original patch too, I'm pretty sure I didn't
broke anything while changing the API a little.
> I will look at that and submit a new patch to Andrea, if necessary.
I prefer to allow all cpus to enter wait_for_rcu at the same time rather
than putting a serializing semaphore around wait_for_rcu (it should
scale pretty well if we don't serialize around wait_for_rcu).
The way I prefer to fix it is just to replace the rcu_sema with a per-cpu
semaphore and have wait_for_rcu running down on such per-cpu semaphore
of the interesting cpu, should be a few liner patch (we have space
free for it in the per-cpu rcu_data cacheline).
> As for wrappers, I am agnostic. However, I think sooner or later
> people will start asking for them, if we go by our past experience.
Maybe I'm missing something but what's the problem in allocating the
struct rcu_head in the data structure? I don't think it's not much more
complicated than the cast magics, and in general I prefer to avoid casts
on larger buffers to get advantage of the C compile time sanity checking ;).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/