I would like to bring your attention to 2nd oops posted in the first mail for
this thread (http://www.uwsg.iu.edu/hypermail/linux/kernel/0202.1/1193.html).
Initially I thought both might be same because they were both releated bind
hash lists. But I have to began suspect it. The first oops is because of race
caused by tw structures left out in the list.. for which a patch is being
discussed here.
The second one seems to be purely because of a corrupted tb
(tcp_bind_hashbucket).. there is no race there.. tb just got pointers into
arbitary place. Also we saw second oops sometimes with relatively less load.
These are two locations it we have seen (with different bad values for
affending pointers each time it occured):
net/ipv4/tcp_ipv4.c:tcp_v4_get_port():
spin_lock(&head->lock);
for (tb = head->chain; tb; tb = tb->next)
1======> if (tb->port == rover)
/*tb is a bad pointer, not NULL */ goto next;
break;
next:
spin_unlock(&head->lock);
/*
more stuff removed
*/
struct sock *sk2 = tb->owners;
int sk_reuse = sk->reuse;
for( ; sk2 != NULL; sk2 = sk2->bind_next) {
if (sk != sk2 &&
2======> sk->bound_dev_if == sk2->bound_dev_if) {
/* sk2 (tb->owners) is a bad pointer */
I will think more about if the problem that causes 1st oops can cause this as
well.
Thanks,
Raghu.
--- kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > the deletion from _both_ the lists.
>
> No, it is only for insertion to established hash table.
>
> References from bind hash are not considered as references.
> Look, if socket will sit only in bind table nobody ever will see it.
> Where is the the reference? :-) It just must _not_ stay in bind hash,
> if no other references remained, that's invariant which we should provide
> now. If we will fail, we are in troubles.
>
> So, its absence in bind hash must be guaranteed to the time of destruction.
> Look at this from another aspect: imagine you increment refcnt when
> adding to binding table. OK. So, what does guarantee that bucket
> will not remain in bind hash forever? And "it will not" is equivalent
> to "refcnt is not useful".
>
> Anyway, I will think on this at night, I am not ready to tell how to
> do this right.
>
>
> > If you want to avoid timewait_kill() getting called twice altogether.
>
> Sorry, I did not understand what do you mean here. It can be called
> twice or three times or more. This is impossible to avoid without adding
> spinlock to timewait bucket.
>
> Alexey
__________________________________________________
Do You Yahoo!?
Yahoo! Greetings - Send FREE e-cards for every occasion!
http://greetings.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/