> OK. So there hasn't been much response for this BUG report. So I did a little
> investigating myself.
>
> By use of objdump I get the impression that the Oops shows up when
> ip_dont_fragment is called (expanded). To be more specific: sk seems to be
> NULL.
>
> This is rather weird. ip_queue_xmit2 is also the caller of nf_hook_slow (via
> ip_queue_xmit) at which time skb->sk must be OK. After nf_hook_slow things
> suddenly are wrong.
Well, I have another impression. Look at this code from ip_queue_xmit2:
if (skb_headroom(skb) < dev->hard_header_len && dev->hard_header) {
struct sk_buff *skb2;
skb2 = skb_realloc_headroom(skb, (dev->hard_header_len + 15) &
~15);
kfree_skb(skb);
if (skb2 == NULL)
return -ENOMEM;
if (sk)
skb_set_owner_w(skb2, sk);
skb = skb2;
iph = skb->nh.iph;
}
What you see here is that the original author obviously thought it may be
possible that sk is NULL. Otherwise he would not have put the "if (sk)" in. We
believe him, that he knows what he's doing and read on.
if (skb->len > rt->u.dst.pmtu)
goto fragment;
if (ip_dont_fragment(sk, &rt->u.dst))
iph->frag_off |= __constant_htons(IP_DF);
Shit. You have it.
static inline
int ip_dont_fragment(struct sock *sk, struct dst_entry *dst)
{
return (sk->protinfo.af_inet.pmtudisc == IP_PMTUDISC_DO ||
(sk->protinfo.af_inet.pmtudisc == IP_PMTUDISC_WANT &&
!(dst->mxlock&(1<<RTAX_MTU))));
}
Obviously this breaks in case of sk == NULL.
This leaves you with the following alternatives:
1) He should do
if (sk && ip_dont_fragment(sk, &rt->u.dst))
or
2) ip_dont_fragment is in itself broken, as it is obviously called with
sk==NULL and should handle that.
Anyway, this is a serious bug, because the code is definitely inconsistent.
ip_dont_fragment shows up 6 times in the source (according to my counting). The
safe patch would look like this:
--- ip.h-orig Wed Nov 28 14:55:50 2001
+++ ip.h Wed Nov 28 14:56:25 2001
@@ -181,9 +181,9 @@
static inline
int ip_dont_fragment(struct sock *sk, struct dst_entry *dst)
{
- return (sk->protinfo.af_inet.pmtudisc == IP_PMTUDISC_DO ||
+ return (sk && ( sk->protinfo.af_inet.pmtudisc == IP_PMTUDISC_DO ||
(sk->protinfo.af_inet.pmtudisc == IP_PMTUDISC_WANT &&
- !(dst->mxlock&(1<<RTAX_MTU))));
+ !(dst->mxlock&(1<<RTAX_MTU)))));
}
extern void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst);
Can you check out?
Can any ip-stack guru comment?
Regards,
Stephan
PS: ip_select_ident looks clean in terms of sk==NULL, BTW.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/