<!-- received="Thu Nov  5 01:36:42 1998 EET" -->
<!-- sent="Wed, 4 Nov 1998 22:10:30 GMT" -->
<!-- name="Stephen C. Tweedie" -->
<!-- email="sct@redhat.com" -->
<!-- subject="[Patch] Day-one race in slab.c" -->
<!-- id="199811042210.WAA14378@dax.scot.redhat.com" -->
<!-- inreplyto="" -->
<title>Linux-kernel mailing list archive 1998-44,: [Patch] Day-one race in slab.c</title>
<body bgcolor="#FFFFFF"><font face="Arial,Helvetica">
<h1>[Patch] Day-one race in slab.c</h1>
<b>Stephen C. Tweedie</b> (<a href="mailto:sct@redhat.com"><i>sct@redhat.com</i></a>)<br>
<i>Wed, 4 Nov 1998 22:10:30 GMT</i>
<p>
<ul>
<li> <b>Messages sorted by:</b> <a href="date.html#538">[ date ]</a><a href="index.html#538">[ thread ]</a><a href="subject.html#538">[ subject ]</a><a href="author.html#538">[ author ]</a>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="0539.html">Richard Gooch: "Re: Comments on Microsoft Open Source document"</a>
<li> <b>Previous message:</b> <a href="0537.html">Cameron Heide: "Re: Storage Tek STL's"</a>
<!-- nextthread="start" -->
<!-- reply="end" -->
</ul>
<hr>
<!-- body="start" -->
Hi,<br>
<p>
For a few days I've been chasing what looked like a skbuff buf in all<br>
recent 2.1 kernels.  The symptom was repeated<br>
<p>
	kmem_free: NULL ptr (objp=c009e928, name=unknown)<br>
<p>
when doing a kmem_cache_reap() on the skbuff_head_cache.  I think I've<br>
finally traced it to a long-time bug in the slab cache itself.  The<br>
problem only occurs if an interrupt slab allocation hits a race in the<br>
cache reaping, which is why it seems to be quite rare.  On a 16mb test<br>
box, I cannot reproduce the problem, but on 8mb, an NFS build will hit<br>
it reliably in 5 to 10 minutes.<br>
<p>
The problem is at the end of kmem_slab_destroy: we destroy the slab data<br>
before destroying the optional management and index structures<br>
associated with the slab.  Unfortnately, if the slab is one of the<br>
standard small-object slabs which include the management structure<br>
within the slab page, deallocating the slab also destroys the slabp<br>
object, and when immediately afterwards we check slabp-&gt;s_index to see<br>
if the index needs to be freed, we can pick up new, bogus data if the<br>
page has been reused.<br>
<p>
I'm not sure whether we can ever get a more serious oops from this<br>
problem, but if we can, it should be quite rare, only hurting us if we<br>
have a cache with separate slab indexes but embeded management<br>
structures.  Any slab with no index will merely result in the above<br>
kmem_free warning as kmem_freepages() will be passed a null pointer from<br>
cachep-&gt;c_index_cachep.<br>
<p>
Patch for your pleasure:<br>
----------------------------------------------------------------<br>
--- mm/slab.c.~1~	Wed Nov  4 10:31:42 1998<br>
+++ mm/slab.c	Wed Nov  4 20:07:07 1998<br>
@@ -650,9 +658,9 @@<br>
 	}<br>
 <br>
 	slabp-&gt;s_magic = SLAB_MAGIC_DESTROYED;<br>
-	kmem_freepages(cachep, slabp-&gt;s_mem-slabp-&gt;s_offset);<br>
 	if (slabp-&gt;s_index)<br>
 		kmem_cache_free(cachep-&gt;c_index_cachep, slabp-&gt;s_index);<br>
+	kmem_freepages(cachep, slabp-&gt;s_mem-slabp-&gt;s_offset);<br>
 	if (SLAB_OFF_SLAB(cachep-&gt;c_flags))<br>
 		kmem_cache_free(cache_slabp, slabp);<br>
 }<br>
----------------------------------------------------------------<br>
<p>
I've had the NFS build test running for an hour and a half with this<br>
patch applied, and can no longer reproduce the problem.<br>
<p>
Cheers,<br>
 Stephen.<br>
<p>
-<br>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br>
the body of a message to majordomo@vger.rutgers.edu<br>
Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br>
<!-- body="end" -->
<hr>
<p>
<ul>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="0539.html">Richard Gooch: "Re: Comments on Microsoft Open Source document"</a>
<li> <b>Previous message:</b> <a href="0537.html">Cameron Heide: "Re: Storage Tek STL's"</a>
<!-- nextthread="start" -->
<!-- reply="end" -->
</ul>
</font></body>
