Re: Linux 2.4.2 fails to merge mmap areas, 700% slowdown.

Zack Weinberg (zackw@stanford.edu)
Fri, 23 Mar 2001 20:11:22 -0800


Kevin Buhr wrote:
> Jakob Østergaard <jakob@unthought.net> writes:
> >
> > Try compiling something like Qt/KDE/gtk-- which are really heavy on
> > templates (with all the benefits and drawbacks of that).
>
> Okay, I just compiled gtk-- 1.0.3 (with CFLAGS = "-O2 -g") under three
> versions of GCC (Debian 2.95.3, RedHat 2.96, and a CVS pull of the
> "gcc-3_0-branch") on the same Debian machine running kernel 2.4.2.
>
> In all cases, the "cc1plus" processes appeared to max out around 25M
> total size. The "maps" pseudofiles for the 2.95.3 and and 3.0
> compiles never grew past 250 lines, but the "maps" pseudofiles for the
> RedHat 2.96 compile were gigantic, jumping to 3000 or 5000 lines at
> times.
>
> The results speak for themselves:
>
> CVS gcc 3.0: Debian gcc 2.95.3: RedHat gcc 2.96:
>
> real 16m8.423s real 8m2.417s real 12m24.939s
> user 15m23.710s user 7m22.200s user 10m14.420s
> sys 0m48.730s sys 0m41.040s sys 2m13.910s
> maps: <250 lines <250 lines >3000 lines

Let me inject some information about what gcc's doing in each version.

2.95.3 allocates its memory via a bunch of 'obstacks' which,
underneath, get memory from malloc, and therefore brk(2). I'm very
surprised to see it had ~250 vmas; it should be more like 10.

2.96 and later versions use a garbage collecting allocator instead; it
was becoming much too hard to decide which obstack to use when. The
garbage collector allocates memory with mmap(..., MAP_ANON, ...).
This is to avoid interfering with malloc, which is still used in many
places; and to get page-aligned memory without wasting tons of space,
as valloc(3) does.

In Red Hat's 2.96, that allocator gets memory from mmap one page at a
time. If I understand what's going on in the kernel correctly, that
means each page is its own vma. 25 megs of GC arena is roughly 6400
vmas in that regime.

In CVS 3.0-to-be (and trunk), the allocator gets memory in 32-page
chunks instead. So 25 megs of GC arena is only 200 vmas.

However, 25 megs of GC arena is small as these things go. GCC's
memory consumption can _easily_ get up to 200 or 300 megs. The
example I'm familiar with is insn-attrtab.c from GCC's own sources
(it's machine-generated code, with several huge functions). 256 megs
of GC arena, in 32-page chunks, is 2048 vmas. Yes, at this point the
machine is swapping... but if I understand the issue, it's just when
we're swapping that having thousands of vmas causes problems.

In conclusion, I think that GCC's allocator still makes a good case
for merging vmas.

zw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/