the vma lookup overhead is nothing compared to the walking of the linked
list (not of the tree) to find the first available slot above
TASK_UNMAPPED_BASE. In the vma lookup engine rewrite I only cared about
find_vma, not about optimizing the search of a free vma over
TASK_UNMAPPED_BASE. Such list-loop is really evil.
> What appears to be happening is that the VMA tree has degenerated
> into a monstrous singly linked list. All those little 4k mappings
actually it's not that the tree degenerated into a list. It's that we
need to walk all the vma to check if there is a large enough hole to
place the new dynamic mapping and so walk we use the list, not the tree,
because it needs less mem resources and it's simpler and faster.
You can fix the problem in userspace by using a meaningful 'addr' as
hint to mmap(2), or by using MAP_FIXED from userspace, then the kernel
won't waste time searching the first available mapping over
TASK_UNMAPPED_BASE.
> The reason you don't see it with an anonymous map is, I think, that
> the kernel will merge contiguous anon mappings into a single one,
Yes. But actually merging also file backed vmas would be a goodness
indipendently of the arch_get_unmapped_area loop. It's not possible
right now because the anonymous memory case it's much simpler: no
i_shared_locks etc... but I'd like if in the long run also the file
backed vma will be mergeable. The side effect of the goodness is that
also the loop would run faster of course. But technically to really kill
the evil complexity of the loop (for example if every vma belongs to a
different file so it cannot be merged anyways) we'd need a tree of
"holes" indexed in function of the size of the hole. but it seems a very
dubious optimization... Complexity wise it makes sense, but in practice
the common case should matter more I guess, and of course userspace can
just fix this without any kernel modification by passing an helpful
"addr", to the mmap syscall with very very little effort. Untested
patch follows:
@@ -68,7 +9,7 @@
int main()
{
int i = 0;
- void *p;
+ void *p = NULL;
int t;
int fd;
@@ -79,7 +20,7 @@
}
t = time(NULL);
while (1) {
- p = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+ p = mmap(p, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if ((int) p == -1) {
perror("mmap");
return;
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/