Re: The disappearing sys_call_table export.

Ben Lau (benlau@linux.org.hk)
Wed, 07 May 2003 10:14:09 +0800


Hi,

I am interested with pt2, how NFS did for their syscall?

Terje Eggestad wrote:
> Unfortunately we live in an insane world.
>
> First of all, in the Changelog where the export was removed for 2.5.41
>
> http://www.kernel.org/pub/linux/kernel/v2.5/ChangeLog-2.5.41
>
> Arjan lists 4 reasons for having the export in the first place, and I'm
> on point 3. Here Arjan pretty much acknowledges that there is a
> legitimate need to have a event/hook system to be informed of a syscall.
> The exact quote is: "Eg the use of the export in this just a bandaid due
> to lack of a proper mechanism".
>
> My argument for *why* there should be a mechanism stops here.
>
>
> Since you're bright inquisitive: The exact problem I'm facing is pretty
> complex:
>
>
> 1. performance is everything.
> 2. We're making a MPI library, and as such we don't have any control
> with the application.
> 3a. The various hardware for cluster interconnect all work with DMA.
> 3b. the performance loss from copying from a receive area to the
> userspace buffer is unacceptable.
> 3c. It's therefore necessary for HW to access user pages.
> 4. In order to to 3, the user pages must be pinned down.
> 5. the way MPI is written, it's not using a special malloc() to allocate
> the send receive buffers. It can't since it would break language binding
> to fortran. Thus ANY writeable user page may be used.
> 6. point 4: pinning is VERY expensive (point 1), so I can't pin the
> buffers every time they're used.
> 7. The only way to cache buffers (to see if they're used before and
> hence pinned) is the user space virtual address. A syscall, thus ioctl
> to a device file is prohibitive expensive under point 1.
> 8a. if the app (glibc in practice, but you never know) use sbrk() with a
> negative arg, and then a positive argument, I can get a a different set
> of user pages with the same address.
> 8b ditto with a set of munmap()/mmap().
> 9. since the number of times. any 'realloc' may happen is << than the
> numbers of times any buffer may be used, it's necessary under point 1 to
> to trace changes to virtual addresses to phys pages, rather than test
> every time an address is being used.
> 10. kernel patches are impractical, I must be able to do this with std
> stock, redhat, AND suse kernels.
>
>
>
>
> On Mon, 2003-05-05 at 10:23, Christoph Hellwig wrote:
>
>>On Mon, May 05, 2003 at 10:19:45AM +0200, Terje Eggestad wrote:
>>
>>>Now that it seem that all are in agreement that the sys_call_table
>>>symbol shall not be exported to modules, are there any work in progress
>>>to allow modules to get an event/notification whenever a specific
>>>syscall is being called?
>>
>>No.
>>
>>
>>>We have a specific need to trace mmap() and sbrk() calls.
>>
>>Well, you get mmap events for your driver and I can't imagine a sane
>>reason for intwercepting sbrk(). Do you have a pointer to the driver
>>source doing such strange things?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/