This situation is this:
We are building an online game system. On some of the systems, there
are simulator processes running that each service a player. There may
be up to 200 or more of these processes running at any given time and
each uses a fairly large amount of memory (as reported by ps). Part of
this is due to the fact that the processes have not been optimized to
make the most efficient use of memory. When the simulator processes
start swapping, then the systems are becoming unstable, performance goes
all to hell and sometimes the systems totally hang. It would be useful
for us to be able to monitor as closely as possible the amount of memory
each processes is using and especially to be notified when these
processes start using significant amounts of swap, so that we can be
prepared to react before the situation gets out of hand. The other
reason why we want to collect this data is so that the developers can
analyze the process when it starts to swap and determine if there
optimizations that they can make that will improve the memory
utilization of these processes so that more processes can be run on the
same box and that swap usage is minimized.
A couple of points that may be useful (or not):
1. These systems are based on RedHat 7.2, but are running a kernel
built from the kernel source tree for Advanced Server 2.1 (as obtained
from the kernel source RPM for the 2.4.9.e-3 kernel). Originally, they
were running on the 2.4.18 kernel from RedHat 7.3, but in our particular
situation, the 2.4.9 Advanced Server kernel was found to have better
performance characteristics.
2. Each of the systems running the simulator processes consist of the
following:
Dual P4 Xeon 2.5 GHz processors (hyperthreading is enabled in the
motherboard BIOS setup)
4 Gigs of RAM
3Ware RAID controller
2 x 40 GB disks in RAID 1 configuration
2 x E1000 NICS
3. The only modifications that were made to the 2.4.9 AS kernel was to
update to the latest version of the E1000 driver from Intel, since the
one in the AS 2.1 kernel source tree didn't work with the systems. The
reason why the kernel was compiled locally was to remove unwanted
options, such as USB and sound support, and to eliminate the need for an
initrd.
If further data is required, I can provide it. Thanks for all who have
responded thus far.
Tom S.
-- +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | Tom Schenk | A positive attitude may not solve all your | | Online Ops | problems, but it will annoy enough people to | | tschenk@ea.com | make it worth the effort. -- Herm Albright | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/