We're still trying to find the correct mathematical functions to do
this. Trust me, it is not so easy, the mapping of the port matrix and
the network flow through many stacked packet filters and firewalls
generates a rather complex graph (partly bigraph (LVS-DR for example))
which has complex structures (redundancy and parallelisations). It's not
that we could sit down and implement a fw-script for our packet filters,
the fw-script is being generated through a meta-fw layer that knows
about the surrounding network nodes.
> But yes, we also found that the L2 cache is limiting here
> (ip_conntrack has the same problem)
I think this weekend I will do my tests also measuring some cpu
performance counters with oprofile, such as DATA_READ_MISS, CODE CACHE
MISS and NONCACHEABLE_MEMORY_READS.
> At least that is easily fixed. Just increase the LOG_BUF_LEN parameter
> in kernel/printk.c
Tests showed that this only helps in peak situations, I think we should
simply forget about printk().
> Alternatively don't use slow printk, but nfnetlink to report bad packets
> and print from user space. That should scale much better.
Yes and there are a few things that my collegue found out during his
tests (actually pretty straight forward things):
1. A big log buffer is only useful to come by peaks
2. A big log buffer while having high CPU load doesn't help at all
3. The smaller the message, the better (binary logging thus is an
advantage)
4. The logging via printk() is extremely expensive, because of the
conversions and whatnot. A rough estimate would be 12500 clock
cycles for a log entry generated by printk(). This means that on a
PIII/450 a log entry needs 0.000028s and this again leads to
following observation: Having 36000pps which should all be logged,
you will end up with a system having 100% CPU load and being 0% idle.
5. The kernel should log a binary stream, also the daemon that needs to
fetch the data. If you want to convert the binary to human readable
format, you start a process with low prio or do it on-demand.
6. Ideally the log daemon should be preemtible to get a defined time
slice to do its job.
Some test results conducted by a coworker of mine (Achim Gsell):
Max pkt rate the system can log without losing more then 1% of the messages:
----------------------------------------------------------------------------
kernel: Linux 2.4.19-gentoo-r7 (low latency scheduling)
daemon: syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIII/450
packet-len: 64 256 512 1024
2873pkt/s 3332pkt/s 3124pkt/s 3067pkt/s
1.4 Mb/s 6.6Mb/s 12.2Mb/s 23.9Mb/s
daemon: syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIVM/1.7
packet-len: 64 256 512 1024
7808pkt/s 7807pkt/s 7806pkt/s pkt/s
3.8 Mb/s 15.2Mb/s 30.5Mb/s Mb/s
----------------------------------------------------------------------------------------------------------
daemon: cat /proc/kmsg > kernlog, logbufsiz=16k, pkts=10*10000,
CPU=PIII/450
packet-len: 64 256 512 1024
4300pkt/s 3076pkt/s
2.1 Mb/s 24.0Mb/s
daemon: ulogd (nlbufsize=4k, qthreshold=1), pkts=10*10000, CPU=PIII/450
packet-len: 64 256 512 1024
4097pkt/s 4097pkt/s
2.0 Mb/s 32 Mb/s
daemon: ulogd (nlbufsize=2^17 - 1, qthreshold=1), pkts=10*10000,
CPU=PIII/450
packet-len: 64 256 512 1024
6576pkt/s 5000pkt/s
3.2 Mb/s 38 Mb/s
daemon: ulogd (nlbufsize=64k, qthreshold=1), pkts=1*10000, CPU=PIII/450
packet-len: 64 256 512 1024
pkt/s
4.0 Mb/s
daemon: ulogd (nlbufsize=2^17 - 1, qthreshold=50), pkts=10*10000,
CPU=PIII/450
packet-len: 64 256 512 1024
6170pkt/s 5000pkt/s
3.0 Mb/s 38 Mb/s
Best regards,
Roberto Nibali, ratz
-- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/