Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest

Con Kolivas (conman@kolivas.net)
Mon, 25 Nov 2002 17:44:30 +1100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Murray J. Root: "keyboard and mouse lost in X"
Previous message: Werner Almesberger: "Re: Module Refcount & Stuff mini-FAQ"
Next in thread: Andrea Arcangeli: "Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest"
Reply: Andrea Arcangeli: "Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest"
Reply: Andrea Arcangeli: "Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest"

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>On Sat, Nov 23, 2002 at 09:29:22AM +1100, Con Kolivas wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> process_load:
>> Kernel [runs] Time CPU% Loads LCPU% Ratio
>> 2.4.18 [3] 109.5 57 119 44 1.50
>> 2.4.19 [3] 106.5 59 112 43 1.45
>> 2.4.20-rc1 [3] 110.7 58 119 43 1.51
>> 2.4.20-rc1aa1 [3] 110.5 58 117 43 1.51*
>> 2420rc2aa1 [1] 212.5 31 412 69 2.90*
>>
>> This load just copies data between 4 processes repeatedly. Seems to take
>> longer.
>
>you go into linux/include/blkdev.h and increase MAX_QUEUE_SECTORS to (2
><< (20 - 9)) and see if it makes any differences here? if it doesn't
>make differences it could be the a bit increased readhaead but I doubt
>it's the latter.

No significant difference:
2420rc2aa1 212.53 31% 412 69%
2420rc2aa1mqs2 227.72 29% 455 71%

>> xtar_load:
>> Kernel [runs] Time CPU% Loads LCPU% Ratio
>> 2.4.18 [3] 150.8 49 2 8 2.06
>> 2.4.19 [1] 132.4 55 2 9 1.81
>> 2.4.20-rc1 [3] 180.7 40 3 8 2.47
>> 2.4.20-rc1aa1 [3] 166.6 44 2 7 2.28*
>> 2420rc2aa1 [1] 217.7 34 4 9 2.97*
>>
>> Takes longer. Is only one run though so may not be an accurate average.
>
>This most probably is a too small waitqueue. Of course increasing the
>waitqueue will increase a bit the latency too for the other workloads,
>it's a tradeoff and there's no way around it. Even read-latency has the
>tradeoff when it chooses the "nth" place to be the seventh slot, where
>to put the read request if it fails inserction.
>
>> io_load:
>> Kernel [runs] Time CPU% Loads LCPU% Ratio
>> 2.4.18 [3] 474.1 15 36 10 6.48
>> 2.4.19 [3] 492.6 14 38 10 6.73
>> 2.4.20-rc1 [2] 1142.2 6 90 10 15.60
>> 2.4.20-rc1aa1 [1] 1132.5 6 90 10 15.47
>> 2420rc2aa1 [1] 164.3 44 10 9 2.24
>>
>> This was where the effect of the disk latency hack was expected to have an
>> effect. It sure did.
>
>yes, I certainly can feel the machine much more responsive during the
>write load too. Too bad some benchmark like dbench decreased
>significantly but I don't see too many ways around it. At least now with
>those changes the contigous write case is unaffected, my storage test
>box still reads and writes at over 100mbyte/sec for example, this
>clearly means what matters is that we have 512k dma commands, not an
>huge size of the queue. Really with a loaded machine and potential
>scheduling delays it could matter more to have a larger queue, that
>maybe why the performance is decreased for some workload here too, not
>only because of a less effective elevator. So probably 2Mbyte of queue
>is a much better idea, so at least we can have a ring with 4 elements to
> refill after a completion wakeup, I wanted to be strict to see the
> "lowlatency" effect at most in the first place. We could also consider to
> use a /4 instead of my current /2 for the batch_sectors initialization.
>
>BTW, at first glance it looks 2.5 has the same problem in the queue
>sizing too.
>
>> read_load:
>> Kernel [runs] Time CPU% Loads LCPU% Ratio
>> 2.4.18 [3] 102.3 70 6 3 1.40
>> 2.4.19 [2] 134.1 54 14 5 1.83
>> 2.4.20-rc1 [3] 173.2 43 20 5 2.37
>> 2.4.20-rc1aa1 [3] 150.6 51 16 5 2.06
>> 2420rc2aa1 [1] 140.5 51 13 4 1.92
>>
>> list_load:
>> Kernel [runs] Time CPU% Loads LCPU% Ratio
>> 2.4.18 [3] 90.2 76 1 17 1.23
>> 2.4.19 [1] 89.8 77 1 20 1.23
>> 2.4.20-rc1 [3] 88.8 77 0 12 1.21
>> 2.4.20-rc1aa1 [1] 88.1 78 1 16 1.20
>> 2420rc2aa1 [1] 99.7 69 1 19 1.36
>>
>> mem_load:
>> Kernel [runs] Time CPU% Loads LCPU% Ratio
>> 2.4.18 [3] 103.3 70 32 3 1.41
>> 2.4.19 [3] 100.0 72 33 3 1.37
>> 2.4.20-rc1 [3] 105.9 69 32 2 1.45
>>
>> Mem load hung the machine. I could not get rc2aa1 through this part of the
>> benchmark no matter how many times I tried to run it. No idea what was
>> going on. Easy to reproduce. Simply run the mem_load out of contest (which
>> runs until it is killed) and the machine will hang.
>
>sorry but what is mem_load supposed to do other than to loop forever? It
>is running for two days on my test box (512m of ram, 2G of swap, 4-way
>smp) and nothing happened yet. It's an infinite loop. Sounds like you're
>trapping a signal. Wouldn't it be simpler to just finish after a number
>of passes? The machine is perfectly usable and responsive during the
>mem_load, xmms doesn't skip a beat for istance, this is probably thanks
>to the elevator-lowlatency too, I recall xmms wasn't used to be
>completely smooth during heavy swapping in previous kernels (because the
> read() of the sound file didn't return in rasonable time since I'm swapping
> in the same hd where I store the data).
>
>jupiter:~ # uptime
> 4:20pm up 1 day, 14:43, 3 users, load average: 1.38, 1.28, 1.21
>jupiter:~ # vmstat 1
> procs memory swap io system
> cpu r b w swpd free buff cache si so bi bo in cs us
> sy id 0 1 0 197408 4504 112 1436 21 34 23 34 36 19
> 0 2 97 0 1 0 199984 4768 116 1116 11712 5796 11720 5804 514
> 851 1 2 97 0 1 0 234684 4280 108 1116 14344 12356 14344 12360
> 617 1034 0 3 96 0 1 0 267880 4312 108 1116 10464 11916
> 10464 11916 539 790 0 3 97 1 0 0 268704 5192 108 1116 6220
> 9336 6220 9336 363 474 0 1 99 0 1 0 270764 5312 108 1116
> 13036 18952 13036 18952 584 958 0 1 99 0 1 0 271368 5088 108
> 1116 8288 5160 8288 5160 386 576 0 1 99 0 1 1 269184 4296
> 108 1116 4352 6420 4352 6416 254 314 0 0 100 0 1 0 266528
> 4604 108 1116 9644 4652 9644 4656 428 658 0 1 99
>
>there is no way I can reproduce any stability problem with mem_load here
>(tested both on scsi quad xeon and ide dualathlon). Can you provide more
>details of your problem and/or a SYSRQ+T during the hang? thanks.

The machine stops responding but sysrq works. It wont write anything to the
logs. To get the error I have to run the mem_load portion of contest, not
just mem_load by itself. The purpose of mem_load is to be just that - a
memory load during the contest benchmark and contest will kill it when it
finishes testing in that load. To reproduce it yourself, run mem_load then do
a kernel compile make -j(4xnum_cpus). If that doesnt do it I'm not sure how
else you can see it. sys-rq-T shows too much stuff on screen for me to make
any sense of it and scrolls away without me being able to scroll up.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE94cbRF6dfvkL3i1gRAvkgAKCOJwQ4hP2E5n1tu1r31MeCz9tULQCdE/lm
hEbMrTEK/u2Sb8INZbVJWpg=
=8YxG
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Murray J. Root: "keyboard and mouse lost in X"
Previous message: Werner Almesberger: "Re: Module Refcount & Stuff mini-FAQ"
Next in thread: Andrea Arcangeli: "Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest"
Reply: Andrea Arcangeli: "Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest"
Reply: Andrea Arcangeli: "Re: [BENCHMARK] 2.4.20-rc2-aa1 with contest"