> Chuck Ebbert wrote:
>
>> Nick Piggin wrote:
>>
>>
>>
>>> OK right. As far as I can see, the algorithm in the RAID1 code
>>> is used to select the best drive to read from? If that is the
>>> case then I don't think it could make better decisions given
>>> more knowledge.
>>>
>>
>>
>> How about if it just asks the elevator whether or not a given read
>> is a good fit with its current workload? I saw in 2.5 where the balance
>> code is looking at the number of pending requests and if it's zero then
>> it sends it to that device. Somehow I think something better than
>> that could be done, anyway.
>>
> That balance code is probably the IDE or SCSI channel balancing?
> In that case, the driver simply wants to know which device it
> should service next, which is an appropriate fit (is that what
> you were talking about? I don't have source here sorry)
>
>
> We could ask the elevator if a given read is a good fit. It
> would probably help.
>
>>
>>
>>
>>> It seems to me that a better way to layer it would be to have
>>> the complex (ie deadline/AS/CFQ/etc) scheduler handling all
>>> requests into the raid block device, then having a raid
>>> scheduler distributing to the disks, and having the disks
>>> run no scheduler (fifo).
>>>
>>
>>
>> That only works if RAID1 is working at the physical disk level (which
>> it should be AFAIC but people want flexibility to mirror partitions.)
>>
> How so? Basically you want your high level scheduler to run first.
> You want it to act on the stream of requests from the system, not
> on the stream of requests to the device. If you know what I mean.
>
> I might be wrong here. I haven't done any testing, and only a
> little bit of thinking.
>
>>
>>
>>
>>> In practice the current scheme probably works OK, though I
>>> wouldn't know due to lack of resources here :P
>>>
>>
>>
>> I've been playing with the 2.4 read balance code and have some
>> improvements, but real gains need a new approach.
>>
> The problem I see, is the higher level schedulers (deadline for
> example, as opposed to the RAID scheduler) will find it difficult
> to tell if a request will be "good" for them or not. For example
> we have 2 devices, 100 requests in each scheduler queue.
>
> Device A's head is at sector x and next request is at x+100,
> Device B's head is at sector x+10 and next request is at x+200.
>
> RAID wants to know which queue should take a request at sector
> x+1000. What do you do?
>
> The way you would do a good "goodness" function, I guess,
> would be to search through all requests on the device, and return
> the minimum distance from the request you are running the query
> on. Do this for both queues, and insert the request into the
> queue with the smallest delta. I don't see much else doing any
> good.
Well no I'm an idiot. You obviously don't have to "search
through all requests" as they are (for AS, DL, CFQ) in an
rbtree. So that might not be too bad an idea to investigate.
But...
It still means you get the high level scheduling below where
you want it. This means the read/write batches for each queue
will not stay in sync (not sure if this is a bad thing), request
deadlines will mean even the good "goodness" calculation
does not always be good, process fairness could be badly
impacted for some loads, and AS has other problems
(hopefully not too bad).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/