Re: [evms-devel] [linux-lvm] [ANNOUNCE] LVM reimplementation ready for beta testing

Kevin Corry (corryk@us.ibm.com)
Fri, 1 Feb 2002 15:59:06 -0600


On Thursday 31 January 2002 06:52, Joe Thornber wrote:
> On Thu, Jan 31, 2002 at 01:52:29PM -0600, Steve Pratt wrote:
> > No, not really. We only put in the kernel the things that make sense to
> > be in the kernel, discovery logic, ioctl support, I/O path. All
> > configuration is handled in user space.
>
> There's still a *lot* of code in there; > 26,000 lines in fact.
> Whereas device-mapper weighs in at ~2,600 lines. This is just because
> you've decided to take a different route from us, you may be proven to
> be correct.

Just so everyone is clear on the amount of code we are talking about, here
are my current counts of the different kernel drivers (based on code for
2.4.17):

EVMS: 17685 (this includes drivers/evms and include/linux/evms)

device-mapper: 2895 (this includes device-mapper/kernel/common and
device-mapper/kernel/ioctl - I've left out device-mapper/kernel/fs since only
one interface can be active at a time)

Current MD and LVM1: 11105 (this includes drivers/md, include/linux/lvm.h,
and include/linux/raid/)

Linux 2.4.17: 2519386 (a clean kernel, without EVMS or the latest LVM1
updates, and not counting asm code)

See http://www.dwheeler.com/sloccount/ for the tool that I used to get these
counts.

So I will agree - device-mapper does provide a nice, general-purpose I/O
remapping service in a small amount of code. Kernel bloat is obviously a big
concern as more functionality is added to Linux, and achieving a desired set
of functionality with less code is generally a good thing.

However, I don't think that the size of EVMS should just be written off as
kernel bloat. (I don't think any of the LVM guys have specifically said this,
but I have heard this comment from others, and I don't think they are looking
at the whole picture). We are talking about seven-tenths of a one percent
increase in the total size of the kernel. And if you consider that EVMS has
implemented support for LVM1 and MD, then EVMS is really only adding 6580
lines of code to the kernel. On top of that, EVMS has support for several
disk partitioning formats. (This support does not yet fully duplicate the
advanced partition support in the kernel, so I can't yet give any definite
numeric comparisons.) There is also support for the AIX LVM and the OS/2 LVM,
as well as general bad-block-relocation and snapshotting support. For all of
this extra functionality, I don't believe the extra code is unwarranted.

> I would like the two projects to help each other, but not to the point
> where one group of people has to say 'you are completely right, we
> will stop developing our project'. It's unlikely that either of us is
> 100% correct; but I do think device-mapper splits off a nice chunk of
> services that is useful to *all* people who want to do volume
> management. As such I see that as one area where we may eventually
> work together.
>
> Similarly I expect to be providing an *optional* kernel module for LVM
> users who wish to do in kernel discovery of a root LV, so if the EVMS
> team has managed to get a nice generic way of iterating block devices
> etc. into the kernel, we would be able to take advantage of that.
> Are you trying to break out functionality so it benefits other Linux
> projects ? or is EVMS just one monolithic application embedded in the
> kernel ?

I have been thinking about this today and looking over some of the
device-mapper interfaces. I will agree that, in concept, EVMS could be
modified to use device-mapper for I/O remapping. However, as things stand
today, I don't think the transition would be easy.

As I'm trying to envision it, the EVMS runtime would become a "volume
recognition" framework (see tanget below). Every current EVMS plugin would
then probe all available devices and communicate the necessary volume
remapping info to device-mapper through the ioctl interface. (An in-kernel
API set might be nice to avoid the overhead of the ioctl path). A new device
would then be created for every node that every plugin recognizes. This
brings up my first objection. With this approach, there would be an exposed
device for every node in a volume stack, whereas the current EVMS design only
exposes nodes for the final volumes. Ignoring the dwindling minor-number
issue which should go away in 2.5, you still take up space in the
buffer-cache for every one of these devices, which introduces the possibility
of cache-incoherencies.

Maybe this example will help: Say we have four disks. These four disks have
one partition each, and are striped together with RAID-0 (using MD). This MD
device is then made into an LVM PV and put in a volume group, and an LV is
created from part of the space in that group. Then down the line, you decide
to do some backups, and create another LV to use as a snapshot of the first
LV. (For those who might be wondering, this is a very realistic scenario.)

Snap_of_LV1
|
LV1 LV2
_|__________|_
| Volume Group |
--------------
|
md0
_____________|_______________
| | | |
sda1 sdb1 sdc1 sdd1
| | | |
sda sdb sdc sdd

In this scenario, we would wind up with exposed devices for every item in
this graph (except the volume group). But in reality, we don't want someone
coming along and mucking with md0 or with LV2 or with any of the disk
partitions, because they are all in use by the two volumes at the top.

<tangent>

As we know, EVMS does volume discovery in the kernel. LVM1 does discovery in
user-space, but Joe has hinted at an in-kernel LVM1 discovery module to work
with device-mapper. Back when we started on EVMS, people were basically
shouting "we need in-kernel discovery!", so that's the route we took. This is
why it looks like EVMS has so much code. I'd say 50-75% of each plugin is
devoted to discovery. Of course, today there seem to be people shouting,
"let's move all discovery into user-space!". Well, I suppose that approach is
feasible, but I personally don't agree with it. My belief is that it's the
kernel's job to tell user-space what the hardware looks like, not the other
way around. If we move partition/volume/etc discovery into user-space, at
what point do we move device recognition into user-space? Looking down that
path just seems more and more like a micro-kernel approach, and I'm sure we
don't want to rehash that discussion.

</tangent>

Next, from what I've seen, device-mapper provides static remappings from one
device to another. It seems to be a good approach for setting up things like
disk partitions and LVM linear LVs. There is also striping support in
device-mapper, but I'm assuming it uses one notion of striping. For instance,
RAID-0 striping in MD is handled differently than striped LVs in LVM, and I
think AIX striping is also different. I'm not sure if one stiping module
could be generic enough to handle all of these cases. But, maybe it can. I'll
have to think more about that one.

How about mirroring? Does the linear module in device-mapper allow for 1-to-n
mappings? This would be similar to the way AIX does mirroring, where each LP
of an LV can map to up to three PPs on different PVs.

How would device-mapper handle a remapping that is dynamic at runtime? For
instance, how would device-mapper handle bad-block-relocation? Actually, it
seems you have dealt with this from one point of view in the snapshotting
code in device-mapper. In order for persistent snapshots (or bad-block) to
work, device-mapper needs a module which knows the metadata format, because
that metadata has to be written at runtime. So another device-mapper "module"
would need to be written to handle bad-block. This implicitly limits the
capabilities of device-mapper, or else ties it directly to the recognition
code. For EVMS and device-mapper to work together, we would have to agree on
metadata formats for these types of modules. Other similar example that come
to mind are RAID-5 and block-level encryption.

Now, don't get me totally wrong. I'm not saying using device-mapper in EVMS
is impossible. I'm just pointing out some of the issues I currently see with
making such a transition. Perhaps some of these issues can be addressed, from
either side.

Ultimately, I agree with Joe and Alasdair - I think there is room for both
projects. There are plenty of other examples of so-called competing projects
that co-exist just fine - KDE/Gnome, ReiserFS/JFS/XFS/ext3 - hell, there's
even two virtual memory managers to choose from! So if it just turns out that
Linux has a choice of two volume managers, then I don't have any problem with
it. I will say that it is somewhat unfortunate that we couldn't have worked
together more, but it seems to me that timing is what kept it from happening.
EVMS was under development when LVM was getting ready for their 1.0 release,
and now EVMS is trying to get a final release out as the new device-mapper is
coming along. Unfortunately we each have our own deadlines to think about.
Maybe down the line there will be more time to get together and figure out
ways to get these different technologies to work together.

-Kevin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/