mapper
Storage management with LVM2, mdadm and device mapper
Storage management with LVM2, mdadm and device mapper
Introduction
Exercise: Preparing block devices for LVM2 use
Exercise: Creating physical volumes and volume groups
Exercise: Creating logical volumes and file systems
Exercise: Creating and checking ext4 file systems (mkfs/fsck)
Exercise: Mounting file systems at bootup time
Exercise: Creating a snapshot volume
Exercise: Increasing logical volumes and file systems
Exercise: Adding a physical volume to a volume group
Exercise: Removing/replacing a physical volume
Exercise: Setting up a RAID1 device using mdadm
Exercise: Setting up Encryption using dm_crypt and LUKS
Introduction
In the early days, Linux could only be installed on fixed hard disk partitions (primary or logical partitions on PCs), which are usually hard to change
after the fact, especially if Linux had to live alongside another operating system (e.g. Microsoft Windows) on the same hard disk drive. Making
changes to the existing partitioning layout usually involved using proprietary tools like Partition Magic or biting the bullet and re-installing
everything from scratch after changing the partition configuration. Also, it was not possible to create file systems that could span across several
physical devices or to provide redundancy (RAID) or encryption.
With the introduction of the Linux device mapper (DM) and LVM2, the logical volume manager for Linux several years ago, Linux provides very
powerful and much more flexible support for managing storage. DM provides an abstraction layer on top of the actual storage block devices and
provides the foundation for LVM2, RAID, encryption and other features.
Linux LVM2 provides features like growing volumes, adding additional block devices, moving volumes between storage devices. Cluster volume
manager supports working with shared storage devices (e.g. SANs). Block devices are arranged as physical volumes that can be grouped into
volume groups. Logical volumes are created within the volume groups. File systems are created on top of the logical volumes, like on a regular
disk partition. Volume Groups and Logical Volumes can be named individually for easy addressing/organizing storage.
The following picture illustrates a possible LVM configuration:
In addition to logical volume management with LVM2, the Linux kernel supports software-RAID with the MD (multiple devices) driver. MD
organizes disk drives into RAID arrays (providing different RAID levels), including fault management.
This lab session will walk you through the basic uses of LVM2, MD RAID and encryption with dm_crypt device mapper module on the command
line.
To avoid messing up the operating system itself, we created two additional virtual disk drives that will be used for these lab exercises.
These two additional virtual SATA disks should appear as SCSI disk drives /dev/sdb and /dev/sdc in addition to the primary disk drive
containing the operating system (/dev/sda) in the booted guest system.
To verify, check the output of the kernel boot messages:
2:0:0:0:
2:0:0:0:
2:0:0:0:
2:0:0:0:
3:0:0:0:
3:0:0:0:
3:0:0:0:
3:0:0:0:
4:0:0:0:
4:0:0:0:
4:0:0:0:
4:0:0:0:
4:0:0:0:
3:0:0:0:
2:0:0:0:
2:0:0:0:
3:0:0:0:
4:0:0:0:
You can also use the lsscsi command or read the content of the /proc/scsi/scsi file to list all connected SATA/SCSI devices:
cd/dvd
disk
disk
disk
VBOX
ATA
ATA
ATA
CD-ROM
VBOX HARDDISK
VBOX HARDDISK
VBOX HARDDISK
1.0
1.0
1.0
1.0
/dev/sr0
/dev/sda
/dev/sdb
/dev/sdc
Rev: 1.0
ANSI SCSI revision: 05
Rev: 1.0
ANSI SCSI revision: 05
Rev: 1.0
ANSI SCSI revision: 05
Rev: 1.0
ANSI SCSI revision: 05
Now that we have verified that we have two additional disk drives for our experiments, let's get going with the LVM2 configuration.
partition:
p
Partition number (1-4): 1
First cylinder (1-522, default 1): 1
Last cylinder, +cylinders or +size{K,M,G} (1-522, default 522): 522
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): p
Disk /dev/sdb: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xcd14f5f9
Device Boot
/dev/sdb1
Start
1
End
522
Blocks
4192933+
Id
8e
System
Linux LVM
Note
Repeat the procedure above to partition the second disk drive (/dev/sdc) in the same way.
The -v option makes the output more verbose, so you can see what the command is actually doing. You can use pvdisplay to print all known
physical volumes:
In the example above, you will notice that the base operating system is also installed on top of LVM2, the second and third partition of the first
disk drive (/dev/sda2 and /dev/sda3) belong to the volume group vg_oraclelinux6.
As an alternative to the above, the pvs command displays all available PVs in a more condensed form:
VG
/dev/sda2
/dev/sda3
/dev/sdb1
/dev/sdc1
We now have two additional physical volumes that we can assign to an existing or a completely new volume group. We will start with using just
one of the two additional physical volumes for the first examples. Later, the second volume will come into play, too.
You can now use the vgcreate command to create a new volume group on the physical volume(s). Space in a volume group is divided into
extents, chunks of space that are allocated at once. The default is 4 MB. The basic syntax is:
vgcreate -v <volume group name> <device>
It's possible to provide more than one physical device here, to create a volume group that spans across multiple physical volumes. Again, the -v
option makes the command's execution a bit more verbose so we can see what's going on. Now let's create a new volume group myvolg on
physical volume /dev/sdb1:
The command vgdisplay will list all known volume groups in the system. Note how our new volume group is there, too:
myvolg
lvm2
1
1
read/write
resizable
0
0
0
0
1
1
4.00 GiB
4.00 MiB
1023
0 / 0
1023 / 4.00 GiB
Tb30rU-AcHP-Cfvq-2cfH-jMa0-NOF1-DuGMvz
vg_oraclelinux6
lvm2
2
6
read/write
resizable
0
2
2
0
2
2
9.50 GiB
4.00 MiB
2432
2432 / 9.50 GiB
0 / 0
0tE3oy-Jylq-PABw-mPQf-Cl9Z-2pqz-zU02su
An alternative short form is using the vgs command, which displays the known volume groups in a more condensed fashion:
Using this command you can quickly get an overview of your LVM setup and it's also particularly suitable to be used inside of shell scripts. Check
the vgs(8) man page for more details.
In our example you can see that the volume group vg_oraclelinux6 consists of two physical volumes (#PV), contains two logical volumes
(#LV) and has no free space left for additional logical volumes (VFree=0). Our newly created volume group myvolg consists of one physical
volume, contains no logical volumes yet and has 4 gigabytes of free space available.
Storage space in LVM2 is divided into so-called extents this is the smallest logical unit a volume can be made of. By default, vgcreate
chooses a physical extent size of 4 megabytes, but you can change this by using the --physicalextentsize option, depending on your
storage requirements.
lvcreate --size <size> --name <logical volume name> <volume group name>
The --size option defines the size of the logical volume, by allocating the respective amount of logical extents from the free physical extent pool
of that volume group.
This will create a new logical volume in the given volume group. LVM2 automatically creates the appropriate block device nodes (named dm-x,
where "x" is a sequence number) in the /dev subdirectory. Additionally LVM2 creates named entries for each volume:
These are symbolic links that point to the dm-x device node.
Let's create a logical volume named myvol inside of the myvolg volume group, with a size of 2 gigabytes:
/dev/myvolg/myvol
myvol
myvolg
igrKHo-IdMv-rECU-b3ju-xQde-FV3U-ffabjx
read/write
oraclelinux6.localdomain, 2013-01-09 01:09:43 -0800
available
0
2.00 GiB
512
1
inherit
auto
256
252:2
/dev/vg_oraclelinux6/lv_root
lv_root
vg_oraclelinux6
l4kAq3-ahhE-cw8Y-D0G4-fkml-8W4X-kgEJmd
read/write
,
available
1
7.53 GiB
1928
2
inherit
auto
256
252:0
/dev/vg_oraclelinux6/lv_swap
lv_swap
vg_oraclelinux6
1olLkX-fTZ0-X79l-eDJo-9b6L-pLmp-Pp8dpm
read/write
,
available
2
1.97 GiB
504
1
inherit
auto
256
252:1
VG
Attr
LSize Pool Origin Data%
myvol
myvolg
-wi-a--- 2.00g
lv_root vg_oraclelinux6 -wi-ao-- 7.53g
lv_swap vg_oraclelinux6 -wi-ao-- 1.97g
Convert
The logical volume myvol has been created. LVM2 and the device mapper also created the corresponding block device nodes in /dev for us:
The free space in our volume group has also been reduced and the number of logical volumes has been updated:
By the way, it's possible to rename existing volume groups or logical volumes, using the vgrename and lvrename commands:
After the file system has been created, you need to mount it somewhere in your directory structure in order to be able to access it. First you need
to create a new empty directory that will act as the mount point, then you mount the file system to this location. We'll be creating a new toplevel
directory named /myvol in the exercise below:
[oracle@oraclelinux6 ~]$
mkdir: created directory
[oracle@oraclelinux6 ~]$
/dev/mapper/myvolg-myvol
[oracle@oraclelinux6 ~]$
Filesystem
Size
/dev/mapper/myvolg-myvol
2.0G
1.9G
4% /myvol
The -h option instructs df to use human readable values for printing the file system size, used and available disk space. Now you can access
the file system and start using it for storing data! Try creating some directories or copying some files into the file system. In the example below, we
use some kernel source files to populate some of the logical volume's disk space.
[oracle@oraclelinux6 ~]$
mkdir: created directory
[oracle@oraclelinux6 ~]$
[oracle@oraclelinux6 ~]$
Filesystem
Size
/dev/mapper/myvolg-myvol
2.0G
1.8G
8% /myvol
8 15:35 2.6.39-300.17.2.el6uek.x86_64
<mount point>
<mount options>
See the fstab(5) manual page for a more detailed description of these fields. Open /etc/fstab in your preferred text editor as the root user (e.g.
in sudo gedit /etc/fstab or sudo vi /etc/fstab) and add a new line for the file system we created to the end of the list:
#
# /etc/fstab
# Created by anaconda on Thu Jan 12 13:21:03 2012
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg_oraclelinux6-lv_root /
btrfs
defaults
UUID=ed6b5002-07d3-4381-9057-47ee31704c78 /boot
ext4
defaults
/dev/mapper/vg_oraclelinux6-lv_swap swap
swap
defaults
tmpfs
/dev/shm
tmpfs
defaults
0 0
devpts
/dev/pts
devpts gid=5,mode=620 0 0
sysfs
/sys
sysfs
defaults
0 0
proc
/proc
proc
defaults
0 0
1 1
1 2
0 0
Now the file system will be mounted automatically on the next reboot of your system. Try it out by rebooting your virtual machine at this point!
Example:
We can now go ahead and mount this snapshot like any other volume:
8 15:35 2.6.39-300.17.2.el6uek.x86_64
As you can see, the snapshot contains the exact same content as the volume it has been taken from. Removing files from the original volume
does not change the snapshot's content:
8 15:35 2.6.39-300.17.2.el6uek.x86_64
A snapshot is not just an identical read-only copy of a volume, it can be modified as well:
9 01:14 lost+found
9 01:32 src
Note that it's not possible to promote a snapshot volume into becoming a replacement for the original volume. Deleting the underlying volume
automatically erases all related snapshots as well. Also, creating snapshots of snapshots is not supported yet LVM2 is still evolving.
To remove an LVM snapshot, use the lvremove command, which can also be used to remove regular logical volumes:
To remove all logical volumes from a volume group, just provide the volume group name without listing any particular logical volume.
Example:
VG
Attr
LSize Pool Origin Data%
myvol
myvolg
-wi-ao-- 2.00g
lv_root vg_oraclelinux6 -wi-ao-- 7.53g
lv_swap vg_oraclelinux6 -wi-ao-- 1.97g
Convert
1.8G
7% /myvol
2.3G
6% /myvol
VG
Attr
LSize Pool Origin Data%
myvol
myvolg
-wi-ao-- 2.49g
lv_root vg_oraclelinux6 -wi-ao-- 7.53g
lv_swap vg_oraclelinux6 -wi-ao-- 1.97g
Convert
LV
VG
Attr
LSize Pool Origin Data%
myvol
myvolg
-wi-ao-- 2.49g
lv_root vg_oraclelinux6 -wi-ao-- 7.53g
lv_swap vg_oraclelinux6 -wi-ao-- 1.97g
VG
/dev/sda2
/dev/sda3
/dev/sdb1
/dev/sdc1
VG
/dev/sda2
/dev/sda3
/dev/sdb1
/dev/sdc1
Convert
6.9G
2% /myvol
The destination PV can be omitted; in this case LVM2 attempts to move all extents to any other available physical volume related to the affected
volume group.
In our case, we have an existing logical volume in the volume group that spans two physical volumes (by allocating physical extents from both),
so we currently would not be able to move it off the first disk. Fortunately the file system on that logical volume currently does not require that
much disk space, it can easily fit on the remaining working physical volume after shrinking it. As a first step, we therefore must reduce the file
system and the logical volume, to free up enough allocated extents:
Doh! While increasing an ext4 file system can be done on the fly, it needs to be unmounted and checked before we can shrink it:
6.9G
2% /myvol
35M
78% /myvol
As you can see, the file system has now been reduced in size significantly. The -M option instructs the resizing tool to shrink the file system to the
absolute minimum.
The same result could also be achieved by using the fsadm utility instead, which performs the checking, unmounting and resizing of a given file
system automatically and supports the ext2/3/4 file systems as well as ReiserFS and XFS (two other popular journaling file systems for Linux).
However, it does not support the option of shrinking a file system to it's minimum possible size, you need to provide an absolute size manually.
69M
64% /myvol
Now let's reduce the size of the logical volume underneath it. We'll choose to be on the safe side and reduce it from 7.5 GB to 200M, so we don't
accidentally damage the file system:
VG
Attr
LSize Pool Origin Data%
myvol
myvolg
-wi-ao-- 7.49g
lv_root vg_oraclelinux6 -wi-ao-- 7.53g
lv_swap vg_oraclelinux6 -wi-ao-- 1.97g
Convert
VG
Attr
LSize
Pool Origin Data%
myvol
myvolg
-wi-ao-- 200.00m
lv_root vg_oraclelinux6 -wi-ao-7.53g
lv_swap vg_oraclelinux6 -wi-ao-1.97g
Convert
69M
64% /myvol
The logical volume has now been reduced in size, so it does allocate much less extents from the volume group.
Alternatively, lvreduce can take care of reducing the file system on top of it automatically, by invoking fsadm by itself. This combines several of
the steps above into a single call:
VG
Attr
LSize
Pool Origin Data%
myvol
myvolg
-wi-ao-- 152.00m
lv_root vg_oraclelinux6 -wi-ao-7.53g
lv_swap vg_oraclelinux6 -wi-ao-1.97g
23M
85 /myvol
Now we can proceed with moving the allocated physical extents from the failing disk:
Convert
VG
/dev/sda2
/dev/sda3
/dev/sdb1
/dev/sdc1
23M
85% /myvol
The failing physical volume has now been removed from the volume group and can be replaced. The file system in logical volume myvol is still
available and could even be increased in size to make use of the remaining available space in the volume group.
For the sake of time, we skip the step of actually removing and re-adding the virtual disk drive from the virtual machine, let's just assume you
replaced the failing disk drive with a new one.
Once the replacement disk is in place, you can partition it and use pvcreate as outlined in an earlier exercise to make it available to LVM2
again.
Now you can add the physical volume to the volume group again:
Also don't forget to remove the mount point (rmdir /myvol) and taking out the mount point entry in /etc/fstab!
Now that we have two clean disk drives for testing, let's start with create a mirrored set out of them. This is done using the mdadm utility, which
is used for building, managing and monitoring Linux MD devices. Check the mdadm(8) manual page for a detailed description of its features and
options.
The /proc/mdstat file is a useful resource to quickly check the status of your MD RAID devices. In the example above, MD was busy initializing
the RAID1 device. After this initialization phase, the status should look as follows:
You can use the mdadm tool to get some more detailed information about the currently configured device:
:
:
:
:
:
:
:
:
1.2
Wed Jan 9 02:49:09 2013
raid1
4191897 (4.00 GiB 4.29 GB)
4191897 (4.00 GiB 4.29 GB)
2
2
Superblock is persistent
Update Time
State
Active Devices
Working Devices
Failed Devices
Spare Devices
:
:
:
:
:
:
Wed Jan
clean
2
2
0
0
9 02:49:30 2013
Major
8
8
Minor
17
33
RaidDevice State
0
active sync
1
active sync
/dev/sdb1
/dev/sdc1
Now that we have a block device, we can create a file system on top of it and put some data into it. Note that we could use LVM on top of this
RAID set, too, but we're sticking to a plain file system on top of the RAID for simplicity.
Size
4.0G
As you can see, the file system has a capacity of 4GB, which resembles the size of one disk drive. The data is being mirrored to the second one
transparently, in the background.
It's useful to store the raid configuration information in a configuration file named /etc/mdadm.conf, this will help mdadm to assemble existing
arrays at system bootup. You can either copy and adapt the sample configuration file from
/usr/share/doc/mdadm-3.2.1/mdadm.conf-example, or create a very minimalistic one from scratch, using your text editor of choice. Our
example looks as follows:
See the mdadm.conf(5) manual page for more details about the format of this file. Now that this configuration file exists, it needs to be added to
the initial ramdisk so the RAID array will be properly detected and initialized upon system reboot (see
https://bugzilla.redhat.com/show_bug.cgi?id=606481 for more details on why this is necessary).
Now let's copy some files on the device, so we have some data for testing:
8 15:35 2.6.39-300.17.2.el6uek.x86_64
9 02:53 lost+found
Size
4.0G
So far, our file system behaves like any other file system. Let's provoke a complete disk failure of one of the disk drives, so we can observe how
the MD driver handles this situation.
In VirtualBox, you can only make changes to the storage configuration when the VM has been powered off. So we need to shut down the VM first,
either by running the following command on the command line or by selecting System -> Shut Down... -> Shut Down from the virtual machine's
desktop menu:
Now we can detach one of the virtual disk drives from the system and reboot. Click on the VM's settings icon and select the Storage section.
Now right-click on the Disk2.vdi icon and select Remove attachment.
This will detach the disk drive from this virtual machine, to simulate a total failure of the entire disk drive.
Now let's restart the VM and figure out how MD copes with the missing disk drive. After the system has booted up, log in as the oracle user
again and open a Terminal.
Let's take a look at the status of our RAID device:
The [U_] part indicates that only one of two devices is active, but you need have a trained eye to discover this. It's better to look at the output
from mdadm, which is a bit more clear about the degraded state of the device:
:
:
:
:
:
:
:
:
1.2
Wed Jan 9 02:49:09 2013
raid1
4191897 (4.00 GiB 4.29 GB)
4191897 (4.00 GiB 4.29 GB)
2
1
Superblock is persistent
Update Time
State
Active Devices
Working Devices
Failed Devices
Spare Devices
:
:
:
:
:
:
Major
8
0
Minor
17
0
RaidDevice State
0
active sync
1
removed
/dev/sdb1
Even though the RAID is degraded, our file system is still available:
8 15:35 2.6.39-300.17.3.el6uek.x86_64
9 02:53 lost+found
However, it's a good idea to replace the failed disk drive as soon as possible. In our case, we can simply shut down the VM, re-attach the disk
image and reboot. In a live production system, you are likely able to hot-swap the disk drive on the fly without any downtime. mdadm supports
these kind of operations as well (disabling and replacing devices on the fly, rebuilding RAID arrays), but this is out of the scope of this lab session.
This exercise only scratched on the surface on what MD is capable of.
rm -v /etc/mdadm.conf
umount -v /raid
mdadm --stop /dev/md0
mdadm --zero-superblock /dev/sdb1
Now we have an empty device that we can use to store the encrypted volume. This is done using the cryptsetup utility.
The first command initializes the volume, and sets an initial key. The -y option ask for the passphrase twice, making sure your password is typed
in correctly. The second command opens the partition, and creates the device mapping (in this case /dev/mapper/cryptfs). This is the actual
device that will be used to create a file system on top of it don't use the real physical device ( /dev/sdb1) for this!
As an additional optional safety measure, we could now write zeros to the new encrypted device. This will force the allocation of data blocks.
Because the zeros are encrypted, this will look like random data to the outside world, making it nearly impossible to track down encrypted data
blocks if someone gains access to the hard disk that contains the encrypted file system. We'll skip this step, as it takes quite some time.
dd if=/dev/zero of=/dev/mapper/cryptfs
Now that we have initialized our encrypted volume, we need to create a filesystem and mount point:
You can now use this file system like any other. The encryption of the data blocks is done in a fully transparent fashion, unnoticed by the file
system or application accessing this data.
As a final step, we need to ensure that the encrypted file system is properly set up and mounted at system bootup time. For this to happen, we
need to create an appropriate entry in the configuration file /etc/crypttab, using our favorite text editor:
Additionally, we need to add the file system to /etc/fstab for the actual mounting to take place, by adding a line as the following one:
ext4
defaults
0 0
If you reboot your system now, you will be prompted to enter your passphrase to continue the boot process:
Password for /dev/sdb1 (luks-a7e...):**********
After entering the correct passphrase, the system continues to boot and the file system will be mounted at the given location:
[oracle@oraclelinux6 ~]$ df -h /cryptfs/
Filesystem
Size Used Avail Use% Mounted on
/dev/mapper/cryptfs
4.0G
72M 3.7G
2% /cryptfs
Now any files that you store in /cryptfs will be protected by the strong encryption of dm_crypt. This also means that your passphrase is an
invaluable asset if you lose it, you won't be able to access your data anymore! However, using LUKS it's actually possible to create multiple
keys to unlock the volume this can be handy to provide a recovery key or allowing multiple individuals to access the volume without sharing
the same password. To add a key, use the following command:
[oracle@oraclelinux6 ~]$ sudo cryptsetup luksAddKey /dev/sdb1
Enter any passphrase: <existing passphrase>
Enter new passphrase for key slot: <new passphrase>
Verify passphrase: <new passphrase>
Now you can unlock the volume by either providing the original or the new passphrase.