FOS fails to capture image from md RAID host
-
Ubuntu 18.04.6 LTS
Web Server 10.13.6.7I’m trying to capture an image from a host that boots UEFI from two hard disks in RAID 1. The way this works is that both disks have an EFI partition that is kept in sync by the OS. The computer boots from either of these disks and then mdadm assembles the root partition from the second partition on these two disks.
I followed the handy guide here and determined that FOS sees my root partition as /dev/md126. On FOG’s host management screen I set the Host Kernel Arguments to mdraid=true and Host Primary Disk to /dev/md126. I have tried setting the image type to each of the first three values.
In every case, attempts to capture the image have resulted in an error in reading the disk or partition table. I hope the output from one such attempt will prove helpful in identifying the source of my troubles and a possible solution:
[Tue Jun 14 root@fogclient ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 116.4T 0 disk sdb 8:16 0 139.8G 0 disk |-sdb1 8:17 0 300M 0 part |-sdb2 8:18 0 30G 0 part | `-md126 9:126 0 30G 0 raid1 `-sdb3 8:19 0 2G 0 part `-md127 9:127 0 2G 0 raid1 sdc 8:32 0 139.8G 0 disk |-sdc1 8:33 0 300M 0 part |-sdc2 8:34 0 30G 0 part | `-md126 9:126 0 30G 0 raid1 `-sdc3 8:35 0 2G 0 part `-md127 9:127 0 2G 0 raid1 sdd 8:48 1 14.3G 0 disk `-sdd1 8:49 1 14.3G 0 part
[Tue Jun 14 root@fogclient ~]# mdadm -D /dev/md126 /dev/md126: Version : 1.2 Creation Time : Wed May 25 20:32:17 2022 Raid Level : raid1 Array Size : 31439872 (29.98 GiB 32.19 GB) Used Dev Size : 31439872 (29.98 GiB 32.19 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Tue Jun 14 15:26:24 2022 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : resync Name : ubuntu-server:0 UUID : 7f76bb36:09c97d07:2528dfc0:ade215db Events : 222 Number Major Minor RaidDevice State 0 8 34 0 active sync /dev/sdc2 1 8 18 1 active sync /dev/sdb2
================================== === ==== ===== ==== === ========= == === == === === ======== ==== == ==== === === ======== ==== == ========= === ==== ==== == ========= === ======== ==== == === === === ======== ==== == ==== === === ========= == === == === === ========== ===== ==== ================================== ===== Free Opensource Ghost ====== ================================== ============ Credits ============= = https://fogproject.org/Credits = ================================== == Released under GPL Version 3 == ================================== Version: 1.5.9 Init Version: 20200906 * Press [Enter] key to continue * Verifying network interface configuration.........Done * Press [Enter] key to continue * Checking Operating System.........................Linux * Checking CPU Cores................................40 * Send method.......................................NFS * Attempting to check in............................ Done * Press [Enter] key to continue * Mounting File System..............................Done * Press [Enter] key to continue * Checking Mounted File System......................Done * Press [Enter] key to continue * Checking img variable is set......................Done * Press [Enter] key to continue * Preparing to send image file to server * Preparing backup location.........................Done * Press [Enter] key to continue * Setting permission on /images/0cc47abc024c........Done * Press [Enter] key to continue * Removing any pre-existing files...................Done * Press [Enter] key to continue * Using Image: veeam_barracuda_2022-06-14 * Looking for Hard Disks............................Failed * Press [Enter] key to continue ############################################################################## # # # An error has been detected! # # # ############################################################################## Init Version: 20200906 Could not find any disks (/bin/fog.upload) Args Passed: Kernel variables and settings: bzImage loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://10.13.6.7/fog/ consoleblank=0 rootfstype=ext4 mdraid=true nvme_core.default_ps_max_latency_us=0 mac=0c:c4:7a:bc:02:4c ftp=10.13.6.7 storage=10.13.6.7:/images/dev/ storageip=10.13.6.7 osid=50 irqpoll hostname=vcc-ldc-vs-11 chkdsk=0 img=veeam_barracuda_2022-06-14 imgType=mpa imgPartitionType=all imgid=2 imgFormat=5 PIGZ_COMP=-6 fdrive=/dev/md126 hostearly=1 pct=5 ignorepg=1 isdebug=yes type=up mdraid=true * Press [Enter] key to continue
Note also that /dev/sda is an empty filesystem and doesn’t need to be imaged. /dev/sdd is a USB stick and also unnecessary to the image. I don’t know if there’s a way to tell FOG to skip these devices.
-
@david-burgess You unfortunately have a configuration that will be difficult to clone with FOG.
You can’t use single disk because you have both md126 and md127 you need to capture. You can’t use mulltiple disks because /dev/sda doesn’t have any partitions and /dev/sdd is a usb drive (which FOG will happily capture). And then add in software raid into the mix and you end up with a mess.I know its late here so my brain isn’t 100% at the moment, but I can’t see a solution for cloning this system and keeping the raid from getting damaged.
I think when I was testing with intel raid I was mirroring the entire disk then creating the partitions on the raid disk. You have two disks with partitions mirrored. That adds to the complexity.
-
I can see how mirroring two disks and then partitioning the RAID device would be simpler, but we have ruled it out as an option because we want to overprovision the SSDs.
We can remove the USB drive and probably the large unpartitioned drive (sda) as well, just leaving us with sdb, sdc, md126 and md127, but it sounds like we may still not have success. I think I will try it anyway just to be thorough.
db
-
@david-burgess said in FOS fails to capture image from md RAID host:
but we have ruled it out as an option because we want to overprovision the SSDs
I don’t understand this because you are still dealing with physical media. But that really doesn’t matter you have a specific use case that I’m unaware of.
If you can discard (remove) sda and sdd then you can use multiple disks non-resizable to capture that system. In either case you must use the md interface, and I’m still not sure FOG can handle the raid correctly. If you mirrored the disks then there is only one md interface. The problem with intel raid is that it no only presents the md interface to the underlying host OS is also exposes the physical disk too. This confuses FOG because it actually see 3 disks in this computer (2 physical and 1 md). If the whole disk was mirrored then you would just set the primary disk in the host configuration to the md interface, then FOG will only look at that interface.
-
If I use the whole device for md-RAID, then partition the md device, this doesn’t prevent the OS from seeing the underlying physical devices. It will still see sdb, sdc and md126. I’m not familiar with Intel RAID on Linux, but it sounds from your description like the result is similar to what I’m seeing. In any case, I don’t know of any way to present the md-RAID device to Linux and hide the member physical devices.
At this point I have removed sda and disabled USB for now until I can get somebody to remove the USB drive from the system. The unfortunate side effect is that I have no keyboard input in FOS with USB disabled (over IPMI, and presumably on the physical console as well). So I will retry the capture job in non-debug mode and see what happens.
db
-
@david-burgess said in FOS fails to capture image from md RAID host:
If I use the whole device for md-RAID, then partition the md device, this doesn’t prevent the OS from seeing the underlying physical devices.
This is actually how the windows driver for the intel rst adapter works too, just in windows the driver “hides” the physical disks from windows so it only sees the raid drive.
In linux its a bit different in the the md drive doesn’t hide the underlying physical disks. For fog imaging we tell it the primary disk is the md drive so it ignores the physical drives, because it only cares about what FOG tells it as the primary drive. If the FOS engine has to guess it will pick sda if present. Since in your case you “could” image sdb and get a good image. The problem is you are cloning a mirrored disk when you restore it to a new computer on sdb your mirror will be broken. So for FOG you can only clone using the md drive, then everything will be in sync and the clone will work.
Now since you referenced my tutorial I assumed you are using the intel raid. You could do similar using 100% software raid in linux. You will have the same limitations with FOG, so the rules apply to the intel raid as with the linux md software raid. I just assumed for this discussion you were using the intel rst adapter to make you raid.
-
@george1421 said in FOS fails to capture image from md RAID host:
since you referenced my tutorial I assumed you are using the intel raid.
I referenced your Intel RAID tutorial because I found another forum post on mdadm where you linked to it and stated that it was similar in principle. I am actually working on capturing an image from a Linux host with md RAID, and deploying the same image to other hosts with identical hardware.
By way of update, I removed the large sda device (a hardware RAID) and the USB stick, so my disk layout was thus (from the running host OS):
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 139.8G 0 disk ├─sda1 8:1 0 300M 0 part ├─sda2 8:2 0 30G 0 part │ └─md0 9:0 0 30G 0 raid1 / └─sda3 8:3 0 2G 0 part └─md127 9:127 0 2G 0 raid1 sdb 8:16 0 139.8G 0 disk ├─sdb1 8:17 0 300M 0 part /boot/efi ├─sdb2 8:18 0 30G 0 part │ └─md0 9:0 0 30G 0 raid1 / └─sdb3 8:19 0 2G 0 part └─md127 9:127 0 2G 0 raid1
I then configured the following in FOG, with the noted outcomes:
Host Primary Disk: /dev/md126 (equivalent to md0 on live system above)
Image Type: Multiple Partition Image - All Disks (3)
Host Kernel Arguments: mdraid=true
Outcome: fails to find disksHost Primary Disk: /dev/md126 (equivalent to md0 on live system above)
Image Type: Multiple Partition Image - Single Disk (2)
Host Kernel Arguments: mdraid=true
Outcome: fails to read partition tableHost Primary Disk: /dev/sdb
Image Type: Multiple Partition Image - Single Disk (2)
Host Kernel Arguments: [not noted. I forget]
Outcome: fails to read partition tableHost Primary Disk: /dev/sda
Image Type: Multiple Partition Image - All Disks (3)
Host Kernel Arguments: [none]
Outcome: capture succeeds. Deployment on new system succeeds. System boots and partition table on sda looks correct. md127 is lost. Partition table on sdb is not correct:# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 139.8G 0 disk ├─sda1 8:1 0 300M 0 part ├─sda2 8:2 0 30G 0 part │ └─md0 9:0 0 30G 0 raid1 / └─sda3 8:3 0 2G 0 part sdb 8:16 0 139.8G 0 disk ├─sdb1 8:17 0 1G 0 part └─sdb2 8:18 0 138.8G 0 part
I’m not sure if the deployment procedure touched sdb or not. I suspect not. I will try again and watch closer or record it. I suppose I can rebuild my RAIDs from here, but I would of course prefer to have FOG automate that process to the extent possible.
Edit: added FOG settings detail.