Mounting and booting an NFS provided Linux root



  • I have about 8 Intel NUCs that have no hard disks, but I want to boot them off of some NFS shares and have them automatically start some workloads (e.g. Minecraft server).

    I have done something very similar to this with BeagleBones / Raspberry Pies. Those have flash memory where you can store the instructions for mounting and booting a NFS share. To my knowledge, a full scale x86 computer has no similar place where I can put these instructions. So I have been using FOG to push NFS bootstrapping instructions over PXE. I would say I even like this even better because I never have to worry about configuration on the NUCs themselves. Network, power, go!

    So far I have booted the ubuntu live cd. At this point, I could turn around and modify the rc.local of the live cd to mount another NFS share and start executing workload from there. There are several things that would not be ideal here. E.g. Any packages needed for workload must be reinstalled on every boot, and eventually I will run out of RAMdisk for temporarily installing packages.

    I realized when setting up boot for the live CD that effectively all I provided via FOG was a kernel binary, an initial ramdisk, and instructions to mount the NFS volume with the real OS. I hijacked the kernel/initrd from the usb installer to boot my NFS share instead of looking for the installer on local media.

    kernel tftp://${fog-ip}/xubuntu/18.04.2/vmlinuz
    initrd tftp://${fog-ip}/xubuntu/18.04.2/initrd
    imgargs vmlinuz root=/dev/nfs boot=casper netboot=nfs nfsroot=${fog-ip}:/images/os/xubuntu/18.04.2/ locale=en_US.UTF-8 keyboard-configuration/layoutcode=us toram quiet splash
    boot || goto MENU
    

    Right now, these instruction point at the install cd for ubuntu and tell it to go into live mode without prompt.

    However, I don’t really want the live disk. I want to effectively mount the OS over network. I think it would be fairly easy to point this to an already installed NFS root of ubuntu (unique for every NUC). It might be as simple as extracting the squashfs. It is something I need to look into.

    I have a few specific questions that you might be able to help with.

    • Can I bootstrap an NFS share some easier way? I.E. running the NFS mount instructions inside of FOG instead of inside a temporary kernel / initrd?
    • How can I make this the default boot option for these systems in FOG? I have not been able to register these systems because they don’t have a hard disk.
    • I do not know how to uniquely identify each of my NUCs from within the fog boot select screen. I need something that I can shove into the ‘nfsroot’ path above that is unique for every nuc. Is there some ${mac-address} or similar I can use here?

  • Moderator

    @johnny_verdane

    Defogging: FOG does modify iPXE binaries by adding a boot script embedded into iPXE. That script is here: https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescript Its used to chain to default.ipxe on the tftp server. If you look in that script on the fog server that is how it gets to boot.php. With iPXE boot loader you can transfer files over tftp, http(s), nfs with the way FOG has it compiled. FOG uses the http protocol to load FOS Linux because http is faster than tftp and is routable across subnets. But you don’t need http or apache to boot your only custom linux, tftp will work just fine. You could use the rom-o-matic site to build your custom iPXE binaries and not even need to setup that build environment for iPXE https://rom-o-matic.eu/ Just understand that you can use the FOG iPXE binaries too, you just need to add your iPXE code to default.ipxe in the tftpboot directory of your tftp server. Its just one less thing to mess with.

    Security: One of the advantages of the buildroot approach is that since your OS will run out of memory its a bit stateless. If the running environment becomes compromised, when you stop the system or reboot the compromise will not be saved since everything runs out of RAM there is no persistence. Plus since the initrd is packed you can’t simply drop a file in its boot media. The bad guy would have to unpack the inits, make the modification and then repack the inits. While its not hard to do, its not easy either.

    Anyway, it sounds like you have a fun and challenging project. I wish you the best.



  • @george1421
    Lots of good points.

    I have continued playing with the setup and it has several rough edges beyond those I jotted down in the solution guide. No gamebreakers yet, but it is clear that this is not a production worthy platform. It’s a nice experiment though.

    Buildroot:
    I will take your suggestion and make another image based on buildroot. It was an inevitable continuation of this project. There are just too many things that aren’t cross-environment portable with Ubuntu and other commercial OSes. I’ve broken more than a few such assumptions, all of which buildroot was built with in mind. Certainly there is a battle of “wouldn’t it be cool if we could do x” versus “wouldn’t it be more stable if we did y”. So I’ll start on y – buildroot that is – and see how it goes.’

    Also, those looking for the Richard Stallman fan club can contact me. Unfortunately, my cat found all of the FOSS party supplies so we won’t be rolling out the GPLv3 streamers anytime soon.

    Defogging:
    I noticed half way through that FOG isn’t needed at all to do what I am doing here. That is, I don’t need a SQL database of all machines and images if none of the machines are registered. The webpage FOG serves is always going to be the same, and I can just toss this up on a plain old http server.

    Is the iPXE menu binary modified at all to be special to FOG? I am looking more closely at that boot.php text, and even the splash background is served over http. Where does the “fog/service/ipxe/boot.php?mac=00:00:00:00:00:00” part of the URL come from? That wasn’t given over DHCP option 66 or 67. Is this your server’s default document?

    Security:
    My objective is each machine needs to authenticate and establish a confidential, integrity protected, replay protected link to its root. Things should be nailed down such that each machine can only boot its image and cannot read/write/snoop anybody else’s. If any machine gets compromised, the worst it can do is break itself, whereby I could immediately restore the image from backup. Each machine might be working on workloads with different security attributes, and I don’t want there to be any risk that a compromised Minecraft server is going to lead to exposing home videos and photos on a transcode server.

    I agree the problem could be firewalled away. If we keep each client in a nice little container where the only thing it can do it speak with its designated IP address and its designated MAC address, then these objectives would be achieved. However I believe this somewhat undermines the objective. Rather than storing configuration on a disk inside each machine, we have stored configuration in the network infrastructure. This is totally viable, but I will continue thinking about other ways that don’t explode the configuration surface. I want to try to keep as much as possible on the PXE server


  • Moderator

    @johnny_verdane said in Mounting and booting an NFS provided Linux root:

    First I wanted to say well done. Great job taking FOG and repurposing it in a direction that it wasn’t intended to go. That just goes to show how grand FOSS software is. If it doesn’t fit your needs take it, tweak it, and make into something new.

    I have a few comments regarding your setup. Please understand this is by any means meant to disparage what you have achieved.

    Default boot option. Those experienced with FOG will know that its not possible to register a system with no hard disk. Therefore, it is not possible to nudge your clients towards this particular boot option we’ve set up.

    What I would do here is make your netboot menu item the default value for fog. There is no reason at all to register your target systems with FOG, just change the default menu for FOG to be the menu entry to netboot your OS. If you are not using FOG for system imaging just remove all of the options that are not important in the iPXE menu. Now that you have a proof of concept working, you truly don’t need FOG at all. You could build a new system greenfield and install tftp and NFS v4. You have what FOG has constructed as a model. Just grab the iPXE files from the fog /tftpboot directory, create your secure nfs shares and copy the target systems over. I’ll give you a hint, if you want to see how fog builds the iPXE menu point your browser to http://<fog_server_ip>/fog/service/ipxe/boot.php?mac=00:00:00:00:00:00 That will paint the FOG iPXE menu. You could take that menu as a building block to reverse engineer what you need for your greenfield server.

    Security. Its basically non-existent and I don’t even know where to start. Any keys or passwords we try to push to the clients will be visible on the http server FOG uses to push boot instructions. No-go. We have a whole bunch of spoofable information we could use, but I wager things like MAC address and IP filters are only going to delay an attack. Its not until NFSv4 that proper authentication was really added, and everything shown here uses NFSv3. I really, really want to have a private key stored on each client, burned into the TPM or some other hardware security mechanism where it is never ever getting out. Can iPXE even touch these resources? If I cannot figure out how to use a proper public-private key pair, at least I want to derive the passwords on the clients where a lucky network snoop is required to get them.

    When you talk about security, who/what are you trying to protect? By creating a greenfield netboot server and only installing the required services along with properly crafted netboot server firewall rules you can lock down your netboot server pretty tightly. I agree that the fog server is pretty much an open system, but you have to look at what FOG is trying to protect (the captured images). That is why the /images share is read only at the share level.

    Package installer freak out. Snap is completely hosed. Do not use it. Apt expects that it needs to update the initramfs after installing every packages; That is okay…I think. Do I need to watch for the initramfs being updated and clone that over to tftp, or am I correct in assuming that initramfs from tftp is not used after nfsroot is established as real root? Apt also expects to update grub to point to the new initramfs. Grub does not exist anymore so that throws a bunch of ugly errors. To be clear; packages still install correctly and you can use them after reboots.

    This is the life of using a commercial OS. I don’t have an absolute answer about the initramfs, but I’m suspecting that it contains just enough brains to then overlay the real OS on top of it.

    No Swap. One of the givens was this system has no hard disk. Therefore there is nowhere to store swap. Unless you mount swap over network, which seems like a really bad idea. There is margin for tuning the kernel memory limits, but I think it should be mostly okay???

    Many embedded systems will not use swap at all since the embedded OS is typically very small. (more on this in a bit)

    De-duplication. My linux-fu is pretty good, but not complete. I wager many of the directories I copied in the clone never change in the lifetime of the OS. Or, any changes could be easily shared among all of the identically imaged systems without problems. Ideas from somebody who knows more about this?

    Really what is the difference between system 1 and system 2 from an OS standpoint? Why can’t they share the same code base with only the /etc directory different. In most OS’ the /etc directory is where the system gets it identity.

    But that takes me back to our discussion of buildroot. Why use a commercial full OS for your workload? I don’t have an idea what a minecraft server is (well I have an idea), but I assume that is a character mode service that is config’d by text files. I would challenge you do you really need an XWindows interface for this workload? Look at FOG’s FOS Linux. The kernel is 8MB and the initrd file is about 200MB. How is your workload much different than FOS Linux? FOS boots less than 20 seconds. Everything runs from memory and its state isn’t persistent. Now it could have persistent elements if it use NFS to connect back to its storage server. Lets think about elastic workloads. Consider if you had an ESXi VM host server. How many netboot workloads could you create in 2 minutes if you needed a bunch of worker nodes if each worker node booted in 20 seconds and only consumed 512MB of ram. Then you could kill them off if the nodes were not needed. I might even challenge you to use those intel NUCs (disregarding any stats on the NUCs) as a VM host cluster and then spin up your workloads as virtual clients. If you wanted to stick opensource use KVM. From a security standpoint you would only include in your buildroot system the services you needed for that specific workload. If its only a minecraft service then don’t include NFS, FTP, apache, XWindows and such you would have with a commercial distribution.

    Understand I’m not saying using a commercial OS is a bad thing, but to get the most out of your solution, do you really need to use it?



  • I got it working! Eureka!

    Thank you for the tips. I found the iPXE documentation and the ${net0/mac} came in handy!

    I have used buildroot before, on those BeagleBones / Raspberry Pie systems. It does have a learning curve and I am certain you can use it for what I want to do.

    But…I figured out how to do exactly what I wanted to do!!! I will share because I can tell you guys are oolging to give this a shot yourselves

    Step 0: Get PXE to boot a simple NFS volume first
    I strongly recommend you follow George’s very helpful guides (Thanks George!!!): https://forums.fogproject.org/topic/10944/using-fog-to-pxe-boot-into-your-favorite-installer-images/17
    I followed the instructions for booting the Ubuntu Live CD, which coincidentally you will need twice if you follow in my footsteps. If you cannot get this working, stop and get this working! There is no point in making things more complicated if you cannot boot prepackaged ISOs.

    Step 1: Create a template machine
    So first things first, you need an installed copy of Ubuntu or whatever OS/distro you are using. We will template this machine. I spun up a Virtual Machine for this purpose and PXE booted into the Ubuntu Live CD. When installing, use a single partition. The size is irrelevant, so just make it big enough to finish installation. Boot into your installed OS, and get it all set up with whatever you want to snapshot. It is easy to add stuff later, so I wouldn’t lament over this.

    Step 2: Set up your NFS share
    We need a NFS share to store the image of this system on. It needs to be a read/write NFS share set up somewhere on your network accessible from your client machines. The FOG box is okay; not sure I would use it long term. I set up my nfsroots in /images/dev/nfsroots. I have not yet tinkered with permissions, but you will want to do this at some point because otherwise people can screw with your images over network (as root no less)! The following line works in my /etc/exports file. It is untouched from fog installation.

    /images/dev *(rw,async,no_wdelay,no_subtree_check,no_root_squash,insecure,fsid=1)
    

    Finally, scribble down the mac address of the system you will be imaging from. Make a directory in nfsroots like so: /images/dev/nfsroots/8a:4b:84:c8:e2:b9. This is where we will store the image of this specific system.

    Step 3: Clone your template machine
    Boot your image system into a live disk. You need something that allows you to drop into terminal, install some basic packages, and run some basic commands. The Ubuntu live CD is perfect. Run these commands:

    sudo apt-get install nfs-common # The live CD does not come with the NFS packages preinstalled
    sudo mkdir /mnt/sda1
    sudo mkdir /mnt/nfs
    sudo mount /dev/sda1 /mnt/sda1 # If you booted into the live disk, your hard disk / boot partition probably isn't mounted. So mount it.
    sudo mount 192.168.0.10:/image/dev/nfsroots /mnt/nfs # Put the IP of your storage server and your nfsroot path here
    cp -a /mnt/sda1/* /mnt/nfs/8a:4b:84:c8:e2:b9/ # The -a is super important. Archive mode. We must preserve the ownership and the SUID bits on file permissions 
    

    Once you have done this, you can shutdown the system. Remove the (virtual) hard disk. I would keep it around until you finish in case you need to repeat any steps.

    Step 4. Copy the kernel and initial ramdisk from your image
    We cannot retrieve these over NFS because these are what set up the NFS connection. If you followed George’s instruction for booting the Ubuntu Live CD, you took the vmlinuz (kernel) and initrd (initial ramdisk) from the Live CD and copied them to /tftpboot on your fog box. DO NOT USE THESE for booting the full OS. It seems like these are hardwired to crack open the squashfs (the compressed read-only filesystem) on the live CD. Instead, get the vmlinuz and initrf.img from the root you just copied. Put them in the tftp folder.

    sudo cp /images/dev/nfsroots/8a:4b:84:c8:e2:b9/vmlinuz /tftpboot/xubuntu/18.04.2/
    sudo cp /images/dev/nfsroots/8a:4b:84:c8:e2:b9/initrd.img /tftpboot/xubuntu/18.04.2/
    

    Step 5. Set up a new boot profile in FOG
    Go to FOG Configuration -> New iPXE Menu

    Copy this information in:
    Menu Item: NFS Boot by MAC (or whatever you want)
    Description: NFS Boot from /images/dev/nsfroot/${net0/mac} (or whatever you want. Note that ${net0/mac} is parsed and will actually display as your system’s MAC address.
    Parameters:

    kernel tftp://${fog-ip}/xubuntu/18.04.2/vmlinuz # Customize to point to wherever the vmlinuz is you copied in step 4
    initrd tftp://${fog-ip}/xubuntu/18.04.2/initrd.img # ditto
    imgargs vmlinuz root=/dev/nfs vga=normal nfsroot=${fog-ip}:/images/dev/nfsroots/${net0/mac}/ rw locale=en_US.UTF-8 persistent console=tty0 console=ttyS0,115200
    boot || goto MENU
    

    Menu Show with: All Hosts
    Leave everything else blank.

    imgargs is the super secret sauce to making this all work. These are the kernel parameters that are going to get passed into that kernel you just supplied over tftp. Basically, these instructions tell the kernel where root is. Root is a coming from nfs, The path to the nfs share is blah. root is read/write. We want vga turned on. We omitted quiet and splash; we want to see all the juicy boot progress details flying across the screen. I’ve added the console strings for testing purposes. These will dump the boot progress out on a serial port, which is extremely helpful if you run into problems because every problem is going to be a full blown kernel panic. You won’t be able to scroll up on your vga terminal to see what the error was! Since I am using the KVM hypervisor, I can add a virtual serial port and I connect to it via the appropriate command issued at the hypervisor’s terminal. If you encounter any issues, serial debug is a good place to start.

    Step 6. Housekeeping
    We have some rough edges that need to be cleaned up, otherwise boot is going to be somewhat turbulent.

    open /images/dev/nfsroots/8a:4b:84:c8:e2:b9/etc/fstab from your nfs host machine. This file contains the list of mount points to setup at boot. Remove everything. We are not in Kansas anymore Dorthy! /dev/sda doesn’t exist anymore, and I seriously doubt you want to have a network mounted swap. If you boot hangs for about 90 seconds, it is probably because you forgot to clear this file. You will only see that boot has hung to wait for a mount if you have serial terminal set up; it doesn’t show on vga boot progress.

    open /images/dev/nfsroots/8a:4b:84:c8:e2:b9/etc/hostname from your nfs host machine. This is the hostname of your client. Change this if your template might ever be booted in the future (you know it will, so change it!). Change this every time you copy this image for another machine.

    Step 7. Boot
    Power on your machine, making sure it goes to PXE. Interrupt it to select the boot option we just created. Cross your fingers!

    TODO There are some rough edges arround this setup. If you have ideas on how to clean this up, please chime in.

    • Default boot option. Those experienced with FOG will know that its not possible to register a system with no hard disk. Therefore, it is not possible to nudge your clients towards this particular boot option we’ve set up. You can go around it backwards and configure every registered machine to do something else, and all unregistered machines will try to boot NFS by MAC address. But, that is backwards. There’s got to be an obvious fix to this.
    • Security. Its basically non-existent and I don’t even know where to start. Any keys or passwords we try to push to the clients will be visible on the http server FOG uses to push boot instructions. No-go. We have a whole bunch of spoofable information we could use, but I wager things like MAC address and IP filters are only going to delay an attack. Its not until NFSv4 that proper authentication was really added, and everything shown here uses NFSv3. I really, really want to have a private key stored on each client, burned into the TPM or some other hardware security mechanism where it is never ever getting out. Can iPXE even touch these resources? If I cannot figure out how to use a proper public-private key pair, at least I want to derive the passwords on the clients where a lucky network snoop is required to get them. Ideas?
    • Package installer freak out. Snap is completely hosed. Do not use it. Apt expects that it needs to update the initramfs after installing every packages; That is okay…I think. Do I need to watch for the initramfs being updated and clone that over to tftp, or am I correct in assuming that initramfs from tftp is not used after nfsroot is established as real root? Apt also expects to update grub to point to the new initramfs. Grub does not exist anymore so that throws a bunch of ugly errors. To be clear; packages still install correctly and you can use them after reboots.
    • No Swap. One of the givens was this system has no hard disk. Therefore there is nowhere to store swap. Unless you mount swap over network, which seems like a really bad idea. There is margin for tuning the kernel memory limits, but I think it should be mostly okay???
    • De-duplication. My linux-fu is pretty good, but not complete. I wager many of the directories I copied in the clone never change in the lifetime of the OS. Or, any changes could be easily shared among all of the identically imaged systems without problems. Ideas from somebody who knows more about this?

  • Developer

    @johnny_verdane Well you have an interesting question there and I am tempted to give it a try myself and play with it. Though I don’t have the time to and therefore won’t go to much into the details myself.

    I suppose you can use Ubuntu for this but personally I would start this adventure by asking which exact workloads will you be running? Is it software available as official Ubuntu package or not. On the other hand try to figure out if other environments like Buildroot might all the tools / software you need. While Buildroot is definitely less resource intensive compared to Ubuntu it also has less choice of software.

    Though I have not done it with Buildroot it should be possible. Find a first hint on this here: http://lists.busybox.net/pipermail/buildroot/2010-March/032773.html

    If you give us more details on the software you wait to run in that environment we might be able to give more hints.


  • Moderator

    Well this is a pretty complex subject. I see you have the system booting via FOG iPXE. In a way you do want the live boot running, but in your case you want to create a custom environment. The vmlinuz and initrd files are the base/core OS. (note this part I’m guessing) the core OS has just enough stuff in it to get the system up on the network and to attach to the NFS share. At this point it probably transfers the complete OS to the target computer. That OS is typically contained in a squash fs file. IMO you need to target that squash fs file to add in your custom applications. (I don’t know how, only giving you a direction to look in).

    You can pass boot time stuff to the target computer through kernel parameters. In your example you could add a kernel parameter mac=${net0/mac} This is using an iPXE variable to set the kernel parameter. It may be just as easy to pick up some boot time information via bash in a rc.local script within your environment. Since you are in the iPXE menu construct you can not use any of the FOG run time variables like ${hostname}, because that environment is not running at this time.

    A totally different approach you might want to look into here is using buildroot to build your own custom OS (akin to what FOG calls FOS Linux). This would then create a bootable bzImage (vmlinuz) and initrd specifically for your workload. Once you are done you would just boot bzImage and initrd directly from FOG and not need any network shares because everything would be contained within initrd. The system would boot very fast since only the required bits would be in the OS. If your workload didn’t need XWindows then don’t include that in your buildroot OS. I can tell you when you first start out using buildroot is VERY intimidating. But you can really do some cool things with it once you understand the process. If you look at the FOG Project’s FOS files on github, you can reverse engineer how the FOG developers created FOS Linux and then incorporate your workload into that framework.


Log in to reply
 

437
Online

6.4k
Users

13.8k
Topics

130.1k
Posts