George,
Thanks for the ideas. I am going to try to build my own customized LIVECD version and see if I can get it to run. Will report back (will most likely take me a while!).
Thanks for the insight.
Colin
George,
Thanks for the ideas. I am going to try to build my own customized LIVECD version and see if I can get it to run. Will report back (will most likely take me a while!).
Thanks for the insight.
Colin
Sorry I was away on vacation. Just now getting back to this.
All nodes currently are on intel 64bit processors. We use AMD GPUs mostly, some NVIDA to test, for compute. 105 Nodes currently deployed (more or less 100kw)
We are “trying” to design them to run various compute level code. Mining, blockchain test code, ai, whatever, if it runs on Ubuntu it should run [I totally get that there is a mirad of ways this can fail]. I have tested a ton of different packages out there with more or less success with my current configuration.
I essentially want for someone to to give me an image, I then push it to the cluster and it runs. So far it has worked great with FOG (minus my ssd failures).
I use ubuntu because it is what I am the most familiar with and it seems to have the best driver level support all around. And the use case is mostly ubuntu.
I have been running them in GUI mode. If something fails I have a KVM I can link over and see what happened. That isnt to say I couldnt run character mode, but I dont know why I would.
My goal is to scale to 1000 nodes then lease out the capacity, think a kind of bare metal AWS, but way more ghetto.
That being said, a LUN may be the way to go, but I dont see why I need that. I should be able to do all this in ram, but then again, I am just the engineer trying to figure out how to cool all this stuff down (hence the engineering part). I also dont want my network getting bogged down on iSCSI traffic. I am only running 1GBe in a pretty limited spanning tree config. I want to be like a refinery for processing data, maybe monday I do some compute, tuesday ai, Wednesday blockchain, all for way cheaper than anybody else (at least that is my goal lol).
@george1421 said in How to Add Boot to MEMDISK Option - Syntax Question:
character based computing nodes
They will only run ubuntu as a node.
I started using fog to “to create a master image and deploy it to the hard drive of many computers.” But now I want to transition to “a diskless (in reference to the target nodes) netboot system that loads and executes everything out of RAM (actually the hard drive on the target computer is not needed at all).”
So whatever I need to do to make that transition, I want to do, even if it means I shouldnt be using fog. I just know fog the best, hence why I am trying to use it.
Ultimate goal is to be diskless.
Thanks.
@george1421 said in How to Add Boot to MEMDISK Option - Syntax Question:
Thanks george! That is super helpful. I am trying to stay away from iSCSI (dont know much about it either); if I am not mistaken, each node would need its own iSCSI instance to run? I was worried that multiple clients connected to the same iSCI target would have issues.
So in your netbooting example:
kernel tftp://${fog-ip}/os/ubuntu/Desk17.10/vmlinuz.efi
initrd tftp://${fog-ip}/os/ubuntu/Desk17.10/initrd.lz
“This right here is telling… vmlinuz.efi is… what you might think in windows is the operating system or kernel. initrd.lz think of it as a virtual hard drive. To get linux to boot you need a kernel (OS) and a hard drive (initrd). From there… that’s call the boot loader, that tiny OS has enough brains to reach back out to the NFS server (FOG) to get the rest of the operating system and to load it into memory. That is netbooting.”
I guess what I was confused on is how do the calls differ, like if I wanted to PXE boot my node now it completely wipes the drive and loads the OS over top, vs trying to boot to memory. How does it know to store the image on sda (currently) vs how to store it in ram (in your example)? My assumption was the base call (as you listed it) would AUTOMATICALLY overwrite sda (or whatever drive was specified in the host primary disk parameter), and move on. But maybe that is not the case?
Also I apologize if this question is stupid. I am a mechanical engineer by training and have had to teach myself all this stuff.
Thanks George. I have read that tutorial. What I was seeing though was this (Ubuntu 17 section):
“In the fog WebGUI go to FOG Configuration->iPXE New Menu Entry
Set the following fields
Menu Item: os.Ubuntu.Desktop.17.10
Description: Ubuntu Desktop 17.10
Parameters:
kernel tftp://${fog-ip}/os/ubuntu/Desk17.10/vmlinuz.efi
initrd tftp://${fog-ip}/os/ubuntu/Desk17.10/initrd.lz
imgargs vmlinuz.efi root=/dev/nfs boot=casper netboot=nfs nfsroot=${fog-ip}:/images/os/ubuntu/Desk17.10/ locale=en_US.UTF-8 keyboard-configuration/layoutcode=us quiet splash ip=dhcp rw
boot || goto MENU
Menu Show with: All Hosts”
None of that code specifies a memdisk or a ramdisk. I assumed I needed to chain “memdisk iso raw” to that tutorial. As it stands, I assume it would load to sda.
Agreed it is diskless boot and have poured over the iPXE forums. I have seen various results. Most guys are running a very rudimentary PXE solution that seems dated. I also dont want an iSCI target.
Mostly we use ubuntu. That is where I would like to start.
I appreciate any help you can give.
All,
We have a large compute cluster that uses FOG to load various linux distros onto the cluster to do tasks. We have recently started failing SSDs at a pretty high rate. I realized that are distros are small (4gb) and we could just load these into a ramdisk/memdisk and operate that way, as we dont need storage (compute cluster only).
Is the appropriate line to add to the iPXE menu the below:
kernel memdisk
initrd “path to file”
boot
goto start
or is it:
initrd “path to file”
chain memdisk iso raw ||
or do I need to add something more complex like listed here?
https://forums.fogproject.org/topic/2845/ipxe-advanced-menu-or-memdisk-problem/8
My thought is I would like to be able to have the functionality that we only use the SSDs for larger distros, but use the memdisk for smaller distros. My hope is I could change a couple of lines to tell iPXE where to store the image when it loads. My goal is to get away from SSDs entirely and operate 100% out of ram.
Also, is there the option to add memdisk to the task or image, so that when I push to the cluster it will essentially boot to a memdisk vs sda etc / rather than trying to trick the menu?
Thanks for your help.
Colin
Got it to run by upping my memory limit to 1280M. Testing now. Currently only getting 200Mb/s transfer speed on eno1.
I am having the same issues as described above. I did the required changes and am now getting a HTTP 500 Error. Any recommendations on how to diagnose? Do I need to change the permissions on the PHP file? Apperently I am running out of memory? I have plenty…
Apache Error Log:
[Mon Jul 30 13:50:38.882162 2018] [proxy_fcgi:error] [pid 1231] [client 192.168.1.3:44270] AH01071: Got error ‘PHP message: PHP Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 8192 bytes) in /var/www/html/fog/lib/fog/fogpage.class.php on line 239\nPHP message: PHP Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 114688 bytes) in Unknown on line 0\n’
Here are the last lines from the FPM log after I reloaded and rebooted.
[30-Jul-2018 13:42:49] NOTICE: reloading: execvp(“/usr/sbin/php-fpm7.1”, {“/usr/sbin/php-fpm7.1”, “–nodaemonize”, “–fpm-config”, “/etc/php/7.1/fpm/php-fpm.conf”})
[30-Jul-2018 13:42:49] NOTICE: using inherited socket fd=8, “127.0.0.1:9000”
[30-Jul-2018 13:42:49] NOTICE: using inherited socket fd=8, “127.0.0.1:9000”
[30-Jul-2018 13:42:49] NOTICE: fpm is running, pid 9096
[30-Jul-2018 13:42:49] NOTICE: ready to handle connections
[30-Jul-2018 13:42:49] NOTICE: systemd monitor interval set to 10000ms
[30-Jul-2018 13:43:27] NOTICE: Terminating …
[30-Jul-2018 13:43:27] NOTICE: exiting, bye-bye!
[30-Jul-2018 13:43:27] NOTICE: fpm is running, pid 9648
[30-Jul-2018 13:43:27] NOTICE: ready to handle connections
[30-Jul-2018 13:43:27] NOTICE: systemd monitor interval set to 10000ms
[30-Jul-2018 13:43:31] NOTICE: Terminating …
[30-Jul-2018 13:43:31] NOTICE: exiting, bye-bye!
[30-Jul-2018 13:44:41] NOTICE: fpm is running, pid 1027
[30-Jul-2018 13:44:41] NOTICE: ready to handle connections
[30-Jul-2018 13:44:41] NOTICE: systemd monitor interval set to 10000ms
@george1421 If there is an easier way to do that, then I am all ears. I like the FOG interface though. Makes it easy. In your original method you posted does that require a hard drive to to be present or is it building the OS to init.gz in ram?
@george1421 They are headless, however, they do have the ability to run interactively but I only use that for debugging, we keep having issues with our cron. I have them restart every 12 hours and they keep hanging. That is not ideal at all. We are still building out the datacenter now, so we havent circled back around to this. So far I am still booting using the old method, It is ubuntu. I can run whatever linux distro makes it easiest. I also am looking into MOSIX.
I was confused about this part, but realized that I was use to the old way of thinking / booting into my hard disk. You are correct I need the default loop to pull down a fresh copy every time. I appreciate the help.
“The last bit of magic we need to do is setup a new FOG iPXE boot menu entry for this OS.
In the fog WebGUI go to FOG Configuration->iPXE New Menu Entry
Set the following fields
Menu Item: os.Ubuntu1604
Description: Ubuntu 16.04.03
Parameters:
kernel tftp://${fog-ip}/os/ubuntu/16.04/vmlinuz
initrd tftp://${fog-ip}/os/ubuntu/16.04/initrd.gz
imgargs vmlinuz initrd=initrd.gz root=/dev/nfs netboot=nfs nfsroot=${fog-ip}:/images/os/ubuntu/16.04/ locale=en_US.UTF-8 ip=dhcp rw
boot || goto MENU
Menu Show with: All Hosts
That’s it, just pxe boot your target system and pick Ubuntu 16.04.03 from the FOG iPXE boot menu.”
@george1421 That is awesome. So for the clusters, they actually dont have any non volatile storage needs. Essentially all that is handled via the scripts from the main server, so just need a fix without the NFS storage I guess. They just call for data, get it and send it back. All NVM processing is done somewhere else, and the code to handle that is on the programs I have installed on the OS.
I guess what I need to figure out is how to create a live boot media of current setup. In your write up, can I also deploy via groups? My plan was to more or less push updates via the group function not the individual menus. I will spend some time reading the diskless setup tutorials y’all posted as well. Because I will no longer will have an sda in the traditional sense (usb 3.0) just want to make sure I understand where the filesystem will live.
@sebastian-roth I think this will work! I need to read through the whole write up, and report back. But essentially the idea is what I am looking to accomplish. Just a quick question, does the nfs server have to have enough storage for the (client file system) x (the number of clients)?
essentially ubuntu with some mods. I can run whatever though. I believe most of my drivers are compatible with most linux distros.
What I would like to do is put a ram disk on the host. Not the server. Server side is fine. I run a cluster setup and use USB 3.0 drives that have been failing me (not to mention are slow). So would like to port everything over to a ram disk and just add another stick of ram to all my hosts. Ideally, FOG (kind of like how clonezilla has a to ram option) would create a ram disk on the host, then image the to the ram disk again, on the host. Obviously each time a host would boot it would need to do this. I didnt see a ram disk write up anywhere so didnt know if it was possible.
Is there a walk through with using a Ram Disk and Fog Anywhere. I couldnt find a good write up and didnt know if anyone had a good workflow that they liked.
I updated to RC 10 and it fixed the issue.
What I said was “It is like…” it seemed “like” it couldnt tell between the different mac addresses.