Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?

p4cm4n

Okay…so I have a huge deployment of FOG I’m about to roll out. 14,000 endpoints.

I’m trying to design based on bandwidth, and the tools available within FOG, to be able to differentiate and make this as smooth as possible
(IE, currently no machines are actually in FOG, save for the newest ones…all of these will probably be freshly added, need identified, grouped, and talking to the most efficient storage node possible)
I’ve never dabbled with subnet groups before. I HAVE dabbled with locations before. Ideally I’d love to try to hash out some stuff in designing this.
For reasons I can’t give many details. But suffice it to say that a good number of machines all have no working OS, and booting to FOG to register and deploy an OS is easy enough - but grouping and managing all of them after the fact may prove difficult and time consuming.

Also, as there are 70+ sites, sometimes bandwidth is a constraint. Physical hardware to that location however is not necessarily an issue. The analysis would end up being whether it makes sense (if 5Mbit connection, but only 2-5 machines need imaged, no sense in deploying a storage node)

Does anyone have experience in designing such a system, and have any pointers/caveauts/tips for this?
For reference, I have already built a “Master” node in HQ, and a few storage nodes there as well. The hardware there is actually insane - I’ve hit 8GB/min to 90 machines on unicast from one server.
I have also tested that the firewall has allowed imaging to happen over WAN/VPN at a remote site, using storage and the master node at a remote location. It too, got 8GB/min.

george1421

@p4cm4n So some questions.

Do you need to manage these target computers with FOG once they are deployed? Like deploying snapins or such?

Do you need to be able to remotely deploy an image to the computers or will this imaging need to be done under the control of an IT Tech?

What target OS will you deploy?

At the sites with 10 or less computers, assuming you have enough bandwidth, will you send an IT tech to the site to image the computers or will you deputize an existing site person to image the computers?

What is your computer naming convention? Can it be calculated?

p4cm4n

@george1421

@george1421 said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

@p4cm4n So some questions.

Do you need to manage these target computers with FOG once they are deployed? Like deploying snapins or such?

– Yes. Specifically AD joining and maintenance. This will also probably replace/begin use as an imaging solution.

Do you need to be able to remotely deploy an image to the computers or will this imaging need to be done under the control of an IT Tech?

–Mostly under the control of a tech. But the workflow so far has been to deploy an image. This may change if I’m able to en-masse register the hosts.

What target OS will you deploy?

–Win10, 21H2 Ent/Pro

At the sites with 10 or less computers, assuming you have enough bandwidth, will you send an IT tech to the site to image the computers or will you deputize an existing site person to image the computers?

–Probably an existing person, which I think I have mastered the image workflow for this purpose. It can’t be dumbed down any further. However, I have remote control of DHCP at least.

What is your computer naming convention? Can it be calculated?

–Naming convention cannot be automatically populated. I wish - however these machines are asset tagged, non-sequentially.

p4cm4n

@george1421
and actually, big thanks. i ended up using deploy image per the last response in another thread you had suggested.
i ended up doing this with win10 for my laptops (the ‘proof of concept’ for this whole project)

-boot to fog>deploy image (default)
-edited the deploy image menu option to include the username and password, so the screen defaults to image selection screen, so we choose which image
-the image we use has drivers included for new intel NIC’s. it then prompts for you to enter the hostname in a CMD shell window. this is input with a barcode scanner. after this, software that installs as the image finishes deploying, and then sets the FOG client to delayed-auto and reboots the machine.
-the client is now sitting in pending hosts, which then we clear the QC portion of the hostname, add it to a group, join to AD appropriately, customize snapins and shutdown the machine.
–because of your suggestion, this significantly reduced technician error with snapins, locations, hostname, etc.

george1421

@p4cm4n said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

OK since we have had some interactions before (sorry I participate in too many threads to remember anymore), I’ll jump right into the more advanced stuff.

Do you need to manage these target computers with FOG once they are deployed? Like deploying snapins or such?
A: Yes. Specifically AD joining and maintenance. This will also probably replace/begin use as an imaging solution.
C1: This was actually a leading question to see if the “Load and Go” (using ipxe deploy image menu) approach would be best here. If all you need is imaging LaG is the simplest method of deployment if everything can be calculated and you don’t need to manage with fog after deployment. In this case you don’t register the computer with FOG and the FOG server will forget about the computer after deployment. A system builder would use this approach.
C2: Since you mentioned that you already use the deploy image menu. If you do have all of the clients registered in FOG you can restrict the deploy image menu to only display the defined image for that target computer instead of all defined images on the fog server.
Do you need to be able to remotely deploy an image to the computers or will this imaging need to be done under the control of an IT Tech?
A: Mostly under the control of a tech. But the workflow so far has been to deploy an image. This may change if I’m able to en-masse register the hosts.
C1: This question was focused on if you needed to boot through PXE or can just have the IT Tech press the F12 key during booting to select PXE boot. If you needed to reimage an entire class room of computers without visiting each computer I would say leave the default boot through the iPXE menu configured. On my campus we only allow reimaging when the IT Tech is sitting in front of the computer to manually select PXE boot. This approach avoids accidentally automatically reimging of the Director’s computer by picking the wrong computer at the wrong time if everything was automatic.
What target OS will you deploy?
A: Win10, 21H2 Ent/Pro
C1: You will surely want to have fog 1.5.9.110 or later installed. You will need to switch to the dev branch and reinstall the FOG software so that you can avoid an issue with 20H2 and later where M$ changed the disk structure a bit which causes FOG 1.5.9 to not be able to expand the disk properly. That has been fixed in 1.5.9.110 or later. FOG 1.5.10 will be out later this spring to have all of the fixes included.
At the sites with 10 or less computers, assuming you have enough bandwidth, will you send an IT tech to the site to image the computers or will you deputize an existing site person to image the computers?
A: Probably an existing person, which I think I have mastered the image workflow for this purpose. It can’t be dumbed down any further. However, I have remote control of DHCP at least.
C1: This question was geared towards seeing if a mobile deployment server would work for the small sites? This is basically a FOG server installed on a laptop, NUC, or Raspberry Pi? That could be shuttled between the sites. Thinking a laptop with a normal version of FOG installed, running dnsmasq and a FOG community dynamic dhcp script could do the trick. Just have the office admin connect it into your network and power it on. The only bit you would need is to be able to find it on your network, but you could do it with a linux startup script just have it send you an email saying hello I’m here and here is my IP address, or something like that.
What is your computer naming convention? Can it be calculated?
A: Naming convention cannot be automatically populated. I wish - however these machines are asset tagged, non-sequentially.
C1: I might have mentioned the use of bar coded tags here with a reader will help in this situation. Its easier to “zap” barcode fragments than to key in a host name and get it right every time.
C2: Through the use of the linux dmidecod command a FOG post install script can read values in SMBIOS. On my campus the windows host name is calculatable based on the site LUN, chassis type, and dell serial number. These all can be pulled from smbios. We can (but don’t currently) pull the dell asset field (blank field in the firmware where companies can put their asset number) with dmidecode.

Since you created a custom fog manual registration script. There is a hack I did one time at the end of that script. In the script it asks if you want to deploy the image to the target computer. If you answer yes it will reboot and then deploy the image. I found this extra reboot (since I require the IT Tech to sit in front of the computer and press the F12 button to get it to pxe boot) unnecessary. I’d have to look at the script but I was able to have it go right into imaging without a reboot if the IT Tech answer yes to deploy the image. That would save about 60 seconds on a registration and then image deployment.

p4cm4n

@george1421 said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

@p4cm4n said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

OK since we have had some interactions before (sorry I participate in too many threads to remember anymore), I’ll jump right into the more advanced stuff.

–You’re good man. We’ve actually spoken a few times over the years, but thats just because you’re as functionally involved with support as much as you are. Kudos to you for that

Do you need to manage these target computers with FOG once they are deployed? Like deploying snapins or such?
A: Yes. Specifically AD joining and maintenance. This will also probably replace/begin use as an imaging solution.
C1: This was actually a leading question to see if the “Load and Go” (using ipxe deploy image menu) approach would be best here. If all you need is imaging LaG is the simplest method of deployment if everything can be calculated and you don’t need to manage with fog after deployment. In this case you don’t register the computer with FOG and the FOG server will forget about the computer after deployment. A system builder would use this approach.
C2: Since you mentioned that you already use the deploy image menu. If you do have all of the clients registered in FOG you can restrict the deploy image menu to only display the defined image for that target computer instead of all defined images on the fog server.

–This actually doesn’t end up being feasible because they aren’t registered until after the second bootup in Windows (and a tech hits “register pending” in a powershell GUI we built for mass imaging via FOGAPI.)

Do you need to be able to remotely deploy an image to the computers or will this imaging need to be done under the control of an IT Tech?
A: Mostly under the control of a tech. But the workflow so far has been to deploy an image. This may change if I’m able to en-masse register the hosts.
C1: This question was focused on if you needed to boot through PXE or can just have the IT Tech press the F12 key during booting to select PXE boot. If you needed to reimage an entire class room of computers without visiting each computer I would say leave the default boot through the iPXE menu configured. On my campus we only allow reimaging when the IT Tech is sitting in front of the computer to manually select PXE boot. This approach avoids accidentally automatically reimging of the Director’s computer by picking the wrong computer at the wrong time if everything was automatic.

–Helpful, as funny enough I deployed the DHCP scope for fogserver//undionly|ipxe to the whole DHCP range for a server that handles a few sites. Funny enough, quite a few machines popped up to the deployimage menu. But, there is currently no salvageable data on those machines so its useless to be ‘safe’…if you catch my drift. The ones we’re actually caring about don’t do this because of SecureBoot and that Win10 takes over the boot priority for quick booting.

What target OS will you deploy?
A: Win10, 21H2 Ent/Pro
C1: You will surely want to have fog 1.5.9.110 or later installed. You will need to switch to the dev branch and reinstall the FOG software so that you can avoid an issue with 20H2 and later where M$ changed the disk structure a bit which causes FOG 1.5.9 to not be able to expand the disk properly. That has been fixed in 1.5.9.110 or later. FOG 1.5.10 will be out later this spring to have all of the fixes included.

–All good there. Dev branch as of 1/28/2022 was installed and that client deployed on all machines thus far. It’s on the golden image I created as well.

At the sites with 10 or less computers, assuming you have enough bandwidth, will you send an IT tech to the site to image the computers or will you deputize an existing site person to image the computers?
A: Probably an existing person, which I think I have mastered the image workflow for this purpose. It can’t be dumbed down any further. However, I have remote control of DHCP at least.
C1: This question was geared towards seeing if a mobile deployment server would work for the small sites? This is basically a FOG server installed on a laptop, NUC, or Raspberry Pi? That could be shuttled between the sites. Thinking a laptop with a normal version of FOG installed, running dnsmasq and a FOG community dynamic dhcp script could do the trick. Just have the office admin connect it into your network and power it on. The only bit you would need is to be able to find it on your network, but you could do it with a linux startup script just have it send you an email saying hello I’m here and here is my IP address, or something like that.

–Good point, but I don’t know this will be feasible. Most of these places will probably have reasonable bandwidth but there are parts to them that I can’t mention on this forum that would give a bit too much away as to the details.

What is your computer naming convention? Can it be calculated?
A: Naming convention cannot be automatically populated. I wish - however these machines are asset tagged, non-sequentially.
C1: I might have mentioned the use of bar coded tags here with a reader will help in this situation. Its easier to “zap” barcode fragments than to key in a host name and get it right every time.
C2: Through the use of the linux dmidecod command a FOG post install script can read values in SMBIOS. On my campus the windows host name is calculatable based on the site LUN, chassis type, and dell serial number. These all can be pulled from smbios. We can (but don’t currently) pull the dell asset field (blank field in the firmware where companies can put their asset number) with dmidecode.

–Yeah, they are asset tagged by the shipping/internal department of the organization that handles procurement. There is no field in ‘software’ that contains this value. I was hoping there would be, -OR- that we’d get a lucky batch of sequential numbers. But it starts with ‘ORG-12345678’ with the numerical portion being somewhat sequential but spread out over a large geographical area. Barcode/One-Off manual entry works well enough for now.

Since you created a custom fog manual registration script. There is a hack I did one time at the end of that script. In the script it asks if you want to deploy the image to the target computer. If you answer yes it will reboot and then deploy the image. I found this extra reboot (since I require the IT Tech to sit in front of the computer and press the F12 button to get it to pxe boot) unnecessary. I’d have to look at the script but I was able to have it go right into imaging without a reboot if the IT Tech answer yes to deploy the image. That would save about 60 seconds on a registration and then image deployment.

–This is a great idea and its been requested by my techs a bunch before I ended up taking the workflow from fullreg > deploy image. The problem was more of following the manual data entry portion of full reg, and while I could have limited what was on there to only request hostname for example, I still had to either use something like persistent groups for snapin deployment and such since they were new clients…but ultimately I just scripted the auto-install of the ‘base’ applications. After that, another poSH gui we created gives an ‘inventory’ or ‘deployment’ person who gives a new laptop to an end user, and is able to just scan the device asset tag barcode, select a checkbox, and hit start which auto-deploys the snapins for that laptop. It’s relatively scalable for 100 a day (not quite keeping up with the 500 new laptops we’re imaging a day as of yet, but…)

george1421

@p4cm4n Round 3

C2: Since you mentioned that you already use the deploy image menu. If you do have all of the clients registered in FOG you can restrict the deploy image menu to only display the defined image for that target computer instead of all defined images on the fog server.
A: This actually doesn’t end up being feasible because they aren’t registered until after the second bootup in Windows (and a tech hits “register pending” in a powershell GUI we built for mass imaging via FOGAPI.)
C3: If you knew the computer name and mac address one might be able to preload FOG with values. 14K is one heck of a deployment. I might try to inject all of the settings right into the mysql database to preregister. I seem to recall a utility like nbtscan that would spit out the computer name and mac address of an active computer subnet. There is also on the host definition page and import and export csv of the hosts table in FOG. I might leverage that utility to import blocks of computers. I’m not suggesting this is the right or only way to go about it, I just want to give you options.
C1: This question was geared towards seeing if a mobile deployment server would work for the small sites? This is basically a FOG server installed on a laptop, NUC, or Raspberry Pi? That could be shuttled between the sites. Thinking a laptop with a normal version of FOG installed, running dnsmasq and a FOG community dynamic dhcp script could do the trick. Just have the office admin connect it into your network and power it on. The only bit you would need is to be able to find it on your network, but you could do it with a linux startup script just have it send you an email saying hello I’m here and here is my IP address, or something like that.
A: Good point, but I don’t know this will be feasible. Most of these places will probably have reasonable bandwidth but there are parts to them that I can’t mention on this forum that would give a bit too much away as to the details.
C2: Not suggesting this is the right thing for your org. Just be aware that you can air drop a mobile FOG server into a site and deploy images without a big impact on the remote site.
C2: Through the use of the linux dmidecod command a FOG post install script can read values in SMBIOS. On my campus the windows host name is calculatable based on the site LUN, chassis type, and dell serial number. These all can be pulled from smbios. We can (but don’t currently) pull the dell asset field (blank field in the firmware where companies can put their asset number) with dmidecode.
A: Yeah, they are asset tagged by the shipping/internal department of the organization that handles procurement. There is no field in ‘software’ that contains this value. I was hoping there would be, -OR- that we’d get a lucky batch of sequential numbers. But it starts with ‘ORG-12345678’ with the numerical portion being somewhat sequential but spread out over a large geographical area. Barcode/One-Off manual entry works well enough for now.
C3: Then do you get a printout of asset tag vs something unique on the computer? Or is it just a asset tag sticker that you need to manually read off from? Can the tagging organization put a barcode tag on the device that represents the name of the computer (just thinking about intake workflow here)

The last bit I can think of is connecting computer to AD and placing in right OU. In my case again I have a post install script calculate the computer name and destination OU base on the location where the image is being deployed at. I take these values and update the unattend.xml script on the target computer (i.e. leave some bread crumbs behind) so that when winsetup runs it find the bread crumbs and does what it should. You kind of get an idea how I’m doing it from this tutorial; https://forums.fogproject.org/post/69725 Again I’m only giving you things that could be done to smooth out deployment. I said it before 14,000 computers is a heck of a lot. If you can shave off a few minutes per system, you can get back a few weeks of your life by the end of the project.

p4cm4n

@george1421
So here is my dilemma in building this out going forward - and its more the naming schemes behind the fog nodes.

I will need the FOG database of clients probably on (1) machine. It is the HQ Master Node as of yet. This is designated by site (HQ), with storage group (HQ). There are (3) additional storage nodes here.

I have a remote site that will potentially be of a ‘DR’ functionality. I’m unsure if its possible to have a second copy of the entire database on that server as well - but the host of ‘fogserver’ is at HQ.

of the 70 or so other sites, at least 50 of them will have FOG storage nodes that connect back to HQ’s fogserver.
-?? I’m going to be playing with subnet groups today, but the idea is going to be to try to automate which site everything talks to, for location purposes.
I have at least 50Mbit to every site, but in some places 2g-10g. I was tempted to use those 10g places as ‘failover’ if their local spot becomes too saturated. Any ideas if this will work, or how to do this?

-?? Any ideas if something like this will work, and i’m not sure if you’ve used some of these concepts or more for one-offs - but i’ll be testing some things soon. if you have ideas of testing things as well, i’m all ears. feel free to reach out via PM for more ‘details’ if you need them. some things i cannot discuss.

george1421

@p4cm4n Ok lets rewind this to a simpler approach

DR site: We may be able to leverage mysql replication to replicate the database from the root fog server to the dr fog server. This would be an active / passive configuration. So this will get the database to the remote site. If you make a storage group on the root FOG server and put the DR FOG server listed as a storage node the root fog server will replicate the image files to the DR FOG server. This would be completely automatic if you use something like keepalived. The next bit is that you will need to do some dns magic. When you setup your storage nodes, don’t point them to the IP address of the root fog node, but to a dns name that points to the root node. That way when/if the root node goes off line you update the dns entries and after some time the storage nodes will find the DR fog server. You will need to do the same thing for when you install the FOG clients, they need to point to a dns name and not an IP address. If the DR system was at the same site at the root server then for external access (storage node, fog client, web admin) you could use a floating VIP/CARP address that would stay with what ever node was master at the time.

Subnet group: This is where the FOG location plugin comes in. You create your locations, you assign your storage nodes to a location and then finally assign your computers to a location. That way when a computer PXE boots, it reaches out to the root node and asks for a local storage node. If workstation is not associated with a location, then it will image from the root FOG server node.

This Highly Available root fog server is something that I find interesting and something I think I would look into.

Just thinking big picture here if you have 14,000 hosts with the fog client installed all hitting the FOG server you will have some severe bottlenecks because the FOG server will be so busy servicing client checking that the Web UI will be very slow. We might (for performance reasons) want to split the fog database off from the FOG server and then create a second fog server for just client check-ins. This is more of an idea than a working model at the moment.

Sebastian Roth

@p4cm4n This is a huge project you are going to tackle. Wohooo.

Beside the things George already pointed out I might add a little to that. FOG was not made for such huge installs to start with. Not saying you can’t use it but I think it needs thorough consideration, testing and adjustments as you go.

One major thing I see is the fog-client software. Even 500-1000 clients can bomb out a decent server. Beside the load issue (which can be solved) you need to know that the current fog-client pins to one single FOG server via certificates. So George’s great advice on using DNS names for failover will only work if you clone all the certificates manually. I was hoping to remove this restriction as it causes way more problems than it adds security, especially when HTTPS is enabled on the FOG server. But we don’t seem to have enough people working on this project and so it’s still on a long list of todos.

As well having dozens of nodes can make the FOG web UI slow. We’ve tried to fix this issue some time ago but I am not exactly sure this is fully fixed yet.

I would definitely try to import host information directly to the database if I were you. While it might take some time to investigate and maybe come up with some scripts it will save you a lot of time and hours of sleep for such a huge amount of hosts.

Definitely look into moving your database to InnoDB right from the start!

p4cm4n

I think so far the adjustments you guys have mentioned will probably be made - but some may not be necessary. I don’t think enough of a requirement will be for the web portion to be ‘fast’ as we wrote a powershell GUI for the specific functionality that we needed with the fogAPI.

I’m still not really understanding some things about the master/storage/storage groups (with the added feature of ‘locations’)

Something I’m running into is that at specific places, I will need a new variation of an image perhaps. So lets say this is the case

HQ > FOGServer. Database host, Image host, PXE host.

HQ has 3 additional storage nodes.
HQ Has the ‘Image Master VM’ as I will call it.
Site 1> FOG Storage Node (Master Node of this Storage Group)
Site 2> FOG Storage Node (Master Node of this Storage Group)

Question 1 relates to only the HQ setup - If I use “DEPLOY IMAGE” without registering a host, is it supposed to only talk to the FOGServer, and not load balance in any way? In this functionality, I’ve got about 90 laptops imaging from the main FOGserver. It’s working incredibly fast, but just making sure this is by design. I guess the FOGServer doesn’t ‘know’ the clients, so it can’t load balance on something it doesn’t know? I’d expect the slots are taken regardless, but ?

Question 2 relates to the entire setup. If I create an image at HQ, but lets say I want to replicate that image to Site 1…how do I do that? As it is now, I’m just changing the storage group of Site 1’s storage node (disabling master) and putting it in the group of HQ so it replicates with the ones in HQ. Then moving it back to its own…but in this case, I will need to do this with all of everywhere…and as more sites come about, that may be a lottttt of images.

What I’d like to do is have that “FOGServer” master node. Capture an Image (Lets Say ImageAdobe) at HQ on this Master Node, then selectively replicate it to Site 2, for example. No need for Site 1 to get it. Likewise, Site 1 gets ImageChrome. No need for Site2 to get it. Is this something that can be done? If there is no ability for this to be done - if I just go into the storage node via SSH and delete the folder for the image I don’t need at a location, is this best practice? Will it cause any issues, if Site2 thought it had Image34, but the folder no longer existed?

So far, I’ve killed the crap out of the master fogserver based on 256MB PHP-FCGI limits (I run debian) but found a few threads @george1421 was a part of that explained how to resolve those issues. A little learning curve due to you guys using CentOS but all good. I’ve got 4000 Clients in there, which does take a while to load up the webpage, but since most of our work now is
image > approve pending hosts, add pending to group > rinse/repeat
we’ve done that whole workflow in Powershell GUI. Only I’ve had to go into the webserver

Sebastian Roth

@p4cm4n said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

I think so far the adjustments you guys have mentioned will probably be made - but some may not be necessary. I don’t think enough of a requirement will be for the web portion to be ‘fast’ as we wrote a powershell GUI for the specific functionality that we needed with the fogAPI.

If you use the fog-client software on all your 14.000 machines this will probably flood your webserver/PHP-FPM and the fogAPI will stop working just as well.

I’m still not really understanding some things about the master/storage/storage groups (with the added feature of ‘locations’)
Something I’m running into is that at specific places, I will need a new variation of an image perhaps. So lets say this is the case
HQ > FOGServer. Database host, Image host, PXE host.

HQ has 3 additional storage nodes.
HQ Has the ‘Image Master VM’ as I will call it.
Site 1> FOG Storage Node (Master Node of this Storage Group)
Site 2> FOG Storage Node (Master Node of this Storage Group)

Question 1 relates to only the HQ setup - If I use “DEPLOY IMAGE” without registering a host, is it supposed to only talk to the FOGServer, and not load balance in any way? In this functionality, I’ve got about 90 laptops imaging from the main FOGserver. It’s working incredibly fast, but just making sure this is by design. I guess the FOGServer doesn’t ‘know’ the clients, so it can’t load balance on something it doesn’t know? I’d expect the slots are taken regardless, but ?

The FOG server doesn’t know the clients in this case but it does/needs to know the image that you want to deploy. Images are associated with one or more storage groups and therefore nodes depending on your setup (code reference).

Question 2 relates to the entire setup. If I create an image at HQ, but lets say I want to replicate that image to Site 1…how do I do that? As it is now, I’m just changing the storage group of Site 1’s storage node (disabling master) and putting it in the group of HQ so it replicates with the ones in HQ. Then moving it back to its own…but in this case, I will need to do this with all of everywhere…and as more sites come about, that may be a lottttt of images.
What I’d like to do is have that “FOGServer” master node. Capture an Image (Lets Say ImageAdobe) at HQ on this Master Node, then selectively replicate it to Site 2, for example. No need for Site 1 to get it. Likewise, Site 1 gets ImageChrome. No need for Site2 to get it. Is this something that can be done? If there is no ability for this to be done - if I just go into the storage node via SSH and delete the folder for the image I don’t need at a location, is this best practice? Will it cause any issues, if Site2 thought it had Image34, but the folder no longer existed?

Probably those five rules from the wiki will help you answer your question:

An image has one storage group as it’s primary group, but can be associated to many storage groups.
The image will always capture to the primary group’s master storage node.
Replication looks for images that belong to multiple groups - and replicates from the primary master to the other associated group’s master nodes.
Replication then replicates images from each group’s masters to other ‘regular’ storage nodes in the master’s group.
A storage node can belong to multiple storage groups - you just need a storage node entry for each. For example, a non-master in one group can be a master in another group.

So far, I’ve killed the crap out of the master fogserver based on 256MB PHP-FCGI limits (I run debian) but found a few threads @george1421 was a part of that explained how to resolve those issues. A little learning curve due to you guys using CentOS but all good. I’ve got 4000 Clients in there, which does take a while to load up the webpage, but since most of our work now is
image > approve pending hosts, add pending to group > rinse/repeat
we’ve done that whole workflow in Powershell GUI. Only I’ve had to go into the webserver

Well, if this is working for you, got ahead. I still suggest you increase the fog-client checkin time: FOG web UI -> FOG Configuration -> FOG Settings -> FOG Client -> CLIENT CHECKIN TIME…

george1421

@p4cm4n said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

I’m going to answer some of these a slightly different way but still in line with what Sebastian posted. There was a few things I had to look in the code because I wanted to make sure they way I thought it worked was the way it was actually coded.

Question 1 relates to only the HQ setup - If I use “DEPLOY IMAGE” without registering a host, is it supposed to only talk to the FOGServer, and not load balance in any way? In this functionality, I’ve got about 90 laptops imaging from the main FOGserver. It’s working incredibly fast, but just making sure this is by design. I guess the FOGServer doesn’t ‘know’ the clients, so it can’t load balance on something it doesn’t know? I’d expect the slots are taken regardless, but ?

I had to look into the code. Each storage node (master or slave) has slots. The FOG imaging load balancer is not a true “utilization load”. There are storage nodes in a storage group. All nodes in that group have the same images as well as snapins deployed. When a system is requesting an image, it looks at where the image is located (storage group). Then the storage notes (master and slaves) are identified in that storage group. The noted is checked 1) to be turned on. 2) If it has reached its max clients (slots) it can service. If this is the first time through then this node is identified as the winning service node. Then it loops to storage node next. It tests is it online, and has less than max clients. If yes then this (new) storage node is check to see if its current client count is less than the winner’s client count. If yes then its the new winner. It continues to loop through the storage nodes in the storage group. So according to the code the storage node with the least number of active deployments in the storage group should get the next deployment job. ref: https://github.com/FOGProject/fogproject/blob/171d63724131c396029992730660497d48410842/packages/web/lib/fog/storagegroup.class.php#L259
That is how FOG imaging deployment does load balancing. I had the impression that it used overflow deployments in that it filled up storage node 1 til max clients then overflowed to storage node 2.

Question 2 relates to the entire setup. If I create an image at HQ, but lets say I want to replicate that image to Site 1…how do I do that? As it is now, I’m just changing the storage group of Site 1’s storage node (disabling master) and putting it in the group of HQ so it replicates with the ones in HQ. Then moving it back to its own…but in this case, I will need to do this with all of everywhere…and as more sites come about, that may be a lottttt of images.

FOG is really not setup for selective deployments. Its generally an all or nothing image deployment.

What I’d like to do is have that “FOGServer” master node. Capture an Image (Lets Say ImageAdobe) at HQ on this Master Node, then selectively replicate it to Site 2, for example. No need for Site 1 to get it. Likewise, Site 1 gets ImageChrome. No need for Site2 to get it. Is this something that can be done? If there is no ability for this to be done - if I just go into the storage node via SSH and delete the folder for the image I don’t need at a location, is this best practice? Will it cause any issues, if Site2 thought it had Image34, but the folder no longer existed?

When the replicator runs it will see that you deleted the file and then just recopy it over from the master node. I realize your image names where just for an example, but I wonder if you are doing something (maybe not wrong) a bit more complex than needed. I’ve worked on deployments where we had exactly 3 images (2 uefi and 1 bios) for 5400 computers, with 14 different models in that 5400 workstation population. We did use PDQ deploy for certain software deployments post imaging but 90% of the software was already baked into the golden image.

So far, I’ve killed the crap out of the master fogserver based on 256MB PHP-FCGI limits (I run debian) but found a few threads @george1421 was a part of that explained how to resolve those issues. A little learning curve due to you guys using CentOS but all good. I’ve got 4000 Clients in there, which does take a while to load up the webpage, but since most of our work now is

This is where we need to do some tuning from default. FOG is really geared towards the SMB deployments where you might have 500 or less computers with FOG clients (my opinion). One of the first things we need to do is make sure your mysql database is using the innodb engine for the tables and not myisam. The quick answer is isam uses tables level locking on updates and innodb uses row level locking. This becomes important when you have 800 notes checking in per minute. You end up with quite a bit of resource contention on your sql server database and your sql sever CPU utilization will jump way up. Once you convert the tables over to innodb load drops back to normal.

With as many client computers you have change your checkin time from 5 minutes to 10 or 15 minutes. It will slow down a few things on initial deployment, but you will release the back pressure on the FOG server. The one design I thought about was to have a 2 web server and 1 database server design, where you have 1 web server for deployment and system management and the second web server to service fog client requests. Both web servers would be hitting the same mysql database. I’m not sure if that would be any more performant than a single large fog server.

Another thing we need to look into is the sql server database design. I think we can make some repetitive queries a bit more responsive by creating indexes on frequently queried values. I can tell you no research has been put into optimizing the database structure by using database indexes. The main issue is finding large installations that has the time to help test different configurations. Usually once the larger installs get it working focus shifts from fog to something else so we never get to close the loop on potential changes.

p4cm4n

@sebastian-roth @george1421

Client time has been adjusted but its being adjusted on-the-fly. Imaging workflow is mostly completed on new laptops, however NOW will be pre-existing computers. Pre-Existing will expand that range from the 4000 as it stands now, to an additional 10k. I’m glad you guys mention that though, as I will need to definitely change that over ASAP before I roll out the FogClient deployment (via PDQ actually, george…however in this engagement we’ve noticed PDQ only scales sooooo much :))

It turns out that after setting the images to replicate with the functionality you guys mention (Group > Group Image Replication) worked. I might have been a little impatient when testing it before. I guess it makes sense that at the time I’ve been impatient with it, but it is what it is. I’ll have to monitor the Master Nodes in their respective groups - but I set one overnight to replicate and it worked as expected…prior to that it did not (Seemingly? perhaps firewall issues though…I watched the log as this one happened)

In most situations I wont be making storage group slaves, luckily. However the infrastructure is there to “make” the images at the HQ site. Massive ESXi farm, lots of existing snapshots, and already open firewall to everywhere.

I now understand what you’re mentioning about the database design. Any guides you have to work on this and migrate it over?

I’m getting to a point in this project where I have to offload nearly everything as I’m leaving the country soon. But all the preparation I can do in advance, the better. Going to go build 70 storage nodes today and tomorrow

Sebastian Roth

@p4cm4n said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

I watched the log as this one happened

That’s definitely what I’d suggest in any case. Watch the log and know what it does instead of just guessing what might happen.

Going to go build 70 storage nodes today and tomorrow.

Whoooohaa!

george1421

@p4cm4n Here is the procedure for upgrading the data base design from MyISAM to INNODB engine.
https://forums.fogproject.org/topic/16099/configure-fog-database-to-use-innodb-engine

Wayne Workman

@p4cm4n Is your org able to help fog with resources? Be they time or financial? FOG is running critically low these things, and support is the one thing that comes to mind when you talk about 70 nodes and 14,000 clients.

p4cm4n

@george1421
Sweet. This will be my project tomorrow.

Today I ended up automating the server node installation, was pretty fun actually. Learned a bunch of the inner workings. Never written a bash script before, or modified the fog.man.reg in the way that I have. The man.reg only asks
Hostname:
LocID:
ImgID:
GroupID:
Then deploys.

This poses the question though, I still don’t understand the workflow enough…how DID you end up having the machine image without rebooting?

george1421

@p4cm4n said in Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?:

how DID you end up having the machine image without rebooting?

I looked for the code I posted in the forums a few years ago but could not locate it. At the end of the fog.man.reg after inventory step, I checked to see if the user answer yes to image now if they did then I reloaded the kernel parameters and then (I think) called the fog.download script directly from fog.man.reg.

p4cm4n

@george1421 I called the fog.download script and got ‘no OS defined’ which must have been the kernel reload you mentioned.
how did you reload the kernel params like that? i dont see in any code anywhere to issue a reboot command unless its in functions.sh, which seems to be the common dump

Master, Storage nodes, and clients for gigantic network. Trying to design - need sanity check?

222

11.5k

16.9k

153.2k