Best posts made by george1421

george1421

@Tom-Elliott We’ve also seen some strange interaction between green ethernet settings on some switches and these new (at the time) l219-v network adapters too. But I really don’t think that fits here since I can assume they use this switch model across their campus.

george1421

@Tauric The question about editing the pcap, I’ve seen some people mask info in the pcap thinking about privacy, but that just adds confusion, like the unprintable characters. I thought the unprintable characters were the results of hand editing the pcap file.

The advantage of going the dnsmasq route on the fog server is if the fog server isn’t running you have nothing issues pxe boot into. If you go the dnsmasq route remote the pxe boot information in your router so it doesn’t confuse things when the fog server is offline

george1421

@Tauric ok I see a whole lot of issues here. Let me ask you did you mask out any data in the pcap?

In the ethernet header (bootp protocol) the boot-file field is blank (should be ipxe.efi). The next server points to 192.168.0.254 not 0.33) Looking at the dhcp part, dhcp option 66 (should be boot server IP) is an unpritable character. DHCP option 67 is ipxe.efi but its not terminated with an end of string character 0x00, it ends the string with 0xFF. For background both bootp and dhcp options need to be set because its up to the pxe rom writer to pick if they want to boot using bootp (older protocol) or dhcp. The issue here is with your dhcp server giving your target computers bad info.

Since you are using a SOHO router, we see them not exactly place nice with pxe booting.

My recommendation is if you can’t fix your dhcp server easily then forgo using it and install dnsmasq on your fog server. It will take about 10 minutes as well as support dynamic pxe booting (bios/uefi). DNSMASQ in this configuration will not issue IP address, but only pxe boot into. https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server?_=1698421239631

george1421

@Tauric On the windows box, make sure you disable the firewall since tftp uses 2 network ports like ftp does, if you are trying to make a comparative test.

Since you seem confident with tcpdump. Lets follow this tutorial to get a pcap from the FOG server. This will show us the dhcp process as well as the tftp process at the end. It should give us a good picture of what is going on. Capture the pcap and upload it to a file share site and I will take a look at it. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue?_=1698421239623 I’ll need the complete pcap since the screen shots don’t show the complete details and there are a many exceptions to list.

george1421

@Tauric said in TFTP Timeout:

(and took less than 10 minutes lol)

Well I was taking into account for slow speeds between keyboard and chair…
Glad you have it worked out. DNSMASQ should work flawlessly in your environment.

george1421

@frobishant32 There is a couple of things going on here.

Your dnsmasq configuration is only setup for bios based computers. Look at this tutorial here to see how to configure dnsmasq for proxy dhcp. Understand this is not what you need, but look at the section with the pxe-service entries for the uefi settings : https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server?_=1699482367667

The second issue you have is that when iPXE boots it once again does a dhcp query to find the IP address of the “what it assumes” is the fog server. So what ever dhcp has for options 66 and 67 will be used to find the fog server. This next part is a little complicated but let me explain. When iPXE boots it runs an internal script that the fog developers embedded in the FOG version of iPXE. The script is pretty much here: https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescript

#!ipxe
isset ${net0/mac} && ifopen net0 && dhcp net0 || goto dhcpnet1
echo Received DHCP answer on interface net0 && goto proxycheck

:dhcpnet1
isset ${net1/mac} && ifopen net1 && dhcp net1 || goto dhcpnet2
echo Received DHCP answer on interface net1 && goto proxycheck

:dhcpnet2
isset ${net2/mac} && ifopen net2 && dhcp net2 || goto dhcpall
echo Received DHCP answer on interface net2 && goto proxycheck

:dhcpall
dhcp && goto proxycheck || goto dhcperror

:dhcperror
prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

:proxycheck
isset ${proxydhcp/next-server} && set next-server ${proxydhcp/next-server} || goto nextservercheck

:nextservercheck
isset ${next-server} && goto netboot || goto setserv

:setserv
echo -n Please enter tftp server: && read next-server && goto netboot || goto setserv

:chainloadfailed
prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

:netboot
chain tftp://${next-server}/default.ipxe || goto chainloadfailed

As I said this script looks to what dhcp settings are and then uses that to chain to load default.ipxe.

So you will need to adjust this script and rebuild ipxe if you want to change the behavior of ipxe as it boots from fog. Maybe something like this edit

#!ipxe
isset ${net0/mac} && ifopen net0 && dhcp net0 || goto dhcpnet1
echo Received DHCP answer on interface net0 && goto proxycheck

:dhcpnet1
isset ${net1/mac} && ifopen net1 && dhcp net1 || goto dhcpnet2
echo Received DHCP answer on interface net1 && goto proxycheck

:dhcpnet2
isset ${net2/mac} && ifopen net2 && dhcp net2 || goto dhcpall
echo Received DHCP answer on interface net2 && goto proxycheck

:dhcpall
dhcp && goto proxycheck || goto dhcperror

:dhcperror
prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

:proxycheck
isset ${proxydhcp/next-server} && set next-server ${proxydhcp/next-server} || goto nextservercheck

:nextservercheck
isset ${next-server} && goto netboot || goto setserv

:setserv
echo -n Please enter tftp server: && read next-server && goto netboot || goto setserv

:chainloadfailed
prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot

:netboot
chain tftp://192.168.21.82/default.ipxe || goto chainloadfailed

That chain update will then ignore what dhcp is telling ipxe and it will load always from the 21.82 address.

Here is a tutorial on rebuilding ipxe. https://forums.fogproject.org/topic/15826/updating-compiling-the-latest-version-of-ipxe

I’m pretty sure you can get to what you need with the above info. I would try the dnsmasq settings first before going down the ipxe edit route.

george1421

@oz-agoston I agree with Gabriel check your dhcp server(s) (if you have more than one) to ensure dhcp options 66 and 67 are being sent to the target computer. PXE booting will simply not work if these values are not set on your campus dhcp server.

The next questions is if you have a screening router between the subnets (client and fog server) that might be blocking tftp.

If you could supply the actual error message created by the target computer during a pxe boot that would help us identify where to look.

george1421

@abdel The error message indicates that no pxe boot information is making it to the target computer. Please try to pxe boot with a physical computer. For this test you don’t need to do anything other than get to the fog ipxe menu. Make sure when you pxe boot a physical computer you know if its in bios or uefi mode, because the value entered in dhcp option 67 is specific to the pxe booting computer. For bios computers you need undionly.kpxe in option 67 and for uefi computers you will need ipxe.efi in dhcp option 67.

george1421

@marsface said in Time for a New FOG Tutorial:

I recently got a FOG environment set up, but it was a huge pain. The documentation is all over the place, and often very outdated, but we got it done.

How much exactly did you pay for the rights to install and setup FOG?

Probably not the way I would go about asking for help from a forum driving support system. There is always room for those folks who would want to update or create the perfect FOG documentation.

My question is now, is there a new tutorial or method for using FOG with autopilot that will inject drivers without having to create a new golden image for each type of device, or need to include drivers in the golden image. I have the CAB files set up in the FOG server, so why do I need to put the drivers on the C drive as well?

As the person who wrote the tutorial on drive injection using FOG that was written before autopilot. And truth be told autopilot is a windows “thing” not a FOG thing. FOG only moves data blocks from here to there. I can tell you at the time I was deploying a single golden image to 14 different hardware platforms by injecting the drivers as I had laid out in the tutorial. The issue you have is that FOG is linux based and not MS Windows based (as with SCCM/WDS/MDT) so there are only a limited things FOG can do during deployment (move data blocks from here to there). FOG can’t step in and run windows applications. Once FOG imaging is done, its up to the target OS to complete the setup.

george1421

@GorkaAP Ok your explanation is very clear of the problem.

I have a couple of ideas here:

An additional building, located 4 kilometers away, shares the same network subnet.

I have see remote locations connected via a WAN have issue with loading the iPXE boot loaders via tftp. In this case the computers would error out where it can not download the NBF boot file. The issue was related to the tftp block size being larger that the MTU packet size on the WAN. If you are direct conneted between the remote building with fiber this is probably not your issue.

Having both locations on one subnet makes things a bit harder since dhcp works off broadcast domains and your local and remote locations have the same broadcast domain since they are on the same subnet.

The FOG booting process is such
PXE Rom (target computer) boots and queries dhcp to find dhcp options 66 and 67
PXE Rom downloads the bootloader pointed to by options 66 and 67
The iPXE boot loader boots and again queries dhcp for dhcp option 66 to locate the FOG server.
The IPXE boot loader then will chain load to the fog server over tftp default.ipxe
default.ipxe will chain load boot.php.

If you are on the same subnet between the sites and it works at the main campus but not at the remote campus then this is the first time ipxe chain loads to http instead of tftp. From the remote campus can you get to the fog server’s web ui?

It might help to debug if you can snap a clear picture of the error message on the target computer as you get the chain load error.

One additional thing I can think is if you have more than 1 dhcp server within this broadcast domain (such as a primary/slave) make sure both have the proper dhcp option settings. I have see two dhcp servers with one having the setting configured and the other without cause random issues. Whichever dhcp responded first the client would use (one having the proper boot setting and the other without).

Bonus additional thing: You are using dnsmasq to provide pxe boot information. Could there be something filtering out the DHCP Discover packet from the client at the remote site? I can see where/if DNSMASQ would work at the main campus, where the remote campus might not, if the DISCOVER packet is getting lost on its way to the DNSMASQ service. You can test this on the dnsmaq server by using tcpdump and monitoring for port 67 or port 68 Now power on a computer at the remote building, do you see the DISCOVER packet arrive at the dnsmaq server? The DISCOVER packet starts the process in DNSMASQ to send the pxe boot information to the target computer.

Bonus++ thing. If your link speed to the remote location is less than 1GbE you can install a fog storage node at that location and deploy your computers using the storage node. (this assumes you solve the pxe booting issue). You will install the location plugin into fog then assign computers to the remote location as well as the storage node. It will still boot using the main campus dhcp and tftp server, but actual image deployment will happen via the storage node not the WAN link.

george1421

@GorkaAP I hate to begin with this, but that referenced document deals with a 10 year out of date version of FOG.

Lets see if we can work out a solution using a current release. Please state the problem you are trying to resolve.

george1421

@alexpolytech94 The shrinking of the disk when using single disk resizable is a bit of black magic. Sometimes because of the actual data size or location of the data on the disk its not possible to shrink the volume down enough to make it fit on a disk that is 1/2 the size of the source disk. I won’t go into too much detail, but if you have a partition that is fixed in size that can’t shrink, but a partition just before it on the disk that can shrink, fog will shrink the one that can be shrunk but leaves the one that can’t be shrunk as it were. If you were to deploy that image to a computer with half the size that non shrunk partition would be technically beyond the last sector on the 1/2 size disk.

To put it another way, always build your mother image on the smallest disk possible, because it can expand to a larger target disk more often than shrink your image to fit on a smaller disk.

When I was building golden images I would build them on a VM with a 50GB hard drive (smaller than anything I would deploy it to) and then let FOG expand the disk to match the target disk size. That always worked.

george1421

@atlas You need to have internet access to install FOG. I have seen some people install 2 network adapters in the fog server, one on the business network and one on the isolated network. The nic on the business network is for management and (install time only) internet access. This keeps the isolated network isolated.

FWIW that wiki page you referenced is 10 years old and does not currently apply to a current version of FOG.

FWIW: You can manually download the inits and kernels from here: https://github.com/FOGProject/fos/releases/

george1421

@atlas When it comes to opensource, the only wrong answer is one that doesn’t work. Well done!

Another hackish way would be to instead of changing the programming, you could enter a fake/but valid entry in the /etc/hosts table to point the dns entry to your internal server. This way you can use fog native code when version next comes out. But again if it worked for you it was the right answer.

george1421

You are going to have to draw a picture with IP addresses of how this infrastructure is connected. Use fake public addresses, but real internal addresses.

I can tell that that the way FOG with a master, storage nodes, and FOG clients are designed… they are expected (storage node and fog clients) to be able to reach the master node 100% of the time to remain operational. So If you have a fully routeable site to site VPN then everything will work as designed. If you have intermittent connection then things won’t work quite as well. The storage node needs to be able to contact the master node because the database only exists on the master node. So this link needs to be up 100% of the time. PXE booting is local then jumps to the master node to load boot.php.

george1421

While I can’t comment on the FOG code, at lot of systems will launch a process and then keep track of that process via a handle until it stops. In the destructor for the instances they will kill off the task based on the handle that was created when the process was launched of the application instance dies before the launched processes. I think the intent of the replicator was to have only one instance of the lftp process running at one time so it wouldn’t be too difficult to keep track of the process handle (as apposed to several hundred processes).

With the current design you normally wouldn’t have to start and stop the replicator multiple times, so having multiple instances of the lftp process running should never happen. I’m not seeing the value in putting energy into fixing a one off issue.

george1421

My preference would be to not do something out of band if possible. It does appear that creating a fake image with its path set to /image/drivers is choking the FOG replicator because of the sub folders, so I’m going to back out that change. Because no replication is happening because of that error.

I haven’t dug into the fog replicator code yet, but I’m wondering if rsync wouldn’t be a better method to replicate the images from the master node to the other storage nodes. Rsync would give us a few more advanced options like data compression and only syncing files that were changed than just a normal file copy.

george1421

Its a trunk build 5040.

Looking at the drivers folder. I have a combination of files and sub folders. Depending on how smart the replicator is it may not handle or traverse the sub folders.

The structure of the drivers folder is such.
/images/drivers/OptiPlex7010.zip
/images/drivers/OptiPlex7010/audio
/images/drivers/OptiPlex7010/audio/<many files and sub folders>
/images/drivers/OptiPlex7010/video/<many files and sub folders>
<…>

I suspect that the replicator was only designed to copy the image folder and one level of files below.

george1421

Rebooting the storage node appears to have started the replication /images/drivers but so far only the first file has replicated.

Looking at /opt/fog/logs/fogreplicator.log on the master node I see this error.

[10-22-15 8:19:52 pm] * shvstorage - SubProcess -> mirror: Fatal error: 500 OOPS: priv_sock_get_cmd
[10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Mirroring directory drivers' [10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Making directory drivers’
[10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Transferring file `drivers/DN2820FYK.zip’

the zip file is the only thing in /images/drivers on the storage node.

george1421

OK that sounds like a plan. I’ll set that up right away.

Do you know what the replication cycle interval is or where to find the setting? Under “normal” production once a days is sufficient, but I can see during development that we might need to shorten it to just a few hours.