Best posts made by Sebastian Roth

Sebastian Roth

@michaeloberg Let’s switch to chat -> speech bubble in the top right corner of the forum.

Sebastian Roth

@JLE Looks quite interesting that packet dump. Something I have not come across in a long time. I am trying to write down what I see in the PCAP to hopefully make any sense as I don’t see what’s wrong yet. Maybe George can add to that as well.

First the PXE ROM of the NIC sends a DHCP Discover and does not get an answer. So 8 seconds later it sends another Discover (same information but just a new DHCP transaction ID). The second DHCP Discover is answered with a DHCP Offer very promptly (delay only ms). Transaction IDs of the second Discover and the following Offer match so the answer is definitely not a delayed one to the first Discover. Question: Why is the first DHCP Discover not answered? (this is happening again later on)
As far as I can see the DHCP Offer looks good (next-server and filename set properly).
Now the client is quiet for 16 seconds before sending a DHCP Request packet to complete the DHCP communication. This Request packet is promptly answered by a proper looking DHCP ACK. So client is finally happy I suppose.
Then I suspect the TFTP transfer to happen which was not captured. See the next bullet point.
Another 12 seconds after the first DHCP DORA (Discover, Offer, Request, ACK) finished I see a new DHCP Discover from that client. This time option 175 is set which is a clear sign for iPXE sending the packet. And the same thing is happening again. No answer from the DHCP server for 8 seconds and the client (now iPXE instead of the PXE ROM) sends another DHCP Discover which is promptly answered with a fine DHCP Offer.
After the Offer the client sends a third DHCP Discover and then a DHCP Request just a second later. I think this is where things start to go really wrong. I suppose iPXE is very confused about the DHCP server only answering the second DHCP Discover (matching transaction ID). I haven’t checked the iPXE code yet but I guess this is something unexpected now causing an issue in your case.
Following are a row of DHCP Request packets from the client which are all answered by DHCP NAK (non ACK!) packets. So the DHCP server declines to give that offered IP information to the client. Result is the “No configuration methods succeeded” message in iPXE.
In the packet dump I see the same thing happening again a minute later. But one thing is different this time. The very first DHCP Discover sent by the client’s PXE ROM is answered within one second this time. But for the DHCP Discover sent by iPXE I see again the exact same behavior as described above.

I guess this can be fixed in iPXE but I doubt this is the right place to do so. There is something wrong within your network. Do those first DHCP Discover packet get lost somewhere along the way? Why is the second one answered so promptly then?

Ok, I’ll leave this ti you for now. We all need to think about it and I am sure someone will come up with a proper explanation on what’s going wrong here.

Sebastian Roth

@plegrand Testing internet connection is now done mainly with curl.

Sebastian Roth

@george1421 @Goll420 I have worked on the multicast tasking/finishing code just before the release of FOG 1.5.9 because there were issues when you schedule a test not via the group but by creating a named multicast session. I thought I had this all tested and working correctly but I might be wrong. I will need to get a test setup ready and see if I can replicate the issue as described.

Sebastian Roth

@wmw509 The refind.efi file is an older version (0.9.4). Tom added this a while ago as this seemed to be kind of a stable version. I have to say that I am not sure this one really is version 0.9.4 or perhaps a different one. You’d use that one by renaming the file. FOG serves refind_x64.efi to 64 bit architecture and refind_ia32.efi for 32 bit.

Unfortunately there does not seem to be a general solution to this. People report that version xxx of rEFInd doesn’t work with their hardware while others have no problems and vice versa. So trying to provide binaries that suit every case is probably not possible. You can download older versions and try out which one works best for you.

Sebastian Roth

@Matthieu-Jacquart Well that’s really interesting. It is known that Windows 2004 adds another revovery partition to the end of the disk. I have this on my list of things to work on for the next weeks.

Though I cannot imagine how removing this partition can cause such issues you described here. It’s really over the top for a Windows installation to boot fine for a couple of times and then fail with such errors if it is for the missing recovery partition I would think. But well, I am a Linux guy and only scratch my head unable to understand the way MS is making up its world.

Sebastian Roth

@george1421 said in Issues with inventory and uploading image:

That isn’t the response from the file command I expected.

Some older Linux systems do not provide the great output using file that we see with up to date systems.

As the OP reported the issue is solved, we don’t need to worry.

Sebastian Roth

@Thobela As we can see on cateee.net the network card 8086:0D4C is supported since kernel 5.5 and so it should be working as soon as you have the kernel binaries updated as suggested.

Sebastian Roth

@eseelke FIrst off you want to use --autoaccept so it will use defaults wherever it can.

To make it install storage node instead of master node and provide database information you will need to use “pre set” variables:

installtype=S snmysqlhost=x.x.x.x snmysqlpass='pAsSwOrD$!' ./installfog.sh --autoaccept

Sebastian Roth

Got a response from Chris about this. His tests were fine too and as we don’t seem to have anyone else willing to test I am “closing” this topic as solved now.

There has not been a response from the partclone devs either. I just just updated the issue and hope that we get this merged upstream as well.

Sebastian Roth

@george1421 Ahhh, missed that part. But nonetheless I think it doesn’t hurt to have it added even now so we don’t forget as soon as we update the kernel to 5.9 and later.

Sebastian Roth

Ok, let’s call it quits. I just pushed a commit to the fos repo to add this patch to our build of partclone and add the ignore_crc parameter back to the scripts.

Marking as solved.

Sebastian Roth

@JJ-Fullmer We provide the PrinterManagerHelper with ever release of the fog-client, e.g. https://github.com/FOGProject/fog-client/releases/tag/0.12.0

I know it’s not perfect and sometimes doesn’t provide all the information correctly. Though I don’t have real printers in my test setup and can’t work on this part much.

Sebastian Roth

@rogalskij said in "Pending/additional" mac addresses combining with other hosts:

For these machines, the majority we DON’T register beforehand.

Ok, that explains part of what you see. The host is being deployed without an object in the database. It deploys, then early host rename is asked to rename it but skips that part because it does not have the information for unregistered hosts. So you end up with all MACs being registered to that single host that is registered and has the name of your initial master image (not the image name bit the hostname in Windows).

So I am wondering what you process looks like from start to end. Where is the step where you actually give each of the deployed machines their own hostname? Either hosts are registered in the FOG DB and named through this or you need to use other ways like OOBE to do the naming.

Sebastian Roth

Pinging @JJ-Fullmer on this.

Sebastian Roth

@rogalskij While I can understand that your techs want to get it imaged as fast as possible I would still suggest they take on small hurdle at the beginning and save a lot of time later on.

See if you can tell them to PXE boot and choose “Perform Full Host Registration and Inventory” instead of direct deploy. This will ask for the hostname to be used within FOG (and als set in Windows!) as well as image ID and even set all the information for automatic AD joining. At the end of registration you can tell it to schedule a deploy so it does one reboot and deploys without any further interaction.

If you set things up correctly your techs would take maybe 2-3 minutes for the full registration but after that they can walk away and it does the deploy, rename, join domain (fog-client) without any further action required.

Beside that you won’t have additional MAC flooding that single host…

Sebastian Roth

Ok, after lots of digging in the code and @JJ-Fullmer database we seem to have found out what happend. Turned out some of his PowerShell code messed things up by deleting the primary MAC address of some hosts and the fog-client did its share to kind of reset some of the hosts’ information and make them pending.

The PowerShell code is fixed already and I will work on adding some checks to hopefully not mess as much up with the fog-client as we did here.

Initially we thought the snapin hash problems were at fault but the evidence collected points to this being a coincidence.

Sebastian Roth

@benc Please check in FOG Configuration -> FOG Settings -> General Settings -> CAPTURERESIZEPCT. From the help text:

This setting defines the amount of padding applied to a partition before attempting resize the ntfs volume and capturing it.

Default is adding 5 % and you can increase that as much as you like (up to 99 % which would not make sense).

Sebastian Roth

@Jacques-Olivier Just found this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=913740

It talks about an issue where it would not properly use the option debian-installer/allow_unauthenticated_ssl=true when the request is being redirected from HTTP to HTTPS.

So try changing your iPXE parameters to “… debian-installer/allow_unauthenticated_ssl=true url=https://${fog-ip}/preseed/preseed.cfg splash ip=dhcp rw” and see if that works.

Sebastian Roth

@devrick When you delete an image from the list/search view by marking the checkbox and click the “Delete” button this will only remove the image definition from the database but won’t delete the files from disk. This is by intention because we want to prevent people from marking several images unintentionally and delete those forever.

If you want to have the image files deleted through the web UI you need to open the image settings for one particular image, go to the “Delete” tab at the top, then check “Delete files” and hit the “Delete” button.

If you want to delete image files manually now that you have removed those from the database/web UI you can run Linux rm command to delete specific folders in /images. Make sure you know how the rm command works using some test files/directories before you blindly go ahead.