Moderators

Private

Posts

  • RE: PXE partial success, no tftp

    @thezman007 said in PXE partial success, no tftp:

    My current setup seems to allow our PXE boot to partially work, but ultimately fails. It appears that our proxyDHCP via dnsmasq is working and our main DHCP server is handing out IPs while our fog server is directing devices to itself for PXE services, but the overall process fails once tftp should be serving the .efi file. We’ve tried using a different computer when attempting to PXE to try and eliminate model specific quirks. I’ve also tried changing the file dnsmasq should serve (snponly.efi or ipxe.efi) with no change. tftp via locahost works as expected, tftp over LAN fails. There are NO tftp requests seen from tcpdump during PXE boot, but I can’t provide that data until my tech returns on-site next week.

    This is the most important section.

    what I want you to do is run tcpdump from the fog server. I want you to use the pcap filter of port 67 or port 68 or port 4011 or port 69

    That will capture dhcp, proxy-dhcp and tftp.

    ref: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue?_=1769224516191

    Review the pcap with wireshark. You should see the DORA process if the fog server is on the same subnet as the pxe booting client.

    Discover
    Offer
    Request
    Ack/Nack

    What will be important to watch is to make sure the client is getting two offer packets. Once will be from your main dhcp server and the second one from dnsmasq. If you are not seeing the one from dnsmasq server then that is the start of the problem. If you do see two and one is from your dnsmasq server then go to the next part.

    Now that you verified that dnsmasq is seeing the DISCOVER packet and responded with an OFFER packet then after DORA you should see the client call back to dnsmasq on port 4011. In that transaction the client will be told the boot server and boot file. Verify these are correct.

    And finally the client should reach out to the FOG server over tftp to first request the file size then request the file. So there will be two tftp communications, then the file should download.

    posted in FOG Problems
  • RE: PXE issues

    @Jamaal This problem is solvable but it make take some effort on your part.

    Lets start with the basics.

    For the DHCP IP zone where your pxe booting clients live, you need to set dhcp options 66 to the IP address of your fog server. And for dhcp options 67 that needs to be snponly.efi or snp.efi. With those settings configured on a MS Windows based dhcp server a pxe booting client should boot. Make sure on your dhcp server that is responding to bootp and dhcp requests. Its been a while since I messed with windows but on the dhcp server there should be a setting of dhcp bootp or both. Select both.

    Now lets talk about WDS for a second. A WDS server can use dhcp options 66 and 67 as above, but it can also run a proxy dhcp service that tells the client to ignore the dhcp options and come talk to it for boot information after it gets an IP address for the dhcp server. This maybe called a netboot service or something like that on your WDS server. Its not part of the main WDS service. If this service is still enabled it will override any settings you make in dhcp for pxe booting.

    So how do you figure this out to what’s wrong?

    The easiest and most complicated issue is to identify what is flying down your network during the pxe booting process. You can do this with wireshark on a witness computer (computer not part of the pxe booting process). This witness computer can either be a ms windows or linux computer, the key is to have wireshark loaded. When you start up a capture use a capture filter of port 67 or port 68 or port 4011 That will limit what wireshark sees to only the dhcp packets. Make sure the witness computer is connected to the same subnet as the pxe booting computer.

    Start the packet capture and then attempt to pxe boot the target computer. Continue to capture the packet until the pxe booting computer either reaches the fog iPXE menu or errors out. Then stop the capture.

    In the top section you should see the DORA (discover, offer, request, and finally ack/nack) process. The process goes as follows:
    Client -> Discovery
    Server-> Offer
    Client -> Request
    Server -> Ack/Nack

    In this process you are most interested in the one or more OFFER packets. In a normal network you should only see one OFFER packet. When WDS is involved you will see one OFFER packet from your main dhcp server and a second OFFER packet from your WDS server. If you are seeing the OFFER from your WDS server then you don’t have the proxy-dhcp service disabled, and that is causing your issue. If you are seeing two offer packets from two different dhcp servers, such as a primary / secondary setup make sure both dhcp server are configured to boot from FOG server.

    Now what do you do if you only have one OFFER packet and its still not working. This is where you need to select the OFFER packet and then look at the data in the parameters box. There will be the bootp fields of next-server and boot-file these need to be configured for the fog server IP and snp.efi. Then in the dhcp options section options 66 and 67 need to be set correctly. If one or the other sections are not set correctly you will get random machines not booting while others are.

    If you can’t figure it out save the packet capture file “be sure you only captured the dhcp process” and up load the file to a file share site and post the link here and one of us will take a look to see what’s wrong. But I think from what I covered here you should be able to figure out what the pxe booting client is being told to do incorrectly.

    posted in FOG Problems
  • RE: could not verify mount point, check if .mntcheck exists /bin/fog.download

    @alperi The bit if detail you are missing is what the kernel parameters were that was sent to the fog client. From what you posted it appears that the FOG server has all of the bits in the right spots.

    In the kernel parameters that are passed to bzImage during boot up it lists where the FOS engine can find the deployment server. I would verify the IP addresses are correct. If everything appears correct with the parameters, we can debug this a bit more by debug deploy and then manually interact with the fos engine from the target PC’s console.

    posted in FOG Problems
  • RE: The DDP package file was not found or could not be read

    @djgalloway Just to add a bit of detail here. All of the work you did was on the iPXE side, which is great work by the way. The kernel driver I updated was after you select an FOG iPXE menu item that is when bzImage is loaded and run. It relies on kernel parameters that is provided by iPXE to find the root file system. This is technically what you fixed by ensuring that default.ipxe/boot.php from the fog server was being called. At the end of the day, I’m glad you got that working because your setup is definitely an edge case that works well in your environment.

    posted in Hardware Compatibility
  • RE: FOG boot issue after BIOS update on HP ZBook Fury 16 G11 – iPXE autoexec.ipxe not found

    @CanadienITGuy Just for your and anyone’s fyi the autoexec.ipxe... Not Found is not an error. It’s more of an info message than a warning or error.
    I actually have tested adding an autoexec.ipxe, even just an empty file to remove that message but even an empty file or a file that is even just a symlink or copy/paste of our normal ipxe/boot menu files causes things to break in the process.
    The autoexec.ipxe is meant for adding customization to the ipxe process without needing to re-build the ipxe binary. But my testing with it within the fog workflow was that it’s best to just let that message exist and to see it as it being not found means the process will not be altered from your expected Fog ipxe workflow

    posted in FOG Problems
  • RE: The DDP package file was not found or could not be read

    @djgalloway Well you have a pretty complex environment when you are chain loading other boot loaders. What I would do on a temporary debugging bases, update your dhcp server settings to use the fog server’s IP for dhcp option 66 and ipxe.efi for dhcp option 67. This will use a fog only solution. Something else that might cause the issue is that the linux kernel that is booting, is not the kernel shipped with FOG. Or the opposite, the init.xz image is not a FOG image. The fog developers compress the init.xz image with an uncommon (in regards to linux file systems) compressor, if the booting linux kernel doesn’t have that image decompressor built in you will get that error about unknown block (because its still compressed).

    It does look from your screen shot that both a bzImage and init.xz is being copied over to the target computer. Another but less likely issue could be that you are not using the custom ipxe.efi boot loader that was shipped with FOG.

    posted in Hardware Compatibility
  • RE: The DDP package file was not found or could not be read

    As I was connecting the firmware to the kernel I thought to myself, it would be great to see the log files to see what firmware the nic needed. When I looked at the nic’s firmware directory there was only one file to I compiled it into kernel so I just added that file and moved on. While it was compiling I looked at the error again and came to the conclusion that the DPP package error is a bit of a red herring. While its an error that you will eventually have to deal with, it didn’t cause this specific error.

    The issue is the kernel panic unable to mount the root fs. Initializing the nic comes after the root fs (init.bz) is mounted. Please verify that both bzImage (which is the kernel that is booting) and init.xz is being transferred via ipxe to the target computer. The kernel bzImage is having an issue mounting init.xz (root fs).

    Now with that said, here is a current kernel with the intel nic firmware included: https://drive.google.com/file/d/1rISBOUuqAlnV_cq9HZgsFdjfygpB4JLT/view?usp=drive_link

    posted in Hardware Compatibility
  • RE: Huge database entries number

    @siarkowski I believe I have found the cause of this.
    A while back, right after the version you reverted too, we added an improved queueing system which is working perfectly in 1.6. However when we ported it backwards into 1.5.10.x I made a simple syntax error (the wrong $task->id vs $task->get('id') ). I have fixed this in the latest dev-branch.

    This should also greatly improve the experience of the imaging task queue (see also https://github.com/FOGProject/fogproject/issues/736 and https://github.com/FOGProject/fogproject/issues/691) I thought I also wrote a post somewhere in the forum walking through the updated process that fixed some longstanding date math issues, but I can’t find that now.

    Point being, if you would be so kind as to update to the latest dev-branch version and see if it fixes the issue, that would be very helpful.

    posted in FOG Problems
  • RE: Deploy Tasks Not Continuing After First Batch

    @eliaspereira This should be fully fixed in the stable release of 1.5.10.x coming on the 15th of this month and in the dev-branch as of now.
    I thought it was already fixed back in September, and it has been working in 1.6 since then but we just got a report of a related issue here https://forums.fogproject.org/topic/18081 which I believe I just fixed.

    posted in FOG Problems
  • RE: Queue problems when deploying

    @tian @DBailey635 @eliaspereira Apologies for missing this post. This was fixed in August-ish of last year, see also:
    https://github.com/FOGProject/fogproject/issues/736

    I found this searching for a post I wrote about it, as I’m pushing another fix for this for a bug just found in 1.5.10.x

    If you update to the latest dev-branch (or what will be stable on the 15th of this month) or give the working-1.6 branch aka 1.6-beta a try, you’ll find the queuing problems fixed.

    posted in FOG Problems