PXEboot issues



  • Hi,

    recently we’ve experienced issues with PXEboot, to the point where I had to disable the boot option completely in the DHCP servers. Surely not convenient since we need to reimage fairly often.

    In my test-VM it looks like this when no task is set (VHD is blank):
    Screenshot 2015-05-21 13.49.12.png

    Both of these files can be accessed via a standard webbrowser just fine. I presume due to 127.0.1.1 something is off there… but:
    As soon as I set a task for this VM, it would scroll trough the FOG pre-download text and sit at a blinking cursor in the left bottom corner. nothing else happens.

    Floor clients would just loop endlessy, but they’re asking the TFTPD-server as well:

    May 21 15:20:35 mgmt1 in.tftpd[5004]: RRQ from 192.168.1.201 filename undionly.kpxe
    May 21 15:20:35 mgmt1 in.tftpd[5004]: tftp: client does not accept options
    May 21 15:20:35 mgmt1 in.tftpd[5005]: RRQ from 192.168.1.201 filename undionly.kpxe
    May 21 15:22:08 mgmt1 in.tftpd[5007]: RRQ from 192.168.1.201 filename undionly.kpxe
    May 21 15:22:08 mgmt1 in.tftpd[5007]: tftp: client does not accept options
    May 21 15:22:08 mgmt1 in.tftpd[5008]: RRQ from 192.168.1.201 filename undionly.kpxe
    

    Neither hardware nor iPXE config changed. mgmt1/192.168.1.28: fog and tftpd server, .201 and .200 are static leases for these clients. each has the tftpd-server and bootfile set in DHCP for those leases.

    Tried rebuilding iPXE from source but am getting:

    [AR] bin/blib.a
    ar: creating bin/blib.a
     [HOSTCC] util/zbin
    util/zbin.c:7:18: fatal error: lzma.h: No such file or directory
    #include <lzma.h>                  ^
    compilation terminated.
    Makefile.housekeeping:1294: recipe for target 'util/zbin' failed
    make: *** [util/zbin] Error 1
    

    Searched for every one of these packages in aptitude, but to no avail.

    Regards



  • boot.php issues have been solved. Problems lie elsewhere.

    vmxnet3 -> ipxe.pxe -> 100%, imaging, rebooting to windows without being stuck
    82574L -> undionly.kpxe.INTEL -> 100%, imaging, rebooting to windows without being stuck
    i217LM -> undionly.kpxe.INTEL -> 100%, imaging, rebooting to windows without being stuck
    Realtek -> undionly.kpxe.INTEL -> 90%, imaging, rebooting to windows without being stuck - doesn’t seem to run all that smooth, but that may be due to Realtek being crappy.
    ipxe.org/040ee119 is listed as error handler, which points to missing DHCP configuration - it just needs 3 tries it seems, also DHCP isn’t all that fast before PXE is even loaded.
    11262255_834812966611489_1190040555_n.jpg
    ^ and somehow it started imaging after a few seconds. shrug

    I think the undionly.kpxe file is somewhat broken. I’ll set up another ubuntu box, install FOG on there and copy over all the things to the production system.


  • Developer

    judging from the boot.php via browser showing 127.0.1.1, you likely need to change the FOG_WEB_HOST value to the correct IP in FOG Configuration>FOG Settings>Web Server
    you also may have the wrong IP set in the /tftpboot/default.ipxe file


  • Senior Developer

    @marbus90 said:

    I’m using ipxe.pxe for that as well without any modifications. The other clients didn’t require any changes, no matter the NIC type. I’ve had Atheros, Intel, Realtek in the clients to be imaged, giving each its own static lease with another bootfile is not all that feasible as some of them are only temporary.

    e1000e in VMware translates to Intel 82574L, then we’ve got a few clients with I217LM, majority is Realtek RTL8111E based.

    It does seem like some tftpd timeout per file, which interestingly only applies to the e1000e vNIC type. Maybe vmxnet3 is faster, who knows. In one test on e1000e it actually got as far as loading the init.xz for 77% after loading the bzimage completely with ipxe.pxe.

    Oh the stories I could tell about VMNic’s and ipxe.

    I’ve found that the undionly.kkpxe serves all of my test systems well. I have YET to find a pxe file, however, that works well with the vmxnet3 drivers. I personally use intel.pxe on my systems as I set all my vm nics to e1000 and I have no issues booting my VMs at all.



  • I’m using ipxe.pxe for that as well without any modifications. The other clients didn’t require any changes, no matter the NIC type. I’ve had Atheros, Intel, Realtek in the clients to be imaged, giving each its own static lease with another bootfile is not all that feasible as some of them are only temporary.

    e1000e in VMware translates to Intel 82574L, then we’ve got a few clients with I217LM, majority is Realtek RTL8111E based.

    It does seem like some tftpd timeout per file, which interestingly only applies to the e1000e vNIC type. Maybe vmxnet3 is faster, who knows. In one test on e1000e it actually got as far as loading the init.xz for 77% after loading the bzimage completely with ipxe.pxe.


  • Senior Developer

    @marbus90 what boot file are you using? If you’re using Intel.pxe.



  • I actually got it working on 2 test-VMs with the vmxnet3 NIC type. It netboots, it images, it reboots to windows without being stuck in PXE voodo etc pp. I’ve just pointed the clients to ipxe.pxe instead of undipxe.kpxe.

    Altough when I’m switching to the e1000e vNIC type, it gets stuck at the following screen when a task is set:
    Screenshot 2015-05-22 13.57.20.png
    while the percentage can be anything, I’ve seen 7 to 85% already. No task = no trouble again.



  • Switching to Debian 7 means reinstall which often solves everything. I’m not all too inclined to add another VM to manage, especially with different software versions.

    Entering the static lease for the FQDN into the hostsfile didn’t help either, seemingly the problem is not the 127.0.1.1 IP anymore.

    Using the iPXE commandline it seems that it doesn’t get the root-path: http://ipxe.org/err/2d03e1
    Screenshot 2015-05-22 07.33.08.png

    hah. now they stuck at
    Screenshot 2015-05-22 08.59.31.png

    iPXE recommends to use the command route to check the IP and GW… voila:
    Screenshot 2015-05-22 09.04.29.png

    now… why does it show the gateway as inaccessible? my workstation-VM is on that same host which has fine connectivity to everything. Also FOG including storage is on the very same subnet without firewalls in between. I’ll just try vmotioning the FOG VM over to the test-host…


  • Moderator

    I have full confidence that if you simply used Debian 7, that everything would just work and you wouldn’t have any more of these problems.

    FOG does not officially support Debian 8 yet, but the senior developer might get with you to make that happen if you’re willing and he has time.

    Those are your two paths. I apologize for not being of more assistance, but it’s just new territory to me and I’ve never seen your exact problem before and with you using Debian 8, it could be any number of things that are wrong.


  • Moderator

    Can you take a look at this?

    https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution

    They recommend that if you’re using a static IP, to use the static IP instead of 127.0.1.1

    In your original post, right when the system uses 127.0.1.1, that’s the exact moment you start having problems in little China…



  • Soooo… iPXE successfully build (missed out on liblzma-dev), copied undionly.kpxe to /tftpboot/ (backed up the old one), and now I’m stuck here:

    fogtest1 with vmxnet3, registered and download task created:
    Screenshot 2015-05-22 04.58.26.png
    it sits in a loop there. as soon as Ctrl+B disappears, it starts at those messages at the top of the screen.

    fogtest2 with vmxnet3, registered and download task created:
    Screenshot 2015-05-22 04.55.06.png

    Both VMDKs are clean without OS. Interesting that the older VM



  • We’ve started on the beta already. I was more accustomed to systemd anyway.

    Grepping for 127.0.1.1 only listed /etc/hosts, nothing in the fog directories. Altough it does show when I call the boot.php via browser. service/ipxe/boot.php:

    #!ipxe
    cpuid --ext 29 && set arch x86_64 || set arch i386
    colour --rgb 0xff6600 2
    cpair --foreground 7 --background 2 2
    console --picture http://127.0.1.1/fog/service/ipxe/bg.png --left 100 --right 80
    prompt --key 0x06 --timeout 3000 Booting... (Press CTRL + F to access the menu) && goto menuAccess || chain -ar http://127.0.1.1/fog/service/ipxe/grub.exe --config-file="rootnoverify (hd0);chainloader +1"
    :menuAccess
    login
    params
    param mac0 ${net0/mac}
    param arch ${arch}
    param username ${username}
    param password ${password}
    param menuaccess 1
    param debug 0
    isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
    isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
    :bootme
    chain -ar http://127.0.1.1/fog/service/ipxe/boot.php##params
    

    for reference, the file as it exists in the server directory:

    <?php
    header("Content-type: text/plain");
    require_once('../../commons/base.inc.php');
    if ($_REQUEST['mac0'] && !$_REQUEST['mac1'] && !$_REQUEST['mac2'])
            $_REQUEST['mac'] = $_REQUEST['mac0'];
    else if ($_REQUEST['mac0'] && $_REQUEST['mac1'] && !$_REQUEST['mac2'])
            $_REQUEST['mac'] = $_REQUEST['mac0'].'|'.$_REQUEST['mac1'];
    else if ($_REQUEST['mac0'] && !$_REQUEST['mac1'] && $_REQUEST['mac2'])
            $_REQUEST['mac'] = $_REQUEST['mac0'].'|'.$_REQUEST['mac2'];
    else if ($_REQUEST['mac0'] && $_REQUEST['mac1'] && $_REQUEST['mac2'])
            $_REQUEST['mac'] = $_REQUEST['mac0'].'|'.$_REQUEST['mac1'].'|'.$_REQUEST['mac2'];
    $MACs = HostManager::parseMacList($_REQUEST['mac']);
    $Host = $FOGCore->getClass('HostManager')->getHostByMacAddresses($MACs);
    new BootMenu($Host);
    

    In the VM I’m using vmxnet3 or intel e1000, the clients are Realtek something. The fun thing is, it worked before. So I think something just corrupted. Can’t find xz-devel in aptitude, only xz-utils which is installed - but the error persists. zlib is installed in standard and -dev packages, yet the make process errors out there.

    May 22 02:12:39 mgmt1 in.tftpd[7757]: RRQ from 192.168.1.200 filename undionly.kpxe
    May 22 02:12:39 mgmt1 in.tftpd[7757]: tftp: client does not accept options
    May 22 02:12:39 mgmt1 in.tftpd[7758]: RRQ from 192.168.1.200 filename undionly.kpxe
    May 22 02:12:43 mgmt1 in.tftpd[7759]: RRQ from 192.168.1.200 filename /default.ipxe
    

    ^^^ this is how it looks when a task is set for the VM with e1000e network.



  • Had a similar problem with an Intel interface (could have been broadcom/tg3)
    Upgrading pxe-firmware or rebuilding ipxe (you need xz-devel for that) could do the trick.


  • Moderator

    @marbus90 said:

    fog 1.2.0 on debian jessie with nginx + php-fpm. no errors in the nginx logs. that install is less than half a year old, so no fiddling with old configs and upgrades or stone-aged iPXE versions.

    Debian “jessie” Release Information

    Debian 8.0 was released April 25th, 2015. The release included many major changes, described in our press release and the Release Notes.

    https://www.debian.org/releases/stable/

    FOG currently does not work on Debian 8.

    But, lets try to see what’s going on with this 127.0.1.1 stuff… that’s not even the local loop-back address (which would be 127.0.0.1)

    Lets search every file for that address ? see what comes up…
    [CODE]grep -H -r “127.0.1.1” /[/CODE]



  • fog 1.2.0 on debian jessie with nginx + php-fpm. no errors in the nginx logs. that install is less than half a year old, so no fiddling with old configs and upgrades or stone-aged iPXE versions.


  • Moderator

    What version of FOG are you using ? What linux distro and version?


Log in to reply
 

429
Online

39.3k
Users

11.0k
Topics

104.6k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.