Fresh Install of 1.5.9 with CentOS 7 issues


  • I am having an issue where during a fresh install of 1.5.9 (stable) on CentOS 7 following this install guide: https://wiki.fogproject.org/wiki/index.php?title=CentOS_7. I restore the database and that all goes well, it is just when I try to upload a new image I created I get the following:

    bzImage…Connection timed out
    Could not boot: Connection timed out (http://ipxe.org/4c0a6035)

    What am I missing? The IP addresses are correct going through to this point.


  • @Sebastian-Roth This is not a new branch, just an upgrade and move from Ubuntu to Centos 7 as it is more stable when upgrading. The only thing I can do as far as getting closer is to move this computer into the server room, but since it is a desktop that would be a little cumbersome. These are juniper switches.

    For creating the schools’ base image, I use this same machine to do the work for all of them since it is pretty close to the “golden image” for each one. I am using this Dell Optiplex 7040. I create a legacy image (have some more impoverished districts with old machines) and a UEFI image.

    I have not tried different iPXE binaries and I wasn’t aware that you guys wanted me to do a mirror port. I will try and work on this today if I get some time.

    Thanks for the guidance.

  • Senior Developer

    @Chris-Whiteley Sounds like this is kind of a new branch you set this up, right? Data center, three switches down from there is just kind of a black box part and I was hoping we could take out some of that from the equation to make sure.

    Do you have the exact same Dell models in the other schools as well? If yes, than it can’t be an issue related to iPXE network drivers on that hardware. Nevertheless, have you tried different iPXE binaries? ipxe.(k)pxe for BIOS or snp(only).efi for UEFI based machines?

    Do you get the chance to setup a mirror port on the last switch you connect the PXE booting host to? I would be interested to see a network packet capture of the full PXE boot process.


  • @Sebastian-Roth I will not be able to login to the same switch as the FOG server as it is a VM in our data center. I am 3 switches down from the data center and don’t have issues with the other 5 schools I manage getting this to work. Same setup as this. I have a switch at my desk with multiple VLANs and that is how I get to do imaging for each district. Does that help paint a picture at all?

  • Senior Developer

    @Chris-Whiteley Ok, I was misled by the Could not start download: Operation not supported (http://ipxe.org/3c092003) error you posted earlier. I suppose this only happens when it did not even pull the boot.php file in the first place. If you run imgfetch bzImage then it doesn’t know where to get this from I guess.

    Now, good you are posting more pictures of this. We see that it sometimes is able load boot.php (earlier picture) and sometimes not! More and more I think this is a network issue.

    Is that machine that is not able to PXE boot from your FOG server in the same subnet than the FOG server? Connected to the same switch? Would you be able to hook up a PC to that very same switch the FOG server is on and try again?


  • @Tom-Elliott Same issue with the [Connecting]… going across and failing, rebooting.


  • @Tom-Elliott I will do this right now and let you know the outcome.

  • Senior Developer

    @Chris-Whiteley alright.

    Something appears to be messed up but where/what is a big question.

    If it were a coding issue within 1.5.9 we’d have probably heard about this from many more than yourself.

    There’s a lot of files we create, but I’d start with wondering if trying to rerun the installer might help? But run it with the -y switch.

    cd /path/to/fogproject/bin
    ./installfog.sh -y

    Let it run until completion and see if things start working?

    It’s a long shot but worth a try I think.


  • @Tom-Elliott This is all I see

    2020-10-06_9-46-53.png

  • Senior Developer

    @Chris-Whiteley that’s the error log itself, there should also be one for www


  • @Tom-Elliott

    [04-Oct-2020 03:47:02] NOTICE: error log file re-opened
    [04-Oct-2020 15:32:43] NOTICE: [pool www] child 26930 exited with code 0 after 138892.087835 seconds from start
    [04-Oct-2020 15:32:43] NOTICE: [pool www] child 15042 started
    [04-Oct-2020 15:32:50] NOTICE: [pool www] child 26795 exited with code 0 after 138950.201260 seconds from start
    [04-Oct-2020 15:32:50] NOTICE: [pool www] child 15045 started
    [04-Oct-2020 15:35:30] NOTICE: [pool www] child 27071 exited with code 0 after 138908.805085 seconds from start
    [04-Oct-2020 15:35:30] NOTICE: [pool www] child 15194 started
    [04-Oct-2020 15:39:14] NOTICE: [pool www] child 27318 exited with code 0 after 138879.587345 seconds from start
    [04-Oct-2020 15:39:14] NOTICE: [pool www] child 15486 started
    [04-Oct-2020 15:39:42] NOTICE: [pool www] child 27320 exited with code 0 after 138907.167868 seconds from start
    [04-Oct-2020 15:39:42] NOTICE: [pool www] child 15512 started
    [04-Oct-2020 15:41:12] NOTICE: [pool www] child 27405 exited with code 0 after 138913.195601 seconds from start
    [04-Oct-2020 15:41:12] NOTICE: [pool www] child 15600 started
    [04-Oct-2020 16:46:31] NOTICE: [pool www] child 31676 exited with code 0 after 138896.314150 seconds from start
    [04-Oct-2020 16:46:31] NOTICE: [pool www] child 19773 started
    [04-Oct-2020 18:49:32] NOTICE: [pool www] child 7284 exited with code 0 after 138870.684686 seconds from start
    [04-Oct-2020 18:49:32] NOTICE: [pool www] child 27795 started
    [04-Oct-2020 21:13:51] NOTICE: [pool www] child 16588 exited with code 0 after 138930.860352 seconds from start
    [04-Oct-2020 21:13:51] NOTICE: [pool www] child 4701 started
    [05-Oct-2020 08:34:51] NOTICE: Terminating ...
    [05-Oct-2020 08:34:51] NOTICE: exiting, bye-bye!
    [05-Oct-2020 08:35:31] NOTICE: fpm is running, pid 1089
    [05-Oct-2020 08:35:31] NOTICE: ready to handle connections
    [05-Oct-2020 08:35:31] NOTICE: systemd monitor interval set to 10000ms
    [05-Oct-2020 20:25:46] NOTICE: [pool www] child 1813 exited with code 0 after 42614.612041 seconds from start
    [05-Oct-2020 20:25:46] NOTICE: [pool www] child 16127 started
    [05-Oct-2020 20:27:04] NOTICE: [pool www] child 1811 exited with code 0 after 42692.732221 seconds from start
    [05-Oct-2020 20:27:04] NOTICE: [pool www] child 16220 started
    [05-Oct-2020 20:27:47] NOTICE: [pool www] child 3239 exited with code 0 after 42730.396625 seconds from start
    [05-Oct-2020 20:27:47] NOTICE: [pool www] child 16263 started
    [05-Oct-2020 20:27:51] NOTICE: [pool www] child 1812 exited with code 0 after 42740.360500 seconds from start
    [05-Oct-2020 20:27:51] NOTICE: [pool www] child 16273 started
    [05-Oct-2020 20:27:52] NOTICE: [pool www] child 1815 exited with code 0 after 42740.447148 seconds from start
    [05-Oct-2020 20:27:52] NOTICE: [pool www] child 16275 started
    [05-Oct-2020 20:28:04] NOTICE: [pool www] child 1814 exited with code 0 after 42752.756222 seconds from start
    [05-Oct-2020 20:28:04] NOTICE: [pool www] child 16289 started
    [05-Oct-2020 20:29:55] NOTICE: [pool www] child 1939 exited with code 0 after 42862.776461 seconds from start
    [05-Oct-2020 20:29:55] NOTICE: [pool www] child 16407 started
    [06-Oct-2020 07:03:34] NOTICE: Terminating ...
    [06-Oct-2020 07:03:34] NOTICE: exiting, bye-bye!
    [06-Oct-2020 07:03:52] NOTICE: fpm is running, pid 1061
    [06-Oct-2020 07:03:52] NOTICE: ready to handle connections
    [06-Oct-2020 07:03:52] NOTICE: systemd monitor interval set to 10000ms
    
    

  • @Sebastian-Roth

    The error on the screen is the same one that I have posted below in this thread. Here are a couple of more pictures about it.

    IMG_0050.JPG
    IMG_0051.JPG

    It is connected through 3 different switches, but I have not had issues with this before. They are also on the same subnet: 192.168.20.1/24.

  • Senior Developer

    I was simply thinking of what potentially be the issue. In the past I know we had a type of issue with fog/service being set as fogservice. So it was just a thought.

    As you’re using centos, can you provide logs for:

    /var/log/php-fpm/www-error.log (or very close)

    Php errors will show up there for centos typically.

  • Senior Developer

    @Chris-Whiteley There must be something we are missing here. Is that machine that is not able to PXE boot from your FOG server in the same subnet as the FOG server? Connected to the same switch?

    Can you please take a picture of the error on screen and post here? Just wanna make sure we are not missing something here.


  • @Sebastian-Roth It just had a connection thing with my browser. At least that’s what I think it is.

    192.168.20.9 - - [06/Oct/2020:08:43:58 -0700] "POST /fog/management/index.php?node=client&sub=wakeEmUp HTTP/1.1" 200 4350 "-" "Mozilla/5.0 (Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"
    
  • Senior Developer

    @Chris-Whiteley Nothing after that?


  • @Sebastian-Roth This is what I saw:

    192.168.20.41 - - [06/Oct/2020:08:37:18 -0700] "POST /fog/service/ipxe/boot.php HTTP/1.1" 200 652 "-" "iPXE/1.20.1+ (g4bd0)"
    

    192.168.20.41 is the client

  • Senior Developer

    @Chris-Whiteley Unfortunately there is no log file for this except the Apache logs.

    Please run tail -f /var/log/httpd/access_log while doing the PXE boot and see if you get the requests logged in there.


  • This post is deleted!
  • Senior Developer

    @Tom-Elliott said in Fresh Install of 1.5.9 with CentOS 7 issues:

    This should be
    set boot-url http://${fog-ip}/${fog-webroot}/service/ipxe

    No I don’t think so. iPXE pulls files that do not a full URL from the same location it got the last file from. So it pulls http://${fog-ip}/${fog-webroot}/service/ipxe/boot.php and would download kernel and init from that same location as well.

319
Online

8.1k
Users

15.0k
Topics

141.3k
Posts