Fresh Install of 1.5.9 with CentOS 7 issues
-
@Sebastian-Roth I just make SELINUX disabled. It was set to permissive. That didn’t make a difference.
-
I tried using a different machine, but all we have are Dells, but I tried 3 different models and encountered an issue where it would sometimes download the bzImage, but then get to init.xz and do the [connecting]… that goes across the screen and then fail.
-
I downloaded the iPXE binaries and that didn’t help either.
Is there a log I can look at specifically for this issue?
Thanks for the help!
-
-
@Chris-Whiteley I don’t have access to a server right now but if I remember correctly, there is a fog setting (fog configuration->fog settings) that stores the fog service directory.
If that field is fog/service then I’m not sure what’s wrong, but if it’s fogservice as we’ve seen in the past, I imagine this could be causing the problem.
Based on what I can see, this is currently just set to fog, if that’s the case can you change it to be fog/service
The part that’s making me think this is the output of
set boot-url http://${fog-ip}/${fog-webroot}
This should be
set boot-url http://${fog-ip}/${fog-webroot}/service/ipxe
-
@Tom-Elliott said in Fresh Install of 1.5.9 with CentOS 7 issues:
This should be
set boot-url http://${fog-ip}/${fog-webroot}/service/ipxeNo I don’t think so. iPXE pulls files that do not a full URL from the same location it got the last file from. So it pulls
http://${fog-ip}/${fog-webroot}/service/ipxe/boot.php
and would download kernel and init from that same location as well. -
This post is deleted! -
@Chris-Whiteley Unfortunately there is no log file for this except the Apache logs.
Please run
tail -f /var/log/httpd/access_log
while doing the PXE boot and see if you get the requests logged in there. -
@Sebastian-Roth This is what I saw:
192.168.20.41 - - [06/Oct/2020:08:37:18 -0700] "POST /fog/service/ipxe/boot.php HTTP/1.1" 200 652 "-" "iPXE/1.20.1+ (g4bd0)"
192.168.20.41 is the client
-
@Chris-Whiteley Nothing after that?
-
@Sebastian-Roth It just had a connection thing with my browser. At least that’s what I think it is.
192.168.20.9 - - [06/Oct/2020:08:43:58 -0700] "POST /fog/management/index.php?node=client&sub=wakeEmUp HTTP/1.1" 200 4350 "-" "Mozilla/5.0 (Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"
-
@Chris-Whiteley There must be something we are missing here. Is that machine that is not able to PXE boot from your FOG server in the same subnet as the FOG server? Connected to the same switch?
Can you please take a picture of the error on screen and post here? Just wanna make sure we are not missing something here.
-
I was simply thinking of what potentially be the issue. In the past I know we had a type of issue with fog/service being set as fogservice. So it was just a thought.
As you’re using centos, can you provide logs for:
/var/log/php-fpm/www-error.log (or very close)
Php errors will show up there for centos typically.
-
The error on the screen is the same one that I have posted below in this thread. Here are a couple of more pictures about it.
It is connected through 3 different switches, but I have not had issues with this before. They are also on the same subnet: 192.168.20.1/24.
-
[04-Oct-2020 03:47:02] NOTICE: error log file re-opened [04-Oct-2020 15:32:43] NOTICE: [pool www] child 26930 exited with code 0 after 138892.087835 seconds from start [04-Oct-2020 15:32:43] NOTICE: [pool www] child 15042 started [04-Oct-2020 15:32:50] NOTICE: [pool www] child 26795 exited with code 0 after 138950.201260 seconds from start [04-Oct-2020 15:32:50] NOTICE: [pool www] child 15045 started [04-Oct-2020 15:35:30] NOTICE: [pool www] child 27071 exited with code 0 after 138908.805085 seconds from start [04-Oct-2020 15:35:30] NOTICE: [pool www] child 15194 started [04-Oct-2020 15:39:14] NOTICE: [pool www] child 27318 exited with code 0 after 138879.587345 seconds from start [04-Oct-2020 15:39:14] NOTICE: [pool www] child 15486 started [04-Oct-2020 15:39:42] NOTICE: [pool www] child 27320 exited with code 0 after 138907.167868 seconds from start [04-Oct-2020 15:39:42] NOTICE: [pool www] child 15512 started [04-Oct-2020 15:41:12] NOTICE: [pool www] child 27405 exited with code 0 after 138913.195601 seconds from start [04-Oct-2020 15:41:12] NOTICE: [pool www] child 15600 started [04-Oct-2020 16:46:31] NOTICE: [pool www] child 31676 exited with code 0 after 138896.314150 seconds from start [04-Oct-2020 16:46:31] NOTICE: [pool www] child 19773 started [04-Oct-2020 18:49:32] NOTICE: [pool www] child 7284 exited with code 0 after 138870.684686 seconds from start [04-Oct-2020 18:49:32] NOTICE: [pool www] child 27795 started [04-Oct-2020 21:13:51] NOTICE: [pool www] child 16588 exited with code 0 after 138930.860352 seconds from start [04-Oct-2020 21:13:51] NOTICE: [pool www] child 4701 started [05-Oct-2020 08:34:51] NOTICE: Terminating ... [05-Oct-2020 08:34:51] NOTICE: exiting, bye-bye! [05-Oct-2020 08:35:31] NOTICE: fpm is running, pid 1089 [05-Oct-2020 08:35:31] NOTICE: ready to handle connections [05-Oct-2020 08:35:31] NOTICE: systemd monitor interval set to 10000ms [05-Oct-2020 20:25:46] NOTICE: [pool www] child 1813 exited with code 0 after 42614.612041 seconds from start [05-Oct-2020 20:25:46] NOTICE: [pool www] child 16127 started [05-Oct-2020 20:27:04] NOTICE: [pool www] child 1811 exited with code 0 after 42692.732221 seconds from start [05-Oct-2020 20:27:04] NOTICE: [pool www] child 16220 started [05-Oct-2020 20:27:47] NOTICE: [pool www] child 3239 exited with code 0 after 42730.396625 seconds from start [05-Oct-2020 20:27:47] NOTICE: [pool www] child 16263 started [05-Oct-2020 20:27:51] NOTICE: [pool www] child 1812 exited with code 0 after 42740.360500 seconds from start [05-Oct-2020 20:27:51] NOTICE: [pool www] child 16273 started [05-Oct-2020 20:27:52] NOTICE: [pool www] child 1815 exited with code 0 after 42740.447148 seconds from start [05-Oct-2020 20:27:52] NOTICE: [pool www] child 16275 started [05-Oct-2020 20:28:04] NOTICE: [pool www] child 1814 exited with code 0 after 42752.756222 seconds from start [05-Oct-2020 20:28:04] NOTICE: [pool www] child 16289 started [05-Oct-2020 20:29:55] NOTICE: [pool www] child 1939 exited with code 0 after 42862.776461 seconds from start [05-Oct-2020 20:29:55] NOTICE: [pool www] child 16407 started [06-Oct-2020 07:03:34] NOTICE: Terminating ... [06-Oct-2020 07:03:34] NOTICE: exiting, bye-bye! [06-Oct-2020 07:03:52] NOTICE: fpm is running, pid 1061 [06-Oct-2020 07:03:52] NOTICE: ready to handle connections [06-Oct-2020 07:03:52] NOTICE: systemd monitor interval set to 10000ms
-
@Chris-Whiteley that’s the error log itself, there should also be one for www
-
@Tom-Elliott This is all I see
-
@Chris-Whiteley alright.
Something appears to be messed up but where/what is a big question.
If it were a coding issue within 1.5.9 we’d have probably heard about this from many more than yourself.
There’s a lot of files we create, but I’d start with wondering if trying to rerun the installer might help? But run it with the -y switch.
cd /path/to/fogproject/bin
./installfog.sh -y
Let it run until completion and see if things start working?
It’s a long shot but worth a try I think.
-
@Tom-Elliott I will do this right now and let you know the outcome.
-
@Tom-Elliott Same issue with the [Connecting]… going across and failing, rebooting.
-
@Chris-Whiteley Ok, I was misled by the
Could not start download: Operation not supported (http://ipxe.org/3c092003)
error you posted earlier. I suppose this only happens when it did not even pull the boot.php file in the first place. If you runimgfetch bzImage
then it doesn’t know where to get this from I guess.Now, good you are posting more pictures of this. We see that it sometimes is able load boot.php (earlier picture) and sometimes not! More and more I think this is a network issue.
Is that machine that is not able to PXE boot from your FOG server in the same subnet than the FOG server? Connected to the same switch? Would you be able to hook up a PC to that very same switch the FOG server is on and try again?
-
@Sebastian-Roth I will not be able to login to the same switch as the FOG server as it is a VM in our data center. I am 3 switches down from the data center and don’t have issues with the other 5 schools I manage getting this to work. Same setup as this. I have a switch at my desk with multiple VLANs and that is how I get to do imaging for each district. Does that help paint a picture at all?
-
@Chris-Whiteley Sounds like this is kind of a new branch you set this up, right? Data center, three switches down from there is just kind of a black box part and I was hoping we could take out some of that from the equation to make sure.
Do you have the exact same Dell models in the other schools as well? If yes, than it can’t be an issue related to iPXE network drivers on that hardware. Nevertheless, have you tried different iPXE binaries?
ipxe.(k)pxe
for BIOS orsnp(only).efi
for UEFI based machines?Do you get the chance to setup a mirror port on the last switch you connect the PXE booting host to? I would be interested to see a network packet capture of the full PXE boot process.