Fresh Install of 1.5.9 with CentOS 7 issues
-
@Chris-Whiteley Nothing after that?
-
@Sebastian-Roth It just had a connection thing with my browser. At least that’s what I think it is.
192.168.20.9 - - [06/Oct/2020:08:43:58 -0700] "POST /fog/management/index.php?node=client&sub=wakeEmUp HTTP/1.1" 200 4350 "-" "Mozilla/5.0 (Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0"
-
@Chris-Whiteley There must be something we are missing here. Is that machine that is not able to PXE boot from your FOG server in the same subnet as the FOG server? Connected to the same switch?
Can you please take a picture of the error on screen and post here? Just wanna make sure we are not missing something here.
-
I was simply thinking of what potentially be the issue. In the past I know we had a type of issue with fog/service being set as fogservice. So it was just a thought.
As you’re using centos, can you provide logs for:
/var/log/php-fpm/www-error.log (or very close)
Php errors will show up there for centos typically.
-
The error on the screen is the same one that I have posted below in this thread. Here are a couple of more pictures about it.
It is connected through 3 different switches, but I have not had issues with this before. They are also on the same subnet: 192.168.20.1/24.
-
[04-Oct-2020 03:47:02] NOTICE: error log file re-opened [04-Oct-2020 15:32:43] NOTICE: [pool www] child 26930 exited with code 0 after 138892.087835 seconds from start [04-Oct-2020 15:32:43] NOTICE: [pool www] child 15042 started [04-Oct-2020 15:32:50] NOTICE: [pool www] child 26795 exited with code 0 after 138950.201260 seconds from start [04-Oct-2020 15:32:50] NOTICE: [pool www] child 15045 started [04-Oct-2020 15:35:30] NOTICE: [pool www] child 27071 exited with code 0 after 138908.805085 seconds from start [04-Oct-2020 15:35:30] NOTICE: [pool www] child 15194 started [04-Oct-2020 15:39:14] NOTICE: [pool www] child 27318 exited with code 0 after 138879.587345 seconds from start [04-Oct-2020 15:39:14] NOTICE: [pool www] child 15486 started [04-Oct-2020 15:39:42] NOTICE: [pool www] child 27320 exited with code 0 after 138907.167868 seconds from start [04-Oct-2020 15:39:42] NOTICE: [pool www] child 15512 started [04-Oct-2020 15:41:12] NOTICE: [pool www] child 27405 exited with code 0 after 138913.195601 seconds from start [04-Oct-2020 15:41:12] NOTICE: [pool www] child 15600 started [04-Oct-2020 16:46:31] NOTICE: [pool www] child 31676 exited with code 0 after 138896.314150 seconds from start [04-Oct-2020 16:46:31] NOTICE: [pool www] child 19773 started [04-Oct-2020 18:49:32] NOTICE: [pool www] child 7284 exited with code 0 after 138870.684686 seconds from start [04-Oct-2020 18:49:32] NOTICE: [pool www] child 27795 started [04-Oct-2020 21:13:51] NOTICE: [pool www] child 16588 exited with code 0 after 138930.860352 seconds from start [04-Oct-2020 21:13:51] NOTICE: [pool www] child 4701 started [05-Oct-2020 08:34:51] NOTICE: Terminating ... [05-Oct-2020 08:34:51] NOTICE: exiting, bye-bye! [05-Oct-2020 08:35:31] NOTICE: fpm is running, pid 1089 [05-Oct-2020 08:35:31] NOTICE: ready to handle connections [05-Oct-2020 08:35:31] NOTICE: systemd monitor interval set to 10000ms [05-Oct-2020 20:25:46] NOTICE: [pool www] child 1813 exited with code 0 after 42614.612041 seconds from start [05-Oct-2020 20:25:46] NOTICE: [pool www] child 16127 started [05-Oct-2020 20:27:04] NOTICE: [pool www] child 1811 exited with code 0 after 42692.732221 seconds from start [05-Oct-2020 20:27:04] NOTICE: [pool www] child 16220 started [05-Oct-2020 20:27:47] NOTICE: [pool www] child 3239 exited with code 0 after 42730.396625 seconds from start [05-Oct-2020 20:27:47] NOTICE: [pool www] child 16263 started [05-Oct-2020 20:27:51] NOTICE: [pool www] child 1812 exited with code 0 after 42740.360500 seconds from start [05-Oct-2020 20:27:51] NOTICE: [pool www] child 16273 started [05-Oct-2020 20:27:52] NOTICE: [pool www] child 1815 exited with code 0 after 42740.447148 seconds from start [05-Oct-2020 20:27:52] NOTICE: [pool www] child 16275 started [05-Oct-2020 20:28:04] NOTICE: [pool www] child 1814 exited with code 0 after 42752.756222 seconds from start [05-Oct-2020 20:28:04] NOTICE: [pool www] child 16289 started [05-Oct-2020 20:29:55] NOTICE: [pool www] child 1939 exited with code 0 after 42862.776461 seconds from start [05-Oct-2020 20:29:55] NOTICE: [pool www] child 16407 started [06-Oct-2020 07:03:34] NOTICE: Terminating ... [06-Oct-2020 07:03:34] NOTICE: exiting, bye-bye! [06-Oct-2020 07:03:52] NOTICE: fpm is running, pid 1061 [06-Oct-2020 07:03:52] NOTICE: ready to handle connections [06-Oct-2020 07:03:52] NOTICE: systemd monitor interval set to 10000ms
-
@Chris-Whiteley that’s the error log itself, there should also be one for www
-
@Tom-Elliott This is all I see
-
@Chris-Whiteley alright.
Something appears to be messed up but where/what is a big question.
If it were a coding issue within 1.5.9 we’d have probably heard about this from many more than yourself.
There’s a lot of files we create, but I’d start with wondering if trying to rerun the installer might help? But run it with the -y switch.
cd /path/to/fogproject/bin
./installfog.sh -y
Let it run until completion and see if things start working?
It’s a long shot but worth a try I think.
-
@Tom-Elliott I will do this right now and let you know the outcome.
-
@Tom-Elliott Same issue with the [Connecting]… going across and failing, rebooting.
-
@Chris-Whiteley Ok, I was misled by the
Could not start download: Operation not supported (http://ipxe.org/3c092003)
error you posted earlier. I suppose this only happens when it did not even pull the boot.php file in the first place. If you runimgfetch bzImage
then it doesn’t know where to get this from I guess.Now, good you are posting more pictures of this. We see that it sometimes is able load boot.php (earlier picture) and sometimes not! More and more I think this is a network issue.
Is that machine that is not able to PXE boot from your FOG server in the same subnet than the FOG server? Connected to the same switch? Would you be able to hook up a PC to that very same switch the FOG server is on and try again?
-
@Sebastian-Roth I will not be able to login to the same switch as the FOG server as it is a VM in our data center. I am 3 switches down from the data center and don’t have issues with the other 5 schools I manage getting this to work. Same setup as this. I have a switch at my desk with multiple VLANs and that is how I get to do imaging for each district. Does that help paint a picture at all?
-
@Chris-Whiteley Sounds like this is kind of a new branch you set this up, right? Data center, three switches down from there is just kind of a black box part and I was hoping we could take out some of that from the equation to make sure.
Do you have the exact same Dell models in the other schools as well? If yes, than it can’t be an issue related to iPXE network drivers on that hardware. Nevertheless, have you tried different iPXE binaries?
ipxe.(k)pxe
for BIOS orsnp(only).efi
for UEFI based machines?Do you get the chance to setup a mirror port on the last switch you connect the PXE booting host to? I would be interested to see a network packet capture of the full PXE boot process.
-
@Sebastian-Roth This is not a new branch, just an upgrade and move from Ubuntu to Centos 7 as it is more stable when upgrading. The only thing I can do as far as getting closer is to move this computer into the server room, but since it is a desktop that would be a little cumbersome. These are juniper switches.
For creating the schools’ base image, I use this same machine to do the work for all of them since it is pretty close to the “golden image” for each one. I am using this Dell Optiplex 7040. I create a legacy image (have some more impoverished districts with old machines) and a UEFI image.
I have not tried different iPXE binaries and I wasn’t aware that you guys wanted me to do a mirror port. I will try and work on this today if I get some time.
Thanks for the guidance.