FOG v1.5.7 on Ubuntu 18.04 random problem multicast
-
@Tom-Elliott I am not sure this is a good idea as default. This means that between every partition it needs to wait for at least one minute if one of the multicast client fails…
-
@Sebastian-Roth That, I think, is okay. 30 seconds could be too short a time (as the partition get’s re-expanded) potentially. It’s one minute vs 10 seconds or 30 seconds. I’m just taking a cautious approach. Only one minute to wait between a partition isn’t too much to ask I don’t think. I can certainly shorten the time. Or we could add a global setting for the admin to select an appropriate time.
-
@Tom-Elliott said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:
Or we could add a global setting for the admin to select an appropriate time.
That would surely be a nice feature for 1.6.
You are probably right hat 1 minute is not asking too much. Though we haven’t seen issues with 10 seconds for many years now. I wouldn’t change that for 1.5.x. The OP is welcome to just manually adjust this.
-
@Sebastian-Roth @Tom-Elliott thanks for your comments.
It is very possible that this is the reason for our multicast problems. It may also be the reason for the host database update problem at the end of the multicast task (post https://forums.fogproject.org/topic/14143/dev-branch-multicast-for-some-hosts-db-not-updated-after-restore/2?_=1580905777518).
I will modify that value (60 seconds), I will test it and I will tell you the result.
This new parameter would be a good improvement for the next version 1.6 -
Hello again. I think the problem of blocking when changing partition has been fixed. Yesterday I executed a multicast task with 12 computers and all deployed each partition. But 2 computers showed this other error message at the end of the task (at the end of the deployment of all partitions):
Then, in the “Imaging log” report it does not correctly show the end time of this computer’s task (“-0001-11-30 00:00:00” and the duration “2020 years 2 months …”).Is it possible to correct this too?
Thanks.
-
@tec618 How many hosts with fog-client software installed do you have in your network?
Please take a look at the apache and php-fpm logs (see my signature) and post valid information here.
-
@Sebastian-Roth said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:
@tec618 How many hosts with fog-client software installed do you have in your network?
In our subnet there are really 32 computers plus the fog server (virtual machine). The 32 computers have the fog client software installed on the 2 mounted operating systems (Windows and Ubuntu)
Please take a look at the apache and php-fpm logs (see my signature) and post valid information here.
We have just completed another multicast task with 12 computers to attach the lines of the requested files (and previously we have restarted the MV of the FOG server). In this case, only one computer failed to update the database at the end.
/var/log/apache2/error.log
[Wed Feb 12 00:06:04.802457 2020] [mpm_prefork:notice] [pid 988] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 configured -- resuming normal operations [Wed Feb 12 00:06:04.802492 2020] [core:notice] [pid 988] AH00094: Command line: '/usr/sbin/apache2' [Wed Feb 12 08:40:18.331499 2020] [proxy_fcgi:error] [pid 8178] [client 192.168.36.252:37258] AH01071: Got error 'PHP message: PHP Warning: Illegal offset type in /var/www/fog/lib/fog/fogmanagercontroller.class.php on line 296\n', referer: http://192.168.36.252/fog/management/index.php?node=host&sub=edit&id=96 [Wed Feb 12 08:47:22.410088 2020] [mpm_prefork:notice] [pid 988] AH00169: caught SIGTERM, shutting down [Wed Feb 12 08:47:46.067103 2020] [mpm_prefork:notice] [pid 940] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 configured -- resuming normal operations [Wed Feb 12 08:47:46.067352 2020] [core:notice] [pid 940] AH00094: Command line: '/usr/sbin/apache2' [Wed Feb 12 09:00:23.764176 2020] [proxy_fcgi:error] [pid 986] [client 192.168.36.116:55576] AH01071: Got error 'PHP message: PHP Warning: count(): Parameter must be an array or an object that implements Countable in /var/www/fog/lib/pages/imagemanagementpage.class.php on line 1125\n', referer: http://192.168.36.252/fog/management/index.php?node=image&sub=list
/var/log/php7.2-fpm.log
[11-Feb-2020 23:29:30] NOTICE: [pool www] child 5974 started [12-Feb-2020 00:12:32] NOTICE: [pool www] child 19941 exited with code 0 after 20696.983251 seconds from start [12-Feb-2020 00:12:32] NOTICE: [pool www] child 8678 started [12-Feb-2020 00:12:57] NOTICE: [pool www] child 19943 exited with code 0 after 20719.070195 seconds from start [12-Feb-2020 00:12:57] NOTICE: [pool www] child 8703 started [12-Feb-2020 00:13:40] NOTICE: [pool www] child 19942 exited with code 0 after 20762.680041 seconds from start [12-Feb-2020 00:13:40] NOTICE: [pool www] child 8740 started [12-Feb-2020 00:13:57] NOTICE: [pool www] child 19960 exited with code 0 after 20760.003372 seconds from start [12-Feb-2020 00:13:57] NOTICE: [pool www] child 8757 started [12-Feb-2020 00:14:08] NOTICE: [pool www] child 19970 exited with code 0 after 20763.628227 seconds from start [12-Feb-2020 00:14:08] NOTICE: [pool www] child 8767 started [12-Feb-2020 00:14:08] NOTICE: [pool www] child 19969 exited with code 0 after 20766.224074 seconds from start [12-Feb-2020 00:14:08] NOTICE: [pool www] child 8768 started [12-Feb-2020 00:16:29] NOTICE: [pool www] child 19999 exited with code 0 after 20869.088276 seconds from start [12-Feb-2020 00:16:29] NOTICE: [pool www] child 8890 started [12-Feb-2020 01:01:01] NOTICE: [pool www] child 20933 exited with code 0 after 22715.786627 seconds from start [12-Feb-2020 01:01:01] NOTICE: [pool www] child 11497 started [12-Feb-2020 03:10:38] NOTICE: [pool www] child 24868 exited with code 0 after 26547.790084 seconds from start [12-Feb-2020 03:10:38] NOTICE: [pool www] child 19375 started [12-Feb-2020 06:52:50] NOTICE: [pool www] child 5974 exited with code 0 after 26599.969300 seconds from start [12-Feb-2020 06:52:50] NOTICE: [pool www] child 32550 started [12-Feb-2020 07:34:55] NOTICE: [pool www] child 8678 exited with code 0 after 26543.871750 seconds from start [12-Feb-2020 07:34:55] NOTICE: [pool www] child 3028 started [12-Feb-2020 07:35:30] NOTICE: [pool www] child 8703 exited with code 0 after 26553.244558 seconds from start [12-Feb-2020 07:35:30] NOTICE: [pool www] child 3063 started [12-Feb-2020 07:36:38] NOTICE: [pool www] child 8740 exited with code 0 after 26577.416929 seconds from start [12-Feb-2020 07:36:38] NOTICE: [pool www] child 3124 started [12-Feb-2020 07:36:40] NOTICE: [pool www] child 8757 exited with code 0 after 26563.254436 seconds from start [12-Feb-2020 07:36:40] NOTICE: [pool www] child 3125 started [12-Feb-2020 07:36:55] NOTICE: [pool www] child 8767 exited with code 0 after 26567.924853 seconds from start [12-Feb-2020 07:36:55] NOTICE: [pool www] child 3137 started [12-Feb-2020 07:36:58] NOTICE: [pool www] child 8768 exited with code 0 after 26570.386211 seconds from start [12-Feb-2020 07:36:58] NOTICE: [pool www] child 3146 started [12-Feb-2020 07:39:10] NOTICE: [pool www] child 8890 exited with code 0 after 26560.913797 seconds from start [12-Feb-2020 07:39:10] NOTICE: [pool www] child 3396 started [12-Feb-2020 08:24:50] NOTICE: [pool www] child 11497 exited with code 0 after 26629.620151 seconds from start [12-Feb-2020 08:24:50] NOTICE: [pool www] child 6007 started [12-Feb-2020 08:47:22] NOTICE: Terminating ... [12-Feb-2020 08:47:22] NOTICE: exiting, bye-bye! [12-Feb-2020 08:47:46] NOTICE: fpm is running, pid 744 [12-Feb-2020 08:47:46] NOTICE: ready to handle connections [12-Feb-2020 08:47:46] NOTICE: systemd monitor interval set to 10000ms
In this log file there are no new lines referring to this last multicast task
-
@tec618 Ahhhh, now that I re-read this I remember that there was an issue with this in FOG 1.5.7. This should be fixed in the latest developer version. Give it a few more days till we push out the next release.
-
@Sebastian-Roth said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:
@tec618 Ahhhh, now that I re-read this I remember that there was an issue with this in FOG 1.5.7. This should be fixed in the latest developer version. Give it a few more days till we push out the next release.
Ok, perfect! We will wait to next versión. Thank you so much.
-
Hello again @Sebastian-Roth,
We have updated to version 1.5.8 with these commands:cd /root/fogproject git checkout master git pull cd bin ./installfog.sh
Using the stable version (not dev-branch). The error continues to occur when updating the database (in deployment tasks) and “Imaging log” report it does not correctly show the end time of this computer’s task (“-0001-11-30 00:00:00” and the duration “2020 years …”).
Any ideas to fix it?
-
Hello @Sebastian-Roth, yesterday we executed a multicast task of 30 computers and when the first one finished, it eliminated all the active tasks of the rest of the computers and this caused the error updating the database of all of them.
All computers failed at the end of the task (updating the database) except one (L3PC23)
I write this comment in case it gives you clues to correct this error. -
@tec618 Good you brought this up again. I had a look at the code and actually found what is causing this. Issue should be fixed with the latest commit (bf60833) on dev-branch.
cd /root/fogproject git checkout dev-branch git pull cd bin ./installfog.sh
If you want to give it a try… Or just take a look at the commit linked above and manually adjust the code. Whatever suites you best.
-
OK, I’ll try it soon
Thank you @Sebastian-Roth