Multicast De-Sync When Resizing Disks
-
Hello everyone,
I am currently facing a challenge when multicasting an older desktop (Lenovo M800). When the desktop is imaged using a deploy task it will complete as expected (just take a fair amount of time). When it is grouped with more modern computers it will all start out fine, it will make it through the first phase of part clone but as soon as it gets to the resizing phase, the modern computers will quickly complete this task, while the older desktop takes a minute.
As soon as the modern computers resize their disks, wait at the last partclone screen for ~30 seconds, they will then complete and go to postdownload scripts. The desktop however hangs on resizing for around a minute, and by the time it makes it to the last partclone screen it will wait indefinitely as I believe it is left behind.
From the log perspective (/opt/fog/log/multicast.log) I see that as soon as the modern computer finishes it begins stating “$Groupname is no longer running”. On the contrary the multicast task will continue to show in the GUI for the stuck host.
I have adjusted UDPCAST MAXWAIT setting but it seems this just affects the first screen. I have also tried rebooting the stuck host, and it will return to the multicast screen and continue to wait indefinitely.
FOG Version: 1.5.10.1726
I am looking for input on how others have dealt with this before.
-
@christop The UDPCAST_MAXWAIT should be the time (in minutes) for all udpcast started jobs.
If your setting isn’t working (I don’t know why it wouldn’t be, but there could have been a change that broke this unexpectedly) it would default back to 60 seconds in total.
The value is stored in minutes on the UI side of things and then converted to seconds when and where required.
I see a potential typo in /var/www/fog/lib/service/multicasttask.class.php at line 660 though, that seems might have been there for a while.
If you can change it to multiply by 60. Otherwise you were just multiplying by six:
So 10 * 6 = 60 seconds (one minute). 15 * 6 = 90 (1 minute 30 seconds)
YOu can fix this by updating your UI value from 10 to the period you’re expecting. Or you can edit the file.
i’m pushing what i hope will fix the issue in dev-branch if you’re wanting to go by a method that’s more “developed already” kind of thing.
-
@Tom-Elliott thank you for the detailed response and quick change.
Two things to note:
- Multiplying the value of UDPCAST_MAXWAIT by 10 worked in v.1726 resolving the issue.
- I performed the update to .1727 and attempted a multicast but it was stuck on the initial multicast screen advising there are no multicast task jobs created, or no task numbers assigned. I am curious if anyone’s multicast is currently working in the updated dev branch.
Let me know if you need me to perform further tests upon review.
-
Just to provide more details. It seems that the multicast service is not functioning post update.
Here is a sample of the log file:
[10-23-25 10:34:19 am] * Starting MulticastManager Service [10-23-25 10:34:19 am] * Checking for new items every 10 seconds [10-23-25 10:34:19 am] * Starting service loop
The service is starting but seems to be stuck starting the service loop and never proceeds. I restarted the service, but the issue persists.
-
@christop I don’t know what OS version you’re using:
Most likely there’s an error I’m not seeing.
I have finally run into the odd issue of udp-sender not being where it used to be expected (‘/usr/sbin/udp-sender’ vs ‘/usr/local/sbin/udp-sender’) and just pushed a fix for it, but the fact that yours isn’t saying the file is missing but just dead stops is odd to me. Usually that’s indicative of a typo or syntax error which is why I’m going to ask for the logs.
See my Signature line here and it should point you to one of the varients of error files.
If redhat based, you’ll likely see the error in the www-error.log from php-fpm
If ubuntu based, you’ll like see the error in the apache2 error.log file.
If you can give us that we might help direct.
I did push a fix for a completely unrelated issue so maybe that helps you too? I doubt it, but you never know.
-
@christop Also, another output for informatoin would be:
sudo systemctl -l status FOGMulticastManager.servcie