Multicast image is pushed, but it does not "finish/finalize"

dmcadams

Hello, we first experienced this issue back in December during one of our store openings. Since it was a new install, we just dealt with it and thought this was some strange glitch. Since then, we’ve actually noticed the same thing at 2 other locations, both of which we made no changes, and didn’t previously see this issue. We really don’t multicast often, so that’s why its been several months.

Issue:
Multicast image is deployed and gets to 100% on all partitions. No matter how many systems we have going, only 1 PC will actually “finish” the image and reboot. The other PCs will all sit on the partclone screen, and we have to stop the FOG task, then manually reboot the PCs. These PCs that need to be manually rebooted do not get their PC name changed (it remains the same name as the source image PC). So whatever stage/phase where the PC name gets changed, is only running on 1 PC. All others will remain stuck until manual intervention.

As mentioned, we didn’t see this issue prior to our December opening. We opened a location in November and successfully multicasted over 120+ PCs in a few chunks. That same location, with no changes made to FOG, now experiences this issue.
We have since tried a few different kernel versions with no luck.

Has anyone experienced this? Ultimately, we do have a work around, so this isn’t a huge deal. It just gets frustrating as most of what we do is remote, so this issue does cause a delay in the process to wait for onsite assistance.

After Multicast screen

dmcadams

@dmcadams FYI, yes this is a Windows 11 image, but I can assure you that this has nothing to do with Windows 11. We first noticed this at 2 other locations using Windows 10. We just happen to be testing with 11 right now.

Sebastian Roth

@dmcadams I think what we see here is not the “end of partclone after the last partition” but more the “start of the last partition”. Looks like only one machine is able to pick up the multicast session of the last partition while all the others miss it and simply hang there till you shut them down manually.

Which version of FOG do you use?

dmcadams

@sebastian-roth
We are using 1.5.9
The kernel we use does vary between stores though. 5.10.50 is a popular one though, but we’ve used older and I’ve tried the 5.10.71 (nothing newer than that).

That’s very strange too. Especially considering that the issue seemingly appeared with no changes made to FOG and across multiple locations.

Sebastian Roth

@dmcadams I made some changes to the multicast management code but that was before 1.5.9 was released. So I am sure you have those fixes already.

I suggest you play with the timeout between partitions to see if that helps. FOG starts a fresh UDPCAST session for each partition and waits for 10 seconds for every PC to join the session in between each partition. To change that behaviour follow these steps:

Stop all multicast sessions
Edit /var/www/html/fog/lib/service/multicasttask.class.php line 659 (see code on github). and increase that number from 10 to 30. For more details on this read up this topic in the forums: https://forums.fogproject.org/topic/15507/multicast-deploying-shutdown-pc
Restart the service systemctl restart FOGMulticastManager
Schedule a fresh multicast session and see if the issue happens again.

dmcadams

@sebastian-roth
I’ve only been able to test this in one environment and only with 2 PCs. It did work in that particular test. In a couple of weeks, we are going to have an opportunity to hit a good number of PC’s in another environment where it previously failed, so we’ll have more data then. So far so good though! Thank you much!

Multicast image is pushed, but it does not "finish/finalize"

126

12.1k

17.3k

155.4k