Multitasking - Not All Clients Start Tasking
FOG Server Ver. - 1.5.4
FOG OS - Debian 9.4
HOST OS - WIN10
I have noticed on multiple occasions that in some of my labs the multicast sessions only complete on a select group of hosts and do not start on others.
What happens is:
ALL hosts in Group WOL
ALL hosts get to the PARTCLONE screen.
Partial hosts start tasking and complete tasking.
I noticed this in one lab where one host would start PARTCLONE tasking and all other hosts would not start. I had to unplug that host from the network in order to get the others to actually finish tasking.
In every situation, previous versions of FOG worked correctly Multicasting to these labs.
@george1421 - Any ideas on what might be going on here? Everything else has been stable since we talked last.
@joe-gill Well in a way I’m glad this issue was a FOG specific one, but rather an infrastructure. I can say multicasting is a bit of a fickle beast no matter what imaging technology you are using. When it works, its great.
It sounds like your network was configured for dense mode multicasting where it would just send the mutlicast stream to all ports even if there were no multicast subscribers.
Thank you for providing solid feedback and sticking with this issue until it is resolved.
To update this issue… I had a network engineer look at this issue. He determined that we had our multicast setting set to flood all ports. So far disabling this has fixed the majority of our multicasting issues.
One thing we did notice is that we had one old switch going to one of our labs that seems to fail on multicast and not unicast. Multicast works everywhere else with that image. We will be replacing that switch at some point but I believe that switch is the culprit. It’s an older 10/100 “dumb” HP switch (commercial grade).
So far I have not experienced any other issues with that version of FOG.
Thanks again for all the help!
The only switches we have on campus now are Meraki or dumb switches. Most things are on Meraki’s. The vendor turned “port flooding” off. I’m going to check but I believe I’ll still have the issue. I do not think that’s it. I’ll post back as soon as I know.
It is set like that for the default stack.
IGMP snooping is enabled.
For all switches (even meraki), on the vlan where the machines are multicasting?
IGMP snooping is enabled.
@george1421 Thanks! I’ll check it out.
That is excactly correct. I am reaching out to our vendor to find out if their are any known issues here. I will post back for others to know.
For the record we are using Meraki MS225-48FP and Meraki MS42P.
Thanks for helping be troubleshoot this @Tom-Elliott .
@tom-elliott Also be aware in another thread we had to make some changes to the www.conf file (adding the memory limit of 256MB). He also had 3 versions of php and php-fpm installed. When I left the session (a few days ago) he had no more than 6 active php-fpm worker threads for 24 hosts in a multicast. Memory and CPU usage looked good, but understand I was only focused on why php-fpm was behaving so badly on his system.
@Joe-Gill Check to see if igmp snooping is turned on, for your network switches. That will allow the switches to handle multicast traffic a bit more intelligently.
So chatting with @Joe-Gill We narrowed down what the problem.
First, the backstory.
So Multicast tasks would work for some machines, but not all. And, to add to that, it was always the same machines that failed to work. If the machines that failed were subsequently put into their own group, they would multicast without issue.
This lead me to ask if the machines that would fail were on a different switch from the rest of the machines, and indeed @Joe-Gill found out that this was the case. They are using Meraki switches and they replaced a few of them. Those that were replaced do not work fluidly with the older switches in place.
@Joe-Gill keep me truthful here, but this is the gist of this information.
That is correct.
Some machines start multitasking like normal and others sit on the Partclone screen. All of them make it to the Partclone screen at the same time.
I’m assuming, for the log you posted, there were only 27 hosts in that group that needed the tasking?
Sorry for reasking the questions you’ve already answered, just trying to understand the whole situation.
@joe-gill So you’re saying some machines work fine, not all of them though?
The last line repeats until it finished.
If you would like the entire log file I can post it.
Yes I am on the working branch. I’ll post the log file here shortly.
Are you on the working branch? Can you show us the
Well it may have been Tom. But I thought you told me that. Ha!
Our machines are all on the same subnet.
had me set the Multicast rendezvous address to our FOG server IP previously
Nope not me, that bit of hocus-pocus sounds something like what Tom would say.
Is your fog server and multicasting clients on the same subnet?