(1.1.1) Multicast Hang - Starting to restore image (-)
-
Question – I was able to multicast my laptops. Now, I have HP DC7800 SFFs that will not multicast.
I tried to make a new group, restart the service. Still running 1.1.1.
Log below.
[07-02-14 2:36:58 pm] | Task (32) Library is new!
[07-02-14 2:36:58 pm] | Task (32) Library image file found.
[07-02-14 2:36:58 pm] | Task (32) Library client(s) found.
[07-02-14 2:36:58 pm] | Task (32) Library sending on base port: 55758
[07-02-14 2:36:58 pm] CMD: cat “/images/WIN7HSLIBRARY/rec.
img.000”|/usr/local/sbin/udp-sender --min-receivers 2 --portbase 55758 --interface eth0 --full-duplex --ttl 32 --nokbd;cat “/images/WIN7HSLIBRARY/sys.img.000”|/usr/local/sbin/udp-sender --min-receivers 2 --portbase 55758 --interface eth0 --full-duplex --ttl 32 --nokbd;
[07-02-14 2:36:58 pm] | Task (32) Library has started.
[07-02-14 2:37:08 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:37:18 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:37:29 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:37:39 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:37:49 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:37:59 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:38:09 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:38:19 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:38:29 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:38:39 pm] | Task (32) Library is already running PID 7625
[07-02-14 2:38:49 pm] | Task (32) Library is already running PID 7625 -
Any idea about this? I have 1300 HP PC’s to reimage in the next month.
-
This tells me you have only two hosts in the job: --min-receivers 2
Have those two systems been turned on?
The “Library is already running PID 7625” is telling you that the task is running. It’s simply waiting for the hosts that are attached to it.
-
To further trouble shoot, if you run [code]ps -ef | grep udp-sender[/code] you should see the task running on the server. You can also verify the tasking for those systems are correct by going to : [url]http://IPOFFOGSERVER/fog/service/ipxe/boot.php?mac=XX:XX:XX:XX:XX:XX[/url]
Change the IPOFFOGSERVER to the ip of the fog server. Change the XX:XX:XX:XX:XX:XX portion to the mac address of one of the clients you’ve tasked for this.
-
The machines are getting the tasks, but once to the multicast grey screen (same error as my OP), they hang. I did verify that they do have the task and I see that it’s running on the server itself. I tried with a single multicast host, 2, and upwards of 8. All booted into the grey screen and hung.
-
Is your network, by chance, blocking UDP packets? Do these run under unicast?
-
Runs great in unicast – I can actually multicast in a different building (connected to the same core) – with laptops. It worked well with 3 different laptop models. HP ProBook 4520s, 30s and 40s and multicast great. It’s just these desktops that hang. Network isn’t blocking any of the UDP packets.
-
And unicast works on these desktops? I don’t know why they’re not getting the data you need. Also, as I stated, it says you have 2 hosts, are both of the hosts on and hanging or are you testing with one? I know these sound like silly questions, but I’m not there so I hope you understand.
-
Yep – I first made a multicast group of 5 machines. I rebooted those five, network booted them and they went right into the grey screen. I waited a good 10 minutes and they just sat there. Online, it says they were “in progress” but that’s only because it had technically ‘started’ the process (I’m assuming).
From there, I went ahead and deleted the group. I remade a new group and added just two hosts. Both different hosts, same device types. Again, booted in just fine and sat on the grey screen. (This is the section of the log I C+P)
I ended up having to do a group unicast which was significantly slower obviously for about 80 of them, so I know it worked that way. Multicast would just be nice to speed the process up to avoid congestion. Like you said, really odd. I have no clue why it would work on three laptop models but not a desktop. I figured if anything with all of the eth0 and eth1 wireless/NIC problems we had previously it would be the other way around. I appreciate the effort.
- Restarted multicast services
- Verified UDP status on server, everything looks fine. Permission is fine.
- Rebooted.
-
I really wish I could remote in and try help trouble shooting. And it seems to only affect these machines now right?
-
We haven’t had to try any other desktops, yet. I can give it a shot after the long weekend. I did VPN in though to check the switch configurations. The only difference between the working environment in building A verse building B is that I have “[B]mls qos trust dscp” [/B]set on the ports. I highly doubt this has anything to do with it? Also - if you think of something, let me know and I can certainly try it Monday when I’m back to work. I could swing by there this weekend if needed. I’m also using your latest kernel for what it’s worth.[B][/B]
-
I have upgraded to FOG 1.1.2 and also updated the kernel again. I’m having the same results in the building where multicast works on laptops. Would you happen to have any suggestions? Also, if you would like to look at the logging, please let me know specifically what log you would like and I’ll post it. Thank you
-
After hours of troubleshooting, I really feel like a moron. We established the problem being a core switch/VLAN issue. Existing VLANs when I inherited the environment had ‘ip multicast-routing’ enabled - right… but when I added new VLANs, consequently the “Server” VLAN, I didn’t re-add ‘ip pim sparse-dense-mode’ and ‘ip cgmp’ commands to get the multicasts on the same broadcast. Disappointing I spent a good week on this but didn’t check it. Thanks Tom for the help – obviously not a FOG issue. Worked fantastic afterwards.
-
[quote=“RLane, post: 32372, member: 23505”]After hours of troubleshooting, I really feel like a moron. We established the problem being a core switch/VLAN issue. Existing VLANs when I inherited the environment had ‘ip multicast-routing’ enabled - right… but when I added new VLANs, consequently the “Server” VLAN, I didn’t re-add ‘ip pim sparse-dense-mode’ and ‘ip cgmp’ commands to get the multicasts on the same broadcast. Disappointing I spent a good week on this but didn’t check it. Thanks Tom for the help – obviously not a FOG issue. Worked fantastic afterwards.[/quote]
I tried saying it all, and I’m sorry I couldn’t help any further. I tried all of my knowledge and it felt/sounded like environment. I appreciate the feedback though thank you.