(1.1.1) Multicast Hang - Starting to restore image (-)



  • Hello -

    I have a fresh install of 1.1.1 and I’m unable to get multicast working properly (still).

    I’m getting the hang like it’s waiting for them to all join, even though they’ve all joined. See attached.

    Yesterday, same environment, I was able to get 2 going after lurking around the forums and seeing that deleting the group and re-adding the group would sometime work. I came in today, and I had tried that, and I cannot get any started. Unicast works great, reimages a 25 GB Win 7 image in ~6 minutes.

    I’m not sure what else to try. I tried the latest revision, 1868 (as of yesterday) and could not get the schema to update so I reverted back to 1.1.1.

    Any way to get around it would be great - thanks

    [url="/_imported_xf_attachments/1/1015_photo.JPG?:"]photo.JPG[/url]


  • Senior Developer

    [quote=“RLane, post: 32372, member: 23505”]After hours of troubleshooting, I really feel like a moron. We established the problem being a core switch/VLAN issue. Existing VLANs when I inherited the environment had ‘ip multicast-routing’ enabled - right… but when I added new VLANs, consequently the “Server” VLAN, I didn’t re-add ‘ip pim sparse-dense-mode’ and ‘ip cgmp’ commands to get the multicasts on the same broadcast. Disappointing I spent a good week on this but didn’t check it. Thanks Tom for the help – obviously not a FOG issue. Worked fantastic afterwards.[/quote]

    I tried saying it all, and I’m sorry I couldn’t help any further. I tried all of my knowledge and it felt/sounded like environment. I appreciate the feedback though thank you.



  • After hours of troubleshooting, I really feel like a moron. We established the problem being a core switch/VLAN issue. Existing VLANs when I inherited the environment had ‘ip multicast-routing’ enabled - right… but when I added new VLANs, consequently the “Server” VLAN, I didn’t re-add ‘ip pim sparse-dense-mode’ and ‘ip cgmp’ commands to get the multicasts on the same broadcast. Disappointing I spent a good week on this but didn’t check it. Thanks Tom for the help – obviously not a FOG issue. Worked fantastic afterwards.



  • I have upgraded to FOG 1.1.2 and also updated the kernel again. I’m having the same results in the building where multicast works on laptops. Would you happen to have any suggestions? Also, if you would like to look at the logging, please let me know specifically what log you would like and I’ll post it. Thank you



  • We haven’t had to try any other desktops, yet. I can give it a shot after the long weekend. I did VPN in though to check the switch configurations. The only difference between the working environment in building A verse building B is that I have “[B]mls qos trust dscp” [/B]set on the ports. I highly doubt this has anything to do with it? Also - if you think of something, let me know and I can certainly try it Monday when I’m back to work. I could swing by there this weekend if needed. I’m also using your latest kernel for what it’s worth.[B][/B]


  • Senior Developer

    I really wish I could remote in and try help trouble shooting. And it seems to only affect these machines now right?



  • Yep – I first made a multicast group of 5 machines. I rebooted those five, network booted them and they went right into the grey screen. I waited a good 10 minutes and they just sat there. Online, it says they were “in progress” but that’s only because it had technically ‘started’ the process (I’m assuming).

    From there, I went ahead and deleted the group. I remade a new group and added just two hosts. Both different hosts, same device types. Again, booted in just fine and sat on the grey screen. (This is the section of the log I C+P)

    I ended up having to do a group unicast which was significantly slower obviously for about 80 of them, so I know it worked that way. Multicast would just be nice to speed the process up to avoid congestion. Like you said, really odd. I have no clue why it would work on three laptop models but not a desktop. I figured if anything with all of the eth0 and eth1 wireless/NIC problems we had previously it would be the other way around. I appreciate the effort.

    • Restarted multicast services
    • Verified UDP status on server, everything looks fine. Permission is fine.
    • Rebooted.

  • Senior Developer

    And unicast works on these desktops? I don’t know why they’re not getting the data you need. Also, as I stated, it says you have 2 hosts, are both of the hosts on and hanging or are you testing with one? I know these sound like silly questions, but I’m not there so I hope you understand.



  • Runs great in unicast – I can actually multicast in a different building (connected to the same core) – with laptops. It worked well with 3 different laptop models. HP ProBook 4520s, 30s and 40s and multicast great. It’s just these desktops that hang. Network isn’t blocking any of the UDP packets.


  • Senior Developer

    Is your network, by chance, blocking UDP packets? Do these run under unicast?



  • The machines are getting the tasks, but once to the multicast grey screen (same error as my OP), they hang. I did verify that they do have the task and I see that it’s running on the server itself. I tried with a single multicast host, 2, and upwards of 8. All booted into the grey screen and hung.


  • Senior Developer

    To further trouble shoot, if you run [code]ps -ef | grep udp-sender[/code] you should see the task running on the server. You can also verify the tasking for those systems are correct by going to : [url]http://IPOFFOGSERVER/fog/service/ipxe/boot.php?mac=XX:XX:XX:XX:XX:XX[/url]

    Change the IPOFFOGSERVER to the ip of the fog server. Change the XX:XX:XX:XX:XX:XX portion to the mac address of one of the clients you’ve tasked for this.


  • Senior Developer

    This tells me you have only two hosts in the job: --min-receivers 2

    Have those two systems been turned on?

    The “Library is already running PID 7625” is telling you that the task is running. It’s simply waiting for the hosts that are attached to it.



  • Any idea about this? I have 1300 HP PC’s to reimage in the next month. :(



  • Question – I was able to multicast my laptops. Now, I have HP DC7800 SFFs that will not multicast.

    I tried to make a new group, restart the service. Still running 1.1.1.

    Log below.

    [07-02-14 2:36:58 pm] | Task (32) Library is new!
    [07-02-14 2:36:58 pm] | Task (32) Library image file found.
    [07-02-14 2:36:58 pm] | Task (32) Library client(s) found.
    [07-02-14 2:36:58 pm] | Task (32) Library sending on base port: 55758
    [07-02-14 2:36:58 pm] CMD: cat “/images/WIN7HSLIBRARY/rec.
    img.000”|/usr/local/sbin/udp-sender --min-receivers 2 --portbase 55758 --interface eth0 --full-duplex --ttl 32 --nokbd;cat “/images/WIN7HSLIBRARY/sys.img.000”|/usr/local/sbin/udp-sender --min-receivers 2 --portbase 55758 --interface eth0 --full-duplex --ttl 32 --nokbd;
    [07-02-14 2:36:58 pm] | Task (32) Library has started.
    [07-02-14 2:37:08 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:37:18 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:37:29 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:37:39 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:37:49 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:37:59 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:38:09 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:38:19 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:38:29 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:38:39 pm] | Task (32) Library is already running PID 7625
    [07-02-14 2:38:49 pm] | Task (32) Library is already running PID 7625



  • Thanks Tom, changing the image to 8 works. Much appreciate all of your hard work.


  • Senior Developer

    I know what this issue is, but first question?

    Is the image dropdown set to Windows 8.1?

    My guess is that it is, can you try setting it to just plain jane Windows 8 and reattempt?



  • I have the same issue - clean centos 6.5 install, fog 1.1.2.

    The fog services were not running however I have started them and added to services. I did not see them listed in rc.local or as a service.

    chkconfig --list
    FOGImageReplicator0:off1:off2:on3:on4:on5:on6:off
    FOGMulticastManager0:off1:off2:on3:on4:on5:on6:off

    FOGScheduler 0:off1:off2:on3:on4:on5:on6:off

    multicast.log
    06-26-14 1:53:40 pm] * Checking for new tasks every 10 seconds.
    [06-26-14 1:53:40 pm] * Starting service loop.
    [06-26-14 1:53:40 pm] | Task (30) is new!
    [06-26-14 1:53:40 pm] | Task (30) /images/WIN8 image file found.
    [06-26-14 1:53:40 pm] | Task (30) 1 client(s) found.
    [06-26-14 1:53:40 pm] | Task (30) sending on base port: 51358
    [06-26-14 1:53:40 pm] CMD:
    [06-26-14 1:53:40 pm] | Task (30) has started.
    [06-26-14 1:54:40 pm] | Task (30) is no longer running.

    [06-26-14 1:55:40 pm] * No tasks found!

    This repeats anytime we attempt to run multicast. udp-sender is not running or started.

    We can start the client in debug mode and use udp-receiver along with udp-sender on the server to get it going. It appears that it is bombing out on creating the cmd to start udp-sender but I have not looked into the multicast php files to see if I can spot the problem.

    any suggestions on where to start looking?

    Thank you.


  • Senior Developer

    Is the udp-sender command running on the FOG Server?
    [code]ps -ef | grep udp-send[/code]



  • Currently having the same issue, Fresh 1.1.1 install, Ubuntu server 14.04. I’ve reset FOGMulticastManager and the clients. Here is my multicast log.
    Thanks.

    [url="/_imported_xf_attachments/1/1060_Multicast.txt?:"]Multicast.txt[/url]


Log in to reply
 

362
Online

39.3k
Users

11.0k
Topics

104.4k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.