Broke Multicast, not sure how.



  • Server
    • FOG Version: 1.3.5-RC-11
    • OS: Ubuntu 14.04
    Description

    I reconfigured the FOG servers IP address and NIC and moved it’s network connection and broke multicast.

    I changed the interface setting in FOG_UDPCAST_INTERFACE, FOG_NFS_ETH_MONITOR and in the Storage Manager.
    I’ve also verified all the settings here show the correct ip:
    https://wiki.fogproject.org/wiki/index.php/Change_FOG_Server_IP_Address

    Unicast works just fine, but Multicast fails. It starts partclone and just hangs there, not doing anything. Going through the steps here:
    https://wiki.fogproject.org/wiki/index.php?title=Troubleshooting_a_multicast

    I try the first test:
    sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint

    Then “udp-receiver” on the client. Doesn’t work, the server doesn’t see the client connection attempt. It doesn’t work even if the client is directly connected to the server by a crossover cable.

    udp-receiver --mcast-rdv-address <fog server ip address>

    If I specify the mcast-rdv-address the test works. This works with the test machine directly connected to the fog server by crossover cable or with the test machine connected to the LAN on all my vlans. So it seems my network connection is fine.

    I’m not finding any settings are incorrect, any pointers?


  • Developer

    @rbaldwin Sounds like you found and fixed the issue. Yes PIM (modes) is definitely a good keyword. Marking this solved now. Hope that others might stumble upon this and find it helpful as well.



  • @Tom-Elliott

    You are correct, I found the issue imaging across vlans. Stupid oversight on my part, not all my interfaces had the same pim mode. So for anyone else having issues imaging across vlans pay special attention that your pim modes are the same on the interfaces in question.

    https://wiki.fogproject.org/wiki/index.php?title=Cisco_Multi_Cast


  • Senior Developer

    @rbaldwin If the systems are separated, then I would suspect you’re right.

    Essentially, it seems as if multicast traffic is not traversing the network.

    mcast-rdv-address might be able to help fix the issue, but it’s not intended to be used in the manner it was used in the past. Essentially, mcast-rdv-address is the “rendevous” address. This is not a “pointer” address to direct hosts to look at the fogserver, rather it’s intended to be a “both sides send/receive” to the same point on a network.

    This setting is now adjustable, as needed, right from the gui.



  • I’ve wiped out my 1.3.5-RC-11 installation and have installed the latest stable release 1.3.4 and it experiences the same issue. I mentioned I changed the IP address, I was using fog on an isolated vlan, just fog server, a 24port Netgear 10/100/1000 switch and my test machines. The new connection the fog server is on one VLAN and the machines are on another. I’m beginning to think this is the issue.

    I’m configured like this:
    https://wiki.fogproject.org/wiki/index.php?title=Cisco_Multi_Cast



  • Here is a fresh attempt:

    [03-06-17 7:30:08 pm] | Task (30) Multi-Cast Task is new!
    [03-06-17 7:30:08 pm] | Task (30) Multi-Cast Task has been cleaned.
    [03-06-17 7:30:08 pm] | Task (30) /images/base_W5xx_Win764_AUG2016 image file found.
    [03-06-17 7:30:08 pm] | Task (30) Multi-Cast Task 1 client found.
    [03-06-17 7:30:08 pm] | Task (30) Multi-Cast Task sending on base port: 63158.
    [03-06-17 7:30:08 pm] | Command: /usr/local/sbin/udp-sender --interface eth1 --min-receivers 1 --max-wait 600 --portbase 63158 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/base_W5xx_Win764_AUG2016/d1p1.img;
    [03-06-17 7:30:09 pm] | Task (30) Multi-Cast Task has started!
    [03-06-17 7:30:19 pm] | Task (30) Multi-Cast Task is already running with pid: 20332.
    [03-06-17 7:30:29 pm] | Task (30) Multi-Cast Task is already running with pid: 20332.
    [03-06-17 7:30:39 pm] | Task (30) Multi-Cast Task is already running with pid: 20332.

    And it’s running:
    ps -ef | grep 20332
    root 20332 11450 0 13:30 ? 00:00:00 sh -c /usr/local/sbin/udp-sender --interface eth1 --min-receivers 1 --max-wait 600 --portbase 63158 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/base_W5xx_Win764_AUG2016/d1p1.img;
    root 20333 20332 0 13:30 ? 00:00:00 /usr/local/sbin/udp-sender --interface eth1 --min-receivers 1 --max-wait 600 --portbase 63158 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/base_W5xx_Win764_AUG2016/d1p1.img
    root 20459 17105 0 13:31 pts/0 00:00:00 grep --color=auto 20332

    When I boot the host, it just says

    Partclone v0.2.89 http://partclone.org
    Starting to restore image (-) to device (/dev/md126p1)

    The task in the Active Task has a spinner that says “In Progress” and the Active Multicast task says “Queued” and it never starts actually imaging.



  • FOG Multicast log says:
    [03-06-17 5:35:48 pm] | Task (29) Multi-Cast Task is new!
    [03-06-17 5:35:48 pm] | Task (29) Multi-Cast Task has been cleaned.
    [03-06-17 5:35:48 pm] | Task (29) /images/base_W5xx_Win764_AUG2016 image file found.
    [03-06-17 5:35:48 pm] | Task (29) Multi-Cast Task 1 client found.
    [03-06-17 5:35:48 pm] | Task (29) Multi-Cast Task sending on base port: 58318.
    [03-06-17 5:35:48 pm] | Command: /usr/local/sbin/udp-sender --interface eth1 --min-receivers 1 --max-wait 600 --portbase 58318 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/base_W5xx_Win764_
    AUG2016/d1p1.img;
    [03-06-17 5:35:48 pm] | Task (29) Multi-Cast Task has started!
    [03-06-17 5:35:58 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:36:08 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:36:18 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:36:28 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:36:38 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:36:48 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:36:58 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:37:08 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:37:18 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:37:28 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:37:38 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:37:48 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:37:58 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:38:08 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:38:18 pm] | Task (29) Multi-Cast Task is already running with pid: 12190.
    [03-06-17 5:38:28 pm] | Cleaning 1 task as they have been removed
    [03-06-17 5:38:28 pm] | Task (29) Multi-Cast Task is being cleaned.
    [03-06-17 5:38:28 pm] | Task (29) Multi-Cast Task has been cancelled.

    Eth1 is the correct interface for the connection.
    And here I cancel it because it’s not started… The Image was made with FOG1.3.1 if it matters…


  • Senior Developer

    What’s the FOG Multicast log show?

    FOG Configuration Page->Log Viewer->Multicast


Log in to reply
 

496
Online

39009
Users

10724
Topics

101882
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.