• Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login
  • Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login

Multicast problem fog 0.32 on centOS 5.6

Scheduled Pinned Locked Moved
FOG Problems
2
4
2.2k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    Emil
    last edited by Apr 30, 2012, 3:33 PM

    Hello.

    I’m having a problem with multicasting on fog 0.32, CentOS 5.6 (final) and windows 7 machines (Dell Optiplex gx620).

    It’s a rather strange problem where I’m trying to multicast 145 machines and it’s all fine with the first small partition, it multicasts that one just fine, but when it’s about to start the second partition it just sits and wait on the “please wait” screen.

    [B]Now the log in /opt/fog/log/ show’s the following:[/B]
    [SIZE=2]([COLOR=#ff0000]This is the multicast.log.udpcast.28 log and not the one called only “multicast.log”[/COLOR]) [/SIZE]

    Udp-sender 2007-12-28
    Using mcast address 236.21.238.31
    UDP sender for (stdin) at 172.21.238.31 on eth0
    Broadcasting control to 224.0.0.1
    New connection from 172.21.238.123 (#0) 00000009
    New connection from 172.21.238.156 (#1) 00000009
    New connection from 172.21.238.152 (#2) 00000009
    New connection from 172.21.238.151 (#3) 00000009
    New connection from 172.21.238.136 (#4) 00000009 etc etc…

    [B]The it show’s:[/B]

    Starting transfer: 00000009
    bytes= 97 552 re-xmits=0000001 ( 1.4%) slice=0066 73 709 551 615 - 132
    bytes= 193 648 re-xmits=0000001 ( 0.7%) slice=0066 73 709 551 615 - 131
    bytes= 289 744 re-xmits=0000001 ( 0.5%) slice=0066 73 709 551 615 - 131
    bytes= 385 840 re-xmits=0000001 ( 0.3%) slice=0066 73 709 551 615 - 131
    bytes= 481 936 re-xmits=0000001 ( 0.3%) slice=0066 73 709 551 615 - 131
    bytes= 578 032 re-xmits=0000001 ( 0.2%) slice=0066 73 709 551 615 - 131
    bytes= 674 128 re-xmits=0000001 ( 0.2%) slice=0066 73 709 551 615 - 130 etc etc…

    [B]And then the interesting bits happen:[/B]

    bytes= 25 370 800 re-xmits=0000001 ( 0.0%) slice=0066 73 709 551 615 - 130
    bytes= 25 375 168 re-xmits=0000001 ( 0.0%) slice=0066 73 709 551 615 - 133
    bytes= 25 375 612 re-xmits=0000001 ( 0.0%) slice=0066 73 709 551 615 - 132
    Timeout notAnswered=[2,4,7,8,10,13,14,15,18,19,20,21,63,106,110,111,114,115,118,119,120,121,122,124,127,128,129,131,133,134,135,136,138,139,140,142,143,145] nrAns=108 nrRead=108 nrPart=146 avg=3661
    Disconnecting #24 (172.21.239.10)
    Disconnecting #89 (172.21.238.240)
    Disconnecting #78 (172.21.238.143)
    Disconnecting #28 (172.21.238.100)
    Disconnecting #23 (172.21.238.227)
    Disconnecting #31 (172.21.238.132)
    Disconnecting #25 (172.21.238.111)
    Disconnecting #22 (172.21.238.226) etc etc…

    [B]And then follows:[/B]

    Disconnecting #126 (172.21.238.174)
    Disconnecting #130 (172.21.238.119)
    Disconnecting #131 (172.21.238.194)
    Bad command 0300
    Bad command 0300
    Bad command 0300
    Bad command 0300
    Bad command 0300
    Bad command 0300 etc etc…

    [B]After that I’m getting:[/B]

    Dropping client #2 because of timeout
    Disconnecting #2 (172.21.238.152)
    Dropping client #4 because of timeout
    Disconnecting #4 (172.21.238.136)
    Dropping client #7 because of timeout
    Disconnecting #7 (172.21.238.149)
    Dropping client #8 because of timeout
    Disconnecting #8 (172.21.238.138)
    Dropping client #10 because of timeout
    Disconnecting #10 (172.21.238.205)
    Dropping client #13 because of timeout
    Disconnecting #13 (172.21.238.104) etc etc…

    [B]Almost at the end it says:[/B]

    Dropping client #142 because of timeout
    Disconnecting #142 (172.21.238.120)
    Dropping client #145 because of timeout
    Disconnecting #145 (172.21.238.173)
    Transfer complete.^G
    Disconnecting #0 (172.21.238.123)
    Disconnecting #1 (172.21.238.156)
    Disconnecting #3 (172.21.238.151)
    Disconnecting #5 (172.21.238.140)
    Disconnecting #6 (172.21.238.129)
    Disconnecting #9 (172.21.238.130)
    Disconnecting #12 (172.21.238.207)
    Disconnecting #16 (172.21.238.211)
    Disconnecting #27 (172.21.239.8)

    [B]And then finally:[/B]

    Udp-sender 2007-12-28
    Using mcast address 236.21.238.31
    UDP sender for (stdin) at 172.21.238.31 on eth0
    Broadcasting control to 224.0.0.1

    Any idea’s what can be wrong? It complains about timeout altho the machines all started up within 20 minutes from the first to the last and looking at [URL=‘http://fogproject.org/forum/threads/multicast-timeout.529/’]this[/URL] post I’ve also checked the Config.php --> UPDSENDER_MAXWAIT setting and I see that it’s on 0 (so I guess that means it will wait forever and not timeout anything).

    1 Reply Last reply Reply Quote 0
    • B
      BryceZ
      last edited by May 2, 2012, 3:38 PM

      What’s your CPU usage look like? With unicasting FOG pushes a compressed image file, which the client uncompresses as it receives it; with multicast FOG uncompresses the image files and then pushes them out. So it could be an issue of failing to uncompress that second partition. Have you been able to unicast this image? Or maybe multicast to a smaller number of hosts?

      1 Reply Last reply Reply Quote 0
      • E
        Emil
        last edited by May 4, 2012, 7:56 AM

        I’m not able to check the cpu usage at this moment, but I did try to multicast the image first to 2 computers and that worked fine.
        I rejoyced and tried them all again but the issue came back.

        I then tried to send it to 25 computers and that worked but it wanted to use 6-7 hours on it and the image would send out around 100-200MB and then freeze for around a minute and then send more… The computer I house fog on is rather old and if it’s a cpu heavy job for the fog server to multicast then your theory might indeed fit what I saw.

        For the fun of it I’m going to get some new fresh hardware and try it on that to see what results that yields.

        Unicasting the image went fine btw 😉

        1 Reply Last reply Reply Quote 0
        • B
          BryceZ
          last edited by May 4, 2012, 1:29 PM

          Another thing that slows down multicasts is actually the performance of the hosts themselves. During a mutlicast the FOG server sends out the image in chunks and waits to get confirmation from each host that they’re ready for the next chunk, so if one host has a bad hard drive or any other issue with receiving and writing the chunk to disk it’ll slow down the entire session.

          1 Reply Last reply Reply Quote 0
          • 1 / 1
          1 / 1
          • First post
            3/4
            Last post

          216

          Online

          12.0k

          Users

          17.3k

          Topics

          155.2k

          Posts
          Copyright © 2012-2024 FOG Project