• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Multicast problem fog 0.32 on centOS 5.6

    Scheduled Pinned Locked Moved
    FOG Problems
    2
    4
    2.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E
      Emil
      last edited by

      Hello.

      I’m having a problem with multicasting on fog 0.32, CentOS 5.6 (final) and windows 7 machines (Dell Optiplex gx620).

      It’s a rather strange problem where I’m trying to multicast 145 machines and it’s all fine with the first small partition, it multicasts that one just fine, but when it’s about to start the second partition it just sits and wait on the “please wait” screen.

      [B]Now the log in /opt/fog/log/ show’s the following:[/B]
      [SIZE=2]([COLOR=#ff0000]This is the multicast.log.udpcast.28 log and not the one called only “multicast.log”[/COLOR]) [/SIZE]

      Udp-sender 2007-12-28
      Using mcast address 236.21.238.31
      UDP sender for (stdin) at 172.21.238.31 on eth0
      Broadcasting control to 224.0.0.1
      New connection from 172.21.238.123 (#0) 00000009
      New connection from 172.21.238.156 (#1) 00000009
      New connection from 172.21.238.152 (#2) 00000009
      New connection from 172.21.238.151 (#3) 00000009
      New connection from 172.21.238.136 (#4) 00000009 etc etc…

      [B]The it show’s:[/B]

      Starting transfer: 00000009
      bytes= 97 552 re-xmits=0000001 ( 1.4%) slice=0066 73 709 551 615 - 132
      bytes= 193 648 re-xmits=0000001 ( 0.7%) slice=0066 73 709 551 615 - 131
      bytes= 289 744 re-xmits=0000001 ( 0.5%) slice=0066 73 709 551 615 - 131
      bytes= 385 840 re-xmits=0000001 ( 0.3%) slice=0066 73 709 551 615 - 131
      bytes= 481 936 re-xmits=0000001 ( 0.3%) slice=0066 73 709 551 615 - 131
      bytes= 578 032 re-xmits=0000001 ( 0.2%) slice=0066 73 709 551 615 - 131
      bytes= 674 128 re-xmits=0000001 ( 0.2%) slice=0066 73 709 551 615 - 130 etc etc…

      [B]And then the interesting bits happen:[/B]

      bytes= 25 370 800 re-xmits=0000001 ( 0.0%) slice=0066 73 709 551 615 - 130
      bytes= 25 375 168 re-xmits=0000001 ( 0.0%) slice=0066 73 709 551 615 - 133
      bytes= 25 375 612 re-xmits=0000001 ( 0.0%) slice=0066 73 709 551 615 - 132
      Timeout notAnswered=[2,4,7,8,10,13,14,15,18,19,20,21,63,106,110,111,114,115,118,119,120,121,122,124,127,128,129,131,133,134,135,136,138,139,140,142,143,145] nrAns=108 nrRead=108 nrPart=146 avg=3661
      Disconnecting #24 (172.21.239.10)
      Disconnecting #89 (172.21.238.240)
      Disconnecting #78 (172.21.238.143)
      Disconnecting #28 (172.21.238.100)
      Disconnecting #23 (172.21.238.227)
      Disconnecting #31 (172.21.238.132)
      Disconnecting #25 (172.21.238.111)
      Disconnecting #22 (172.21.238.226) etc etc…

      [B]And then follows:[/B]

      Disconnecting #126 (172.21.238.174)
      Disconnecting #130 (172.21.238.119)
      Disconnecting #131 (172.21.238.194)
      Bad command 0300
      Bad command 0300
      Bad command 0300
      Bad command 0300
      Bad command 0300
      Bad command 0300 etc etc…

      [B]After that I’m getting:[/B]

      Dropping client #2 because of timeout
      Disconnecting #2 (172.21.238.152)
      Dropping client #4 because of timeout
      Disconnecting #4 (172.21.238.136)
      Dropping client #7 because of timeout
      Disconnecting #7 (172.21.238.149)
      Dropping client #8 because of timeout
      Disconnecting #8 (172.21.238.138)
      Dropping client #10 because of timeout
      Disconnecting #10 (172.21.238.205)
      Dropping client #13 because of timeout
      Disconnecting #13 (172.21.238.104) etc etc…

      [B]Almost at the end it says:[/B]

      Dropping client #142 because of timeout
      Disconnecting #142 (172.21.238.120)
      Dropping client #145 because of timeout
      Disconnecting #145 (172.21.238.173)
      Transfer complete.^G
      Disconnecting #0 (172.21.238.123)
      Disconnecting #1 (172.21.238.156)
      Disconnecting #3 (172.21.238.151)
      Disconnecting #5 (172.21.238.140)
      Disconnecting #6 (172.21.238.129)
      Disconnecting #9 (172.21.238.130)
      Disconnecting #12 (172.21.238.207)
      Disconnecting #16 (172.21.238.211)
      Disconnecting #27 (172.21.239.8)

      [B]And then finally:[/B]

      Udp-sender 2007-12-28
      Using mcast address 236.21.238.31
      UDP sender for (stdin) at 172.21.238.31 on eth0
      Broadcasting control to 224.0.0.1

      Any idea’s what can be wrong? It complains about timeout altho the machines all started up within 20 minutes from the first to the last and looking at [URL=‘http://fogproject.org/forum/threads/multicast-timeout.529/’]this[/URL] post I’ve also checked the Config.php --> UPDSENDER_MAXWAIT setting and I see that it’s on 0 (so I guess that means it will wait forever and not timeout anything).

      1 Reply Last reply Reply Quote 0
      • B
        BryceZ
        last edited by

        What’s your CPU usage look like? With unicasting FOG pushes a compressed image file, which the client uncompresses as it receives it; with multicast FOG uncompresses the image files and then pushes them out. So it could be an issue of failing to uncompress that second partition. Have you been able to unicast this image? Or maybe multicast to a smaller number of hosts?

        1 Reply Last reply Reply Quote 0
        • E
          Emil
          last edited by

          I’m not able to check the cpu usage at this moment, but I did try to multicast the image first to 2 computers and that worked fine.
          I rejoyced and tried them all again but the issue came back.

          I then tried to send it to 25 computers and that worked but it wanted to use 6-7 hours on it and the image would send out around 100-200MB and then freeze for around a minute and then send more… The computer I house fog on is rather old and if it’s a cpu heavy job for the fog server to multicast then your theory might indeed fit what I saw.

          For the fun of it I’m going to get some new fresh hardware and try it on that to see what results that yields.

          Unicasting the image went fine btw 😉

          1 Reply Last reply Reply Quote 0
          • B
            BryceZ
            last edited by

            Another thing that slows down multicasts is actually the performance of the hosts themselves. During a mutlicast the FOG server sends out the image in chunks and waits to get confirmation from each host that they’re ready for the next chunk, so if one host has a bad hard drive or any other issue with receiving and writing the chunk to disk it’ll slow down the entire session.

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post

            215

            Online

            12.0k

            Users

            17.3k

            Topics

            155.2k

            Posts
            Copyright © 2012-2024 FOG Project