• Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login
  • Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login

FOG v1.5.7 on Ubuntu 18.04 random problem multicast

Scheduled Pinned Locked Moved Solved
FOG Problems
5
29
3.0k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    tec618
    last edited by Feb 5, 2020, 12:28 PM

    Hello.

    We have a classroom with 30 computers with the same hardware and multi-boot system with several partitions. With version 1.5.7 (we do not know if this is the reason) when multicast with 12PCs when changing partition, some PCs do not start the deployment of the next partition and the task does not end on those hosts. It does not always happen with the same PCs, this is random.

    We have reviewed logs, memory, free space, … but we did not find the possible cause.

    What parameter would have to be modified to correct this problem? What can we check?

    Thanks

    G 1 Reply Last reply Feb 5, 2020, 12:48 PM Reply Quote 0
    • G
      george1421 Moderator @tec618
      last edited by Feb 5, 2020, 12:48 PM

      Cross linking this post: https://forums.fogproject.org/topic/14143/dev-branch-multicast-for-some-hosts-db-not-updated-after-restore/2?_=1580905777518

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

      1 Reply Last reply Reply Quote 0
      • S
        Sebastian Roth Moderator
        last edited by Feb 5, 2020, 12:52 PM

        @george1421 From what I read between the lines I don’t think this topic is not related. While this was an issue at the end of multicasting when hosts update the DB this topic here seems to be about an issue when hosts step from one partition to the next.

        @tec618 What do you see on the screen of the hosts that don’t proceed. Do they come up with the blue partclone screen and wait like this forever?

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        1 Reply Last reply Reply Quote 0
        • T
          tec618
          last edited by Feb 5, 2020, 1:35 PM

          Exactly @Sebastian-Roth, PCs wait on the blue screen forever and do not start the next partition. It does not always occur on the same computer or on the same partition. In this image we have until nine partitions.

          I can also prove what he says @george1421 because it happens in the same classroom.

          G 1 Reply Last reply Feb 5, 2020, 1:40 PM Reply Quote 0
          • G
            george1421 Moderator @tec618
            last edited by george1421 Feb 5, 2020, 7:41 AM Feb 5, 2020, 1:40 PM

            @tec618 said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:

            I can also prove what he says @george1421 because it happens in the same classroom.

            I can see the logic in that if there are not enough php-fpm workers to service the requests that the target systems may appear to hang if the server is to late to respond to the clients request. At this moment I don’t know if its one worker per multicast client or if one php-fpm worker can service multiple requests from multiple clients. We haven’t research it to that level yet.

            How many computers do you have running the fog client? What is your client check in time out?

            I agree that I’m probably off base here, but in the other post your conditions seem similar.

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            1 Reply Last reply Reply Quote 0
            • S
              Sebastian Roth Moderator
              last edited by Feb 5, 2020, 9:35 PM

              @tec618 said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:

              It does not always occur on the same computer or on the same partition

              So does that mean it’s not always happening on the first partition but usually on one of the later partitions?

              In this image we have until nine partitions.

              While FOG should be able to handle that amount of partitions I am still wondering why you have that many? Dual boot Windows/Linux? ChromeOS?

              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

              1 Reply Last reply Reply Quote 0
              • T
                tec618
                last edited by Feb 6, 2020, 5:27 PM

                Hello again.
                After modifying the file “www.conf”, I have returned to perform a multicast task with 6 identical computers and this is the result:

                • 4 computers have completed the task perfectly
                • 1 computer has failed to update the database (as described in the post: https://forums.fogproject.org/topic/14143/dev-branch-multicast-for-some-hosts-db-not-updated-after-restore/2?_=1580905777518)
                • and another computer has been locked when switching to partition 8 (this is the photo)
                  img_fog.jpg

                During the operation, the RAM (and CPUs) has been sufficient:
                Captura de pantalla de 2020-02-06 13-22-03.png

                Another interesting fact is that after finishing the deployment of computers, fog has eliminated the task (with a computer locked on the blue screen)
                img_fog_2.png

                What parameter can we check? What could be the origin of this problem?

                1 Reply Last reply Reply Quote 0
                • S
                  Sebastian Roth Moderator
                  last edited by Feb 6, 2020, 5:45 PM

                  @Tom-Elliott Have you ever seen a PC failing to proceed from one partition to the other in multicast?

                  @tec618 What happens if you deploy unicast to that PC that failed to pick up partition 8?

                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                  T S T 3 Replies Last reply Feb 6, 2020, 5:53 PM Reply Quote 0
                  • T
                    Tom Elliott @Sebastian Roth
                    last edited by Feb 6, 2020, 5:53 PM

                    @Sebastian-Roth no I’ve never seen this before. While it doesn’t display, does it at least complete? I ask because of the nature of multicast.

                    If you have 10 machines to image and all 10 connect, imaging will proceed immediately. If one of those hosts were shut off after, then imaging would only proceed after a specified timeout, I think we default to 10 minutes.

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    T 1 Reply Last reply Feb 7, 2020, 7:22 AM Reply Quote 0
                    • S
                      Sebastian Roth Moderator
                      last edited by Feb 6, 2020, 6:32 PM

                      @tec618 Yeah, the timeout mentioned by Tom is a good point. When you see one machine not picking up on one of the partitions. Do the others sit there and wait for some amount of time as well?

                      The other thing that just came to my mind is checking /var/log/fog/multicast.log. While I don’t expect to see something out of the ordinary in there it’s still worth a try. You will see many lines with “No new tasks found” but at some point there should be a section of logs starting with “Task ID: xxx Name: Multi-Cast Task - yyy is new”. Please post the full block of lines here.

                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                      T 1 Reply Last reply Feb 7, 2020, 7:39 AM Reply Quote 0
                      • S
                        shruggy @Sebastian Roth
                        last edited by Feb 6, 2020, 9:15 PM

                        @Sebastian-Roth said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:

                        @Tom-Elliott Have you ever seen a PC failing to proceed from one partition to the other in multicast?

                        IIRC, I had this same problem once or twice, after one of FOG upgrades, probably half a year ago. I guess that prompted me to try out the dev-branch back then, the problem went away and I stayed on the dev-branch since then.

                        @tec618 What happens if you deploy unicast to that PC that failed to pick up partition 8?

                        In my case, it never happened in the unicast mode. I have a setup with 4 legacy primary MBR partitions. NTFS (500 MB) - NTFS (237 GB) - ext2 (500 MB) - LVM2 (237 GB). d1.partitions:

                        label: dos
                        label-id: 0x871158f2
                        device: /dev/sda
                        unit: sectors
                        
                        /dev/sda1 : start=        2048, size=     1024000, type=7, bootable
                        /dev/sda2 : start=     1026048, size=   499081216, type=7
                        /dev/sda3 : start=   500107264, size=     1024000, type=83
                        /dev/sda4 : start=   501131264, size=   499083264, type=8e
                        

                        IIRC, the freeze up happened after restoring the 3rd partition.

                        1 Reply Last reply Reply Quote 0
                        • T
                          tec618 @Sebastian Roth
                          last edited by Feb 7, 2020, 7:13 AM

                          @Sebastian-Roth said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:

                          @tec618 What happens if you deploy unicast to that PC that failed to pick up partition 8?

                          if we deploy unicast that PC no problem. In unicast does not fail, only in multicast

                          1 Reply Last reply Reply Quote 0
                          • T
                            tec618 @Tom Elliott
                            last edited by Feb 7, 2020, 7:22 AM

                            @Tom-Elliott said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:

                            @Sebastian-Roth no I’ve never seen this before. While it doesn’t display, does it at least complete? I ask because of the nature of multicast.

                            If you have 10 machines to image and all 10 connect, imaging will proceed immediately. If one of those hosts were shut off after, then imaging would only proceed after a specified timeout, I think we default to 10 minutes.

                            The multicast task is not completed on the computer that fails. The solution is to turn off the computer and start the task in unicast.
                            We have observed that in the partition change the computers do not wait 10 minutes. This is only done when starting multicast deployment

                            1 Reply Last reply Reply Quote 0
                            • T
                              tec618 @Sebastian Roth
                              last edited by Feb 7, 2020, 7:39 AM

                              @Sebastian-Roth These are the lines of the multicast.log file of the commented multicast task …

                              [02-06-20 1:03:55 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is new
                              [02-06-20 1:03:55 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 image file found, file: /images/L4Ene-5
                              [02-06-20 1:03:55 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 6 clients found
                              [02-06-20 1:03:55 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 sending on base port 51530
                              [02-06-20 1:03:55 pm]  | Command: /usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 600 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p1.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p2.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p3.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p4.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p5.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p6.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p7.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p8.img;
                              [02-06-20 1:03:55 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 has started
                              [02-06-20 1:04:05 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 1:04:15 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 1:04:25 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              ..... [This line is repeated many times. Always the same.]
                              [02-06-20 2:08:24 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:08:34 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:08:44 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:08:54 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:09:04 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:09:14 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:09:24 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is already running with pid: 4610
                              [02-06-20 2:09:34 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is no longer running
                              [02-06-20 2:09:34 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 has been killed
                              [02-06-20 2:09:44 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is new
                              [02-06-20 2:09:44 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 image file found, file: /images/L4Ene-5
                              [02-06-20 2:09:44 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 6 clients found
                              [02-06-20 2:09:44 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 sending on base port 51530
                              [02-06-20 2:09:44 pm]  | Command: /usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 600 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p1.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p2.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p3.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p4.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p5.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p6.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p7.img;/usr/local/sbin/udp-sender --interface ens3 --min-receivers 6 --max-wait 10 --portbase 51530 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/L4Ene-5/d1p8.img;
                              [02-06-20 2:09:44 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 has started
                              [02-06-20 2:09:54 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 has been completed
                              [02-06-20 2:09:54 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 has been killed
                              [02-06-20 2:09:54 pm]  | Task ID: 25 Name: Multi-Cast Task - 15-16-27al30 is now completed
                              [02-06-20 2:10:04 pm] Task not created as there are no associated tasks
                              [02-06-20 2:10:04 pm] Or there was no number defined for joining session
                              [02-06-20 2:10:04 pm]  * No new tasks found
                              [02-06-20 2:10:14 pm]  * No new tasks found
                              
                              1 Reply Last reply Reply Quote 0
                              • S
                                Sebastian Roth Moderator
                                last edited by Feb 7, 2020, 1:50 PM

                                @tec618 said:

                                ... /usr/local/sbin/udp-sender --max-wait 600 ... d1p1.img;/usr/local/sbin/udp-sender ... --max-wait 10 ... d1p2.img; ...
                                

                                Good I asked about the log and had a closer look at this now. I remember that we have changed the timeouts but this was a long time ago so I didn’t remember. Currently the first timeout is set to 10 minutes so hosts that boot quicker than others to the multicast don’t rush off and let others behind. But for subsequent partitions the timeout is only 10 seconds. This was made because when you have one out of 20 hosts that fails on the first partition (for whatever reason) then the whole set of hosts would wait for 10 minutes on each partition.

                                Now possibly the 10 seconds is too little for some of your machines. Edit the file /var/www/html/fog/lib/service/multicasttask.class.php line 662 and change number 10 to 30 sec.

                                Cancel all multicast tasks that might still be running and restart the service: systemctl restart FOGMulticastManager

                                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                T T 2 Replies Last reply Feb 7, 2020, 2:28 PM Reply Quote 1
                                • T
                                  Tom Elliott @Sebastian Roth
                                  last edited by Feb 7, 2020, 2:28 PM

                                  @Sebastian-Roth I made the change in working-1.6. I took a more cautious approach to set to one minute instead of 30 seconds.

                                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    Sebastian Roth Moderator
                                    last edited by Feb 7, 2020, 2:56 PM

                                    @Tom-Elliott I am not sure this is a good idea as default. This means that between every partition it needs to wait for at least one minute if one of the multicast client fails…

                                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                    T 1 Reply Last reply Feb 7, 2020, 3:10 PM Reply Quote 0
                                    • T
                                      Tom Elliott @Sebastian Roth
                                      last edited by Feb 7, 2020, 3:10 PM

                                      @Sebastian-Roth That, I think, is okay. 30 seconds could be too short a time (as the partition get’s re-expanded) potentially. It’s one minute vs 10 seconds or 30 seconds. I’m just taking a cautious approach. Only one minute to wait between a partition isn’t too much to ask I don’t think. I can certainly shorten the time. Or we could add a global setting for the admin to select an appropriate time.

                                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        Sebastian Roth Moderator
                                        last edited by Feb 7, 2020, 3:18 PM

                                        @Tom-Elliott said in FOG v1.5.7 on Ubuntu 18.04 random problem multicast:

                                        Or we could add a global setting for the admin to select an appropriate time.

                                        That would surely be a nice feature for 1.6. 🙂

                                        You are probably right hat 1 minute is not asking too much. Though we haven’t seen issues with 10 seconds for many years now. I wouldn’t change that for 1.5.x. The OP is welcome to just manually adjust this.

                                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                        1 Reply Last reply Reply Quote 0
                                        • T
                                          tec618 @Sebastian Roth
                                          last edited by Feb 10, 2020, 9:07 AM

                                          @Sebastian-Roth @Tom-Elliott thanks for your comments.
                                          It is very possible that this is the reason for our multicast problems. It may also be the reason for the host database update problem at the end of the multicast task (post https://forums.fogproject.org/topic/14143/dev-branch-multicast-for-some-hosts-db-not-updated-after-restore/2?_=1580905777518).
                                          I will modify that value (60 seconds), I will test it and I will tell you the result.
                                          This new parameter would be a good improvement for the next version 1.6 😉

                                          1 Reply Last reply Reply Quote 0
                                          • 1
                                          • 2
                                          • 1 / 2
                                          1 / 2
                                          • First post
                                            1/29
                                            Last post

                                          154

                                          Online

                                          12.0k

                                          Users

                                          17.3k

                                          Topics

                                          155.2k

                                          Posts
                                          Copyright © 2012-2024 FOG Project