• Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login
  • Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login

Partclone Upload Stalling

Scheduled Pinned Locked Moved Unsolved
FOG Problems
4
24
5.0k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    jemerson93
    last edited by Feb 18, 2019, 7:25 PM

    Running into a strange issue with a FOG deployment. Below is the information and issue.

    FOG Version: 1.5.5
    Kernel Version: 4.19.1

    FOG is deployed via Hyper-V on Ubuntu 18.04.1 Desktop
    2 separate VM’s are created for each image. 1 VM is a Gen 1 VM for the Legacy image. The other VM is a Gen 2 VM for the UEFI image. The DHCP server is Server 2016 and the scope options for Legacy (undionly.kpxe) and UEFI (ipxe.efi) are both configured and FOG boots fine. During the image capture, it completely stalls out shortly after starting the upload process to /images/dev. First the percentage bar stalls out and network traffic slowly drops, eventually the whole thing stalls out and time remaining and time elapsed are frozen. I’m stumped on what could be causing this as it was working a few weeks ago. The FOG server was re-created last night just to test that. This image is after it was completely frozen.

    ec7ee435-5c24-4bc5-9c26-fecdfb84c52b-image.png

    1 Reply Last reply Reply Quote 0
    • S
      Sebastian Roth Moderator
      last edited by Feb 18, 2019, 11:02 PM

      @jemerson93 I am wondering if this is related: https://forums.fogproject.org/topic/6695/performance-decrease-using-hyper-v-win10-clients

      We should have that patch in all our kernels but there is a slight chance that I have missed adding the patch to 4.19.1 kernel. Can you please update to 4.19.6 and see if that makes a difference?

      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

      J 1 Reply Last reply Feb 19, 2019, 11:29 PM Reply Quote 0
      • J
        jemerson93 @Sebastian Roth
        last edited by Feb 19, 2019, 11:29 PM

        @Sebastian-Roth said in Partclone Upload Stalling:

        @jemerson93 I am wondering if this is related: https://forums.fogproject.org/topic/6695/performance-decrease-using-hyper-v-win10-clients

        We should have that patch in all our kernels but there is a slight chance that I have missed adding the patch to 4.19.1 kernel. Can you please update to 4.19.6 and see if that makes a difference?

        Hi Sebastian,

        To give an update on this…

        I updated the kernel and I was able to capture the UEFI image. Below is a copy of an error I am receiving when trying to capture the legacy image.

        e7135ff6-b9a4-4123-ae4f-cdfe6256cc23-image.png

        I verified I am not maxing out any resources on the host.

        G 1 Reply Last reply Feb 20, 2019, 1:53 AM Reply Quote 0
        • G
          george1421 Moderator @jemerson93
          last edited by Feb 20, 2019, 1:53 AM

          @jemerson93 It would be interesting to know if you took the kernel the other direction and see if 4.15.2 has the same error. We’ve see quite a few cpu stalls recently. Maybe Sebastian’s patch took care of them. But downgrading the kernel also seems to mask the issue.

          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

          J 1 Reply Last reply Feb 20, 2019, 7:22 PM Reply Quote 0
          • S
            Sebastian Roth Moderator
            last edited by Sebastian Roth Feb 20, 2019, 2:39 AM Feb 20, 2019, 8:38 AM

            @jemerson93 said in Partclone Upload Stalling:

            I updated the kernel and I was able to capture the UEFI image.

            So you are saying the speed issue and stall is gone with 4.19.6 kernels? If that is the case I should look into re-building the 4.19.1 kernels and make sure the patch is included as well.

            About the rcu_sched message: Please search the forums for this and see what you can find out.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            J 1 Reply Last reply Feb 20, 2019, 3:53 PM Reply Quote 0
            • J
              jemerson93 @Sebastian Roth
              last edited by Feb 20, 2019, 3:53 PM

              @Sebastian-Roth said in Partclone Upload Stalling:

              @jemerson93 said in Partclone Upload Stalling:

              I updated the kernel and I was able to capture the UEFI image.

              So you are saying the speed issue and stall is gone with 4.19.6 kernels? If that is the case I should look into re-building the 4.19.1 kernels and make sure the patch is included as well.

              About the rcu_sched message: Please search the forums for this and see what you can find out.

              Hi Sebastian,

              With the newest kernel, the issue and speed issue seemed resolved. This first upload took an extremely long time, but the 2nd upload took much, much faster.

              I’ll try 4.15.2 and see if I can upload the legacy image.

              1 Reply Last reply Reply Quote 0
              • J
                jemerson93
                last edited by Feb 20, 2019, 4:08 PM

                Trying the 4.15.2 kernel’s, I get the following…

                6dd182a6-1dab-4feb-83e9-bad50f0b8753-image.png

                1 Reply Last reply Reply Quote 0
                • J
                  jemerson93 @george1421
                  last edited by Feb 20, 2019, 7:22 PM

                  @george1421 said in Partclone Upload Stalling:

                  @jemerson93 It would be interesting to know if you took the kernel the other direction and see if 4.15.2 has the same error. We’ve see quite a few cpu stalls recently. Maybe Sebastian’s patch took care of them. But downgrading the kernel also seems to mask the issue.

                  Hi George,

                  Just to give an update, I tried that kernel and it seems it goes into a Kernel Panic mode. I’ve also tried various other Kernel’s 4.19, 4.18, etc and I either go into the kernel panic or eventually the rcu_sched stall.

                  Stumped on what I could do to get this image captured.

                  G 1 Reply Last reply Feb 20, 2019, 8:18 PM Reply Quote 0
                  • G
                    george1421 Moderator @jemerson93
                    last edited by Feb 20, 2019, 8:18 PM

                    @jemerson93 That is very strange to see a kernel panic like that. I’ve only experienced that when I’ve mixed the inits arch with the kernel arch (i.e. booting bzImage (64 bit kernel) but having init_32.xz (32 bit inits) for the virtual disk).

                    I see you are running this on a hyper-v vm in bios mode. Are you getting the same results with a physical machine. Maybe its something in hyper-v land causing this cpu stall. Can you provide stats on your hypervisor version you are using (i.e. Windows 10 1803 with hyper-v loaded, etc).

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                    J 1 Reply Last reply Feb 20, 2019, 8:45 PM Reply Quote 0
                    • J
                      jemerson93 @george1421
                      last edited by Feb 20, 2019, 8:45 PM

                      @george1421 said in Partclone Upload Stalling:

                      @jemerson93 That is very strange to see a kernel panic like that. I’ve only experienced that when I’ve mixed the inits arch with the kernel arch (i.e. booting bzImage (64 bit kernel) but having init_32.xz (32 bit inits) for the virtual disk).

                      I see you are running this on a hyper-v vm in bios mode. Are you getting the same results with a physical machine. Maybe its something in hyper-v land causing this cpu stall. Can you provide stats on your hypervisor version you are using (i.e. Windows 10 1803 with hyper-v loaded, etc).

                      Hi George,

                      The host running FOG (and our images) is Windows Server 2016 Standard 1607 with Hyper-V loaded. All that we are running on this server is a few VM’s (FOG, Win10-UEFI, Win10-Legacy, and our DHCP server).

                      To give another example, in another location, we run Proxmox as our hypervisor. FOG and all of our images (we house many more images there) are created in Proxmox and we are currently having no issues deploying or capturing images. That FOG is version 1.5.5 and the kernel version is 4.19.1.

                      This VM specifically is running in BIOS mode (as I am trying to upload a legacy only image for older workstations). We seem to be able to deploy to physical workstations (and virtual machines perfectly fine). Capturing seems to have no issue except on this VM running in Gen 1 (BIOS Mode).

                      G 1 Reply Last reply Feb 20, 2019, 8:51 PM Reply Quote 0
                      • G
                        george1421 Moderator @jemerson93
                        last edited by Feb 20, 2019, 8:51 PM

                        @jemerson93 While it doesn’t help you at the moment, then can we say the issue of the cpu stall and other linux kernel strangeness is related to to hyper-v under windows 2016 server?

                        I have a hyper-v server running Windows 2016 Data Center in our backup hot site. This is for spinning up our Veeam images. The point is, I haven’t messed with hyper-v (ever sorry I’m a vmware guy) but I’m willing to see if I can create a VM and duplicate the same thing you see. The first step is to see if we have correlation between my server and your server. Then see if this is a common problem with linux OS running under hyper-v.

                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                        J 1 Reply Last reply Feb 21, 2019, 5:07 PM Reply Quote 1
                        • J
                          jemerson93 @george1421
                          last edited by Feb 21, 2019, 5:07 PM

                          @george1421 said in Partclone Upload Stalling:

                          @jemerson93 While it doesn’t help you at the moment, then can we say the issue of the cpu stall and other linux kernel strangeness is related to to hyper-v under windows 2016 server?

                          I have a hyper-v server running Windows 2016 Data Center in our backup hot site. This is for spinning up our Veeam images. The point is, I haven’t messed with hyper-v (ever sorry I’m a vmware guy) but I’m willing to see if I can create a VM and duplicate the same thing you see. The first step is to see if we have correlation between my server and your server. Then see if this is a common problem with linux OS running under hyper-v.

                          Hi George,

                          Perfect, if you can let’s see if you get the same issue. I’m also going to spin up a VM at my home (I have Hyper-V and Proxmox on 2 servers at my house) and I’ll see if I can replicate the issue.

                          1 Reply Last reply Reply Quote 0
                          • S
                            Sebastian Roth Moderator
                            last edited by Feb 22, 2019, 7:32 AM

                            @jemerson93 Can you please do me a favor testing-wise? I just updated the 4.19.1 kernels to have the Hyper-V patch included as well. Can you please downgrade to that kernel version on your FOG server and test again? The hard stall shouldn’t be happing with this version anymore.

                            About the other issue: I am looking forward to hear what you guys find out testing this. Maybe we can report that to the kernel developers when we find out what version exactly is causing this.

                            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                            J 1 Reply Last reply Feb 22, 2019, 11:13 PM Reply Quote 0
                            • J
                              jemerson93 @Sebastian Roth
                              last edited by Feb 22, 2019, 11:13 PM

                              @Sebastian-Roth said in Partclone Upload Stalling:

                              @jemerson93 Can you please do me a favor testing-wise? I just updated the 4.19.1 kernels to have the Hyper-V patch included as well. Can you please downgrade to that kernel version on your FOG server and test again? The hard stall shouldn’t be happing with this version anymore.

                              About the other issue: I am looking forward to hear what you guys find out testing this. Maybe we can report that to the kernel developers when we find out what version exactly is causing this.

                              Hi Sebastian,

                              Unfortunately, I still got the stall. As a notice, I did get the legacy image to upload on 4.19.6 yesterday. Was extremely slow but did upload. Trying again right now on 4.19.6.

                              d9b6b2a8-525f-4b75-a26a-328ddb9dd6f1-image.png

                              I am still setting it up at my home to test as well.

                              1 Reply Last reply Reply Quote 0
                              • S
                                Sebastian Roth Moderator
                                last edited by Feb 22, 2019, 11:31 PM

                                @jemerson93 I meant the stall from the initial post.

                                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                J 2 Replies Last reply Feb 22, 2019, 11:41 PM Reply Quote 0
                                • J
                                  jemerson93 @Sebastian Roth
                                  last edited by Feb 22, 2019, 11:41 PM

                                  @Sebastian-Roth said in Partclone Upload Stalling:

                                  @jemerson93 I meant the stall from the initial post.

                                  Hi Sebastian,

                                  I did not see the stall from the initial post. Now I am just running into the self-detected stall.

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    jemerson93 @Sebastian Roth
                                    last edited by Feb 23, 2019, 12:09 AM

                                    @Sebastian-Roth said in Partclone Upload Stalling:

                                    @jemerson93 I meant the stall from the initial post.

                                    I also attempted 4.19.6 and got the same error. I’ll chalk it up as luck that I got it uploaded, but now I can’t re-upload.

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      Sebastian Roth Moderator
                                      last edited by Sebastian Roth Feb 23, 2019, 4:26 AM Feb 23, 2019, 10:25 AM

                                      @jemerson93 It’s very interesting we see this or similar issues on many different platforms - hardware as well as virtualized environments. There does not seem to be a general answer and those kind of things have been around for years - when you search for rcu_sched you find tons of messages in kernel related mailing lists and forums.

                                      But still this got more and more lately in the FOG forums as well and we are not sure why yet. But moving back to 4.15.2 kernel helped most of the people. We have seen issues with the newer init files not being compatible with this kernel and so I might ask you to manually download those more compatible inits (64 bit and 32bit). Rename/backup the ones you have in /var/www/html/fog/service/ipxe and put those in place (names init.xz and init_32.xz). Then downgrade to the 4.15.2 kernel and see if you still get the ugly error you had before when trying to downgrade the kernel.

                                      If that doesn’t work then I expect this particular issue to be a problem with Hyper-V and Linux kernel option PAGE_TABLE_ISOLATION - Meltdown patch (ref). But would be kind of strange as we have this option enabled in all later kernel version. Nevertheless you can try going back to even older versions of kernel and init. Find those used in FOG 1.5.0 here (kernel 4.13.4).

                                      Beside that I have tried to find current information specific to Hyper-V. Not much I could find, really. https://access.redhat.com/solutions/3743631 (anyone who’s access to RedHat stuff? @Wayne-Workman?)

                                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                      W J 2 Replies Last reply Feb 23, 2019, 2:22 PM Reply Quote 0
                                      • W
                                        Wayne Workman @Sebastian Roth
                                        last edited by Wayne Workman Feb 23, 2019, 8:23 AM Feb 23, 2019, 2:22 PM

                                        @Sebastian-Roth I don’t have access to their articles anymore - but when I did have access back at a previous job, I found if I google’d enough I could find the same stuff elsewhere.

                                        Has a duplicate IP been looked for yet, for the fog server and for the VMs?

                                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                                        Daily Clean Installation Results:
                                        https://fogtesting.fogproject.us/
                                        FOG Reporting:
                                        https://fog-external-reporting-results.fogproject.us/

                                        J 1 Reply Last reply Feb 27, 2019, 10:05 PM Reply Quote 1
                                        • J
                                          jemerson93 @Sebastian Roth
                                          last edited by Feb 27, 2019, 10:03 PM

                                          @Sebastian-Roth said in Partclone Upload Stalling:

                                          @jemerson93 It’s very interesting we see this or similar issues on many different platforms - hardware as well as virtualized environments. There does not seem to be a general answer and those kind of things have been around for years - when you search for rcu_sched you find tons of messages in kernel related mailing lists and forums.

                                          But still this got more and more lately in the FOG forums as well and we are not sure why yet. But moving back to 4.15.2 kernel helped most of the people. We have seen issues with the newer init files not being compatible with this kernel and so I might ask you to manually download those more compatible inits (64 bit and 32bit). Rename/backup the ones you have in /var/www/html/fog/service/ipxe and put those in place (names init.xz and init_32.xz). Then downgrade to the 4.15.2 kernel and see if you still get the ugly error you had before when trying to downgrade the kernel.

                                          If that doesn’t work then I expect this particular issue to be a problem with Hyper-V and Linux kernel option PAGE_TABLE_ISOLATION - Meltdown patch (ref). But would be kind of strange as we have this option enabled in all later kernel version. Nevertheless you can try going back to even older versions of kernel and init. Find those used in FOG 1.5.0 here (kernel 4.13.4).

                                          Beside that I have tried to find current information specific to Hyper-V. Not much I could find, really. https://access.redhat.com/solutions/3743631 (anyone who’s access to RedHat stuff? @Wayne-Workman?)

                                          Hi Sebastian,

                                          My apologies for the late response. I backed up the old inits and downloaded and moved the ones you requested. I then downgraded to 4.15.2 kernel. Did not receive the kernel panic and the legacy image successfully uploaded. It take much longer then usual (I think it took about 3 and a half hours opposed to the past 30 minutes) but no CPU stall or initial network stall.

                                          1 Reply Last reply Reply Quote 0
                                          • 1
                                          • 2
                                          • 1 / 2
                                          1 / 2
                                          • First post
                                            20/24
                                            Last post

                                          199

                                          Online

                                          12.0k

                                          Users

                                          17.3k

                                          Topics

                                          155.2k

                                          Posts
                                          Copyright © 2012-2024 FOG Project