• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Imaging Jobs Freezing

    Scheduled Pinned Locked Moved Unsolved
    FOG Problems
    5
    57
    18.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      atarone
      last edited by

      Server
      • FOG Version: 1.4.2
      • OS: Ubuntu Server 16.04.2 LTS
      Description

      Recently, when I image computers the imaging job will randomly hang or freeze. The times on the client all stop as well as on the server. No errors or warnings are displayed nor seen on the server’s log files. The GUI is still usable and the server is still pingable. The only errors I can find are in the Apache Error Log on the FOG GUI. Below is what is present:

      [Mon Jun 05 10:01:46.164311 2017] [php7:warn] [pid 4300] [client 192.168.150.19:52636] PHP Warning: file_get_contents(/sys/class/net/bonding_masters/operstate): failed to open stream: No such file or directory in /var/www/fog/status/bandwidth.php on line 82

      I was running version 1.4.0 when the issue started and I have since upgraded to 1.4.2 which did not resolve the issues. Has anyone ever seen this before?

      Thanks,

      Anthony

      1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator
        last edited by

        If you use a second target computer do you get the same results?

        We have see target computer with bad hard drives cause the imaging to fail.

        Also if you get an error message on the target computer, please snap a clear picture of it with a mobile and post it here. The context of the error is almost as important as the error itself.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        1 Reply Last reply Reply Quote 0
        • A
          atarone
          last edited by

          Imaging the second target computer also fails. I also tried some different hard drives without any change. Still no errors on the target computer unfortunately.

          Thanks,

          Anthony

          1 Reply Last reply Reply Quote 0
          • A
            atarone
            last edited by

            @george1421 I have tried three different hard drives on three different computers and on all three the imaging jobs freeze. When I cancel the job on the FOG server GUI, the frozen part clone screen is still displayed on the target. Please see the picture below:

            0_1496780590228_20170606_161845.jpg

            What would cause target to lock up? I have verified that there is still network connectivity during imaging via pings to both the server and the target computer and I have verified that the drives are healthy. Any ideas on where to go from here?

            Thanks,

            Anthony

            george1421G 1 Reply Last reply Reply Quote 0
            • george1421G
              george1421 Moderator @atarone
              last edited by

              @atarone On your fog server, is that a physical one? In your OP the error you posted mentioned bonding master. Are you running a lag between your fog server and your network switch?

              What its doing is confusing me a bit, it should not halt during a transfer. It sounds like access to the nfs service on the fog server is breaking. If I was at your location there could be a few things I might try to find out what is going on, but it would be a bit hard to describe.

              Does it only freeze with this specific computer or all computers?

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

              A 1 Reply Last reply Reply Quote 0
              • A
                atarone @george1421
                last edited by

                @george1421 Thanks George! The server is a physical server. I do have a Cisco EtherChannel setup on the switch going to the FOG server. I have not tried undoing that yet. It freezes on all computers I try to image. This is very strange. Let me know what you might try. I am willing to do a remote session with you, if you think it will help.

                Thanks,

                Anthony

                george1421G 1 Reply Last reply Reply Quote 0
                • george1421G
                  george1421 Moderator @atarone
                  last edited by

                  @atarone Lets take down that LAG group. For testing, lets keep is simple, just a single GbE link. We have to start eliminating where the issue isn’t, to find out what’s left.

                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                  A 1 Reply Last reply Reply Quote 0
                  • A
                    atarone @george1421
                    last edited by

                    @george1421 I took down the LAG and connected one Cable to a different port on our switch and I am still getting the same issue. The target freezes after about 1 minute. Let me know what you think.

                    Thanks,

                    Anthony

                    george1421G 1 Reply Last reply Reply Quote 0
                    • george1421G
                      george1421 Moderator @atarone
                      last edited by

                      @atarone Well let see if we can identify what we know so far. Please correct any assumptions I’ve made.

                      1. Different images will pause/freeze going to the same target.
                      2. The same image will freeze going to different targets.
                      3. Both the FOG server and target remain “on the net” and are pingable
                      4. You can reset the process by rebooting the target computer.
                      5. Its a physical server (no intervening hypervisor to deal with).
                      6. We’ve ruled out any strangeness with the LAG

                      I just thought of a process (not a solution) to test your system. Its a bit out there but it will tell us if FOS is freezing or if its operational and just partclone is freezing on us.

                      1. Schedule a deployment to this target computer, but select the debug check box before submitting the task.
                      2. PXE boot the target computer. You will see a few pages of commands on the target computer, just press enter a few times to get past them.
                      3. On the target computer you should be dropped to a command prompt
                      4. At that command prompt key in ip addr show and record the IP address of the FOS system.
                      5. Give root a password in FOS with passwd and use a simple password like hello. Don’t worry since FOS executes out of memory, after a reboot this change is gone.
                      6. Now that you know the IP address of the target computer and have set’s roots password you should be able to connect to the target computer using putty (from a windows computer).
                      7. Connect to the target computer using putty and leave the session open.
                      8. Now back on the console of the target computer key in the master script calle fog
                      9. At each step in the process the script will pause waiting for an enter keypress. Do this until partclone freezes.
                      10. Once partclone freezes go back to your putty session and key in ls -la /images and see if you get a response.

                      This will tell us if the target computer can still reach the images stored on the FOG server. Once we know this we can choose a direction.

                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                      A 2 Replies Last reply Reply Quote 0
                      • A
                        atarone @george1421
                        last edited by

                        @george1421 The only assumption that is incorrect is resetting by reboot does not work. When you reboot the target, it continues to try and boot off of the hard drive.

                        I will try the process you outlined and get back with you.

                        Thanks,

                        Anthony

                        george1421G 1 Reply Last reply Reply Quote 0
                        • george1421G
                          george1421 Moderator @atarone
                          last edited by

                          @atarone said in Imaging Jobs Freezing:

                          incorrect is resetting by reboot does not work

                          I guess what I was getting at is that you can continue imaging if you only reset the target computer. Rebooting the fog server is not required to reimage (any) computer again.

                          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                          A 1 Reply Last reply Reply Quote 0
                          • A
                            atarone @george1421
                            last edited by

                            @george1421 The imaging never continues. I can cancel the job in the GUI, re-schedule the task, reboot target and it will start, but lock up around the same time again. Sorry for the confusion.

                            Thanks,

                            Anthony

                            Tom ElliottT 1 Reply Last reply Reply Quote 0
                            • Tom ElliottT
                              Tom Elliott @atarone
                              last edited by

                              @atarone Is it ONLY this system, or multiple systems having the issue?

                              Sorry if this was already answered, I have been quite busy this week.

                              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                              A 1 Reply Last reply Reply Quote 0
                              • A
                                atarone @Tom Elliott
                                last edited by

                                @Tom-Elliott No worries it has been crazy here too. This is happening on multiple systems.

                                1 Reply Last reply Reply Quote 0
                                • A
                                  atarone @george1421
                                  last edited by

                                  @george1421
                                  @Tom-Elliott I followed these steps and when when PartClone freezes I lose SSH connectivity to the target and pings to it timeout. I checked the switch that it is connected to and the port stays up and error free. Changes cables makes no difference. Could we be hitting a bug or driver error? Please let me know your thoughts.

                                  Thanks,

                                  Anthony

                                  george1421G 1 Reply Last reply Reply Quote 0
                                  • sudburrS
                                    sudburr
                                    last edited by

                                    I don’t see you actually say that you’ve tried using more than the one image.

                                    And have you tried manually copying the image from the server to another hard drive?

                                    [ Standing in between extinction in the cold and explosive radiating growth ]

                                    A 1 Reply Last reply Reply Quote 0
                                    • A
                                      atarone @sudburr
                                      last edited by

                                      @sudburr This issues occurs with different images to different devices. I am using the device/image combination because it is the smallest image and the most critical one I have. I can copy images via SCP from the server to my workstation.

                                      Thanks,

                                      Anthony

                                      1 Reply Last reply Reply Quote 0
                                      • george1421G
                                        george1421 Moderator @atarone
                                        last edited by george1421

                                        @atarone Sorry I’ve been unavailable almost all day.

                                        OK so your target is “lunching-out”. You loose your ssh session and the system is unpingable. So its sounding like the FOS kernel is crashing or there is a network issue.

                                        Your network is 100% GbE including the link to the workstation.

                                        FOS is a multi-tasking, multi-user OS. You should not be able to take it down. A single thread or task may freeze but the OS should keep running. A hardware issue will take down a multi-tasking OS.

                                        In your picture you are deploying to an NCR device. Are these the only devices you are deploying to?

                                        I can say the test I setup did cover all of the basis. It didn’t give us an answer other than the OS is freezing.

                                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                        A 1 Reply Last reply Reply Quote 0
                                        • A
                                          atarone @george1421
                                          last edited by

                                          @george1421 Not a problem. I have been out most of the day myself. Yes, once PartClone freezes I lose all connectivity to the target. We are GbE with the exception of the NCR Kiosk, that I think is only 10/100. But other images I deploy to other PCs are GbE all the way through and I still have the issue. I am using the NCR because it is the most critical at this point and its the smallest image so it is easier to troubleshoot with.

                                          george1421G 1 Reply Last reply Reply Quote 0
                                          • george1421G
                                            george1421 Moderator @atarone
                                            last edited by

                                            @atarone well this is a bit challenging. I have to think its something in your environment because (to this point) no one else has reported this issue.

                                            I have two thoughts on this.

                                            1. Put the target computer on the same switch as the FOG server for testing. This will (should) eliminate any off core switch networking issues.
                                            2. Its still not clear in my mind that FOS is actually freezing. What we do know is the console session is locked because partclone is waiting for data and the network interface went off line because you can’t communicate with it.

                                            With a traditional linux OS in command line mode there are multiple consoles enabled and you can switch between them using ctrl-Fx keys (I think). In the AM I’ll boot FOS into debug mode to see if I FOS supports multiple consoles. If I can switch to another console then we might be able to gain access to a command prompt. If that’s the case then FOS is running, just the network subsystem went off line. I’m not sure what that will tell us other than its not a FOS specific issue.

                                            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                            A 1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 1 / 3
                                            • First post
                                              Last post

                                            230

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project