• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Failing to image after VPN drop between FOG Node & primary server

    Scheduled Pinned Locked Moved
    FOG Problems
    3
    7
    2.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • netbootdiskN
      netbootdisk
      last edited by

      Hi All,

      Been using FOG in a multi-site setup for the past year quite successfully. Some 800+ jobs have been kicked off.

      The main FOG server is located in our data centre and we have local storage nodes each at each site.

      Recently during a unicast clone session at one site, the VPN link dropped out whilst imaging was taking place. When these machines finished imaging (from the local storage node) they then bombed out with an error message because the primary FOG server couldn’t be contacted. (Sorry I don’t recall what this message was!)

      Once the VPN was back up, the jobs were cancelled. That actual deployment still worked OK, but the couple of machines we have since attempted to reimage are now not working. The machines still PXE boot in to the FOG OS, then instead of partclone executing, it appears to skip straight over it and then reboot. FOG then thinks the job is completed successfully.

      My guess is some temporary file/flag is set somewhere still? Any ideas of where I should start looking to clean this up?

      EDIT: Running Fog 1.2.0 too 🙂

      1 Reply Last reply Reply Quote 0
      • Tom ElliottT
        Tom Elliott
        last edited by

        [quote=“netbootdisk, post: 38245, member: 5249”]Hi All,

        Been using FOG in a multi-site setup for the past year quite successfully. Some 800+ jobs have been kicked off.

        The main FOG server is located in our data centre and we have local storage nodes each at each site.

        Recently during a unicast clone session at one site, the VPN link dropped out whilst imaging was taking place. When these machines finished imaging (from the local storage node) they then bombed out with an error message because the primary FOG server couldn’t be contacted. (Sorry I don’t recall what this message was!)

        Once the VPN was back up, the jobs were cancelled. That actual deployment still worked OK, but the couple of machines we have since attempted to reimage are now not working. The machines still PXE boot in to the FOG OS, then instead of partclone executing, it appears to skip straight over it and then reboot. FOG then thinks the job is completed successfully.

        My guess is some temporary file/flag is set somewhere still? Any ideas of where I should start looking to clean this up?

        EDIT: Running Fog 1.2.0 too :)[/quote]

        The only place I’m aware of to “clean-up” tasks is from the task page. Since the start of the 1.x.x series, we’re not building files to load the system for pxe booting. We’re using just straight database values. My only guess is maybe with the VPN link dropping the database may have become corrupted?

        You can try Repairing your database tables. Make your life a little easier and install phpMyAdmin. Login to your database. Select the FOG database. It should show a page on the left containing all of the fog tables. scroll to the bottom and choose check all. From the Drop down option, choose Repair and let it run.

        It’s only a guess. It doesn’t necessarily mean there is a problem with the DB, but it’s where I would start.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        1 Reply Last reply Reply Quote 0
        • netbootdiskN
          netbootdisk
          last edited by

          Hi Tom,
          I’ve tried that and it’s no different.

          Looking at the apache log on the primary server, I’m seeing this when I PXE boot a problematic machine, then select the ‘quick image’ task.

          10.1.1.110 - - [28/Oct/2014:12:59:59 +1000] “POST /fog/service/ipxe/boot.php HTTP/1.1” 200 609 “-” “iPXE/1.0.0+ (3a02)”
          10.1.1.110 - - [28/Oct/2014:12:59:59 +1000] “POST /fog/service/ipxe/boot.php HTTP/1.1” 200 948 “-” “iPXE/1.0.0+ (3a02)”
          10.1.1.110 - - [28/Oct/2014:13:00:08 +1000] “POST /fog/service/inventory.php HTTP/1.1” 200 299 “-” “Wget”
          10.1.1.110 - - [28/Oct/2014:13:00:08 +1000] “GET /fog/service/Pre_Stage1.php?mac=78:45:c4:2f:61:09&type=down HTTP/1.1” 200 300 “-” “Wget”
          10.1.1.110 - - [28/Oct/2014:13:00:08 +1000] “GET /fog/service/Post_Stage3.php?mac=78:45:c4:2f:61:09&type=down HTTP/1.1” 200 297 “-” “Wget”

          Update:
          ImagingLog db table shows an entry for that machine ID with identical start/finish times too, along with the TaskLog table showing State 3 & 4 entries with identical times too.

          1 Reply Last reply Reply Quote 0
          • netbootdiskN
            netbootdisk
            last edited by

            Also if I tell it to do a Memtest task that works fine. Just seems to totally skip the imaging step.

            1 Reply Last reply Reply Quote 0
            • JunkhackerJ
              Junkhacker Developer
              last edited by

              do those computers have partitions on the drives?

              signature:
              Junkhacker
              We are here to help you. If you are unresponsive to our questions, don't expect us to be responsive to yours.

              1 Reply Last reply Reply Quote 0
              • netbootdiskN
                netbootdisk
                last edited by

                Ah man… :eek:

                Just ran gparted over it and it said there was no partition table. Created it with gParted and then rebooted, and it’s imaged up fine then. Which also explains why a brand new (replacement) drive didn’t work.

                I’d just assumed that FOG would restore all that from the image too. (mult-partition image).

                1 Reply Last reply Reply Quote 0
                • netbootdiskN
                  netbootdisk
                  last edited by

                  Odd thing too is the problem lab was still booting Win7 fine despite gparted showing a blank partition table too! Back to reimaging now 🙂

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post

                  144

                  Online

                  12.0k

                  Users

                  17.3k

                  Topics

                  155.2k

                  Posts
                  Copyright © 2012-2024 FOG Project