Imaging Jobs Freezing

atarone

@george1421 I will get a lab environment setup and report back what I find. Just for the sake of argument I did perform a capture task with the NCR with the all of the normal settings and the capture task was successful.

Thanks,

Anthony

atarone

@george1421 I tried re-installing FOG without any success. I have a lab environment at home so I will re-create and test there. The only difference will be that the FOG server will be a VM instead of physical. I will report later next week what I find.

Thanks,

Anthony

atarone

@george1421
@Tom-Elliott
I setup a FOG server in a test environment. Different switch, different server,etc and the imaging jobs are still freezing and locking up. What could be causing the client to lockup?

Thanks,

Anthony

Sebastian Roth

@atarone Reading all the things you have already tried I think we need to tackle this from a different angle to get a step forward. So I modified the most current init.xz image to spawn a second virtual terminal. Usually FOS does not spawn a second one because there is no need to. In this case I thought this might be helpful for the first time.

On your test FOG server, download init.xz here and put it in /var/www/fog/service/ipxe (move the original out of the way).

Then schedule a deploy task for your test client and boot it up. It should boot just as it used to and while FOG sets things up for imaging you should be able to switch between virtual terminal one and two with the well known keys Ctrl+Alt+F2 and Ctrl+Alt+F1… On the second terminal hit ENTER once and you get to a shell. Here you are free to run commands while FOG is doing it’s thing on VT1.

I’d start with getting VT2 ready but then switching back to VT1 and watching partclone. After freezing, are you still able to switch back to VT2? If yes, what do you see when running dmesg for example? If you cannot switch back to VT2 then … (I am still thinking about what to do next then!).

george1421

@Sebastian-Roth said in Imaging Jobs Freezing:

Usually FOS does not spawn a second one because there is no need to. In this case I thought this might be helpful for the first time.

Really nice, we could have used that for the switch console test, to see if the target was freezing or just the session was hanging. Thanks!!

atarone

@Sebastian-Roth
@george1421 So I updated init.xz with the one created by Sebastian-Roth and now images deploy, but Windows no longer boots after imaging. So i did a capture job thinking that the image is just messed. It did the capture but returned errors trying to update the database using the username “fog”. I tested that login and it is still the default. What password is it trying to use and could this be the root cause of my deployments freezing?

Thanks,

Anthony

Sebastian Roth

@atarone said in Imaging Jobs Freezing:

So I updated init.xz with the one created by Sebastian-Roth and now images deploy

This all sounds totally weird! Is this in your small simple test environment? The only thing I did was downloading the most current init.xz and adding this one line to /etc/inititab:

tty2::askfirst:-/bin/bash

From my point of view it’s impossible that this change is actually solving your freezing problem!

but Windows no longer boots after imaging.

Which error do you see? Please post a picture!

It did the capture but returned errors trying to update the database using the username “fog”. I tested that login and it is still the default. What password is it trying to use and could this be the root cause of my deployments freezing?

Check this wiki page about database password. Again I highly doubt that this could cause the freezing!

atarone

@Sebastian-Roth
@george1421

This has been the weirdest issue since day one haha. This is all being done in the production environment. I got the password stuff sorted out. Below is what the target is doing. This occurs anytime FOG turns over booting to the OS.

Sebastian Roth

@atarone I am not really sure where this is taking us. Would you mind testing with the provided init.xz file (second virtual terminal) over and over till you run into the freezing issue again. Then see if you can still switch terminals. If freezes don’t occur anymore then you might want to go back to the init.xz you had before and try again. Maybe there was something screwed with that initrd you had. Would be good to rule that out.

I am not sure about your other issue. Looks like deploy is not properly working. Is this a complete new image your uploaded freshly? Could you please run a debug deploy task on this client again (like when you schedule a normal deploy but tick the checkbox for debug). When you get to the shell run fdisk -l /dev/sda and post a clear picture of that here.

atarone

@Sebastian-Roth Below is the screenshots from the fdisk -l

I will be running the tests shortly with old init.xz and will get back to you.

Thanks,

Anthony

Sebastian Roth

@atarone So the partition table on sda (first HD) looks pretty straight forward. I don’t think there is much that can go wrong. Why did you run fdisk -l /dev/sda1? Just out of curiosity or by intention? There shouldn’t be a partition table within the first partition I reckon. But maybe that’s just a coincidence?!

Can you please post a picture (or text output) when running the following commands on your FOG server:

ls -al /images/Vertix
cat /images/Vertix/d1.partitions
cat /images/Vertix/d1.minimum.partitions
cat /images/Vertix/d1.fixed_size_partitions

atarone

@Sebastian-Roth The deployments started freezing again with the new init.xz and the old one.

Below are the screen captures you requested:

ls -al Vertix
total 9526116
drwxrwxrwx 2 fog root 4096 Jun 15 11:07 .
drwxrwxrwx 11 fog root 4096 Jun 15 11:07 …
-rwxrwxrwx 1 root root 1 Jun 15 10:21 d1.fixed_size_partitions
-rwxrwxrwx 1 root root 512 Jun 15 10:21 d1.mbr
-rwxrwxrwx 1 root root 132 Jun 15 10:21 d1.minimum.partitions
-rwxrwxrwx 1 root root 15 Jun 15 10:21 d1.original.fstypes
-rwxrwxrwx 1 root root 0 Jun 15 10:21 d1.original.swapuuids
-rwxrwxrwx 1 root root 9754708186 Jun 15 11:07 d1p1.img
-rwxrwxrwx 1 root root 132 Jun 15 10:21 d1.partitions

cat /images/Vertix/d1.partitions
label: dos
label-id: 0x708be90c
device: /dev/sda
unit: sectors

/dev/sda1 : start= 63, size= 78134424, type=7, bootable

cat /images/Vertix/d1.minimum.partitions
label: dos
label-id: 0x708be90c
device: /dev/sda
unit: sectors

/dev/sda1 : start= 63, size= 24918072, type=7, bootable

cat /images/Vertix/d1.fixed_size_partitions

fogadmin@INC-FOG01:~$

I think I have found part of it given this output. Please let me know.

Thanks,

Anthony

Sebastian Roth

@atarone said in Imaging Jobs Freezing:

The deployments started freezing again with the new init.xz and the old one.

Are you able to switch between virtual terminal one and two (as described earlier) when deployment freezes?

The numbers in the outputs you posted look pretty ok to me. I can’t see where things are going wrong here yet.

I think I have found part of it given this output. Please let me know.

What do you mean by that?

atarone

@Sebastian-Roth “I think I have found part of it given this output. Please let me know.” I though that last output capture being blank may have been a problem, but you said all looks good to you. I am unable to switch between the VTYs.

Thanks,

Anthony

Sebastian Roth

@atarone said in Imaging Jobs Freezing:

I am unable to switch between the VTYs.

So the client really seems to fully freeze. What if you hit caps lock by the way. Does the LED on the keyboard change state when it hangs? Just want to make sure…

Sebastian Roth

As well, could you please try this: Boot the client into deploy task using the new init.xz as normal. As FOG starts to prepare the disk for imaging (before the blue partclone screen) switch to VT2 (Ctrl+Alt+F2) and run this command: tail -f /var/log/messages. Just let it sit there. You should see all (kernel) messages coming in. Maybe this will give us a hint on what’s causing the hang. Please take a picture and upload here.

Unfortunately you can’t see when it freezes while you are in VT2 but you can run a ping from another machine to check if the client is still alive…

atarone

@Sebastian-Roth Do you have a VT2 version for init_32.xe? I can only use the NCR and they are 32 bit.

Sebastian Roth

@atarone Yes, no problem. Find a fresh version of both 32 bit and 64 bit in the same place.

@Tom-Elliott What do you think about adding a virtual terminal to the official initrds? Would this use too much resources on the clients for no reason?

Tom Elliott

@Sebastian-Roth I don’t think it would. I think it’s just the access to those terminals can be rather limited, which is all the more reason I added the openssh utils. Anybody can remote in much easier than have a device that’s having issues right next to them the whole time.

Using the openssh elements of it all allow us (devs and what not) to remote in and ssh in to see the machine too.

Pair the postinit scripts with a means to associate the root password and you don’t even, fully, need debug mode to test things (though I’ll admit you’d be strained for time to get information).

Sebastian Roth

@Tom-Elliott You are absolutely right about SSH being the more advanced method to get access to such a client. But in this case when network connection is lost or it actually freezes it’s quite handy I suppose. On the other hand, I see that we never ever had such a case yet. So maybe just leave it.

Imaging Jobs Freezing

104

12.6k

17.5k

156.3k