Imaging Jobs Freezing
- FOG Version: 1.4.2
- OS: Ubuntu Server 16.04.2 LTS
Recently, when I image computers the imaging job will randomly hang or freeze. The times on the client all stop as well as on the server. No errors or warnings are displayed nor seen on the server’s log files. The GUI is still usable and the server is still pingable. The only errors I can find are in the Apache Error Log on the FOG GUI. Below is what is present:
[Mon Jun 05 10:01:46.164311 2017] [php7:warn] [pid 4300] [client 192.168.150.19:52636] PHP Warning: file_get_contents(/sys/class/net/bonding_masters/operstate): failed to open stream: No such file or directory in /var/www/fog/status/bandwidth.php on line 82
I was running version 1.4.0 when the issue started and I have since upgraded to 1.4.2 which did not resolve the issues. Has anyone ever seen this before?
@atarone Before or after it freezes??
@Sebastian-Roth Correct. I am not able to switch when using CTRL-ALT-Fx.
@atarone You mean you aren’t able to switch even when the imaging did not freeze yet? Yeah, Ctrl+Alt+Fx is the key…
@Sebastian-Roth Thanks! What is the key combination to switch VTY lines? I am unable to switch them using CTRL-ALT-Fx or any other combination of those keys.
@Tom-Elliott You are absolutely right about SSH being the more advanced method to get access to such a client. But in this case when network connection is lost or it actually freezes it’s quite handy I suppose. On the other hand, I see that we never ever had such a case yet. So maybe just leave it.
@Sebastian-Roth I don’t think it would. I think it’s just the access to those terminals can be rather limited, which is all the more reason I added the openssh utils. Anybody can remote in much easier than have a device that’s having issues right next to them the whole time.
Using the openssh elements of it all allow us (devs and what not) to remote in and ssh in to see the machine too.
Pair the postinit scripts with a means to associate the root password and you don’t even, fully, need debug mode to test things (though I’ll admit you’d be strained for time to get information).
@Tom-Elliott What do you think about adding a virtual terminal to the official initrds? Would this use too much resources on the clients for no reason?
@Sebastian-Roth Do you have a VT2 version for init_32.xe? I can only use the NCR and they are 32 bit.
As well, could you please try this: Boot the client into deploy task using the new init.xz as normal. As FOG starts to prepare the disk for imaging (before the blue partclone screen) switch to VT2 (Ctrl+Alt+F2) and run this command:
tail -f /var/log/messages. Just let it sit there. You should see all (kernel) messages coming in. Maybe this will give us a hint on what’s causing the hang. Please take a picture and upload here.
Unfortunately you can’t see when it freezes while you are in VT2 but you can run a ping from another machine to check if the client is still alive…
I am unable to switch between the VTYs.
So the client really seems to fully freeze. What if you hit caps lock by the way. Does the LED on the keyboard change state when it hangs? Just want to make sure…
@Sebastian-Roth “I think I have found part of it given this output. Please let me know.” I though that last output capture being blank may have been a problem, but you said all looks good to you. I am unable to switch between the VTYs.
The deployments started freezing again with the new init.xz and the old one.
Are you able to switch between virtual terminal one and two (as described earlier) when deployment freezes?
The numbers in the outputs you posted look pretty ok to me. I can’t see where things are going wrong here yet.
I think I have found part of it given this output. Please let me know.
What do you mean by that?
@Sebastian-Roth The deployments started freezing again with the new init.xz and the old one.
Below are the screen captures you requested:
ls -al Vertix
drwxrwxrwx 2 fog root 4096 Jun 15 11:07 .
drwxrwxrwx 11 fog root 4096 Jun 15 11:07 …
-rwxrwxrwx 1 root root 1 Jun 15 10:21 d1.fixed_size_partitions
-rwxrwxrwx 1 root root 512 Jun 15 10:21 d1.mbr
-rwxrwxrwx 1 root root 132 Jun 15 10:21 d1.minimum.partitions
-rwxrwxrwx 1 root root 15 Jun 15 10:21 d1.original.fstypes
-rwxrwxrwx 1 root root 0 Jun 15 10:21 d1.original.swapuuids
-rwxrwxrwx 1 root root 9754708186 Jun 15 11:07 d1p1.img
-rwxrwxrwx 1 root root 132 Jun 15 10:21 d1.partitions
/dev/sda1 : start= 63, size= 78134424, type=7, bootable
/dev/sda1 : start= 63, size= 24918072, type=7, bootable
I think I have found part of it given this output. Please let me know.
@atarone So the partition table on sda (first HD) looks pretty straight forward. I don’t think there is much that can go wrong. Why did you run
fdisk -l /dev/sda1? Just out of curiosity or by intention? There shouldn’t be a partition table within the first partition I reckon. But maybe that’s just a coincidence?!
Can you please post a picture (or text output) when running the following commands on your FOG server:
ls -al /images/Vertix cat /images/Vertix/d1.partitions cat /images/Vertix/d1.minimum.partitions cat /images/Vertix/d1.fixed_size_partitions
@Sebastian-Roth Below is the screenshots from the fdisk -l
I will be running the tests shortly with old init.xz and will get back to you.
@atarone I am not really sure where this is taking us. Would you mind testing with the provided init.xz file (second virtual terminal) over and over till you run into the freezing issue again. Then see if you can still switch terminals. If freezes don’t occur anymore then you might want to go back to the init.xz you had before and try again. Maybe there was something screwed with that initrd you had. Would be good to rule that out.
I am not sure about your other issue. Looks like deploy is not properly working. Is this a complete new image your uploaded freshly? Could you please run a debug deploy task on this client again (like when you schedule a normal deploy but tick the checkbox for debug). When you get to the shell run
fdisk -l /dev/sdaand post a clear picture of that here.
This has been the weirdest issue since day one haha. This is all being done in the production environment. I got the password stuff sorted out. Below is what the target is doing. This occurs anytime FOG turns over booting to the OS.
So I updated init.xz with the one created by Sebastian-Roth and now images deploy
This all sounds totally weird! Is this in your small simple test environment? The only thing I did was downloading the most current init.xz and adding this one line to /etc/inititab:
From my point of view it’s impossible that this change is actually solving your freezing problem!
but Windows no longer boots after imaging.
Which error do you see? Please post a picture!
It did the capture but returned errors trying to update the database using the username “fog”. I tested that login and it is still the default. What password is it trying to use and could this be the root cause of my deployments freezing?
Check this wiki page about database password. Again I highly doubt that this could cause the freezing!
@george1421 So I updated init.xz with the one created by Sebastian-Roth and now images deploy, but Windows no longer boots after imaging. So i did a capture job thinking that the image is just messed. It did the capture but returned errors trying to update the database using the username “fog”. I tested that login and it is still the default. What password is it trying to use and could this be the root cause of my deployments freezing?