Fog hangs while trying to upload
-
Hi all. I’ve been using Fog for a couple years now and it has been fantastic. I try at least once a month to download my Windows “template” image and update it so we don’t have spend hours running updates to Windows after it images.
This time on upload it gets to the step for “Setting permission on /images…” and just hangs. Thought maybe I had a corrupt image so I restored a backup and tried again with same issue.
Running Fog 1.5.9 on CentOS with latest kernel.
Any idea’s would be awesome.
Thanks
-
@Scootframer Can you confirm you’re using the latest FOG version in the
dev-branch
branch?How full is your
/images
partition/drive? Is it a local drive on your server or an NFS mount? Does/images/a4bb6d84ebf4
exist on your server? -
@lukebarone Thanks for the response. I have 3.9GB available on this Dell Server.
I remoted in to my Images folder and noticed a dev directory. Under that directory is indeed a4bb6d84ebf4. There is no contents in the folder. Is this supposed to be there?
Not sure how to check the version in the branch? I am pretty new to Linux but enjoying the heck out of it so far!
-
@Scootframer On your server, navigate to the folder you downloaded fog to, and run
git checkout dev-branch
. Then, re-run the./bin/installfog.sh
script to install the latest version.If you only have 3.9 GB of free space, I’m going to guess that it’s running out of space before it can set the permissions (but someone else would have to verify that for me). The partitions are supposed to be saved into that dev folder until captured, then it gets moved over and renamed.
-
@lukebarone I believe I typed 3.9GB instead of 3.9TB. My apologies.
-
@Scootframer Oh good!
Anyways, I would try the latest
dev-branch
and see if the issue persists. If not, then we would start looking at the logs on the server. -
@lukebarone Thanks for all the assistance on this issue. When I issue the ./installfog.sh it runs through until I get a message that account “fogproject” already exist. Tried to run userdel fogproject, like it recommends and message says “user fogproject is currently used by process 2643”.
-
@Scootframer To verify what it is first, run
ps aux | grep 2643
. Then runkill -9 2643
to forcefully kill it. -
@lukebarone Finally have the latest dev-branch installed. Had my fingers crossed as I uploaded my Template image. Still get’s stuck setting permission on that /images/a4bb6d84ebf4.
-
@Scootframer Please run the following commands on your FOG server console and post output here:
lsblk mount
-
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 200M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 930.3G 0 part ├─centos-root 253:0 0 20G 0 lvm / ├─centos-swap 253:1 0 15.8G 0 lvm [SWAP] ├─centos-home 253:2 0 10G 0 lvm /home └─centos-images 253:3 0 3.6T 0 lvm /images sdb 8:16 0 931.5G 0 disk └─sdb1 8:17 0 931.5G 0 part └─centos-images 253:3 0 3.6T 0 lvm /images sdc 8:32 0 931.5G 0 disk └─sdc1 8:33 0 931.5G 0 part └─centos-images 253:3 0 3.6T 0 lvm /images sdd 8:48 0 931.5G 0 disk └─sdd1 8:49 0 931.5G 0 part └─centos-images 253:3 0 3.6T 0 lvm /images sr0 11:0 1 1024M 0 rom
[root@d9fogserver ~]# mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=16367536k,nr_inodes=4091884,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/mapper/centos-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=8889) debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) nfsd on /proc/fs/nfsd type nfsd (rw,relatime) /dev/mapper/centos-images on /images type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sda2 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro) /dev/mapper/centos-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3275884k,mode=700) [root@d9fogserver ~]#
-
@Scootframer
ls -la /images
? -
@lukebarone I thought I would try to upload again from a different computer. It hung on same spot but with different image name.
total 8 drwxrwxrwx. 19 fogproject root 4096 Feb 6 08:13 . dr-xr-xr-x. 20 root root 4096 Feb 27 15:31 .. drwxrwxrwx. 2 fogproject root 206 Oct 21 11:25 21H2GoldSophos drwxrwxrwx. 2 fogproject root 206 Oct 21 11:06 21H2TemplateSophos drwxrwxrwx. 2 fogproject root 206 Nov 16 07:54 22H2GoldSophos drwxrwxrwx. 2 fogproject root 206 Feb 3 16:13 22H2TemplateSophos drwxrwxrwx. 5 fogproject root 106 Mar 1 09:44 dev drwxrwxrwx. 2 fogproject root 246 Jan 13 15:12 ImageforTestingalso drwxrwxrwx. 2 fogproject root 206 Feb 6 08:13 ImageforTestingonly -rwxrwxrwx. 1 fogproject root 0 Dec 21 2021 .mntcheck drwxrwxrwx. 2 fogproject root 30 Dec 21 2021 postdownloadscripts drwxrwxrwx. 2 fogproject root 246 Jan 13 15:34 TestImageONLY drwxrwxrwx. 2 fogproject root 266 Oct 13 14:57 VanillaUEFI drwxrwxrwx. 2 fogproject root 246 Sep 14 15:39 VanillaUPD drwxrwxrwx. 2 fogproject root 206 Mar 9 2022 Win10-21H2-Vanilla drwxrwxrwx. 2 fogproject root 222 Dec 23 2021 Windows10-EDU-20H2-UEFI-Template drwxrwxrwx. 2 fogproject root 206 Oct 12 12:18 Windows10-EDU-21H2-UEFI drwxrwxrwx. 2 fogproject root 206 Oct 12 12:05 Windows10-EDU-21H2-UEFI-Template drwxrwxrwx. 2 fogproject root 206 Jan 13 13:53 Windows10-EDU-22H2-UEFI drwxrwxrwx. 2 fogproject root 206 Jan 13 13:41 Windows10-EDU-22H2-UEFI-Template [root@d9fogserver ~]#
-
@Scootframer Hmmm… I’m not sure off the top of my head. I know I’ve seen this issue before, but I do not recall what the fix was.
-
@lukebarone Thanks for the all the assistance Luke. I am still able to deploy my images which is the most important thing. If you think of the solution let me know.
Thanks,
Scott -
@Scootframer So capturing an image has worked before just fine as we see from the images directory. It’s on a local LVM storage, XFS format. Does not look like something causing trouble in general.
What happens at this stage is simply a
chmod -R 777 /images/a4bb6d84ebf4
, so setting access rights to allow full access for everyone on this directory. I really have no idea why it would hang here. In the step “Preparing backup location” just before this one the directory is being created, which obviously does not hang or error out.You might wanna try a debug capture (go to where you would schedule a normal task but in the last step you can tell to run in debug mode). Boot up the PC and hit ENTER twice to get to the shell. Then start the process by running the command
fog
and start stepping through it. Then just before the “Setting permission location” step you wanna stop the run via ctrl+c. Then runchmod -R 777 /images/a4bb6d84ebf4
and see if that’s returning or not. -
@Sebastian-Roth Hi Sebastian,
I ran through the steps that you recommended I try. After hitting ctrl-c after the “preparing backup location” and typing in your chmod command it just leaves me with a blinking cursor on the next line. I tried to upload a pic of it but I just get an error on upload. -
@Scootframer Do you get anything in the server logs?
dmesg | tail tail /var/log/syslog /var/log/messages journalctl -f
-
@Sebastian-Roth
[root@d9fogserver ~]# dmesg | tail
[262168.127909] nfsd: peername failed (err 107)!
[262263.362858] nfsd: peername failed (err 107)!
[262524.491146] nfsd: peername failed (err 107)!
[262585.932952] nfsd: peername failed (err 107)!
[262592.077249] nfsd: recvfrom returned errno 104
[263814.772339] nfsd: peername failed (err 107)!
[264045.179522] nfsd: recvfrom returned errno 104
[264840.851890] nfsd: peername failed (err 107)!
[264957.591855] nfsd: peername failed (err 107)!
[265277.089716] nfsd: recvfrom returned errno 104
[root@d9fogserver ~]# tail /var/log/syslog /var/log/messages
tail: cannot open ‘/var/log/syslog’ for reading: No such file or directory
==> /var/log/messages <==
Mar 2 11:35:21 d9fogserver kernel: nfsd: peername failed (err 107)!
Mar 2 11:35:28 d9fogserver xinetd[24281]: START: tftp pid=10997 from=10.39.210.89
Mar 2 11:35:28 d9fogserver in.tftpd[10998]: Error code 8: User aborted the transfer
Mar 2 11:35:28 d9fogserver in.tftpd[10999]: Client 10.39.210.89 finished ipxe.efi
Mar 2 11:35:35 d9fogserver in.tftpd[11171]: Client 10.39.210.89 finished default.ipxe
Mar 2 11:36:00 d9fogserver rpc.mountd[27250]: authenticated mount request from 10.39.210.89:986 for /images (/images)
Mar 2 11:40:39 d9fogserver systemd: Created slice User Slice of root.
Mar 2 11:40:39 d9fogserver systemd-logind: New session 81 of user root.
Mar 2 11:40:39 d9fogserver systemd: Started Session 81 of user root.
Mar 2 11:40:41 d9fogserver kernel: nfsd: recvfrom returned errno 104
[root@d9fogserver ~]# journalctl -f
– Logs begin at Mon 2023-02-27 09:59:41 PST. –
Mar 02 11:35:28 d9fogserver.### in.tftpd[10998]: Error code 8: User aborted the transfer
Mar 02 11:35:28 d9fogserver.### in.tftpd[10999]: Client 10.39.210.89 finished ipxe.efi
Mar 02 11:35:35 d9fogserver.### in.tftpd[11171]: Client 10.39.210.89 finished default.ipxe
Mar 02 11:36:00 d9fogserver.### rpc.mountd[27250]: authenticated mount request from 10.39.210.89:986 for /images (/images)
Mar 02 11:40:39 d9fogserver.### sshd[16750]: Accepted password for root from 10.39.210.79 port 59689 ssh2
Mar 02 11:40:39 d9fogserver.### systemd[1]: Created slice User Slice of root.
Mar 02 11:40:39 d9fogserver.### systemd-logind[907]: New session 81 of user root.
Mar 02 11:40:39 d9fogserver.### systemd[1]: Started Session 81 of user root.
Mar 02 11:40:39 d9fogserver.### sshd[16750]: pam_unix(sshd:session): session opened for user root by (uid=0)
Mar 02 11:40:41 d9fogserver.### kernel: nfsd: recvfrom returned errno 104 -
@Scootframer
I found some things on some of these errors that suggest stopping or restarting the nfs service on the server, that doesn’t seem like the best workaround, but could be something.
What pxe boot file are you using? Have you tried reverting to an older kernel or init? You can download the bzImage with a different name in the kernel downloader within the fog gui and then set the host you’re capturing to use the alternate test kernel (so you’re not changing the kernel that you know is working for deploying images). i.e. go download the previous kernel from what you last updated and name itbzImage-test
and then on the host your capturing putbzImage-test
in the hosts kernel field in the fog gui and redo the debug capture steps and see what happens.Also, are you able to run the chmod command on the server (instead of from the client in the debug session)?