Fog hangs while trying to upload
-
@lukebarone Thanks for all the assistance on this issue. When I issue the ./installfog.sh it runs through until I get a message that account “fogproject” already exist. Tried to run userdel fogproject, like it recommends and message says “user fogproject is currently used by process 2643”.
-
@Scootframer To verify what it is first, run
ps aux | grep 2643
. Then runkill -9 2643
to forcefully kill it. -
@lukebarone Finally have the latest dev-branch installed. Had my fingers crossed as I uploaded my Template image. Still get’s stuck setting permission on that /images/a4bb6d84ebf4.
-
@Scootframer Please run the following commands on your FOG server console and post output here:
lsblk mount
-
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 200M 0 part /boot/efi ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 930.3G 0 part ├─centos-root 253:0 0 20G 0 lvm / ├─centos-swap 253:1 0 15.8G 0 lvm [SWAP] ├─centos-home 253:2 0 10G 0 lvm /home └─centos-images 253:3 0 3.6T 0 lvm /images sdb 8:16 0 931.5G 0 disk └─sdb1 8:17 0 931.5G 0 part └─centos-images 253:3 0 3.6T 0 lvm /images sdc 8:32 0 931.5G 0 disk └─sdc1 8:33 0 931.5G 0 part └─centos-images 253:3 0 3.6T 0 lvm /images sdd 8:48 0 931.5G 0 disk └─sdd1 8:49 0 931.5G 0 part └─centos-images 253:3 0 3.6T 0 lvm /images sr0 11:0 1 1024M 0 rom
[root@d9fogserver ~]# mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=16367536k,nr_inodes=4091884,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuacct,cpu) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_prio,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/mapper/centos-root on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota) selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=8889) debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel) nfsd on /proc/fs/nfsd type nfsd (rw,relatime) /dev/mapper/centos-images on /images type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sda2 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota) /dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro) /dev/mapper/centos-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=3275884k,mode=700) [root@d9fogserver ~]#
-
@Scootframer
ls -la /images
? -
@lukebarone I thought I would try to upload again from a different computer. It hung on same spot but with different image name.
total 8 drwxrwxrwx. 19 fogproject root 4096 Feb 6 08:13 . dr-xr-xr-x. 20 root root 4096 Feb 27 15:31 .. drwxrwxrwx. 2 fogproject root 206 Oct 21 11:25 21H2GoldSophos drwxrwxrwx. 2 fogproject root 206 Oct 21 11:06 21H2TemplateSophos drwxrwxrwx. 2 fogproject root 206 Nov 16 07:54 22H2GoldSophos drwxrwxrwx. 2 fogproject root 206 Feb 3 16:13 22H2TemplateSophos drwxrwxrwx. 5 fogproject root 106 Mar 1 09:44 dev drwxrwxrwx. 2 fogproject root 246 Jan 13 15:12 ImageforTestingalso drwxrwxrwx. 2 fogproject root 206 Feb 6 08:13 ImageforTestingonly -rwxrwxrwx. 1 fogproject root 0 Dec 21 2021 .mntcheck drwxrwxrwx. 2 fogproject root 30 Dec 21 2021 postdownloadscripts drwxrwxrwx. 2 fogproject root 246 Jan 13 15:34 TestImageONLY drwxrwxrwx. 2 fogproject root 266 Oct 13 14:57 VanillaUEFI drwxrwxrwx. 2 fogproject root 246 Sep 14 15:39 VanillaUPD drwxrwxrwx. 2 fogproject root 206 Mar 9 2022 Win10-21H2-Vanilla drwxrwxrwx. 2 fogproject root 222 Dec 23 2021 Windows10-EDU-20H2-UEFI-Template drwxrwxrwx. 2 fogproject root 206 Oct 12 12:18 Windows10-EDU-21H2-UEFI drwxrwxrwx. 2 fogproject root 206 Oct 12 12:05 Windows10-EDU-21H2-UEFI-Template drwxrwxrwx. 2 fogproject root 206 Jan 13 13:53 Windows10-EDU-22H2-UEFI drwxrwxrwx. 2 fogproject root 206 Jan 13 13:41 Windows10-EDU-22H2-UEFI-Template [root@d9fogserver ~]#
-
@Scootframer Hmmm… I’m not sure off the top of my head. I know I’ve seen this issue before, but I do not recall what the fix was.
-
@lukebarone Thanks for the all the assistance Luke. I am still able to deploy my images which is the most important thing. If you think of the solution let me know.
Thanks,
Scott -
@Scootframer So capturing an image has worked before just fine as we see from the images directory. It’s on a local LVM storage, XFS format. Does not look like something causing trouble in general.
What happens at this stage is simply a
chmod -R 777 /images/a4bb6d84ebf4
, so setting access rights to allow full access for everyone on this directory. I really have no idea why it would hang here. In the step “Preparing backup location” just before this one the directory is being created, which obviously does not hang or error out.You might wanna try a debug capture (go to where you would schedule a normal task but in the last step you can tell to run in debug mode). Boot up the PC and hit ENTER twice to get to the shell. Then start the process by running the command
fog
and start stepping through it. Then just before the “Setting permission location” step you wanna stop the run via ctrl+c. Then runchmod -R 777 /images/a4bb6d84ebf4
and see if that’s returning or not. -
@Sebastian-Roth Hi Sebastian,
I ran through the steps that you recommended I try. After hitting ctrl-c after the “preparing backup location” and typing in your chmod command it just leaves me with a blinking cursor on the next line. I tried to upload a pic of it but I just get an error on upload. -
@Scootframer Do you get anything in the server logs?
dmesg | tail tail /var/log/syslog /var/log/messages journalctl -f
-
@Sebastian-Roth
[root@d9fogserver ~]# dmesg | tail
[262168.127909] nfsd: peername failed (err 107)!
[262263.362858] nfsd: peername failed (err 107)!
[262524.491146] nfsd: peername failed (err 107)!
[262585.932952] nfsd: peername failed (err 107)!
[262592.077249] nfsd: recvfrom returned errno 104
[263814.772339] nfsd: peername failed (err 107)!
[264045.179522] nfsd: recvfrom returned errno 104
[264840.851890] nfsd: peername failed (err 107)!
[264957.591855] nfsd: peername failed (err 107)!
[265277.089716] nfsd: recvfrom returned errno 104
[root@d9fogserver ~]# tail /var/log/syslog /var/log/messages
tail: cannot open ‘/var/log/syslog’ for reading: No such file or directory
==> /var/log/messages <==
Mar 2 11:35:21 d9fogserver kernel: nfsd: peername failed (err 107)!
Mar 2 11:35:28 d9fogserver xinetd[24281]: START: tftp pid=10997 from=10.39.210.89
Mar 2 11:35:28 d9fogserver in.tftpd[10998]: Error code 8: User aborted the transfer
Mar 2 11:35:28 d9fogserver in.tftpd[10999]: Client 10.39.210.89 finished ipxe.efi
Mar 2 11:35:35 d9fogserver in.tftpd[11171]: Client 10.39.210.89 finished default.ipxe
Mar 2 11:36:00 d9fogserver rpc.mountd[27250]: authenticated mount request from 10.39.210.89:986 for /images (/images)
Mar 2 11:40:39 d9fogserver systemd: Created slice User Slice of root.
Mar 2 11:40:39 d9fogserver systemd-logind: New session 81 of user root.
Mar 2 11:40:39 d9fogserver systemd: Started Session 81 of user root.
Mar 2 11:40:41 d9fogserver kernel: nfsd: recvfrom returned errno 104
[root@d9fogserver ~]# journalctl -f
– Logs begin at Mon 2023-02-27 09:59:41 PST. –
Mar 02 11:35:28 d9fogserver.### in.tftpd[10998]: Error code 8: User aborted the transfer
Mar 02 11:35:28 d9fogserver.### in.tftpd[10999]: Client 10.39.210.89 finished ipxe.efi
Mar 02 11:35:35 d9fogserver.### in.tftpd[11171]: Client 10.39.210.89 finished default.ipxe
Mar 02 11:36:00 d9fogserver.### rpc.mountd[27250]: authenticated mount request from 10.39.210.89:986 for /images (/images)
Mar 02 11:40:39 d9fogserver.### sshd[16750]: Accepted password for root from 10.39.210.79 port 59689 ssh2
Mar 02 11:40:39 d9fogserver.### systemd[1]: Created slice User Slice of root.
Mar 02 11:40:39 d9fogserver.### systemd-logind[907]: New session 81 of user root.
Mar 02 11:40:39 d9fogserver.### systemd[1]: Started Session 81 of user root.
Mar 02 11:40:39 d9fogserver.### sshd[16750]: pam_unix(sshd:session): session opened for user root by (uid=0)
Mar 02 11:40:41 d9fogserver.### kernel: nfsd: recvfrom returned errno 104 -
@Scootframer
I found some things on some of these errors that suggest stopping or restarting the nfs service on the server, that doesn’t seem like the best workaround, but could be something.
What pxe boot file are you using? Have you tried reverting to an older kernel or init? You can download the bzImage with a different name in the kernel downloader within the fog gui and then set the host you’re capturing to use the alternate test kernel (so you’re not changing the kernel that you know is working for deploying images). i.e. go download the previous kernel from what you last updated and name itbzImage-test
and then on the host your capturing putbzImage-test
in the hosts kernel field in the fog gui and redo the debug capture steps and see what happens.Also, are you able to run the chmod command on the server (instead of from the client in the debug session)?
-
@Scootframer said in Fog hangs while trying to upload:
[262168.127909] nfsd: peername failed (err 107)!
[262263.362858] nfsd: peername failed (err 107)!
[262524.491146] nfsd: peername failed (err 107)!
[262585.932952] nfsd: peername failed (err 107)!
[262592.077249] nfsd: recvfrom returned errno 104
[263814.772339] nfsd: peername failed (err 107)!
[264045.179522] nfsd: recvfrom returned errno 104
[264840.851890] nfsd: peername failed (err 107)!
[264957.591855] nfsd: peername failed (err 107)!
[265277.089716] nfsd: recvfrom returned errno 104Yeah definitely restart your whole FOG server as suggested by @JJ-Fullmer as well.
-
@JJ-Fullmer I appreciate everyone’s assistance on this issue. We are due for a Server upgrade next month so we are going to put this issue on hold, wait till April and create a new Fog server.
I did download an older Kernel from Sept of last year and renamed as you recommended and had Fog use the alternate kernal for that client. Same issue happened. After doing the Upload Debug steps, ctrl+C and tried the chmod (tried on client and server) it said directory not found? What the heck?
Thanks again for all the great advice. I will keep you guys on speed-dial and hope not to need you when I create our new Fog server next month. -
@Scootframer Have you done a server reboot yet? From what I read this would fix the current issue you see.
-
@Sebastian-Roth Yes, first thing this am after reading my notifications I rebooted the server.
-
@Scootframer Do I get this right? The issue persists even after the server reboot?
-
@Sebastian-Roth Indeed it does, gets stuck in the same spot. When I ran the chmod it said the directory didn’t exist.