Fog hangs while trying to upload
-
@Scootframer Do you get anything in the server logs?
dmesg | tail tail /var/log/syslog /var/log/messages journalctl -f
-
@Sebastian-Roth
[root@d9fogserver ~]# dmesg | tail
[262168.127909] nfsd: peername failed (err 107)!
[262263.362858] nfsd: peername failed (err 107)!
[262524.491146] nfsd: peername failed (err 107)!
[262585.932952] nfsd: peername failed (err 107)!
[262592.077249] nfsd: recvfrom returned errno 104
[263814.772339] nfsd: peername failed (err 107)!
[264045.179522] nfsd: recvfrom returned errno 104
[264840.851890] nfsd: peername failed (err 107)!
[264957.591855] nfsd: peername failed (err 107)!
[265277.089716] nfsd: recvfrom returned errno 104
[root@d9fogserver ~]# tail /var/log/syslog /var/log/messages
tail: cannot open ‘/var/log/syslog’ for reading: No such file or directory
==> /var/log/messages <==
Mar 2 11:35:21 d9fogserver kernel: nfsd: peername failed (err 107)!
Mar 2 11:35:28 d9fogserver xinetd[24281]: START: tftp pid=10997 from=10.39.210.89
Mar 2 11:35:28 d9fogserver in.tftpd[10998]: Error code 8: User aborted the transfer
Mar 2 11:35:28 d9fogserver in.tftpd[10999]: Client 10.39.210.89 finished ipxe.efi
Mar 2 11:35:35 d9fogserver in.tftpd[11171]: Client 10.39.210.89 finished default.ipxe
Mar 2 11:36:00 d9fogserver rpc.mountd[27250]: authenticated mount request from 10.39.210.89:986 for /images (/images)
Mar 2 11:40:39 d9fogserver systemd: Created slice User Slice of root.
Mar 2 11:40:39 d9fogserver systemd-logind: New session 81 of user root.
Mar 2 11:40:39 d9fogserver systemd: Started Session 81 of user root.
Mar 2 11:40:41 d9fogserver kernel: nfsd: recvfrom returned errno 104
[root@d9fogserver ~]# journalctl -f
– Logs begin at Mon 2023-02-27 09:59:41 PST. –
Mar 02 11:35:28 d9fogserver.### in.tftpd[10998]: Error code 8: User aborted the transfer
Mar 02 11:35:28 d9fogserver.### in.tftpd[10999]: Client 10.39.210.89 finished ipxe.efi
Mar 02 11:35:35 d9fogserver.### in.tftpd[11171]: Client 10.39.210.89 finished default.ipxe
Mar 02 11:36:00 d9fogserver.### rpc.mountd[27250]: authenticated mount request from 10.39.210.89:986 for /images (/images)
Mar 02 11:40:39 d9fogserver.### sshd[16750]: Accepted password for root from 10.39.210.79 port 59689 ssh2
Mar 02 11:40:39 d9fogserver.### systemd[1]: Created slice User Slice of root.
Mar 02 11:40:39 d9fogserver.### systemd-logind[907]: New session 81 of user root.
Mar 02 11:40:39 d9fogserver.### systemd[1]: Started Session 81 of user root.
Mar 02 11:40:39 d9fogserver.### sshd[16750]: pam_unix(sshd:session): session opened for user root by (uid=0)
Mar 02 11:40:41 d9fogserver.### kernel: nfsd: recvfrom returned errno 104 -
@Scootframer
I found some things on some of these errors that suggest stopping or restarting the nfs service on the server, that doesn’t seem like the best workaround, but could be something.
What pxe boot file are you using? Have you tried reverting to an older kernel or init? You can download the bzImage with a different name in the kernel downloader within the fog gui and then set the host you’re capturing to use the alternate test kernel (so you’re not changing the kernel that you know is working for deploying images). i.e. go download the previous kernel from what you last updated and name itbzImage-test
and then on the host your capturing putbzImage-test
in the hosts kernel field in the fog gui and redo the debug capture steps and see what happens.Also, are you able to run the chmod command on the server (instead of from the client in the debug session)?
-
@Scootframer said in Fog hangs while trying to upload:
[262168.127909] nfsd: peername failed (err 107)!
[262263.362858] nfsd: peername failed (err 107)!
[262524.491146] nfsd: peername failed (err 107)!
[262585.932952] nfsd: peername failed (err 107)!
[262592.077249] nfsd: recvfrom returned errno 104
[263814.772339] nfsd: peername failed (err 107)!
[264045.179522] nfsd: recvfrom returned errno 104
[264840.851890] nfsd: peername failed (err 107)!
[264957.591855] nfsd: peername failed (err 107)!
[265277.089716] nfsd: recvfrom returned errno 104Yeah definitely restart your whole FOG server as suggested by @JJ-Fullmer as well.
-
@JJ-Fullmer I appreciate everyone’s assistance on this issue. We are due for a Server upgrade next month so we are going to put this issue on hold, wait till April and create a new Fog server.
I did download an older Kernel from Sept of last year and renamed as you recommended and had Fog use the alternate kernal for that client. Same issue happened. After doing the Upload Debug steps, ctrl+C and tried the chmod (tried on client and server) it said directory not found? What the heck?
Thanks again for all the great advice. I will keep you guys on speed-dial and hope not to need you when I create our new Fog server next month. -
@Scootframer Have you done a server reboot yet? From what I read this would fix the current issue you see.
-
@Sebastian-Roth Yes, first thing this am after reading my notifications I rebooted the server.
-
@Scootframer Do I get this right? The issue persists even after the server reboot?
-
@Sebastian-Roth Indeed it does, gets stuck in the same spot. When I ran the chmod it said the directory didn’t exist.
-
@Scootframer Maybe the server’s disk is failing? Have you checked the disk’s SMART bearings yet?
-