• Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login
  • Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login

Intermittent TFTP Failure

Scheduled Pinned Locked Moved
Linux Problems
2
11
5.2k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L
    LindsayS
    last edited by Jul 14, 2014, 8:21 PM

    Hi All,
    I am currently struggling with my FOG server.
    Last week up updated from .32 to 1.1.2. I am running my FOG server on Ubuntu 11.10.
    Prior to the upgrade FOG had been working flawlessly for almost 2 years.
    Now I am having intermittent issues with TFTP. It goes like this:
    Upload an image, deploy an image. All works as it should. Then, randomly, I start getting TFTP open timeout errors on clients that are trying to boot to the network.
    I follow the instructions here [url]http://www.fogproject.org/wiki/index.php/Tftp_timeout[/url]… to check tftp from both windows and the fog server. Pre-TFTP open timeout error, I get the responses, just like I should. Post I get no response from either.
    I restart the tftp service, and everything starts working again.
    It typically keeps working for the rest of the day, and then fails sometime after I leave for work.
    I checked the system logs and I found what I suspect is the issue.
    I see:

    Jul 14 09:47:58 CLS-FOG kernel: [268144.646604] init: tftpd-hpa main process (27778) terminated with status 71
    Jul 14 09:47:58 CLS-FOG kernel: [268144.646661] init: tftpd-hpa main process ended, respawning
    Jul 14 09:47:58 CLS-FOG kernel: [268144.648328] init: Temporary process spawn error: Cannot allocate memory
    Jul 14 09:47:59 CLS-FOG CRON[14948]: (CRON) error (grandchild #18135 failed with exit status 1)

    and

    Jul 14 10:09:58 CLS-FOG in.tftpd[18966]: fork: Cannot allocate memory
    Jul 14 10:09:58 CLS-FOG kernel: [269464.412877] init: tftpd-hpa main process (18966) terminated with status 71
    Jul 14 10:09:58 CLS-FOG kernel: [269464.412939] init: tftpd-hpa main process ended, respawning
    Jul 14 10:09:58 CLS-FOG kernel: [269464.414225] init: Temporary process spawn error: Cannot allocate memory
    Jul 14 10:10:00 CLS-FOG in.tftpd[14940]: cannot daemonize: Cannot allocate memory
    Jul 14 10:10:00 CLS-FOG kernel: [269466.543148] init: tftpd-hpa main process (14940) terminated with status 71
    Jul 14 10:10:00 CLS-FOG kernel: [269466.543221] init: tftpd-hpa main process ended, respawning
    Jul 14 10:17:04 CLS-FOG CRON[356]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
    Jul 14 10:24:58 CLS-FOG CRON[2558]: (CRON) error (grandchild #7213 failed with exit status 1)

    along with a bunch of
    Jul 14 15:08:39 CLS-FOG in.tftpd[7749]: tftp: client does not accept options, but this, I gather, is normal.

    I am not particularly familiar with ubuntu beyond the basic stuff, and am not sure where to look from here. Can anyone point me in the right direction?

    Thanks,
    Lindsay

    1 Reply Last reply Reply Quote 0
    • I
      ianabc Testers
      last edited by Jul 15, 2014, 4:28 AM

      Hi Lindsay,

      The “Cannot allocate memory” messages look a bit worrying to me. Does the machine have enough RAM? Could you try running “free -m” in a command window? e.g. here is my fog server
      [CODE]
      total used free shared buffers cached
      Mem: 996 565 431 0 11 382
      -/+ buffers/cache: 172 824

      Swap: 2043 0 2043
      [/CODE]
      My fog server is a virtual machine with 1GB of ram, from the numbers above I have 431Mb free and around 400Mb in cache.

      1 Reply Last reply Reply Quote 0
      • L
        LindsayS
        last edited by Jul 15, 2014, 12:32 PM

        [ATTACH=full]1171[/ATTACH]
        This is from first thing this morning.
        tftp failed again last night, but, as usual, restarted without issue.
        I’ll try checking on the memory consumption again later today, since it looks like it is fine right now.
        FOG is currently running on what used to be a Windows server that was taken offline when we got new ones.

        [url=“/_imported_xf_attachments/1/1171_FOGMemory.jpg?:”]FOGMemory.jpg[/url]

        1 Reply Last reply Reply Quote 0
        • I
          ianabc Testers
          last edited by Jul 15, 2014, 5:33 PM

          I think you should be OK for memory, 4GB seems like plenty. As I said, my setup is only using 1GB and I’m not seeing the problems, before you dig too deep looking for answers, you should check that the problem isn’t firewalling.

          If 0.32 was working and there have been no network changes since then there probably isn’t a network firewall, but you should double/triple check that the fog server doesn’t have one. On Ubuntu 11.10 you should check
          [CODE]
          $ sudo iptables -nL
          Chain INPUT (policy ACCEPT)
          target prot opt source destination

          Chain FORWARD (policy ACCEPT)
          target    prot opt source              destination       
          
          Chain OUTPUT (policy ACCEPT)
          
          target    prot opt source              destination   
          

          $ sestatus
          [/CODE]
          You should also check SELINUX, make sure it is either disabled or in permissive mode
          [CODE]
          $ sestatus
          SELinux status: enabled
          SELinuxfs mount: /selinux
          Current mode: permissive
          Mode from config file: permissive
          Policy version: 24

          Policy from config file:        targeted
          

          [/CODE]
          And you might want to just double check that you are NOT using tcp_wrappers: just check that /etc/hosts.allow and /etc/hosts.deny are empty or only have comments in them. Once you have confirmed those things we can start tracking down the problem without going in circles 🙂

          1 Reply Last reply Reply Quote 0
          • L
            LindsayS
            last edited by Jul 15, 2014, 7:00 PM

            Results of iptables -nL looked exactly like yours.
            Running sestatus gives me "The program ‘sestatus’ is currently not installed’. I assume that this means SELinux is not installed also?
            hosts.allow and hosts.deny both had only comments in them.

            Thanks for your help. I appreciate the inclusion of the commands I need to find the info out. Saves me a lot of digging.:)

            1 Reply Last reply Reply Quote 0
            • I
              ianabc Testers
              last edited by Jul 15, 2014, 7:52 PM

              OK, the network sounds clear and those error messages you were getting from tftp.hpa are very suspicious.

              I don’t have an ubuntu system to test with, but on CentOS systems tftpd is started by inetd, if ubuntu is the same then when a client makes an incoming connection to the fog server for tftp, inetd spawns an instance of in.tftpd to handle the connection. It looks like, in your case, when that happens your system can’t allocate memory for tftpd so it never gets spawned, that is why your tftp sessions fail. The difficult part is figuring out why there is no memory for tftpd.

              As I mentioned above, I don’t think it is a physical problem, 4GB seems like plenty of RAM - I get by with much less. It is possible that some runnaway process is eating up all of your ram, but if your system was genuinely memory starved I would expect more problems than just a failing tftpd server.

              Could you attach a copy of your /var/log/messages or /var/log/syslog file? We might be able to spot the root cause in there.

              Also could you periodically have a look for any tftp processes
              [CODE]
              $ ps -ef | grep tftp
              root 5007 1138 0 12:38 ? 00:00:00 in.tftpd -s /tftpboot
              root 5010 4980 0 12:38 pts/0 00:00:00 grep tftp
              [/CODE]
              It would be interesting to see if it is running and if more than one process is spawned.

              Finally after your machine has been running for a while you can get a list of the most memory hungry processes by running ps, e.g.
              [CODE]
              $ ps -eo pid,rss,vsz,comm= | sort -n -k 3 | tail -n 20
              5461 688 100944 tail
              5459 996 108124 ps
              1192 1540 108168 mysqld_safe
              4980 1924 108308 bash
              1406 1360 117300 crond
              5460 868 155432 sort
              955 1560 249084 rsyslogd
              1413 11140 250640 FOGTaskSchedule
              1427 10944 250640 FOGImageReplica
              1420 11212 250896 FOGMulticastMan
              1398 9544 271660 httpd
              1462 10548 370800 httpd
              1460 10576 370808 httpd
              1458 10480 371024 httpd
              1461 10764 371024 httpd
              1457 10724 371040 httpd
              1463 18528 378748 httpd
              1459 18280 378752 httpd
              1456 19196 379668 httpd
              1294 27708 640800 mysqld
              [/CODE]
              If you could do this around the time that tftp dies we might have a chance of catching the reason why.

              1 Reply Last reply Reply Quote 0
              • L
                LindsayS
                last edited by Jul 17, 2014, 12:57 PM

                Good morning,
                I haven’t forgotten about your reply. I was just waiting for tftp to fail so that the logs and responses I give you are actually useful. Its been two days, and I haven’t had to restart once.
                The only thing I’ve done differently is not uploaded a new image. I have one that is almost ready for upload, so I’ll do that and see if I can get it to fail again. If that works, I should have the requested info soon.
                Thanks again for your help.

                1 Reply Last reply Reply Quote 0
                • I
                  ianabc Testers
                  last edited by Jul 17, 2014, 5:03 PM

                  No problem. I hope it works for you 🙂

                  1 Reply Last reply Reply Quote 0
                  • L
                    LindsayS
                    last edited by Jul 17, 2014, 5:41 PM

                    Alright, here is the latest:
                    It appears that the failures are somehow related to Image uploads to Fog. After 2 days without failure, I started an image upload, went to a meeting, and two hours later, when I got back, tftp was broken. The failure doesn’t seem to happen immediately, since I’ve seen clients continue to connect to Fog immediately after an image upload.

                    The info you requested is here:

                    [CODE]$ ps -ef | grep tftp
                    fog 11648 8967 0 12:03 pts/0 00:00:00 grep --color=auto tftp
                    [/CODE]

                    [CODE]$ ps -eo pid,rss,vsz,comm= | sort -n -k 3 |tail -n 20
                    1577 4584 60608 indicator-sessi
                    1592 5340 63672 indicator-messa
                    1581 6104 69116 indicator-datet
                    1641 4860 69804 indicator-sessi
                    32691 14640 74444 gnome-terminal
                    1560 6132 77264 indicator-datet
                    32186 18416 77768 unity-panel-ser
                    1405 18236 77936 unity-greeter
                    27376 22500 85372 unity-2d-panel
                    1615 3456 87192 pulseaudio
                    28537 11252 87348 nm-applet
                    27090 3940 87444 pulseaudio
                    1584 4736 115080 indicator-sound
                    28576 36108 123692 nautilus
                    25822 13444 124244 gnome-settings-
                    12697 122096 126196 find
                    1528 5724 132244 indicator-sound
                    832 28392 159192 mysqld
                    3170 73260 219168 soffice.bin
                    27380 39584 244452 unity-2d-launch
                    fog@CLS-FOG:~$
                    [/CODE]

                    I checked the memory usage during the failure
                    [code] $ free -m
                    total used free shared buffers cached
                    Mem: 4031 3757 273 0 60 3193
                    -/+ buffers/cache: 503 3527
                    Swap: 4092 0 4092
                    [/code]
                    and after running sudo restart tftpd-hpa
                    [CODE]free -m
                    total used free shared buffers cached
                    Mem: 4031 1349 2681 0 78 289
                    -/+ buffers/cache: 981 3049
                    Swap: 4092 0 4092
                    [/CODE]

                    Attached is the syslog from today

                    [url=“/_imported_xf_attachments/1/1175_syslog.txt?:”]syslog.txt[/url]

                    1 Reply Last reply Reply Quote 0
                    • I
                      ianabc Testers
                      last edited by Jul 17, 2014, 6:02 PM

                      I’m not sure at the moment. I’ll take a look at your syslog later, it certainly still looks like a memory leak in some process to me (my guess would be gnome somewhere - if you don’t need a desktop on your fog server just uninstall it).

                      If you don’t want to uninstall gnome just yet make sure you apt-get update && apt-get upgrade the system. It’s possible whatever program is leaking has been fixed already.

                      If you want to try the system without a desktop for a while you could try
                      [CODE]
                      $ sudo service lightdm stop
                      [/CODE]
                      You should end up at a black console with white text. Run your upload/download tests and see if the problem is fixed.

                      1 Reply Last reply Reply Quote 0
                      • I
                        ianabc Testers
                        last edited by Jul 17, 2014, 6:15 PM

                        One thing I just noticed, the timestamp on your tftpd entries in syslog are all really weird, I’m not sure why, and I don’t see how it could cause a problem, but it is odd.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        1 / 1
                        • First post
                          9/11
                          Last post

                        220

                        Online

                        12.0k

                        Users

                        17.3k

                        Topics

                        155.2k

                        Posts
                        Copyright © 2012-2024 FOG Project