FOG doesn't detect the status of the clients


  • Developer

    Hi,

    I have searched in the forum about this and I have found some entries:
    Host management page show red exclamation for all hosts
    Ubuntu 14.04 LTS with dnsmasq no external DNS

    But this entries are old and don’t solved the problem.

    Scenario

    • FOG Server version: 1.5.4
    • OS: RHEL 7
    • I use DNSmasq to control the access to the server and the DHCP and DNS are external servers.

    dnsmasq.conf file (this file is created dinamycally on-fly to allow only access to the server the clienat which have active task. In this way we can have two or more PXE servers in the same VLAN ):

    /etc/dnsmasq.conf

    port=0
    log-dhcp
    tftp-root=/tftpboot
    
    # Disable re-use of the DHCP servername and filename fields as extra
    # sy=2pace. That's to avoid confusing some old or broken DHCP clients.
    dhcp-no-override
    
    # inspect the vendor class string and match the text to set the tag
    dhcp-vendorclass=BIOS,PXEClient:Arch:00000
    dhcp-vendorclass=UEFI32,PXEClient:Arch:00006
    dhcp-vendorclass=UEFI,PXEClient:Arch:00007
    dhcp-vendorclass=UEFI64,PXEClient:Arch:00009
    
    # Set the boot file name based on the matching tag from the vendor class (above)
    dhcp-boot=net:UEFI32,i386-efi/ipxe_delay.efi,,10.0.15.8
    dhcp-boot=net:UEFI,ipxe_delay.efi,,10.0.15.8
    dhcp-boot=net:UEFI64,ipxe_delay.efi,,10.0.15.8
    
    # The boot filename, Server name, Server Ip Address
    dhcp-boot=undionly_delay.kpxe,,10.0.15.8
    
    # PXE menu.  The first part is the text displayed to the user.  The second is the timeout, in seconds.
    pxe-prompt=Booting FOG Client, 1
    
    dhcp-reply-delay=0
    

    The server have two interfaces:

    # more ifcfg-ens192
    TYPE="Ethernet"
    BOOTPROTO="none"
    DEFROUTE="yes"
    IPV4_FAILURE_FATAL="no"
    IPV6INIT="yes"
    IPV6_AUTOCONF="yes"
    IPV6_DEFROUTE="yes"
    IPV6_FAILURE_FATAL="no"
    NAME="ens192"
    UUID="78259bb6-4b04-438f-aa60-33197d32dcf4"
    ONBOOT="yes"
    DNS1="10.10.13.6"
    DNS2="10.20.13.6"
    DNS3="10.30.13.6"
    DOMAIN="lg.ehu.es"
    HWADDR="00:50:56:B8:58:54"
    IPADDR=10.0.15.8
    PREFIX=24
    GATEWAY=10.0.15.1
    IPV6_PEERDNS=yes
    IPV6_PEERROUTES=yes
    

    /etc/resolv.conf file:

    # more /etc/resolv.conf 
    # Generated by NetworkManager
    search lg.ehu.es lgp.ehu.es
    nameserver 10.10.13.6
    nameserver 10.20.13.6
    nameserver 10.30.13.6
    

    FOGPingHosts service status:

    # systemctl -l status FOGPingHosts
    ● FOGPingHosts.service - FOGPingHosts
       Loaded: loaded (/usr/lib/systemd/system/FOGPingHosts.service; enabled; vendor preset: disabled)
       Active: active (running) since vie 2018-10-26 11:42:03 CEST; 2 weeks 4 days ago
     Main PID: 1799 (FOGPingHosts)
       CGroup: /system.slice/FOGPingHosts.service
               ├─1799 /usr/bin/php -q /opt/fog/service/FOGPingHosts/FOGPingHosts &
               └─1824 /usr/bin/php -q /opt/fog/service/FOGPingHosts/FOGPingHosts &
    
    Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
    

    The clients domain name are differents: hostname.lgp.ehu.es, hostname.lg.ehu.es, hostname.ll.ehu.es, hostname.ehu.es, …

    Any ideas?
    How works FOG to detect the client status?


  • Developer

    Hi again,

    Only for give more information about this in a huge scenario. In our case the FOGPingHosts service give 4 hours and 30 minutes, more or less, to do ping to all clients.

    Capture of the traffic in one client:

    18:10:43.216290 IP fog7.lgp.ehu.es.39006 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 3515736006, win 29200, options [mss 1460,sackOK,TS val 2691861063 ecr 0,nop,wscale 7], length 0
    18:10:44.216894 IP fog7.lgp.ehu.es.39006 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 3515736006, win 29200, options [mss 1460,sackOK,TS val 2691862064 ecr 0,nop,wscale 7], length 0
    22:46:19.552188 IP fog7.lgp.ehu.es.47968 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 14809462, win 29200, options [mss 1460,sackOK,TS val 2708397399 ecr 0,nop,wscale 7], length 0
    22:46:20.552909 IP fog7.lgp.ehu.es.47968 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 14809462, win 29200, options [mss 1460,sackOK,TS val 2708398400 ecr 0,nop,wscale 7], length 0
    03:19:37.718318 IP fog7.lgp.ehu.es.51734 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 3380421067, win 29200, options [mss 1460,sackOK,TS val 2724795565 ecr 0,nop,wscale 7], length 0
    03:19:38.720889 IP fog7.lgp.ehu.es.51734 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 3380421067, win 29200, options [mss 1460,sackOK,TS val 2724796568 ecr 0,nop,wscale 7], length 0
    07:52:47.623417 IP fog7.lgp.ehu.es.57902 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 1502300674, win 29200, options [mss 1460,sackOK,TS val 2741185470 ecr 0,nop,wscale 7], length 0
    07:52:48.624903 IP fog7.lgp.ehu.es.57902 > u032668.lgp.ehu.es.microsoft-ds: Flags [S], seq 1502300674, win 29200, options [mss 1460,sackOK,TS val 2741186472 ecr 0,nop,wscale 7], length 0
    

  • Developer

    I think that I will disable it because I have done a tcpdump to see the traffic in the port 445 and the performance is not good. We are very far of 10 hosts per second. 1 hosts per second :(

    # tcpdump port 445 -i ens192
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
    13:21:53.447925 IP fog7.lgp.ehu.es.40826 > u030011.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 2649384977, win 29200, options [mss 1460,sackOK,TS val 2588131295 ecr 0,nop,wscale 7], length 0
    13:21:54.448890 IP fog7.lgp.ehu.es.40826 > u030011.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 2649384977, win 29200, options [mss 1460,sackOK,TS val 2588132296 ecr 0,nop,wscale 7], length 0
    13:21:55.591994 IP fog7.lgp.ehu.es.48832 > u030012.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 3272040579, win 29200, options [mss 1460,sackOK,TS val 2588133438 ecr 0,nop,wscale 7], length 0
    13:21:56.592905 IP fog7.lgp.ehu.es.48832 > u030012.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 3272040579, win 29200, options [mss 1460,sackOK,TS val 2588134440 ecr 0,nop,wscale 7], length 0
    13:21:57.724226 IP fog7.lgp.ehu.es.57840 > u030013.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 1809232999, win 29200, options [mss 1460,sackOK,TS val 2588135571 ecr 0,nop,wscale 7], length 0
    13:21:58.724890 IP fog7.lgp.ehu.es.57840 > u030013.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 1809232999, win 29200, options [mss 1460,sackOK,TS val 2588136572 ecr 0,nop,wscale 7], length 0
    13:21:59.797692 IP fog7.lgp.ehu.es.39344 > u030014.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 969204538, win 29200, options [mss 1460,sackOK,TS val 2588137644 ecr 0,nop,wscale 7], length 0
    13:22:00.798892 IP fog7.lgp.ehu.es.39344 > u030014.xa.bn.ehu.es.microsoft-ds: Flags [S], seq 969204538, win 29200, options [mss 1460,sackOK,TS val 2588138646 ecr 0,nop,wscale 7], length 0
    

    I wiil wait to 1.6 version and see :)


  • Senior Developer

    @Fernando-Gietz The reason we moved to a service method is because it blocked things from the main page. Say, for example, you have a list of 100 hosts, the Ping that was used before had called out to 4 hosts at a time, in succession. Doing this would seem efficient as long as you stayed on the page long enough for all 100 to update, but if you did not wait, when you clicked on a host from the page it would wait until the prior attempts to retrieve information would return. This could be 1 second or 30 seconds before you could go where you needed to go. Using the service method allows continuous checking without blocking the GUI. I understand it’s not exactly convenient but it works. Using the service it could actually ping around 10 hosts per second, on average. As this item is not a functionally important part of FOG, it makes more sense, too me, to have it as a service. Even if it’s not fully real time, it’s still a fairly good indicator. Maybe I can do the pinging methods a bit faster in the future and make it do so non-blocking even on the service side.

    That said, if you want to make it ajax dependent again, it’s not too hard to implement, just remember what it means from a GUI responsiveness point of view. Mind you, this will be much better with 1.6 as we have pagination, so at any time it only really needs to do the ping for the hosts present on the current page.


  • Developer

    My sugestion :)

    Don’t use a service to check the status. Is not very useful and eficient. If you want to know the status of one client, you want to know at the moment and not the status of 5 minutes ago.

    The old version way was more eficient: after the search and a little timeout, the js script did a ping only over the found clients. Can not call to the Ping class method execute after the search?

    pinghosts.class.php

    foreach ((array)$hostids as $index => &$hostid) {
                    if (false === array_key_exists($index, $hostips)
                        || false === array_key_exists($index, $hostnames)
                    ) {
                        continue;
                    }
                    $ip = $hostips[$index];
                    if (filter_var($ip, FILTER_VALIDATE_IP) === false) {
                        $ip = self::resolveHostname($hostnames[$index]);
                    }
                    if (filter_var($ip, FILTER_VALIDATE_IP) === false) {
                        $ip = $hostnames[$index];
                    }
                    unset($hostnames[$index], $hostips[$index]);
                    $ping = self::getClass('Ping', $ip)
                        ->execute();
                    self::getClass('HostManager')
                        ->update(
                            array('id' => $hostid),
                            '',
                            array('pingstatus' => $ping)
                        );
    


  • @Tom-Elliott said in FOG doesn't detect the status of the clients:

    FOG Uses port 445 to detect the status of the client machines. This is usually UDP, though I think opening UDP and TCP would help things out.

    We use this port as it can give a more direct status than a simple ICMP request.

    Hopefully this help.s

    #wiki worthy


  • Developer

    @Fernando-Gietz said in FOG doesn't detect the status of the clients:

    Maybe in my case or for huge environments his service is not very useful.

    Probably a good point. But I can’t think of a different check that would work through such a load of clients as you’ll always need to wait for timeouts in such an environment. Maybe it’s just wise to disable this service in your case and leave it to that.

    Let us know if you have a great idea on how to implement a faster or on demand check. Would the later one be really useful?


  • Developer

    Hi Tom,

    And how many time takes FOGPingHosts to check the status of 8192 hosts? If FOGPingHost service checks all the hosts and takes 1 second per hosts (for example, then 8192 seconds=136 minutes … Maybe in my case or for huge environments his service is not very useful.

    Maybe if the status is checked using other way, for example in the old version (0.30 and early version) the status was checked when you made a search, on demand, will be more useful and practice.


  • Senior Developer

    @Fernando-Gietz it’s a blocking type system, 5 minutes after the last item is updated is when it runs. It’s not arbitrarily updating every 5 minutes.


  • Developer

    Umm … Maybe the problem is the number of hosts, if FOGPingHosts service checks all the hosts that are in the hosts table. We have 8192 hosts in this table.


  • Senior Developer

    @Fernando-Gietz it’s using the fogpinghosts service which is run every 5 minutes I believe.


  • Developer

    Hi Tom,

    Thanks for the info.

    I have disabled the firewall in one client and now the icon appears in green :) but if I shutdown the client the icon appears in green too :(

    Who detects the status? the FOGPingHosts service? This status is detected on-fly, on demand?


  • Senior Developer

    FOG Uses port 445 to detect the status of the client machines. This is usually UDP, though I think opening UDP and TCP would help things out.

    We use this port as it can give a more direct status than a simple ICMP request.

    Hopefully this help.s


Log in to reply
 

373
Online

6.0k
Users

13.3k
Topics

125.1k
Posts