Dual NIC clients



  • Hi,

    We use dual NIC client boxes, each on different VLANS. Machines will PXE boot but the screens go blank during that process. In earlier versions (0.28, 0.29 and 0.32) we’d get a TFTP time out.

    The quick and dirty fix is to kill the relevant switch ports on the LAN not used for FOG while imaging.

    Since it works fine with only one active NIC, it would seem that the pxe booted kernel gets confused about which interface to use for TFPT. I tried changing TFTP_ADDRESS=“0.0.0.0:69”; in /etc/default/tftpd-hpa to point at the server IP, but that didn’t help. I also tried adding option tftp-server-name X.X.X.X; to my dhcpd config file, but no luck.

    Any ideas?

    Thanks.

    Server:
    FOG 1.2 (pretty much standard setup except for bonded NICs)
    Ubuntu 10.04.4 LTS

    Clients:
    HP z620
    Lenovo ThinkStation S10


  • Moderator

    @tag correct. No getting around that.



  • @Wayne-Workman

    That would work nicely, seeing you can use different inits. I didn’t know that as it is not possible in 1.2.0.

    The caveat is that I would have to redo the server as trunk requires a newer Ubuntu according to @Quazz.


  • Moderator

    I’ve been working on this.

    This is my first go-round with a custom init so I’m asking that @Sebastian-Roth and @Tom-Elliott and @george1421 to take a look, too.

    I’ve not tested, as I don’t readily have available a machine with multiple interfaces, but I think I’ve got a universal init that you can pass a custom kernel argument to, which will ensure the correct interface is up, and others are down. So far, I’ve coded for three possible interfaces. I already had many of these functions already written in another project I’ve been working on.

    in the init, I’ve edited the file /usr/share/fog/lib/funcs.sh to include these functions:

    cidr2mask() {
            #Expects CIDR notation (a single integer between 0 and 32)
            local i=""
            local mask=""
            local full_octets=$(($1/8))
            local partial_octet=$(($1%8))
            for ((i=0;i<4;i+=1)); do
            if [[ $i -lt $full_octets ]]; then
                    mask+=255
            elif [[ $i -eq $full_octets ]]; then
                    mask+=$((256 - 2**(8-$partial_octet)))
            else
                    mask+=0
            fi
                    test $i -lt 3 && mask+=.
            done
            echo $mask
    }
    
    getCidr() {
            #Expects an interface name to be passed.
            local cidr
            cidr=$(ip -f inet -o addr | grep $1 | awk -F'[ /]+' '/global/ {print $5}' | head -n2 | tail -n1)
            echo $cidr
    }
    mask2network() {
            #Expects IP address passed 1st, and Subnet Mask passed 2nd.
            OIFS=$IFS
            IFS='.'
            read -r i1 i2 i3 i4 <<< "$1"
            read -r m1 m2 m3 m4 <<< "$2"
            IFS=$OIFS
            printf "%d.%d.%d.%d\n"  "$((i1 & m1))" "$((i2 & m2))" "$((i3 & m3))" "$((i4 & m4))"
    }
    GetInterfaceInfo() {
    	DIR="/"
    
    	ip link show > $DIR/interfaces.txt
    
    	interface1name="$(sed -n '3p' $DIR/interfaces.txt)"
    	interface2name="$(sed -n '5p' $DIR/interfaces.txt)"
    	interface3name="$(sed -n '7p' $DIR/interfaces.txt)"
    
    
    	rm -f $DIR/interfaces.txt
    
    	echo $interface1name | cut -d \: -f2 | cut -c2- > $DIR/interface1name.txt
    	echo $interface2name | cut -d \: -f2 | cut -c2- > $DIR/interface2name.txt
    	echo $interface3name | cut -d \: -f2 | cut -c2- > $DIR/interface3name.txt	
    
    	interface1name="$(cat $DIR/interface1name.txt)"
    	interface2name="$(cat $DIR/interface2name.txt)"
    	interface2name="$(cat $DIR/interface2name.txt)"
    
    	rm -f $DIR/interface1name.txt
    	rm -f $DIR/interface2name.txt
    	rm -f $DIR/interface3name.txt
    
    	#Bring up interfaces.
    
    	echo “iface $interface1name inet dhcp” >>/etc/network/interfaces
    	echo “iface $interface2name inet dhcp” >>/etc/network/interfaces
    	echo “iface $interface3name inet dhcp” >>/etc/network/interfaces
    
    
    
    	ip link set $interface1name up
    	ip link set $interface2name up
    	ip link set $interface3name up
    	sleep 4
    
    
    
    
    	interface1ip="$(/sbin/ip addr show | grep $interface1name | grep -o "inet [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*")"
    
    	interface2ip="$(/sbin/ip addr show | grep $interface2name | grep -o "inet [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*")"
    
    	interface3ip="$(/sbin/ip addr show | grep $interface3name | grep -o "inet [0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" | grep -o "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*")"
    
    
    	if [[ -z $interface1ip ]]; then
    		interface1ip=127.0.0.1
    	fi
    
    	if [[ -z $interface2ip ]]; then 
    		interface2ip=127.0.0.1
    	fi
    
    	if [[ -z $interface3ip ]]; then 
                    interface3ip=127.0.0.1
            fi
    	
    	interface1network=$(mask2network $interface1ip $(cidr2mask $(getCidr $interface1name)))
    	interface2network=$(mask2network $interface2ip $(cidr2mask $(getCidr $interface2name)))
    	interface3network=$(mask2network $interface3ip $(cidr2mask $(getCidr $interface3name)))
    
    
    
    
    
    }
    
    setCorrectInterface() {
    for arg in $(cat /proc/cmdline) do  
        echo $arg | grep -q USE_NETWORK
        if [ $? == 0 ] then      
            val=$(echo $arg | cut -d= -f2)
            desiredNetwork=$val
        fi
    done
    
    GetInterfaceInfo
    
    if [[ $interface1network==$desiredNetwork ]] then
            ip link set $interface1name up
            ip link set $interface2name down
            ip link set $interface3name down
    elif [[ $interface2network==$desiredNetwork ]] then
    	ip link set $interface1name down
           	ip link set $interface2name up
            ip link set $interface3name down
    else [[ $interface3network==$desiredNetwork ]] then
    	ip link set $interface1name down
            ip link set $interface2name down
            ip link set $interface3name up
    fi
    
    
    
    }
    

    And I’ve added a call to the main function in the main fog script file /bin/fog between the usb part and the task calling, around line 10 like this:

    #!/bin/bash
    . /usr/share/fog/lib/funcs.sh
    ### If USB Boot device we need a way to get the kernel args properly
    if [[ $boottype == usb && ! -z $web ]]; then
        mac=$(getMACAddresses)
        wget -q -O /tmp/hinfo.txt "http://${web}service/hostinfo.php?mac=$mac"
        [[ -f /tmp/hinfo.txt ]] && . /tmp/hinfo.txt
    fi
    
    setCorrectInterface
    
    if [[ -n $mode && $mode != +(*debug*) ]]; then
        case $mode in
            wipe)
                fog.wipe
                ;;
            checkdisk)
                fog.testdisk
                ;;
            photorec)
                fog.photorec
                ;;
            badblocks)
                fog.surfacetest
                ;;
            clamav)
                fog.av
                ;;
            autoreg)
                fog.auto.reg
                ;;
            manreg)
                fog.man.reg
                ;;
            inventory)
                fog.inventory
                ;;
            capone)
                fog.capone
                ;;
            winpassreset)
                fog.chntpw
                ;;
            quickimage)
                fog.quickimage
                ;;
            sysinfo)
                fog.sysinfo
                ;;
            "donate.full")
                fog.donatefull
                ;;
            *)
                handleError "Fatal Error: Unknown mode :: $mode ($0)\n   Args Passed: $*"
                ;;
        esac
    else
        case $type in
            down)
                fog.download
                ;;
            up)
                fog.upload
                ;;
            *)
                [[ -z $type ]] && type="Null"
                handleError "Fatal Error: Unknown request type :: $type"
                ;;
        esac
    fi
    

    With modifications to the init like this (and using fog trunk), You’d simply specify this custom init, and pass the kernel argument for the network you want to use. For example:

    0_1467251975872_USE_NETWORK.png


  • Moderator

    @tag Well the way I write it, it’ll work with however many NICs a system has. We will need to use the new feature that @Tom-Elliott so kindly implemented maybe a month ago, the host’s Host Init field. Basically we will build an init for each of your subnets, then use groups to assign the right inits to the right computers - so that those computers in those subnets use the correct interface.

    Sounds like a lot - but I really don’t think it is. I think this is going to be very easy.



  • @Wayne-Workman
    Thanks again.

    Yes, that code will only work on the specific network defined in $nwid and if the kernel names the interfaces ethX and probably only if the number of interfaces match that particular piece of hw…

    Mighty nice of you to help me out here… Appreciate it.

    Thanks.


  • Moderator

    @tag You’ve got the right idea - but that specific code is inflexible.

    Tonight I’ll put something together that will take the IPs, and the subnet mask, and calculate the subnet ID and use that for comparison.



  • @Wayne-Workman

    Thanks for the reply.

    Seems kind of inflexible, though… The same init is used for all, right? We even have some clients with three NICs at other locations… If it has to take various hw scenarios into account, it might take some fancy scripting.

    I know some basic scripting but nothing really fancy.

    The only way to determine the correct interface would be to filter on IP, as I see it.

    So maybe a list of interfaces and then for each ethX in the list:

    #!/bin/csh
    
    set nwid = X.X.X
    set list = (eth0 eth1)
    
    foreach eth ($list)
      set ip = `ifconfig $eth | grep inet | awk '{print $2}' | sed 's/addr://' | cut -c-10 `
      if ($ip == $nwid) then
          ifup $eth else
          ifdown $eth
      endif
    end
    

    That would in my case get the network ID of the correct network and other disimilar outputs from the other interfaces in the list which could then be compared to a set value of the correct network ID. Based on that comparison you could then turn on or off the interfaces.

    I’m sure someone else could do something a lot niftier.

    I haven’t tested any of this and it might screw up if the number of interfaces actually present is different from the number in the list.


  • Moderator

    @tag I think you’re going to have to build a custom init. You can change the fog.upload and fog.download scripts. The idea would be to use shell script to determine which interface is on the right network, and then disable the other interface (or enable it). It should be pretty simple.

    How experienced are you with shell scripting?

    Also, here’s a link on how to unpack and re-pack the inits:
    https://wiki.fogproject.org/wiki/index.php?title=Build_FOG_file_system_with_BuildRoot

    I’m willing to help do this - but I wouldn’t have time until tonight to mess with it.



  • @Quazz and @Wayne-Workman

    Thanks for the suggestions.

    I tried playing around with grcan.enable0=[0|1] and grcan.enable1=[0|1] as well as grcan.select=[0|1] but none had any effect. The kernel continues to choose the slower link in most cases. Seemed promising, though…

    What I do notice for the first time, though, is that whatever NIC is not chosen is disabled. I hadn’t noticed as I can’t see the backs of the boxes very well. Here it is also obvious that the active NIC changes on occasion, as the LEDs die on the disabled NIC.


  • Moderator

    @Quazz very nice. I read it’s description, but the one below it caught my eye:

    grcan.select= [HW] Select which physical interface to use. Format: 0 | 1 Default: 0

    So perhaps try this for the host’s kernel arguments (web gui -> host management -> desired host -> kernel arguments)

    grcan.select=1

    See what happens?


  • Moderator

    @Wayne-Workman The problem is, what arguments can we use to differentiate? It sounds like they’re basically identical NICs connecting to different network outlets, probably getting inconsistent names as well. (based on his results)

    https://www.kernel.org/doc/Documentation/kernel-parameters.txt

    Scroll down to grcan.enable0

    Looks like those options are useful?


  • Moderator

    @Quazz do we know of any linux kernel arguments that specify only using a particular nic, or disabling a particular nic?

    Maybe even a custom kernel could be an answer? Or custom init?


  • Moderator

    @tag said in Dual NIC clients:

    If i understand you correctly, your suggestion of trunking would enable the client to connect to the TFTP server on either link?

    No, the developmental version of fog is called “fog trunk”.



  • @Quazz
    No, they are registered and deployed through tasks. Actually it’s not a question of one NIC being faster than the other - they’re more or less identical mobo dual NICs - as it is the link speed. They’re 100Mbps switch ports on a 100Mbps trunk to the layer 3 switch. It was never intended for large data transfers - just remote access and so on.

    The primary MAC in the host registration is the faster link, so that has no effect, I’m afraid. These are 1Gbps switch ports for the clients and two ports in ether channel for the server.


  • Moderator

    @tag Are you using quick image?

    Might be possible to tell it to use the faster NIC by registering them, assigning the faster NIC as primary NIC and deploying in that manner, but I really can’t be certain on that, I was kind of hoping someone else would chime on in on this, heh.



  • @Quazz
    Yes, sometimes an image will deploy at 5GB/min - a few minutes later when trying again with the same client it will only deploy at a fraction of that speed…

    Otherwise I agree; I too believe it has to do with the order of the NICs.

    I tried swapping the cables at first to see if it just chose one specific hardware ID first, but that was not the case.

    Since then I’ve done quite a few test deployments on the same eigth machines. Mostly they’ll deploy slowly, but every now and again one will run on the faster NIC, which can be verified by pulling the cables and seeing which one makes it pause.

    I’m not saying it’s arbitrary, but I don’t see a pattern so it seems arbitrary to me. ;)


  • Moderator

    @tag Is it arbitrarily? If you try the same PC over and over does it report different speeds?

    I think it has to do with the order your system reports the NICs in. If it reports the slower NIC first it will use that. Perhaps that’s something you can alter on your end?



  • OK - did some thinking. It times out because there is no default gateway set on the secondary link. Setting that it will connect. The problem is, how do I know which network it chooses? I’m getting inconsistent transfer speeds now, average of 5GB/min versus 225MB/min - apparently depending on which NIC it connects through.

    I should mention that inter-VLAN routing is enabled on the layer 3 switch of the primary network. Removing the secondary network from the static route list or pulling the physical link kills it again - this time at trying to send an inventory before deploying.

    If I pull the power on the secondary network they will all deploy at high speeds.
    With the secondary network on (and inter-VLAN routing), some will deploy normally, others slowly - apparently arbitrarily, as the same machines will act differently from task to task.
    It would seem the kernel arbitrarily sets which NIC is eth0 from boot to boot? That would perhaps explain why it would appear to use different NICs.

    If I pull the plug on the secondary network while deploying it stops deploying until plugged back in. So it’s using the “wrong” NIC…

    Anyone ever see something like this?



  • @Quazz said in Dual NIC clients:

    Sorry, it’s still morning, things got mixed up in my head.

    LOL, tank up on your morning stimulant of choice…


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.