Using dumb switch to troubleshoot imaging, network related



  • @Mastriani said in Using dumb switch to troubleshoot imaging, network related:

    the one I typically use is Win 10, and carries too much information/applications to be reformatted.

    There is no need to reformat. You can use a Live Linux disk, meaning you boot from a CD or USB drive, and run the OS completely from that, without ever changing the contents of your local HDD. See this for more information: https://www.ubuntu.com/download/desktop/try-ubuntu-before-you-install

    Now a days I default to using Linux to solve any sort of problem that isn’t specifically a Windows problem, just because I’ve become so accustomed to the tools available to Linux. If you’re into high quality Information Technology / Computer Science software at a low price, Linux is the literal jackpot.



  • @george1421 said in Using dumb switch to troubleshoot imaging, network related:

    igmp snooping is only used if you are using multicasting. We are only talking about unicast image deployment, Right?

    Yes, unicast only, the IGMP comment was just for reference. I am attempting to be as inclusive as possible with anything that your team might deem relevant, or not and helping to refine the possible scope.


  • Moderator

    @Mastriani said in Using dumb switch to troubleshoot imaging, network related:

    Is there something else then, in the protocol stack, that should be looked at more closely? The switches currently have igmp snooping disabled, by factory default.

    igmp snooping is only used if you are using multicasting. We are only talking about unicast image deployment, Right?



  • @george1421 said in Using dumb switch to troubleshoot imaging, network related:

    This is an interesting puzzle here. You should not have this issue (which you know all ready). When I run into this puzzles I remember what one of my university professors told me when debugging an electrical circuit. “You have first find out where the problem isn’t to find out where the problem is” So you start where the problem should be and then work your way away from where the problem isn’t to where it is. So far his statement has worked for me well.

    If you have a better approach, then I am all … well, reading glasses, as it were. Resolution is what is most important. Hopefully I can also make use of Mr. Workman’s talents as well, and grab a few new neuronal connections in the process.



  • @Tom-Elliott said in Using dumb switch to troubleshoot imaging, network related:

    Just shedding light that udp, unless using multicast, is limited only to tftp traffic and even then is minimal at best

    Is there something else then, in the protocol stack, that should be looked at more closely? The switches currently have igmp snooping disabled, by factory default.


  • Moderator

    @Mastriani said in Using dumb switch to troubleshoot imaging, network related:

    I have Fog servers at both buildings, suffering the same issues. I will connect as you specified first thing in the morning, and let the capture run.

    This is an interesting puzzle here. You should not have this issue (which you know all ready). When I run into this puzzles I remember what one of my university professors told me when debugging an electrical circuit. “You have first find out where the problem isn’t to find out where the problem is” So you start where the problem should be and then work your way away from where the problem isn’t to where it is. So far his statement has worked for me well.


  • Senior Developer

    Just shedding light that udp, unless using multicast, is limited only to tftp traffic and even then is minimal at best



  • @george1421

    I have Fog servers at both buildings, suffering the same issues. I will connect as you specified first thing in the morning, and let the capture run.

    Easy enough to get the data you require, I will post that whenever the test completes.

    Greatly appreciate the input, loss of imaging capability has made this school year … frustrating and inefficient.


  • Moderator

    @Mastriani Well the 755 to 390s are not the most powerful systems but they should not take 30 minutes to image either.

    When you run you test, please try to get an accurate time to deploy and the actual image size. That way we can calculate the true throughput. The number from part clone is a bit deceiving.

    The compression value is a scale from 0 (no compression) to 9 (maximum compression). That also means:
    0 = maximum file size on the FOG server and over the network but images faster since the client doesn’t need to decompress the image.
    9 = smaller file size on the FOG server and network but images slower since the client must squeeze all of the air possible out of the image to capture and deploy it.

    5 is a middle of the road between larger file and higher cpu requirements on the client.



  • @george1421

    All machines that we use are Dell refurbs, ranging from OptiPlex 755 up to Dell 390 with Win 10.

    All images at this point are Win 7, 35 - 55Gb. After researching through community commentary, I changed compression to 5, as most reported this setting maximized throughput in their experience. Previously tried compression 3 and 9, but in the overall, no observable change in capture time frame.



  • @Wayne-Workman

    Thank you sir, I will see if I have a laptop available and answer back when that is confirmed. The one I typically use is Win 10, and carries too much information/applications to be reformatted.

    My Linux/Ubuntu capabilities are intermediate, I have never used a live disk or utilities for network troubleshooting, for my ignorance, I might require more information on how this handled.



  • @Tom-Elliott

    Mr. Elliot, you sir are a phenom.

    I had tried 2 months ago to do exactly that, but the trunk version, at that time, errored out continually on pxe image boot. At that time, you and your team were working on a fix, but I reformatted the server back to the previously stated conditions.


  • Moderator

    @Mastriani I think then I would do as I mentioned. Put a target computer plugged into the same switch as the FOG server make sure both are on the same VLAN. Test deploy in this configuration.

    The Dell should not have this problem, but I have seen some old computers not properly negotiate with the switch to setup GbE speeds. You should confirm from the switch management interface that the port plugged into the Dell is indeed connecting as GbE.

    How big (in GB) is your image that is being deployed?



  • @george1421

    According to just theoretical throughput of the Enterasys switch, yes, all are listed as GbE.

    Yes, I have even moved ports and rechecked, no issue with the FOG server connecting. The server itself is Dell PE 2950 / Dual Quad core / 16 Gb RAM /Broadcom Extreme GbE NIC.

    Yes, there are some other issues. The DNS retardation was cleared up yesterday, I retested this morning; no change in performance outcome. A major issue, (in my thinking at least), is the interface for these switches. Correct me if I am wrong, but FOG utilizes UDP heavily, and as of yet I can find no utility for looking into the protocol stack settings for this hardware.

    Our ISP/Support is ENA, and they have done all they can, even extended past their “contractual” level support, but cannot pinpoint the issue as they can only look inside individual LAN’s on the WAN, not down to device level. There is, from my observations, some issue that is causing chronic IP conflict, no rogue DHCP/DNS servers found, but the tech who installed the switch hardware didn’t seem particularly detail oriented, ie. yesterday I found that jumbo frames/MTU settings were different on every switch in both my buildings, now corrected.

    It looks directly like a switching issue to me, because during the 3.5+ hour capture, you can watch the bandwidth continually degrade, typically starting between 2.4 - 4 Gb/s down to less than 100Mb/s, continual drop off.


  • Moderator

    While I agree with Tom you should upgrade. There does sounds like something not right with your network. For example I can push a 25GB host image from the FOG server to a target computer in a little over 4 minutes. With a partclone measured throughput of 6.5GB/min

    Understand that the work load on the FOG server and network is very light (other than just copying the image), all of the heavy CPU usage is on the target computer to compress and send or receive and decompress the image. This is where all of the real work is done. If you have a very slow computer it WILL take a very long time to image.

    Is all of your network GbE speeds?

    You have confirmed from the switch management page that the FOG server is connected via one or more GbE links?

    Do you have any other speed related issues on your network?

    If it was me and I could not purchase any new hardware to test, I would start out with a simple test. Move a target computer and plug it into the same switch the FOG server is plugged into. Start there. Test to see what your transfer rates are? If its not good there then you can focus on the switch settings or the FOG server. If its good on the switch connected to the FOG server then move your target computer to the next switch away from the switch plugged into the FOG server. Keep doing this until your transfer rate becomes bad. Then focus in that area for your problem.



  • I’m willing to help you figure out what’s up with the network. install iperf and ethtool and grab a laptop with gig network interface and boot up a Ubuntu live disk, and install the same things again, and we will get to testing where the bottleneck is.

    Also, for your testing on the dumb switch, you could temporarily of course install FOG with DHCP (using FOG 1.3.0 RC as Tom suggested). You can very easily turn off/on the DHCP service via CLI simply by: service dhcpd stop and service dhcpd start


  • Senior Developer

    You might try using one of the RC versions of FOG? I say this because while it may not fix the networking issues, we did have quite a breakthrough in the speeds to deploy an image (that’s not network speed limited). While it won’t make the network transfer any faster, it should make deploying to a system be a little faster as we’re not limiting the decompression to a single core.

    Using dnsmasq (aka proxyDhcp) might work better to redirect the traffic coming from a dumb switch too.


 

375
Online

41.8k
Users

12.3k
Topics

116.0k
Posts