Using dumb switch to troubleshoot imaging, network related

Mastriani

Our district has been using Fog since .29/Ubuntu 10.4

It has been the most formidable tool in cutting time costs, and saving the minds of most of the very small IT team. Many thanks to the venerable Fog team and community.

3 years ago we had a one-sided network overhaul, new switches through all schools, addition of wireless to all schools. Extreme Networks/Enterasys switches. I despise this equipment. Fog has never operated correctly since the new hardware install.

Currently on 1.0.2/Ubuntu 14.04 LTS, no DHCP from Fog server, handled by Win AD/DC, newer server because certain members of the team thought the server was the issue. Not a chance.

Issues, definitely network hardware related: Capture, minimum 3.5 to 5.5 hours.
Download: maximum of 4 machines, 45m - 1 hr/per. Anything over 4 machines, imaging hangs/fails/sits in queue indefinitely.

Need to test on a dumb switch, haven’t been able to get it to work, client requests DHCP, which isn’t available, why am I a moron, and what do I need to change/correct to get this done?

Many thanks to all members of the Fog team and community for effort and expertise nearly unmatched.

Mastriani

@Wayne-Workman said in Using dumb switch to troubleshoot imaging, network related:

This was due to a known issue with the onboard NIC chipset the server had.

@Mastriani could you post the chipset model please? I forgot what it was.

Both of the Broadcom NIC’s used in Dell servers are an issue, what is now known thanks to your workup on my server Mr. Workman.

Broadcom NetXtreme 5721 Single Port Gigabit Ethernet NIC, is what is in mine.

Hands down, Fog, it’s development team and support professionals, are inarguably in a class of their own. My sincerest thanks and professional appreciation to all, Mr. Elliott, and a note of exceptional knowledge and top class support and effort to Mr. Workman.

There is in my estimation, none better. Were it that others in the IT community would take lessons from your organization and team. Well done.

Tom Elliott

You might try using one of the RC versions of FOG? I say this because while it may not fix the networking issues, we did have quite a breakthrough in the speeds to deploy an image (that’s not network speed limited). While it won’t make the network transfer any faster, it should make deploying to a system be a little faster as we’re not limiting the decompression to a single core.

Using dnsmasq (aka proxyDhcp) might work better to redirect the traffic coming from a dumb switch too.

Wayne Workman

I’m willing to help you figure out what’s up with the network. install iperf and ethtool and grab a laptop with gig network interface and boot up a Ubuntu live disk, and install the same things again, and we will get to testing where the bottleneck is.

Also, for your testing on the dumb switch, you could temporarily of course install FOG with DHCP (using FOG 1.3.0 RC as Tom suggested). You can very easily turn off/on the DHCP service via CLI simply by: service dhcpd stop and service dhcpd start

george1421

While I agree with Tom you should upgrade. There does sounds like something not right with your network. For example I can push a 25GB host image from the FOG server to a target computer in a little over 4 minutes. With a partclone measured throughput of 6.5GB/min

Understand that the work load on the FOG server and network is very light (other than just copying the image), all of the heavy CPU usage is on the target computer to compress and send or receive and decompress the image. This is where all of the real work is done. If you have a very slow computer it WILL take a very long time to image.

Is all of your network GbE speeds?

You have confirmed from the switch management page that the FOG server is connected via one or more GbE links?

Do you have any other speed related issues on your network?

If it was me and I could not purchase any new hardware to test, I would start out with a simple test. Move a target computer and plug it into the same switch the FOG server is plugged into. Start there. Test to see what your transfer rates are? If its not good there then you can focus on the switch settings or the FOG server. If its good on the switch connected to the FOG server then move your target computer to the next switch away from the switch plugged into the FOG server. Keep doing this until your transfer rate becomes bad. Then focus in that area for your problem.

Mastriani

@george1421

According to just theoretical throughput of the Enterasys switch, yes, all are listed as GbE.

Yes, I have even moved ports and rechecked, no issue with the FOG server connecting. The server itself is Dell PE 2950 / Dual Quad core / 16 Gb RAM /Broadcom Extreme GbE NIC.

Yes, there are some other issues. The DNS retardation was cleared up yesterday, I retested this morning; no change in performance outcome. A major issue, (in my thinking at least), is the interface for these switches. Correct me if I am wrong, but FOG utilizes UDP heavily, and as of yet I can find no utility for looking into the protocol stack settings for this hardware.

Our ISP/Support is ENA, and they have done all they can, even extended past their “contractual” level support, but cannot pinpoint the issue as they can only look inside individual LAN’s on the WAN, not down to device level. There is, from my observations, some issue that is causing chronic IP conflict, no rogue DHCP/DNS servers found, but the tech who installed the switch hardware didn’t seem particularly detail oriented, ie. yesterday I found that jumbo frames/MTU settings were different on every switch in both my buildings, now corrected.

It looks directly like a switching issue to me, because during the 3.5+ hour capture, you can watch the bandwidth continually degrade, typically starting between 2.4 - 4 Gb/s down to less than 100Mb/s, continual drop off.

george1421

@Mastriani I think then I would do as I mentioned. Put a target computer plugged into the same switch as the FOG server make sure both are on the same VLAN. Test deploy in this configuration.

The Dell should not have this problem, but I have seen some old computers not properly negotiate with the switch to setup GbE speeds. You should confirm from the switch management interface that the port plugged into the Dell is indeed connecting as GbE.

How big (in GB) is your image that is being deployed?

Mastriani

@Tom-Elliott

Mr. Elliot, you sir are a phenom.

I had tried 2 months ago to do exactly that, but the trunk version, at that time, errored out continually on pxe image boot. At that time, you and your team were working on a fix, but I reformatted the server back to the previously stated conditions.

Mastriani

@Wayne-Workman

Thank you sir, I will see if I have a laptop available and answer back when that is confirmed. The one I typically use is Win 10, and carries too much information/applications to be reformatted.

My Linux/Ubuntu capabilities are intermediate, I have never used a live disk or utilities for network troubleshooting, for my ignorance, I might require more information on how this handled.

Mastriani

@george1421

All machines that we use are Dell refurbs, ranging from OptiPlex 755 up to Dell 390 with Win 10.

All images at this point are Win 7, 35 - 55Gb. After researching through community commentary, I changed compression to 5, as most reported this setting maximized throughput in their experience. Previously tried compression 3 and 9, but in the overall, no observable change in capture time frame.

george1421

@Mastriani Well the 755 to 390s are not the most powerful systems but they should not take 30 minutes to image either.

When you run you test, please try to get an accurate time to deploy and the actual image size. That way we can calculate the true throughput. The number from part clone is a bit deceiving.

The compression value is a scale from 0 (no compression) to 9 (maximum compression). That also means:
0 = maximum file size on the FOG server and over the network but images faster since the client doesn’t need to decompress the image.
9 = smaller file size on the FOG server and network but images slower since the client must squeeze all of the air possible out of the image to capture and deploy it.

5 is a middle of the road between larger file and higher cpu requirements on the client.

Mastriani

@george1421

I have Fog servers at both buildings, suffering the same issues. I will connect as you specified first thing in the morning, and let the capture run.

Easy enough to get the data you require, I will post that whenever the test completes.

Greatly appreciate the input, loss of imaging capability has made this school year … frustrating and inefficient.

Tom Elliott

Just shedding light that udp, unless using multicast, is limited only to tftp traffic and even then is minimal at best

george1421

@Mastriani said in Using dumb switch to troubleshoot imaging, network related:

I have Fog servers at both buildings, suffering the same issues. I will connect as you specified first thing in the morning, and let the capture run.

This is an interesting puzzle here. You should not have this issue (which you know all ready). When I run into this puzzles I remember what one of my university professors told me when debugging an electrical circuit. “You have first find out where the problem isn’t to find out where the problem is” So you start where the problem should be and then work your way away from where the problem isn’t to where it is. So far his statement has worked for me well.

Mastriani

@Tom-Elliott said in Using dumb switch to troubleshoot imaging, network related:

Just shedding light that udp, unless using multicast, is limited only to tftp traffic and even then is minimal at best

Is there something else then, in the protocol stack, that should be looked at more closely? The switches currently have igmp snooping disabled, by factory default.

Mastriani

@george1421 said in Using dumb switch to troubleshoot imaging, network related:

This is an interesting puzzle here. You should not have this issue (which you know all ready). When I run into this puzzles I remember what one of my university professors told me when debugging an electrical circuit. “You have first find out where the problem isn’t to find out where the problem is” So you start where the problem should be and then work your way away from where the problem isn’t to where it is. So far his statement has worked for me well.

If you have a better approach, then I am all … well, reading glasses, as it were. Resolution is what is most important. Hopefully I can also make use of Mr. Workman’s talents as well, and grab a few new neuronal connections in the process.

george1421

@Mastriani said in Using dumb switch to troubleshoot imaging, network related:

Is there something else then, in the protocol stack, that should be looked at more closely? The switches currently have igmp snooping disabled, by factory default.

igmp snooping is only used if you are using multicasting. We are only talking about unicast image deployment, Right?

Mastriani

@george1421 said in Using dumb switch to troubleshoot imaging, network related:

igmp snooping is only used if you are using multicasting. We are only talking about unicast image deployment, Right?

Yes, unicast only, the IGMP comment was just for reference. I am attempting to be as inclusive as possible with anything that your team might deem relevant, or not and helping to refine the possible scope.

Wayne Workman

@Mastriani said in Using dumb switch to troubleshoot imaging, network related:

the one I typically use is Win 10, and carries too much information/applications to be reformatted.

There is no need to reformat. You can use a Live Linux disk, meaning you boot from a CD or USB drive, and run the OS completely from that, without ever changing the contents of your local HDD. See this for more information: https://www.ubuntu.com/download/desktop/try-ubuntu-before-you-install

Now a days I default to using Linux to solve any sort of problem that isn’t specifically a Windows problem, just because I’ve become so accustomed to the tools available to Linux. If you’re into high quality Information Technology / Computer Science software at a low price, Linux is the literal jackpot.

Mastriani

@Wayne-Workman said in Using dumb switch to troubleshoot imaging, network related:

There is no need to reformat. You can use a Live Linux disk, meaning you boot from a CD or USB drive, and run the OS completely from that, without ever changing the contents of your local HDD. See this for more information: https://www.ubuntu.com/download/desktop/try-ubuntu-before-you-install

Now a days I default to using Linux to solve any sort of problem that isn’t specifically a Windows problem, just because I’ve become so accustomed to the tools available to Linux. If you’re into high quality Information Technology / Computer Science software at a low price, Linux IS the literal jackpot.

Thank you Mr. Workman, understood. I will burn a disc in the morning. Because of my ignorance, how then do I “install” iperf and ethtool functionally, or is this all accomplished via virtual environment? Yes, my moron is showing currently.

Wayne Workman

@Mastriani Stop down-talking yourself.

apt-get install iperf ethtool -y

apt-get is Ubuntu’s preferred package manager. There is also apt and dpkg

install is one of many commands to do. There is also remove and others.

iperf and ethtool are package names.

-y means do whatever needs done to just install it and don’t ask for permission, this is my permission here.

Fedora 24’s package manager is dnf but the rest is the same. In CentOS 7, it’s yum. In Arch Linux, it’s pacman

Using dumb switch to troubleshoot imaging, network related

110

12.6k

17.5k

156.3k