Proliant ML110G7
-
Hello again,
I actually done both (WAN and Fog) in past, but never done on mirror way, this is good idea.
I use Meraki location, since has a capture tool, it’s very good.
On the WAN side, I capture all traffic for host DHCP IP and on Fog Server the ports mentioned.It actually loop for while and can take sometime to finally give up on loop and time-out, so I set it as 180 seconds on both ends.
There was once took 30 mins to finally give up.
The screenshot will show you … like not loading.
Usual behavior, is the use graphical representation of pipes and slashes while loading PXE bootfile, instead of …Attached screenshot from iLO that gives me the view / control on the client:
I attached both captures.
Fog Server Capture
WAN Side / ServerI’m really curious to see, if you can discover the reason on these specific servers, even if I can’t solve it remotely for some odd reason.
I’m not used to not get some the answers, I always like to find why something doesn’t work like how it should work.
Thanks for everything.
PS - Edit uploading on forums doesn’t seem to be working for me, throwing “Something went wrong while parsing server response”, hold let me fix it.
-
@NTex I really need to see the pcap from the witness computer at the remote site. The whole pcap and not just screen shots. We need to identify what the server is presenting itself as. (dhcp option 94 [I think]) and who the actors are in the dhcp OFFER. The 4 packet DHCP sequence is important as well as what the target computer does after it gets the ACK from the dhcp server.
-
@george1421
Fixed the previous post, I was having issues uploading files directly on forums.Attached screenshot from iLO that gives me the view / control on the client:
I attached both captures.
Fog Server Capture
WAN Side / Server -
@NTex The WAN side doesn’t contain the details I need. I don’t know if your capture filter is set wrong or the witness computer’s network interface isn’t on the same subnet as the pxe booting computer. Its good we see the tftp packets in it but we are missing the DHCP packets. What I’m expecting to see is a DHCP DISCOVER packet from the target computer, one or more OFFER packets from your dhcp server(s), a REQUEST packet from the pxe computer and then an ACK from the dhcp server. The first two packets are the telling ones.
From the screen shot it appears that its timing out on the tftp call, but we are seeing tftp requests to the fog server. I can also tell the client is in bios mode. So it should be requesting undionly.kpxe from the fog server.
-
@NTex Looking at the fog server pcap it appears normal too. I see the client asking for the size (tsize) of the file then it requests the file. The issue appears that the FOG server doesn’t send the file so it waits 30 seconds and requests it again and so on. This makes me think that the dhcp process is fine, but for some reason the tftp server is not sending the requested file. It can’t be ignoring the request because it must answer the tsize request because the client then sends the request for the file.
-
Yes, you’re right I start capturing before the actual bootp.
Problem was using capture on Appliance.This capture was on Switch port where the actual server is connected, so you will see a lot more traffic.
iLO IP is .2 and gateway .254.
See if this has what you want I filtered to dhcp I saw option 594 or something.
Thanks
-
@NTex OK now taking a step back and looking at the WAN side at the tftp protocol, its working as designed. Not how you want or working completely, but it is working. So we can discount everything up to the tftp file transfer. Because everything before is working.
Now it looks like the file transfer is not complete. There are not enough packets in the transfer to contain all of undionly.kpxe. I see the fog server sending block 0 and then the client sending an ACK for block 0 and then the fog server sends block 1 but the client never ACKs block one. The FOG server tries to resend block 1 several times and then stops. The client then waits 30 seconds and requests the file all over again. The cycle continues until the client gives up.
So the next test. Is it the FOG server (doubt), pxe client, or the network causing the pain. So from a windows computer install the tftp client program. Drop the windows firewall and use the tftp cleint on the windows computer to call undionly.kpxe to the remote site on the same subnet as the pxe booting computer. Do the same wan packet capture as the first time and lets see what we get.
-
So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ? -
@NTex said in Proliant ML110G7:
Yes, you’re right I start capturing before the actual bootp.
OK this second pcap contains more data. FYI, if you enter a display filter of
bootp
you can see the dhcp process andtftp
you can see the tftp process. The DHCP process looks textbook normal (but I kind of guessed that from the last pcap). But the target computer IS stating that its a BIOS mode computer. I just wanted to make sure the client was doing one thing and the network doing something else.From this pcap we can see the tsize of the file is 99002, so at 1456 bytes per packet it should take 68 blocks to transfer undionly,kpxe to the remote computer. Right now its only ACKing 1 block.
I don’t know why but something is telling me MTU and if the MTU is below 1456 it could be fragmenting the packet causing this problem, but why??
-
@NTex said in Proliant ML110G7:
So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ?yes as long as you transport undionly.kpxe and ipxe.efi to the remote site for tftp64 that will work… oh wait undionly.kpxe will again send out a dhcp request to find what it thinks is fog server listed as the next server (dhcp option 66) in this case it will point to the windows server again and not the FOG server. I’d have to look but I think I can create a one off version of those files that will only reference your FOG server.
Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.
-
Yes, I noticed the MTU is smaller on this location, so gets 106 bytes on 2nd window.
These WAN links are all Fiber 20 mbps, minimum.
Might be due to VPN, using part of MTU though.My thoughts were always towards to I wonder if it’s actually the card firmware might be bogus and doesn’t load the bootfile, but is the same version for the working servers.
And like I said on initial post, if I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
-
@george1421 said in Proliant ML110G7:
Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.
Yes, that’s the IP.
-
@NTex said in Proliant ML110G7:
f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.
-
@NTex said in Proliant ML110G7:
Yes, that’s the IP.
OK let me remote into the office and see if my dev box is still powered on. I had to do something similar not to long ago so that project still should be setup.
-
@george1421 said in Proliant ML110G7:
@NTex said in Proliant ML110G7:
f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.
Yes, there is a difference between client and PXE.
I checked HPE all these servers have the latest NIC firmware.
I mean these servers are pretty old!They release packages to patch on Linux, so I’ve done all that in the past.
-
@NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing
Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.
Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.
-
@george1421 said in Proliant ML110G7:
@NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing
Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.
Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.
Yes, you’re right
While you were compiling your project, I did this:
Copied the portable tftp64.
Then I copied ALL files from Fog Server located at /tftpboot.I saw the boot file being loaded, immediately
I captured the event using local tftpd nevertheless, if you want to look at it
Capture using local tftpdOnce Fog Menu loaded, I selected my “Install CentOS” option and it’s loading:
Still I download your special version, might be useful in future ?
I’m going to try now on server that I know it worked before to see if we see the MTU fragmentation to prove, if this was the root-cause.
-
@NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog
I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.
-
@george1421 said in Proliant ML110G7:
@NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog
I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.
So it might be the actual MTU and fragmentation, probably just happens for this old NIC and on these locations, who knows.
Come to think about it, theses sites are kind located more on country side, far from big cities, where usually ISP have more issues like this due to distance / infrastructure, etc.
Working Server, one of those I didn’t had issues, capture file
Has no fragmentation, right ?I mean you see loading it fine here:
I think (at least I) learned something, MTU can cause issues like this.
I wish I would had this idea sooner, using another workstation with portable TFTP Server while keeping the same DHCP, just had to change Option 66 to point to the Workstation.
I actually copied ALL the PXE files from our Fog.I can use this workaround for 4 locations, and saved us couple thousand miles of driving and replace the servers physically, at least for now.
Nevertheless, I will keep your special version that you compiled for me.
Brainstorming this puzzle with you was a pleasure, thanks for all the help you gave and support, truly awesome. @george1421
-
@NTex The pcap in the last one. On the dhcp side it was textbook perfect. On the tfp side I did see occasional block retrans but overall for a wan connection its acceptable.
From the MTU/fragmentation side, you have to remember a few things.
- tftp use udp protocol which is not very forgiving to dropped or lost packets.
- The PXE rom implementation originally didn’t support pxe booting across subnets. That was added later on. Because of the minimal size of the PXE ROM they made certain assumption about the transfer and eliminated code from the drivers that might have been though unnecessary at the time. Later version of the PXE rom had more space and are much more tolerant to communication problems.
Happy FOGGING.