Proliant ML110G7
-
Yes, you understood correctly.
Different locations, we did like around 200 migrations, with no issue whatsoever. These locations all have the same template.All other servers boots fine on this specific model and all other models, even some recent Lenovo, I have no issues.
I also tried to run tftp command from the actual server while inside the OS shell, it download the PXE file from our Fog server, no problem.
This happens only when we try to PXE boot those.
I have some Windows workstations on all these subnets I can use, only
One site I have MX Meraki those have a incorporated Wireshark as tool, which I can save it too.
Thanks
-
@NTex Ok what I would like to see is a pcap (packet capture) of the pxe booting process on these computers that are failing.
The witness computer needs to be on the same subnet as the failing server. Ideally if you can swing a mirror port that would give us the best quality of the pcap. If you can’t then on the same subnet will work. I need a pcap for this witness computer to use this wireshark capture filter
port 67 or port 68 or port 69
That will ensure we only get pxe booting packets and not any incidental packets with information you shouldn’t share.If you can’t use a mirror port then at the same time run the pcap defined in this tutorial on the FOG server. This will give us the tftp side of the booting process. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
Upload the pcap to this forum or to a file share site, share as public with the link. You can either post the link here or DM me the link using the FOG Forum chat function. I’ll take a look at the pcap and see if I can find why these are not booting.
It would also be helpful to grab a screen shot of the error on the server’s console when the boot fails.
-
Hello again,
I actually done both (WAN and Fog) in past, but never done on mirror way, this is good idea.
I use Meraki location, since has a capture tool, it’s very good.
On the WAN side, I capture all traffic for host DHCP IP and on Fog Server the ports mentioned.It actually loop for while and can take sometime to finally give up on loop and time-out, so I set it as 180 seconds on both ends.
There was once took 30 mins to finally give up.
The screenshot will show you … like not loading.
Usual behavior, is the use graphical representation of pipes and slashes while loading PXE bootfile, instead of …Attached screenshot from iLO that gives me the view / control on the client:
I attached both captures.
Fog Server Capture
WAN Side / ServerI’m really curious to see, if you can discover the reason on these specific servers, even if I can’t solve it remotely for some odd reason.
I’m not used to not get some the answers, I always like to find why something doesn’t work like how it should work.
Thanks for everything.
PS - Edit uploading on forums doesn’t seem to be working for me, throwing “Something went wrong while parsing server response”, hold let me fix it.
-
@NTex I really need to see the pcap from the witness computer at the remote site. The whole pcap and not just screen shots. We need to identify what the server is presenting itself as. (dhcp option 94 [I think]) and who the actors are in the dhcp OFFER. The 4 packet DHCP sequence is important as well as what the target computer does after it gets the ACK from the dhcp server.
-
@george1421
Fixed the previous post, I was having issues uploading files directly on forums.Attached screenshot from iLO that gives me the view / control on the client:
I attached both captures.
Fog Server Capture
WAN Side / Server -
@NTex The WAN side doesn’t contain the details I need. I don’t know if your capture filter is set wrong or the witness computer’s network interface isn’t on the same subnet as the pxe booting computer. Its good we see the tftp packets in it but we are missing the DHCP packets. What I’m expecting to see is a DHCP DISCOVER packet from the target computer, one or more OFFER packets from your dhcp server(s), a REQUEST packet from the pxe computer and then an ACK from the dhcp server. The first two packets are the telling ones.
From the screen shot it appears that its timing out on the tftp call, but we are seeing tftp requests to the fog server. I can also tell the client is in bios mode. So it should be requesting undionly.kpxe from the fog server.
-
@NTex Looking at the fog server pcap it appears normal too. I see the client asking for the size (tsize) of the file then it requests the file. The issue appears that the FOG server doesn’t send the file so it waits 30 seconds and requests it again and so on. This makes me think that the dhcp process is fine, but for some reason the tftp server is not sending the requested file. It can’t be ignoring the request because it must answer the tsize request because the client then sends the request for the file.
-
Yes, you’re right I start capturing before the actual bootp.
Problem was using capture on Appliance.This capture was on Switch port where the actual server is connected, so you will see a lot more traffic.
iLO IP is .2 and gateway .254.
See if this has what you want I filtered to dhcp I saw option 594 or something.
Thanks
-
@NTex OK now taking a step back and looking at the WAN side at the tftp protocol, its working as designed. Not how you want or working completely, but it is working. So we can discount everything up to the tftp file transfer. Because everything before is working.
Now it looks like the file transfer is not complete. There are not enough packets in the transfer to contain all of undionly.kpxe. I see the fog server sending block 0 and then the client sending an ACK for block 0 and then the fog server sends block 1 but the client never ACKs block one. The FOG server tries to resend block 1 several times and then stops. The client then waits 30 seconds and requests the file all over again. The cycle continues until the client gives up.
So the next test. Is it the FOG server (doubt), pxe client, or the network causing the pain. So from a windows computer install the tftp client program. Drop the windows firewall and use the tftp cleint on the windows computer to call undionly.kpxe to the remote site on the same subnet as the pxe booting computer. Do the same wan packet capture as the first time and lets see what we get.
-
So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ? -
@NTex said in Proliant ML110G7:
Yes, you’re right I start capturing before the actual bootp.
OK this second pcap contains more data. FYI, if you enter a display filter of
bootp
you can see the dhcp process andtftp
you can see the tftp process. The DHCP process looks textbook normal (but I kind of guessed that from the last pcap). But the target computer IS stating that its a BIOS mode computer. I just wanted to make sure the client was doing one thing and the network doing something else.From this pcap we can see the tsize of the file is 99002, so at 1456 bytes per packet it should take 68 blocks to transfer undionly,kpxe to the remote computer. Right now its only ACKing 1 block.
I don’t know why but something is telling me MTU and if the MTU is below 1456 it could be fragmenting the packet causing this problem, but why??
-
@NTex said in Proliant ML110G7:
So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ?yes as long as you transport undionly.kpxe and ipxe.efi to the remote site for tftp64 that will work… oh wait undionly.kpxe will again send out a dhcp request to find what it thinks is fog server listed as the next server (dhcp option 66) in this case it will point to the windows server again and not the FOG server. I’d have to look but I think I can create a one off version of those files that will only reference your FOG server.
Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.
-
Yes, I noticed the MTU is smaller on this location, so gets 106 bytes on 2nd window.
These WAN links are all Fiber 20 mbps, minimum.
Might be due to VPN, using part of MTU though.My thoughts were always towards to I wonder if it’s actually the card firmware might be bogus and doesn’t load the bootfile, but is the same version for the working servers.
And like I said on initial post, if I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
-
@george1421 said in Proliant ML110G7:
Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.
Yes, that’s the IP.
-
@NTex said in Proliant ML110G7:
f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.
-
@NTex said in Proliant ML110G7:
Yes, that’s the IP.
OK let me remote into the office and see if my dev box is still powered on. I had to do something similar not to long ago so that project still should be setup.
-
@george1421 said in Proliant ML110G7:
@NTex said in Proliant ML110G7:
f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.
Yes, there is a difference between client and PXE.
I checked HPE all these servers have the latest NIC firmware.
I mean these servers are pretty old!They release packages to patch on Linux, so I’ve done all that in the past.
-
@NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing
Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.
Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.
-
@george1421 said in Proliant ML110G7:
@NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing
Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.
Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.
Yes, you’re right
While you were compiling your project, I did this:
Copied the portable tftp64.
Then I copied ALL files from Fog Server located at /tftpboot.I saw the boot file being loaded, immediately
I captured the event using local tftpd nevertheless, if you want to look at it
Capture using local tftpdOnce Fog Menu loaded, I selected my “Install CentOS” option and it’s loading:
Still I download your special version, might be useful in future ?
I’m going to try now on server that I know it worked before to see if we see the MTU fragmentation to prove, if this was the root-cause.
-
@NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog
I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.