Proliant ML110G7
-
Hi,
I’ve been enjoying these forums as avid reader and technical help.
Manage to do my Windows Images UEFI and Legacy for deployment.
Manage to have my Linux OS to be installed through HTTP
Also, helped me to get some diagnostic / maintenance tools, such as GParted on PXE Boot.I can easily deploy and really fast my images / OS’es installs even through my MPLS Wan Links.
I even discontinued completely my WDS, is gone forever.
So, overall it’s been working very good.
We’re doing a mass Linux Migration old system SLES11 to CentOS 7 / 8 using HP iLO.
I have many hardware generations that goes through G7 up to G10.
Now for some reason I have 4 servers of 35 of this G7, won’t boot at all.They will get an IP from DHCP fine, no problem.
Fog Server will receive the request I checked the logs, but aborts and then loops on this:
Aug 28 03:19:47 fogserver xinetd[1328]: START: tftp pid=10927 from=10.173.72.153
Aug 28 03:19:47 fogserver in.tftpd[10928]: Error code 0: TFTP Aborted
Aug 28 03:19:53 fogserver in.tftpd[10929]: Client 10.173.72.153 finished undionly.kpxe
Aug 28 03:19:53 fogserver in.tftpd[10929]: Client 10.173.72.153 timed outI also checked the tcpdump tool and traffic goes back to the server.
Being MPLS I have no firewall between these servers.
Also, it’s just this 4 servers, the other 31 servers had no issues.
And checked PXE boot is enabled on this INTEL 82574L
Also card firmware is the exact same for all 35.I tried all the Fog Legacy PXE files, as well.
Any way I can find out the root-cause for this problem ?
Thanks for your help and keep up the good work.
-
OK so if I understand this you only have 4 of this specific model that won’t pxe boot. I take it that these 4 servers are on the opposite end of the mpls link as the FOG server? Is this correct?
If it is do you have a witness (second computer at the remote site) computer that we can use for packet capture? This can be another linux computer or windows computers on the same subnet as the pxe booting computer.
-
This post is deleted! -
Yes, you understood correctly.
Different locations, we did like around 200 migrations, with no issue whatsoever. These locations all have the same template.All other servers boots fine on this specific model and all other models, even some recent Lenovo, I have no issues.
I also tried to run tftp command from the actual server while inside the OS shell, it download the PXE file from our Fog server, no problem.
This happens only when we try to PXE boot those.
I have some Windows workstations on all these subnets I can use, only
One site I have MX Meraki those have a incorporated Wireshark as tool, which I can save it too.
Thanks
-
@NTex Ok what I would like to see is a pcap (packet capture) of the pxe booting process on these computers that are failing.
The witness computer needs to be on the same subnet as the failing server. Ideally if you can swing a mirror port that would give us the best quality of the pcap. If you can’t then on the same subnet will work. I need a pcap for this witness computer to use this wireshark capture filter
port 67 or port 68 or port 69
That will ensure we only get pxe booting packets and not any incidental packets with information you shouldn’t share.If you can’t use a mirror port then at the same time run the pcap defined in this tutorial on the FOG server. This will give us the tftp side of the booting process. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
Upload the pcap to this forum or to a file share site, share as public with the link. You can either post the link here or DM me the link using the FOG Forum chat function. I’ll take a look at the pcap and see if I can find why these are not booting.
It would also be helpful to grab a screen shot of the error on the server’s console when the boot fails.
-
Hello again,
I actually done both (WAN and Fog) in past, but never done on mirror way, this is good idea.
I use Meraki location, since has a capture tool, it’s very good.
On the WAN side, I capture all traffic for host DHCP IP and on Fog Server the ports mentioned.It actually loop for while and can take sometime to finally give up on loop and time-out, so I set it as 180 seconds on both ends.
There was once took 30 mins to finally give up.
The screenshot will show you … like not loading.
Usual behavior, is the use graphical representation of pipes and slashes while loading PXE bootfile, instead of …Attached screenshot from iLO that gives me the view / control on the client:
I attached both captures.
Fog Server Capture
WAN Side / ServerI’m really curious to see, if you can discover the reason on these specific servers, even if I can’t solve it remotely for some odd reason.
I’m not used to not get some the answers, I always like to find why something doesn’t work like how it should work.
Thanks for everything.
PS - Edit uploading on forums doesn’t seem to be working for me, throwing “Something went wrong while parsing server response”, hold let me fix it.
-
@NTex I really need to see the pcap from the witness computer at the remote site. The whole pcap and not just screen shots. We need to identify what the server is presenting itself as. (dhcp option 94 [I think]) and who the actors are in the dhcp OFFER. The 4 packet DHCP sequence is important as well as what the target computer does after it gets the ACK from the dhcp server.
-
@george1421
Fixed the previous post, I was having issues uploading files directly on forums.Attached screenshot from iLO that gives me the view / control on the client:
I attached both captures.
Fog Server Capture
WAN Side / Server -
@NTex The WAN side doesn’t contain the details I need. I don’t know if your capture filter is set wrong or the witness computer’s network interface isn’t on the same subnet as the pxe booting computer. Its good we see the tftp packets in it but we are missing the DHCP packets. What I’m expecting to see is a DHCP DISCOVER packet from the target computer, one or more OFFER packets from your dhcp server(s), a REQUEST packet from the pxe computer and then an ACK from the dhcp server. The first two packets are the telling ones.
From the screen shot it appears that its timing out on the tftp call, but we are seeing tftp requests to the fog server. I can also tell the client is in bios mode. So it should be requesting undionly.kpxe from the fog server.
-
@NTex Looking at the fog server pcap it appears normal too. I see the client asking for the size (tsize) of the file then it requests the file. The issue appears that the FOG server doesn’t send the file so it waits 30 seconds and requests it again and so on. This makes me think that the dhcp process is fine, but for some reason the tftp server is not sending the requested file. It can’t be ignoring the request because it must answer the tsize request because the client then sends the request for the file.
-
Yes, you’re right I start capturing before the actual bootp.
Problem was using capture on Appliance.This capture was on Switch port where the actual server is connected, so you will see a lot more traffic.
iLO IP is .2 and gateway .254.
See if this has what you want I filtered to dhcp I saw option 594 or something.
Thanks
-
@NTex OK now taking a step back and looking at the WAN side at the tftp protocol, its working as designed. Not how you want or working completely, but it is working. So we can discount everything up to the tftp file transfer. Because everything before is working.
Now it looks like the file transfer is not complete. There are not enough packets in the transfer to contain all of undionly.kpxe. I see the fog server sending block 0 and then the client sending an ACK for block 0 and then the fog server sends block 1 but the client never ACKs block one. The FOG server tries to resend block 1 several times and then stops. The client then waits 30 seconds and requests the file all over again. The cycle continues until the client gives up.
So the next test. Is it the FOG server (doubt), pxe client, or the network causing the pain. So from a windows computer install the tftp client program. Drop the windows firewall and use the tftp cleint on the windows computer to call undionly.kpxe to the remote site on the same subnet as the pxe booting computer. Do the same wan packet capture as the first time and lets see what we get.
-
So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ? -
@NTex said in Proliant ML110G7:
Yes, you’re right I start capturing before the actual bootp.
OK this second pcap contains more data. FYI, if you enter a display filter of
bootp
you can see the dhcp process andtftp
you can see the tftp process. The DHCP process looks textbook normal (but I kind of guessed that from the last pcap). But the target computer IS stating that its a BIOS mode computer. I just wanted to make sure the client was doing one thing and the network doing something else.From this pcap we can see the tsize of the file is 99002, so at 1456 bytes per packet it should take 68 blocks to transfer undionly,kpxe to the remote computer. Right now its only ACKing 1 block.
I don’t know why but something is telling me MTU and if the MTU is below 1456 it could be fragmenting the packet causing this problem, but why??
-
@NTex said in Proliant ML110G7:
So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ?yes as long as you transport undionly.kpxe and ipxe.efi to the remote site for tftp64 that will work… oh wait undionly.kpxe will again send out a dhcp request to find what it thinks is fog server listed as the next server (dhcp option 66) in this case it will point to the windows server again and not the FOG server. I’d have to look but I think I can create a one off version of those files that will only reference your FOG server.
Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.
-
Yes, I noticed the MTU is smaller on this location, so gets 106 bytes on 2nd window.
These WAN links are all Fiber 20 mbps, minimum.
Might be due to VPN, using part of MTU though.My thoughts were always towards to I wonder if it’s actually the card firmware might be bogus and doesn’t load the bootfile, but is the same version for the working servers.
And like I said on initial post, if I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
-
@george1421 said in Proliant ML110G7:
Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.
Yes, that’s the IP.
-
@NTex said in Proliant ML110G7:
f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.
-
@NTex said in Proliant ML110G7:
Yes, that’s the IP.
OK let me remote into the office and see if my dev box is still powered on. I had to do something similar not to long ago so that project still should be setup.
-
@george1421 said in Proliant ML110G7:
@NTex said in Proliant ML110G7:
f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.
But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.
Yes, there is a difference between client and PXE.
I checked HPE all these servers have the latest NIC firmware.
I mean these servers are pretty old!They release packages to patch on Linux, so I’ve done all that in the past.