Proliant ML110G7



  • Hi,

    I’ve been enjoying these forums as avid reader and technical help.
    Manage to do my Windows Images UEFI and Legacy for deployment.
    Manage to have my Linux OS to be installed through HTTP
    Also, helped me to get some diagnostic / maintenance tools, such as GParted on PXE Boot.

    I can easily deploy and really fast my images / OS’es installs even through my MPLS Wan Links.

    I even discontinued completely my WDS, is gone forever.

    So, overall it’s been working very good.

    We’re doing a mass Linux Migration old system SLES11 to CentOS 7 / 8 using HP iLO.

    I have many hardware generations that goes through G7 up to G10.
    Now for some reason I have 4 servers of 35 of this G7, won’t boot at all.

    They will get an IP from DHCP fine, no problem.

    Fog Server will receive the request I checked the logs, but aborts and then loops on this:
    Aug 28 03:19:47 fogserver xinetd[1328]: START: tftp pid=10927 from=10.173.72.153
    Aug 28 03:19:47 fogserver in.tftpd[10928]: Error code 0: TFTP Aborted
    Aug 28 03:19:53 fogserver in.tftpd[10929]: Client 10.173.72.153 finished undionly.kpxe
    Aug 28 03:19:53 fogserver in.tftpd[10929]: Client 10.173.72.153 timed out

    I also checked the tcpdump tool and traffic goes back to the server.

    Being MPLS I have no firewall between these servers.

    Also, it’s just this 4 servers, the other 31 servers had no issues.

    And checked PXE boot is enabled on this INTEL 82574L
    Also card firmware is the exact same for all 35.

    I tried all the Fog Legacy PXE files, as well.

    Any way I can find out the root-cause for this problem ?

    Thanks for your help and keep up the good work.



  • @george1421

    Awesome!

    BTW, little off-topic do you know why I had issues initially to upload on forum?
    Was due to the fact fresh user account and low reputation?

    Now I tested seems fine.


  • Moderator

    @NTex The pcap in the last one. On the dhcp side it was textbook perfect. On the tfp side I did see occasional block retrans but overall for a wan connection its acceptable.

    From the MTU/fragmentation side, you have to remember a few things.

    1. tftp use udp protocol which is not very forgiving to dropped or lost packets.
    2. The PXE rom implementation originally didn’t support pxe booting across subnets. That was added later on. Because of the minimal size of the PXE ROM they made certain assumption about the transfer and eliminated code from the drivers that might have been though unnecessary at the time. Later version of the PXE rom had more space and are much more tolerant to communication problems.

    Happy FOGGING.



  • @george1421 said in Proliant ML110G7:

    @NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog

    I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.

    So it might be the actual MTU and fragmentation, probably just happens for this old NIC and on these locations, who knows.

    Come to think about it, theses sites are kind located more on country side, far from big cities, where usually ISP have more issues like this due to distance / infrastructure, etc.

    Working Server, one of those I didn’t had issues, capture file
    Has no fragmentation, right ?

    I mean you see loading it fine here:
    alt text

    I think (at least I) learned something, MTU can cause issues like this.

    I wish I would had this idea sooner, using another workstation with portable TFTP Server while keeping the same DHCP, just had to change Option 66 to point to the Workstation.
    I actually copied ALL the PXE files from our Fog.

    I can use this workaround for 4 locations, and saved us couple thousand miles of driving and replace the servers physically, at least for now.

    Nevertheless, I will keep your special version that you compiled for me.

    Brainstorming this puzzle with you was a pleasure, thanks for all the help you gave and support, truly awesome. @george1421


  • Moderator

    @NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog

    I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.



  • @george1421 said in Proliant ML110G7:

    @NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing

    Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.

    Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.

    Yes, you’re right 🙂

    While you were compiling your project, I did this:

    Copied the portable tftp64.
    Then I copied ALL files from Fog Server located at /tftpboot.

    I saw the boot file being loaded, immediately
    alt text

    I captured the event using local tftpd nevertheless, if you want to look at it 🎯
    Capture using local tftpd

    Once Fog Menu loaded, I selected my “Install CentOS” option and it’s loading:
    alt text

    Still I download your special version, might be useful in future ?

    I’m going to try now on server that I know it worked before to see if we see the MTU fragmentation to prove, if this was the root-cause.


  • Moderator

    @NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing

    Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.

    Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.



  • @george1421 said in Proliant ML110G7:

    @NTex said in Proliant ML110G7:

    f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.

    But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.

    Yes, there is a difference between client and PXE.
    I checked HPE all these servers have the latest NIC firmware.
    I mean these servers are pretty old! 🙂

    They release packages to patch on Linux, so I’ve done all that in the past.


  • Moderator

    @NTex said in Proliant ML110G7:

    Yes, that’s the IP.

    OK let me remote into the office and see if my dev box is still powered on. I had to do something similar not to long ago so that project still should be setup.


  • Moderator

    @NTex said in Proliant ML110G7:

    f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.

    But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.



  • @george1421 said in Proliant ML110G7:

    Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.

    Yes, that’s the IP.



  • @george1421

    Yes, I noticed the MTU is smaller on this location, so gets 106 bytes on 2nd window.
    These WAN links are all Fiber 20 mbps, minimum.
    Might be due to VPN, using part of MTU though.

    My thoughts were always towards to I wonder if it’s actually the card firmware might be bogus and doesn’t load the bootfile, but is the same version for the working servers.

    And like I said on initial post, if I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.


  • Moderator

    @NTex said in Proliant ML110G7:

    So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
    Would it work ?

    yes as long as you transport undionly.kpxe and ipxe.efi to the remote site for tftp64 that will work… oh wait undionly.kpxe will again send out a dhcp request to find what it thinks is fog server listed as the next server (dhcp option 66) in this case it will point to the windows server again and not the FOG server. I’d have to look but I think I can create a one off version of those files that will only reference your FOG server.

    Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.


  • Moderator

    @NTex said in Proliant ML110G7:

    Yes, you’re right I start capturing before the actual bootp.

    OK this second pcap contains more data. FYI, if you enter a display filter of bootp you can see the dhcp process and tftp you can see the tftp process. The DHCP process looks textbook normal (but I kind of guessed that from the last pcap). But the target computer IS stating that its a BIOS mode computer. I just wanted to make sure the client was doing one thing and the network doing something else.

    From this pcap we can see the tsize of the file is 99002, so at 1456 bytes per packet it should take 68 blocks to transfer undionly,kpxe to the remote computer. Right now its only ACKing 1 block.

    I don’t know why but something is telling me MTU and if the MTU is below 1456 it could be fragmenting the packet causing this problem, but why??



  • @george1421

    So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
    Would it work ?


  • Moderator

    @NTex OK now taking a step back and looking at the WAN side at the tftp protocol, its working as designed. Not how you want or working completely, but it is working. So we can discount everything up to the tftp file transfer. Because everything before is working.

    Now it looks like the file transfer is not complete. There are not enough packets in the transfer to contain all of undionly.kpxe. I see the fog server sending block 0 and then the client sending an ACK for block 0 and then the fog server sends block 1 but the client never ACKs block one. The FOG server tries to resend block 1 several times and then stops. The client then waits 30 seconds and requests the file all over again. The cycle continues until the client gives up.

    So the next test. Is it the FOG server (doubt), pxe client, or the network causing the pain. So from a windows computer install the tftp client program. Drop the windows firewall and use the tftp cleint on the windows computer to call undionly.kpxe to the remote site on the same subnet as the pxe booting computer. Do the same wan packet capture as the first time and lets see what we get.



  • @george1421

    Yes, you’re right I start capturing before the actual bootp.
    Problem was using capture on Appliance.

    This capture was on Switch port where the actual server is connected, so you will see a lot more traffic.

    iLO IP is .2 and gateway .254.

    See if this has what you want I filtered to dhcp I saw option 594 or something.

    Thanks


  • Moderator

    @NTex Looking at the fog server pcap it appears normal too. I see the client asking for the size (tsize) of the file then it requests the file. The issue appears that the FOG server doesn’t send the file so it waits 30 seconds and requests it again and so on. This makes me think that the dhcp process is fine, but for some reason the tftp server is not sending the requested file. It can’t be ignoring the request because it must answer the tsize request because the client then sends the request for the file.


  • Moderator

    @NTex The WAN side doesn’t contain the details I need. I don’t know if your capture filter is set wrong or the witness computer’s network interface isn’t on the same subnet as the pxe booting computer. Its good we see the tftp packets in it but we are missing the DHCP packets. What I’m expecting to see is a DHCP DISCOVER packet from the target computer, one or more OFFER packets from your dhcp server(s), a REQUEST packet from the pxe computer and then an ACK from the dhcp server. The first two packets are the telling ones.

    From the screen shot it appears that its timing out on the tftp call, but we are seeing tftp requests to the fog server. I can also tell the client is in bios mode. So it should be requesting undionly.kpxe from the fog server.



  • @george1421
    Fixed the previous post, I was having issues uploading files directly on forums.

    Attached screenshot from iLO that gives me the view / control on the client:
    alt text

    I attached both captures.
    Fog Server Capture
    WAN Side / Server


Log in to reply
 

275
Online

7.4k
Users

14.5k
Topics

136.5k
Posts