Proliant ML110G7

NTex

Yes, you’re right I start capturing before the actual bootp.
Problem was using capture on Appliance.

This capture was on Switch port where the actual server is connected, so you will see a lot more traffic.

iLO IP is .2 and gateway .254.

See if this has what you want I filtered to dhcp I saw option 594 or something.

Thanks

george1421

@NTex OK now taking a step back and looking at the WAN side at the tftp protocol, its working as designed. Not how you want or working completely, but it is working. So we can discount everything up to the tftp file transfer. Because everything before is working.

Now it looks like the file transfer is not complete. There are not enough packets in the transfer to contain all of undionly.kpxe. I see the fog server sending block 0 and then the client sending an ACK for block 0 and then the fog server sends block 1 but the client never ACKs block one. The FOG server tries to resend block 1 several times and then stops. The client then waits 30 seconds and requests the file all over again. The cycle continues until the client gives up.

So the next test. Is it the FOG server (doubt), pxe client, or the network causing the pain. So from a windows computer install the tftp client program. Drop the windows firewall and use the tftp cleint on the windows computer to call undionly.kpxe to the remote site on the same subnet as the pxe booting computer. Do the same wan packet capture as the first time and lets see what we get.

NTex

@george1421

So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ?

george1421

@NTex said in Proliant ML110G7:

Yes, you’re right I start capturing before the actual bootp.

OK this second pcap contains more data. FYI, if you enter a display filter of bootp you can see the dhcp process and tftp you can see the tftp process. The DHCP process looks textbook normal (but I kind of guessed that from the last pcap). But the target computer IS stating that its a BIOS mode computer. I just wanted to make sure the client was doing one thing and the network doing something else.

From this pcap we can see the tsize of the file is 99002, so at 1456 bytes per packet it should take 68 blocks to transfer undionly,kpxe to the remote computer. Right now its only ACKing 1 block.

I don’t know why but something is telling me MTU and if the MTU is below 1456 it could be fragmenting the packet causing this problem, but why??

george1421

@NTex said in Proliant ML110G7:

So I can deploy like tftpd64 server on Windows client and then change my DHCP to get that client instead and capture all the action ?
Would it work ?

yes as long as you transport undionly.kpxe and ipxe.efi to the remote site for tftp64 that will work… oh wait undionly.kpxe will again send out a dhcp request to find what it thinks is fog server listed as the next server (dhcp option 66) in this case it will point to the windows server again and not the FOG server. I’d have to look but I think I can create a one off version of those files that will only reference your FOG server.

Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.

NTex

@george1421

Yes, I noticed the MTU is smaller on this location, so gets 106 bytes on 2nd window.
These WAN links are all Fiber 20 mbps, minimum.
Might be due to VPN, using part of MTU though.

My thoughts were always towards to I wonder if it’s actually the card firmware might be bogus and doesn’t load the bootfile, but is the same version for the working servers.

And like I said on initial post, if I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.

NTex

@george1421 said in Proliant ML110G7:

Just to confirm your fog server is at 10.200.0.67? Once iPXE gets loaded and running it access the FOG server over http which is a bit more WAN friendly than tftp.

Yes, that’s the IP.

george1421

@NTex said in Proliant ML110G7:

f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.

But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.

george1421

@NTex said in Proliant ML110G7:

Yes, that’s the IP.

OK let me remote into the office and see if my dev box is still powered on. I had to do something similar not to long ago so that project still should be setup.

NTex

@george1421 said in Proliant ML110G7:

@NTex said in Proliant ML110G7:

f I do from this very server (OS terminal) i do the command of tftp to our fog server to download undionly.kpxe and does no problem.

But in this case you are using the OS’ tftp client, where when you are pxe booting you are using the nic card’s PXE rom that contains the tftp client. I don’t remember HP servers, but I know Dell and you can update the bios, but that doesn’t necessary mean you update the NIC firmware. Through the lifecycle controller the NIC and RAID firmware is a separate install.

Yes, there is a difference between client and PXE.
I checked HPE all these servers have the latest NIC firmware.
I mean these servers are pretty old!

They release packages to patch on Linux, so I’ve done all that in the past.

george1421

@NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing

Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.

Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.

NTex

@george1421 said in Proliant ML110G7:

@NTex Ok here is a “special” version of undionly.kpxe https://drive.google.com/file/d/1XYe4SsM0ZLiJae1paIb8PFDnPVV0M3D7/view?usp=sharing

Once loaded it will ignore any direction given by dhcp and request default.ipxe from 10.200.0.67 over the tftp protocol. Once that file is loaded it will then switch to http.

Well now that I think about it, the default undionly.kpxe would work too (ugh) as long as you bring over default.ipxe to your tftpd64 server too. THAT file points directly at your FOG server. I didn’t think far enough ahead in the process. That makes this special undonly.kpxe not that special.

Yes, you’re right

While you were compiling your project, I did this:

Copied the portable tftp64.
Then I copied ALL files from Fog Server located at /tftpboot.

I saw the boot file being loaded, immediately
alt text

I captured the event using local tftpd nevertheless, if you want to look at it
Capture using local tftpd

Once Fog Menu loaded, I selected my “Install CentOS” option and it’s loading:
alt text

Still I download your special version, might be useful in future ?

I’m going to try now on server that I know it worked before to see if we see the MTU fragmentation to prove, if this was the root-cause.

george1421

@NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog

I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.

NTex

@george1421 said in Proliant ML110G7:

@NTex Good going. Now I did work on a project to turn a Windows server into a FOG storage node. Once I proved that it worked I dropped the project because, why?? I have it documented here: https://forums.fogproject.org/topic/6941/windows-server-as-fog-storage-node-proof-of-concept-blog

I realize this is a one off situation but if you need it then use it. But I think the fragmentation or what ever is going on with your MPLS circuit will be a problem when you get to the imaging point because FOG uses NFS to transfer the file from the FOG server to FOS Linux running on the target computer. Having a storage node at the remote sites might be the better solution if you can’t image over your WAN connection.

So it might be the actual MTU and fragmentation, probably just happens for this old NIC and on these locations, who knows.

Come to think about it, theses sites are kind located more on country side, far from big cities, where usually ISP have more issues like this due to distance / infrastructure, etc.

Working Server, one of those I didn’t had issues, capture file
Has no fragmentation, right ?

I mean you see loading it fine here:
alt text

I think (at least I) learned something, MTU can cause issues like this.

I wish I would had this idea sooner, using another workstation with portable TFTP Server while keeping the same DHCP, just had to change Option 66 to point to the Workstation.
I actually copied ALL the PXE files from our Fog.

I can use this workaround for 4 locations, and saved us couple thousand miles of driving and replace the servers physically, at least for now.

Nevertheless, I will keep your special version that you compiled for me.

Brainstorming this puzzle with you was a pleasure, thanks for all the help you gave and support, truly awesome. @george1421

george1421

@NTex The pcap in the last one. On the dhcp side it was textbook perfect. On the tfp side I did see occasional block retrans but overall for a wan connection its acceptable.

From the MTU/fragmentation side, you have to remember a few things.

tftp use udp protocol which is not very forgiving to dropped or lost packets.
The PXE rom implementation originally didn’t support pxe booting across subnets. That was added later on. Because of the minimal size of the PXE ROM they made certain assumption about the transfer and eliminated code from the drivers that might have been though unnecessary at the time. Later version of the PXE rom had more space and are much more tolerant to communication problems.

Happy FOGGING.

NTex

@george1421

Awesome!

BTW, little off-topic do you know why I had issues initially to upload on forum?
Was due to the fact fresh user account and low reputation?

Now I tested seems fine.

Proliant ML110G7

87

12.7k

17.6k

156.8k