Fresh VM and Fog 1.2.0 install having issues with iPXE boot
-
If you can sustain a little of instability, I might recommend that you upgrade to the latest trunk build (pre 1.3.0 release). That way you will be assured you have the latest drivers and boot images and support for uefi firmware, gpt disks and windows 10.
-
Using ipxe.pxe i get the original error again. With undionly.kkpxe "Could not start download: Operation not supported (http://ipxe.org/3c092003).
In another post (that I think most of you were helping me with too :D) I was trying to use .032 and I think the error was the same. I am beginning to believe its the network and not fog. I might try to see if Adtran support can help me if I can explain the problem to them.
-
I agree with George, you should try fog trunk before escalating the issue - when simply upgrading to trunk might totally resolve the issue.
-
@jcook That’s interesting. For me
file
is reporting the kernel version:/var/www/html/fog/service/ipxe/bzImage: Linux kernel x86 boot executable bzImage, version 4.5.0 (root@debian64) #1 SMP Mon Mar 14 06:35:01 EDT 2016, RO-rootFS, swap_dev 0x6, Normal VGA /var/www/html/fog/service/ipxe/bzImage32: Linux kernel x86 boot executable bzImage, version 4.5.0 (root@debian64) #1 SMP Mon Mar 14 06:36:15 EDT 2016, RO-rootFS, swap_dev 0x6, Normal VGA
This is FOG trunk kernel. You’ll probably see 3.x.y version. See if you can find out about the kernel version via web GUI -> FOG Configuration -> Kernel Update
-
When I go to the “Kernel Update” section of the GUI I don’t see an indication of my current kernel. I don’t mind trying updating kernel to make sure I have the latest, just not sure which to choose.
Also upgrading to trunk isn’t a problem either if someone can point to guide I can do that too or first.
-
@jcook Here is a guide for updating to the latest trunk build. https://wiki.fogproject.org/wiki/index.php/Upgrade_to_trunk
-
I have upgraded to trunk and this is what my PXE boot looks like, I noticed another error I might have over looked. Since I’m not sure what any of it means Ill just let u see the screen shot. This is with undionly.kpxe.
Also I forgot to mention the fog server and the clients are on different vlans. I was thinking of moving fog to the same vlan and see if that helps just not sure it will.
-
That picture speaks volumes.
The PXE rom is requesting iPXE (good). iPXE kernel get loaded (good), iPXE isn’t able to pickup a dhcp address (bad).Do you have access to a hub or an unmanaged switch you can place between the computer and your building switch?
So far this is not a vlan issue, but an issue with iPXE being able to pick up a dhcp address. The most frequent reason is that spanning tree is not forwarding right away (but you said RSTP was enabled). So by the time STP switches to the forwarding rate, the iPXE kernel has already given up.
-
I am actaully a lone IT guy here at a small school. The firewall and core switches are in my “office”. My understanding ( and I may have things wrong) each vlan has its own DHCP server. Our “Wired” vlan has the fog tftp info, but the “Management” vlan has a tftp server for out access points. However for the “Management” vlan the tftp sever isn’t supplied via option 67 but another setting in the DHCP server. Should I try leaving the other tftp setting alone and adding the options 66 and 67?
Should I put the dumb switch between the client and the core, the firewall/dhcp, or just between the client and a smart switch( if that mkes since lol)?
EDIT: Also is this something I might be able to fix by adjusting RSTP timings?
-
@jcook The first step is to see if we can use an unmanaged switch between the target computer and your core switch.
The rest of your environment (up to this point) is setup correctly. The root of the issue right now is that when the iPXE kernel starts to run it will reset the network adapter causing the link to drop (if you watch the link light on the target computer) for a second while the network adapter is being configured. Its only out for a second, but with spanning tree in default mode it take 27 seconds for the port to start forwarding data again. The are 3 such network “winks” as the FOS kernel boots (PXE Rom -> iPXE, iPXE -> iPXE, and iPXE -> FOS kernel).
By using the unmanged switch the wink happens between the target and unmanaged switch. The core switch port never winks so it stays forwarding. Understand this is only a test to see if it is spanning tree issue.
-
DHCP is much faster. It asks for the tftp server and after entering it I get the above and it seems to hang.
-
@jcook good, then issue #1 has something to do with spanning tree. The next issue is if iPXE asks for the fog server address the proper dhcp options are not getting to the target. Can you confirm that the dhcp scope for this subnet is sending out dhcp options 66 {next-server} and option 67 {boot file}
<edit> crud that’s not the problem because the iPXE image is getting to the client. Is the FOG server and the target computer on the same subnet </edit>
-
@jcook if you use a web browser to go to here, what do you see?
http://172.18.164.6/fog/service/ipxe/boot.php?mac0=78:45:c4:0e:5d:a3
-
@george1421 No they are on separate subnets.
@Wayne-Workman I get the following (I changed the IP to my fogserver):
#!ipxe set fog-ip 172.18.164.6 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} cpuid --ext 29 && set arch x86_64 || set arch i386 goto get_console :console_set colour --rgb 0x00567a 1 || colour --rgb 0x00567a 2 || colour --rgb 0x00567a 4 || cpair --foreground 7 --background 2 2 || goto MENU :alt_console cpair --background 0 1 || cpair --background 1 2 || goto MENU :get_console console --picture http://172.18.164.6/fog/service/ipxe/bg.png --left 100 --right 80 && goto console_set || goto alt_console :MENU menu colour --rgb 0xff0000 0 || cpair --foreground 1 1 || cpair --foreground 0 3 || cpair --foreground 4 4 || item --gap Host is NOT registered! item --gap -- ------------------------------------- item fog.local Boot from hard disk item fog.memtest Run Memtest86+ item fog.reginput Perform Full Host Registration and Inventory item fog.reg Quick Registration and Inventory item fog.quickimage Quick Image item fog.multijoin Join Multicast Session item fog.sysinfo Client System Information (Compatibility) choose --default fog.local --timeout 3000 target && goto ${target} :fog.local sanboot --no-describe --drive 0x80 || goto MENU :fog.memtest kernel memdisk iso raw initrd memtest.bin boot || goto MENU :fog.reginput kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=127000 keymap= web=172.18.164.6/fog/ consoleblank=0 rootfstype=ext4 loglevel=4 mode=manreg imgfetch init_32.xz boot || goto MENU :fog.reg kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=127000 keymap= web=172.18.164.6/fog/ consoleblank=0 rootfstype=ext4 loglevel=4 mode=autoreg imgfetch init_32.xz boot || goto MENU :fog.quickimage login params param mac0 ${net0/mac} param arch ${arch} param username ${username} param password ${password} param qihost 1 isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :fog.multijoin login params param mac0 ${net0/mac} param arch ${arch} param username ${username} param password ${password} param sessionJoin 1 isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :fog.sysinfo kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=127000 keymap= web=172.18.164.6/fog/ consoleblank=0 rootfstype=ext4 loglevel=4 mode=sysinfo imgfetch init_32.xz boot || goto MENU :bootme chain -ar http://172.18.164.6/fog/service/ipxe/boot.php##params || goto MENU autoboot
-
This post is deleted! -
I took a packet capture from the time the client booted to a minute or so after it asked for me to put in the tftp IP. The mac address for the client is
78:45:c4:0e:5d:a3 and fog server is 172.18.164.6 it that will help to filter. If more info would help interpret the cap let me know.https://drive.google.com/file/d/0BxsOsMJZGNhYWklZNldIUWpCcU0/view?usp=sharing
-
Also as a test I set up a new router and move the fog and client to it on a dumb switch and everything seems to be working. I was a able to get to the fog boot screen so its must be something with network I just don’t know enough to figure it out. I am going to see if I can get STP disabled on the network to see if that does the trick.
-
@jcook Your FOG server (172.18.164.6) and the client (172.18.165.245) are on two different subnets (netmask being 255.255.255.0). This does not have to be an issue but I am wondering if you are aware of the fact that client and FOG server need use a gateway to talk to each other.
What kind of DHCP server do you use? I see option 66 and 67 (seem fine) but the DHCP server does not set those options in the DHCP header (next-server and filename). Not sure if this is causing the “Please enter tftp server” message because iPXE does not find option 66 in the DHCP answers - kind of strange… not sure about that.
Let me guess - you captured the packets from a different host in your network. This is why we see the DHCP conversation but no TFTP packets! Can you please do it again but capture the packets on the FOG server. Either use wireshark if you have GUI installed. Or install tcpdump and run it like this
tcpdump -w /tmp/bootup.pcap port 67 or port 68 or port 69 or port 80
(just leave the command, boot your client till you get the error/hang and then stop tcpdump with ctrl+c and upload the file /tmp/bootup.pcap to the forum) -
Our DHCP is handled by a Adtran Netvanta 3140. I think at first the clients on the 165 subnet were trying to use 172.18.165.1 as the tftp server so a rule is in place to forward it to fog (172.18.164.6). Clients could get to the fog boot screen after those changes on old fog server running 0.32 so i thought we were all good.
You were correct about the previous cap, I’m a networking novice. Here is the capture file
-
@jcook That’s funny. This time I only see the TFTP but no DHCP traffic…
TFTP traffic looks ok but the HTTP request for boot.php is being terminated (reset flag) by the client just a few micro seconds before the HTTP server would send it’s answer?!? Maybe that caused by some kind of HTTP filter on the gateway?