040ee119 error on boot
-
Wow never expected that result.
Yes there appears to be something in the network itself causing this newer problem. Took my test machine out to the “Production” workstation and it too failed to boot (reboot loop). Brought it back to the “Test” desk and it works fine.
Grabbed the box from the “Production” workstation and brought it to my “test” desk. It booted up flawlessly.
Upon initial inspection, the only major difference in network protocol is that the “Test” desk is 10/100, while the “Production” desk is running on gigabit.
Further inspection denotes that all the core switches are either Dell Powerconnect, or Netgear Prosafe Gigabits. The “Test” desk has a cheap-o C-net 10/100 24 port to give me the number of ports i need at the desk which effectively drops the network speed to 10/100 for that cable run. The “Production” desk is running directly off the core switches at full Gigabit speed.
I then pulled the cable from my 10/100 switch and tried booting a pc directly on the “Test” desk from the direct line, without the 10/100 switch reducing the transmission speed. Reboot loop occured once again.
Therefore i must conclude that there is an issue with Undionly and gigabit speeds.
-
[QUOTE]Netgear Prosafe Gigabits. [/QUOTE]
Be absolutely sure that STP is disable on these switches. Otherwise, PXE or iPXE will not work.
-
The netgear switches are isolated to our DMZ, i didn’t take the time to trace all the cables. But to allay your fears, .33 worked perfectly for all our machines w/pxe.
The “Production” desk is directly connected to a dell Powersafe, that i know for certain.
This is a recent issue since 1.0.1 and more specifically, if STP were an issue i would not expect it to suddenly “go away” just because i put another switch in-line after it. In fact i would expect it to get worse or at least stay the same since there would be more usage of STP with another switch in place.
In this particular office, we only have 2 subnets. Our DMZ, and the main network. In fact, i only ever use Fog here in this office to clone new machines, and i’ve done that well over 100 times on .33 with no problems
Well, other than multicast taking the network down when we try to use it, but that’s neither here nor there and probably has more to do with our highly distributed network and massive VPN usage to handle it all. In either case, i’m not worried about multi-casting, and we don’t push images over the WAN anyway.
-
[QUOTE]But to allay your fears, .33 worked perfectly for all our machines w/pxe.[/QUOTE]
Fears eased . Just have to throw that out there just-in-case. I have Netgear Prosafes (1GB) w/ 1GB GBIC’s here and it works…except for the few bugs i keep hitting…cough TOM!..
Have you you tried any of the other undionly.ipxe files from [url]https://mastacontrola.com/ipxe/?[/url] I would strongly suggest /[URL=‘https://mastacontrola.com/ipxe/d6300-DEFAULT-GOOD/’]d6300-DEFAULT-GOOD/[/URL] This is the last I did on the rounds of testing here and it seems to be running smoothly.
-
When connected to the 10/100 switch everything functions as well as can be expected. They boot just fine right into the menu and then on to windows.
Not that it’s necessarily relevant yet, but it locks up at “loading boot sector… booting…” when trying to do a memtest. Let’s not trouble Tom with other issues here until we can get everyone to the menu first
However bump up the speed to 1gb, (by removing the 10/100 switch) and it, picks up DHCP, goes to configuring, twirls there for a bit, then flashes an error message which i can’t read, because it instantly reboots.
-
Tom,
I got rid of the “Operation not Supported” message on a Dell 390 today by using the “e0478-DEFAULT-TEST” ipxe.pxe . Though that version did not work on the Dell 990’s. I went through all of the most current files you have posted and here is what I have found.
e0478-DEFAULT-TEST
undionly.kkpxe - - reboot loop on dell 990 operation not supported
ipxe.pxe - - - - operation not supported then freezes
undionly.kpxe - - - operation not supported then freezes
undionly.pxe - - - - assumming it errors but goes way to fast to read error then reboots
ipxe.kpxe - - - operation not supported then freezes
ipxe.kkpxe - - - operation not supported then freezesd6300-DEFAULT-GOOD
ipxe.pxe - - - operation not supported then freezes
undionly.kpxe - – operation not supported then freezes
undionly.pxe - - - - assumming it errors but goes way to fast to read error then reboots
ipxe.kpxe - - - operation not supported then freezes
ipxe.kkpxe - - - operation not supported then freezes
undionly.kkpxe - - - - operation not supported then reboots -
the undionly.kpxe file is not your problem. something else is not configured properly
use the d6300-DEFAULT-GOOD file -
What would you suggest? Should I run a backup and re-install? Or would that carry the same problem over. Its just weird that the same issue happened on a Dell 390 and I change to ipxe.pxe and it worked fine.
-
[quote=“Tribble, post: 29072, member: 17221”]When connected to the 10/100 switch everything functions as well as can be expected. They boot just fine right into the menu and then on to windows.
Not that it’s necessarily relevant yet, but it locks up at “loading boot sector… booting…” when trying to do a memtest. Let’s not trouble Tom with other issues here until we can get everyone to the menu first
However bump up the speed to 1gb, (by removing the 10/100 switch) and it, picks up DHCP, goes to configuring, twirls there for a bit, then flashes an error message which i can’t read, because it instantly reboots.[/quote]
In reference to the “twirling” you’re experiencing, do your switches have STP enabled on them? Do you have an option for portfast on these switches?
Being connected to the 10/100 doesn’t really mean anything, it simply means it can receive the dhcp request from ipxe. If you take out all managed switches and connect directly to the FOG server, does all work as expected at 1gb?
This will let you know if it’s a 10/100->1000 problem. I run gig switches on all my tests, and don’t experience the boot loop issue. (BTW for all, the boot loop is intentional kind of).
Doing this won’t necessarily be easy, but I suppose, if you could for DHCP, connect your FOG Server, Client test system, and DHCP server to the same switch. Start off easy and use a “dummy” switch preferably rated for 1GB.
This will, again, determine if it’s truly the undionly OR if something’s on your network not sending the DHCP back in time.
-
[quote=“Buddy, post: 29100, member: 225”]What would you suggest? Should I run a backup and re-install? Or would that carry the same problem over. Its just weird that the same issue happened on a Dell 390 and I change to ipxe.pxe and it worked fine.[/quote]
The reason, in past posts, I asked if you performed a schema update is because there are pieces of information pulled from the database. If the DB’s not accessible it can’t, properly, create the Boot scripts.
That all said, you stated that schema is fine and you can log in to the FOG Web GUI, correct? Using the “latest-ish” undionly/ipxe files have helped you get further, but the 990’s are still not working correct?
Is it possible these systems aren’t receiving all the data intended? Meaning, it’s only downloaded a portion of the default.ipxe, if any?
Is your system actually trying to follow the right paths? ( pxe->download undionly.kpxe->load ipxe->receive new dhcp->load the /tftpboot/default.ipxe->load menu system ([url]http://FOGIP/fog/service/ipxe/boot.php?mac=XX:XX:XX:XX:XX:XX)?[/url] We already know they are pxe booting, downloading the undionly.kpxe, and attempting to load default.ipxe
While the link to boot.php is not 100% accurate as the ipxe loading uses post variables as opposed to get vars, if you go to the link, you should see your tasking, or menu information. Replace the XX’s with the client MAC or just stop at boot.php without sending the mac at all. What is seen in your browser if you go to this link?
All of your other systems are working, just not the 990’s. Is there a wireless nic, or some other nic on the system?
-
what bios version to the Dell Optiplex 990’s have?
-
[quote=“Tom Elliott, post: 29104, member: 7271”]In reference to the “twirling” you’re experiencing, do your switches have STP enabled on them? Do you have an option for portfast on these switches?
Being connected to the 10/100 doesn’t really mean anything, it simply means it can receive the dhcp request from ipxe. If you take out all managed switches and connect directly to the FOG server, does all work as expected at 1gb?
This will let you know if it’s a 10/100->1000 problem. I run gig switches on all my tests, and don’t experience the boot loop issue. (BTW for all, the boot loop is intentional kind of).
Doing this won’t necessarily be easy, but I suppose, if you could for DHCP, connect your FOG Server, Client test system, and DHCP server to the same switch. Start off easy and use a “dummy” switch preferably rated for 1GB.
This will, again, determine if it’s truly the undionly OR if something’s on your network not sending the DHCP back in time.[/quote]
Hey Tom, I’ll see what i can do about running them all through a single unmanaged gigabit, but i doubt i’ll be able to do that without significant network disruption. Nearly everything runs through those 2 Dell powerconnects.
What i can do at the moment, is try an un-managed gigabit capable switch in place of the 10/100 C-net.
I’ll also have the Network admin check the managed switches for STP and portfast.Keep in mind however this issue never came up while using .32.33 to image well over 100 pc’s this spring for our Win-7 roll out. We’ve had no changes to our network infrastructure since then. Those 100+ PC’s were loading through with no issues up until i updated FOG to 1.0.1 .
Appx 25 of the machines mentioned are in this office, i have not verified that all of the PC’s in this office are effected. I have not updated the DHCP PXE image name at our other 14 branch offices as of yet as that’s just way too many “OMG my computer won’t boot” phone calls every morning lol. Each office has it’s own subnet (10.X.0.x) and Local DHCP server.
All in all, i’ve imaged and installed 80 DC7900’s, and 80+ DC5750’s since january using fog .32 before the update with no issues like i’m describing here. A PXE\STP issue would have reared it’s ugly head already.
I’m certainly not complaining, Fog has already saved me hundreds of hours of work, lol, I just want to give you as much info as i can since i’ll be pretty busy today with DR testing.
-
Tribble check out this: [url]http://fogproject.org/forum/threads/nothing-is-working.10598/page-2#post-28305[/url]
I’m sure that the Dell Powerconnects have Spanning Tree Protocol (STP) enabled by default. Please confirm that these have be disabled.
Also there seems to be a portfast function as well. With portfast being disabled it would interfere with iPXE and not PXE
-
[quote=“Tribble, post: 29149, member: 17221”]Keep in mind however this issue never came up while using .32.33 to image well over 100 pc’s this spring for our Win-7 roll out. We’ve had no changes to our network infrastructure since then. Those 100+ PC’s were loading through with no issues up until i updated FOG to 1.0.1 .[/quote]
The reason there is a difference is ipxe has to establish it’s own dhcp connection. The timing is less than that for regular PXE. Your systems are “getting” to the undionly.kpxe which means the PXE side of the house is working properly. To enable the new “protocol” usages, ipxe has to re-establish the link to the switches which was what was first causing the 040ee119 error. It happened because of a timing problem and CPU usage issue in the ipxe source which has since been fixed. Now if you see this error, it’s most likely a network relatable issue such as the STP or not having PortFast in the case of cisco switches.
I hope this makes sense.
[quote=“Tribble, post: 29149, member: 17221”]All in all, i’ve imaged and installed 80 DC7900’s, and 80+ DC5750’s since january using fog .32 before the update with no issues like i’m describing here. A PXE\STP issue would have reared it’s ugly head already.[/quote]
The pxe/stp issue wouldn’t have reared it’s head because Old PXE didn’t care about that, and even still isn’t, which is WHY you’re able to see undionly.kpxe, but after that point is the failure. Just wanted to give clarification.
-
[quote=“Buddy, post: 29092, member: 225”]Tom,
I got rid of the “Operation not Supported” message on a Dell 390 today by using the “e0478-DEFAULT-TEST” ipxe.pxe . Though that version did not work on the Dell 990’s. I went through all of the most current files you have posted and here is what I have found.
e0478-DEFAULT-TEST
undionly.kkpxe - - reboot loop on dell 990 operation not supported
ipxe.pxe - - - - operation not supported then freezes
undionly.kpxe - - - operation not supported then freezes
undionly.pxe - - - - assumming it errors but goes way to fast to read error then reboots
ipxe.kpxe - - - operation not supported then freezes
ipxe.kkpxe - - - operation not supported then freezesd6300-DEFAULT-GOOD
ipxe.pxe - - - operation not supported then freezes
undionly.kpxe - – operation not supported then freezes
undionly.pxe - - - - assumming it errors but goes way to fast to read error then reboots
ipxe.kpxe - - - operation not supported then freezes
ipxe.kkpxe - - - operation not supported then freezes
undionly.kkpxe - - - - operation not supported then reboots[/quote]For what it’s worth, our Dell 990s work on the out of the box PXE configuration.
-
Thanks fractal that is good info to know. So it is definatley something unique with my setup.
I am running A18 on the bios and I will run through boot again and post that in a sec. No wireless or other nics in any of the machines.
-
So, you’re saying that the issue would only appear when a PXE booting device using Undionly.kpxe is directly attached to a managed switch with STP enabled. And that adding an unmanaged, Non-STP switch as an intermediary between the PC and the managed switch can prevent the problem because iPXE communicates directly with the switch for DHCP? Therefore since the PC is not “directly” communicating with a switch that has STP enabled, there is less delay establishing a connection with the DHCP server.
Now, I will admit that i do notice a shorter time to pick up DHCP when i have the intermediary switch in place, but i never realized that switches had anything to do with DHCP packets and requests other than routing them to the correct ports.
I won’t be able to get on those switches until possibly friday due to our DR testing schedule, but i’ll be sure to let you know what happens. I know we haven’t done much if any customizing to those managed switches from their default state, so STP being enabled is certainly possible.
-
[quote=“Tribble, post: 29175, member: 17221”]So, you’re saying that the issue would only appear when a PXE booting device using Undionly.kpxe is directly attached to a managed switch with STP enabled. And that adding an unmanaged, Non-STP switch as an intermediary between the PC and the managed switch can prevent the problem because iPXE communicates directly with the switch for DHCP? Therefore since the PC is not “directly” communicating with a switch that has STP enabled, there is less delay establishing a connection with the DHCP server.[/quote]
The intermediary switch just makes the connection between your DHCP server and the FOG Server so things can still boot.
iPXE does not communicate directly with the switch for DHCP. PXE boot’s, get’s DHCP and then loads the undionly.kpxe file. After this point, to use the “new” protocols within iPXE, ipxe requests it’s own dhcp information. The timeout between PXE and iPXE is vastly skewed. iPXE must wait for proxyDHCP address, then if those “timeout” it requests regular DHCP. Mind you these are in the milisecond to second intervals of “waiting”. The STP “blocks” the DHCP address during this time, unintentionally.
STP works like this:
Establish that there is a link on the port. Clear the port, enable forwarding to the port, PXE doesn’t mind all of this and will ‘wait’ for a much longer time to get DHCP. This isn’t all bad, BUT, when iPXE is re-requesting dhcp, STP ‘blocks’ the connection by, disabling and resetting the port, re-open the port, re-forward the port. Sometimes, this action causes a longer delay than what iPXE is expecting so iPXE DHCP requests ‘timeout’.[quote=“Tribble, post: 29175, member: 17221”]Now, I will admit that i do notice a shorter time to pick up DHCP when i have the intermediary switch in place, but i never realized that switches had anything to do with DHCP packets and requests other than routing them to the correct ports.
I won’t be able to get on those switches until possibly friday due to our DR testing schedule, but i’ll be sure to let you know what happens. I know we haven’t done much if any customizing to those managed switches from their default state, so STP being enabled is certainly possible.[/quote]
-
hi i´m getting the error “could not start: download operation not supported” but this only happens in one domain, in the others we have the same server with no problems,the switches are equals on both domains(cisco), can anybody help me please?
best regardsBruno
[url=“/_imported_xf_attachments/0/905_fog.png?:”]fog.png[/url]
-
[quote=“bveiga, post: 29369, member: 24091”]hi i´m getting the error “could not start: download operation not supported” but this only happens in one domain, in the others we have the same server with no problems,the switches are equals on both domains(cisco), can anybody help me please?
best regardsBruno[/quote]
This is not the iPXE “0x040ee119 error” please make a new thread.