@lakk I have had to work (deal) with them from time to time. I can tell you I did the exact same thing with them (breaking the mirror) by (assuming) the intel raid controller acts like a traditional raid controller. I can tell you it does not, because it exposes both the raid device and the JBOD disks to the OS. The OS needs to be smart enough to know how to manage the array.
I can tell you another example (possibly of what you are seeing). We have several dell precision rack mount workstations that use these raid controllers for their local disks. Somewhere in 2018-2019 they upgraded the OS from Windows 7 to Windows 10. About 6 months later we got a call that 2 of the workstations had reverted back to windows 7. This wasn’t possible because it was a clean install of windows 10 and not an upgrade from Windows 7. Its just not possible to do what they said it did. We had them reboot the workstation and take a few screen shots. They called back and said that it switched back to windows 10. Thinking they were just crazy we said the next time it happened give us a call. About a month later it did it again. To no make this any longer of an example I’ll cut to the point. We found that the raid-1 mirror was split (akin to split brain) some time before windows 10 was installed. So not knowing the mirror was broken they installed windows 10 and it went onto one disk while the other disk remained at windows 7 install. It appears that the intel raid controller picks at random which disk will be the leader and the other the follower in the mirror (for the intel controller the leader disk has read/write activity, while the follower only has write activity). That is how on one boot it would start up as win10 and the another boot win7.
Now you said that isc-dhcp server worked?? The reason why I ask is that isc-dhcp server and dnsmasq do the same thing. So why would isc-dhcp server issue IP addresses and dnsmasq does not respond?
To be honest, this is beyond me.
Initially, when I configured the FOG server I installed isc-dhcp server, configured pool of addresses and added “next-server” and “filename”. Then I set bootprelay of the switch to point to the FOG server and everything worked (and still works if I go back to this scenario).
Afterwards I realized that our expensive switch supports dhcp and decided to give it a shot (why use separate service for something that already runs on the switch, right?). Since then I’ve encountered this issue i.e. switch supporting only numerical option codes (66, 67 etc). Analyzing the frames turned out that my clients are unable to evaluate those options even though they do exist in the dhcp packets! Having “next-server” and “Boot file name” in the packet body seemed like the only way it works. I managed to confirm this by analyzing frames coming from my working isc-dhcp server.
@sebastian-roth@george1421 I may have posted this thread prematurely. I ended up having my colleague try a different type of Ethernet dongle (USB-C vs regular USB) and that did seem to fix the halting issue. I can verify tomorrow, but I think that message in the picture is still there, it just keeps imaging until complete now. I also found out a nice new BIOS came out for this model, so it is possible an update can fix the message. I can test the bios update with one at some point, but we are pretty busy with students returning next Tuesday in person.
I think we can mark as solved in the meantime.
@matthew73 This is a unique condition. I can understand what is going on because we use NAC and VLAN switching on my campus. I can say that I have not seen this issue (anywhere) on my campus.
I think I understand what needs to happen. Basically iPXE needs to say something and then wait XX seconds for your NAC system to identify the hardware and to switch it to the right vlan. The network link light winking happens 2 times during a normal pxe booting. The first time is when the PXE turns over control of the network adapter to iPXE and then when iPXE turns over control of the network adapter to FOS Linux. We see a similar issue when the network switches are using standard spanning tree and not one of the fast protocols (RSTP,MSTP, port-fast).
The developers have created a specific group of iPXE boot loaders that have an embedded 10 second delay before iPXE tries to request an IP address. This gives STP and powersaver functions on the switch a chance to react before iPXE starts to talk. These files are in the 10secdelay folder. So to use these update dhcp option from ipxe.efi to 10secdelay/ipxe.efi This will call in the 10 second delay boot loader. See if that makes things better or not.