Best posts made by Sebastian Roth

Sebastian Roth

@jgiovann Great you figured this is a spanning treee thing! Seems like I was to focused on the issue might be routing problems that I didn’t notice the message udhcpc: no lease, failing.

In a real production network (where turning off spanning tree is not be allowed), is it therefore possible to re-configure the FOG server to wait longer or retry more times before it gives up the registration retry loop ?

The problem cannot be solved in the registration retry loop I think but we’d need to tell the dhcp client to wait longer. But from my point of view you should be able to solve this by setting the client ports to “port fast”. There er different names for this but what it essentially does is disable spanning tree for particular ports where you surely know there are no other switches connected but only clients. On those ports you never ever need spanning tree because single clients connected to a port can never cause a loop (which spanning tree was invented to prevent from)!

Sebastian Roth

@seniorvodka There seems to be no image defined for the host…

Sebastian Roth

@Florent Both binaries (old legacy and new client) are still shipped with FOG. Definitely use /var/www/fog/client/FOGService.msi which is the new client.

Sebastian Roth

@dirtrunner21 said in Fog imaging and M.2 drives:

I’ve tried a newer kernels and they all hang at clearing the MBR/GPT table and go no farther.

This issue is known and fixed. Either update to 1.5.5 or manually update kernel and init.xz! Search the forums and you should find numerous posts on how to do that.

Sebastian Roth

@BrentC @fry_p Thanks heaps for reporting this. I just fixed it. You’ll find the very latest kernel at the top of the page again. Version 4.19.6 as of now.

Sebastian Roth

@Reagan Which Linux OS do you use? Any other page that shows the PHP code? This is very strange!

PHP Notice: A non well formed numeric value is encountered in /var/www/fog/status/bandwidth.php on line 109

This is known and fixed in the dev-branch, going to be in the next release.

Sebastian Roth

@Troye-Johnson Repliaction code (master to storage) has been improved from 1.5.4 to 1.5.5 a lot. As well there were seom PHP-FPM config improvements that should definitely help. But make sure you carefully read the IMPORTANT NOTICE in the release notes on upgrading from 1.5.4 to 1.5.5!

Fernando is right that PHP 7.x is way faster than 5.6.x. But upgrading PHP manually could break if you run the FOG installer later on again. Not sure if I’d advice you to go that route. Depends on how experienced you are doing such things in Linux. Might need manual fixing as well.

But this is all guessing as we don’t really know what is causing the load on your particular system. Possibly run top, take a picture and post here so we get an impression on what’s going on.

Sebastian Roth

@FritzBox360 Great you posted all the logs. Definitely helpful here I reckon. In the install log I see that you install FOG without DHCP. So I suppose you have a DHCP server running in your network that provides PXE boot information (aka DHCP option 66 and 67). What kind of DHCP is this? Which boot file name does it point to? And which server? Does it point to the FOG server as PXE TFTP server or a different machine?

The install and build logs seem all fine.

Sebastian Roth

@Technolust said:

Is it common to leave the mysql password blank?

No you better set a proper mysql root password before installing FOG. We are working on the installer to pilot people to set passwords.

Sebastian Roth

@Technolust said in New Fog Server - /images location:

I setup samba on both servers so I can do in GUI or I can scp the files…

It’s not just the files! What George is saying is that you need to create the image definitions (also can be called metadata) in the web UI of the new server as well. This can either be done through database export (old server) and import (new server) or just manually. The later is way easier to do if it’s just a couple or dozen of image definitions you have. Simply open the web UI of both your servers in two browser windows. Now edit one of the image definitions on the old one and click create image on the new one. Set all options exactly like they were on the old server. Save and go to the next.

Sebastian Roth

@george1421 Can’t stop laughing. We all seem to have stepped into the same puddle of mud here. I figured that the field is not being displayed because AdBlock Plus Firefox Add-in is blocking it!!!

The rest is executing “I think” correctly.

Sorry, no. There is an issue in the advanced.php as mentioned earlier. As soon as Scott has fixed that in the code (manually for now) and maybe disabled AdBlock for this site to be able to edit that field he should be all up and running.

But still I am wondering about the structure of this. When you got to FOG Configuration -> iPXE Menu Item Settings -> fog.advanced. From my understanding this is how the advanced menu was meant to be used at some stage but was never finished. Maybe I am wrong here. @Tom-Elliott is the only one who can tell us I suppose.

I’ll push that one fix to github for now and we’ll see about the rest later.

Sebastian Roth

@fenix_team said in Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job:

My original FOS image is the latest available at Kernel Update GUI page, which currently is 4.19.6 (both bzImage and bzImage32)
I just finished one of the 2 machines with systems as described in OP. The capture job succeeded with not much as a single warning! The system with American Megatrends BIOS is also smoothly past the point in which the issue was happening.

Hey, thanks for reporting so many details about this! I started to look into this and reading all the messages posted. That one really caught my attention. Are you saying that it does work “sometimes” without an issue. Is that on the same kernel version 4.19.6 that is causing the error initially posted?? Would make it even harder for us to nail this issue down.

And a quick comment on the kernel/init versions. There is no strict rule that kernels are compiled against exactly one init version or vice versa. But looking into this more closely I just figured something out that I wasn’t aware of until now: There is an option within buildroot (the toolstack we use for the inits) that is used to optimize glibc compilation. The more recent kernel version you choose the less compatibility code is needed to be build into glibc and therefore the binaries are smaller in size. Sounds pretty straight forward and if I had known this before (there are hundreds of buildroot options and I really don’t know what they are all doing exactly) I would have build with more compatibility!
I will compile a new set of inits with more compatibility now and see if it matters in size much. I guess it won’t as the inits are huge ( just under 20 MB) anyway. We’ll see. I will let you all know.

Ok, back to the initial posted issue: Trying to figure out what might be causing this on your hardware I started by reading the kernel docs on this. Essentially it says that this can be caused by many different things (see a detailed list in the document linked) and we might need to turn on CONFIG_RCU_TRACE in the kernel to get an idea where things go wrong. But as a start we would need to have a clear picture of the exact error messages on screen.

@fenix_team said:

I also noticed another change, all of these machines sometimes got stuck in iPXE boot while loading “/default.ipxe”, at 0%, and forced me to reboot lots of times until randomly it boot correctly. After changing kernel and init versions, that problem vanished (I don’t know if things are related, tho).

From my point of view those two things can’t be related as the Linux kernel is not running when you get to loading default.ipxe yet! It’s interesting you seem to have this fixed by changing the Linux kernel and inits though. I suspect it to be just a coincidence. Usually when thinks don’t load properly at that stage it’s a network driver problem within the iPXE code. Another thing very hard to debug as it is hardware specific and needs to be reproduced to find and fix. But for iPXE there might be a different solution for you. We provide a set of different binaries which you all find in /tftpboot on your FOG server. Default for legacy BIOS machines is undionly.kkpxe. You can try undionly.*pxe (UNDI network stack only), ipxe.*pxe (native driver stack all included), intel.*pxe (native driver but Intel NICs only) and realtek.*pxe (native driver but Realtek NICs only).

Sebastian Roth

@fenix_team @george1421 @Quazz Ok, I just compiled inits that should work with kernels all the way back to 4.15.x (64 bit and 32bit). Can you guys give those a try in your environments before I make those the default?

@Tom-Elliott Turns out size is not a matter. The inits only grew by about 3-4 KB.

Sebastian Roth

@Jpolk91 said in Unable to register/capture/deploy "Failed to get an IP via DHCP! Tried on Interfaces:":

PCI\VEN_8086&DEV_15BB

Well that’s interesting. This NIC is supported in the default Intel driver e1000e since kernel version 4.12.x - long time.

@Jpolk91 When you boot into a debug task again, please run the following command, take a picture and post here: dmesg | grep -i -e eth -e e1000e

Sebastian Roth

@Jpolk91 said in Unable to register/capture/deploy "Failed to get an IP via DHCP! Tried on Interfaces:":

kernel version showing in the version info is showing 4.11 for some reason

Yes that is definitely the reason. See if you can use the 4.12.3 kernels as suggested. If you run into a kernel error then you might need to use the compat inits as well (32 bit and 64 bit).

Just for the record, the command used to get the network card information only shows the wireless adapter. Use lspci -nn | grep -i net to get both.

Sebastian Roth

@totoro Do I get this right? Did you re-use the picture from this post here https://forums.fogproject.org/post/116007 ??

Please take a picture of the actual error on the screen of your client and post here. We can’t help you if you don’t give us the exact details. Most instructions on debugging and assumptions on the actual error are based on the information you give. If it’s just “looks like this” we might be way off!

Sebastian Roth

@george1421 said in Dell 7730 precision laptop deploy GPT error message:

Do we have empirical evidence that these disks are being swapped as being reported by the uefi bios?

See @jmason’s lsblk listings. From my point of view this is evidence enough. The disks are different size and do swap. As far as I got the postings it seems like the output was always taken on the same deploy system. One time 477 GB drive is last and the other time it’s first.

The Precision 7730 generation is pretty new, so the first thing I would check/watch for is firmware update availability.

Definitely a good point!!

It would be interesting to compare the FOS boot logs between the two states to see if there are any telling events.

Good one as well! @jmason Can you please schedule a debug deploy job. Boot that machine and run dmesg | grep -i nvm. Take a picture and reboot the machine. When you are bacl to the shell, again dmesg | grep -i nvm and take a picture. Do this maybe ten times to see if we see a difference there.

Sebastian Roth

@thomasg07 What Linux OS (version) did you try to install FOG on? Your PHP version seems to be way to old (5.4.45) and causes the problem.

Sebastian Roth

@george1421 As posted, using case insensitive grep should work fine: lspci -nn|grep -i net

Sebastian Roth

@CoxM Please check out Wayne’s script here: https://forums.fogproject.org/topic/9103/new-script-to-update-fog-server-s-ip-address