Hey Tom, take your time and get yourself settled!
I am wondering if this is a good point too where we reach out to the community for people to join into the FOG team to help work on the code, fix issues, improve stuff here and there.
Hey Tom, take your time and get yourself settled!
I am wondering if this is a good point too where we reach out to the community for people to join into the FOG team to help work on the code, fix issues, improve stuff here and there.
@Tom-Elliott, @Developers, @Moderators
I got carried away trying to improve the current iPXE script. Now that I’ve dug into the syntax and found some interesting new stuff I want to see what you guys think about this:
#!ipxe
isset ${net0/mac} && dhcp net0 || goto dhcpnet1
echo Received DHCP answer on interface net0 && goto proxycheck
:dhcpnet1
isset ${net1/mac} && dhcp net1 || goto dhcperror
echo Received DHCP answer on interface net1 && goto proxycheck
:dhcperror
prompt --key s --timeout 10000 DHCP failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot
:proxycheck
isset ${proxydhcp/next-server} && isset ${next-server} && echo Duplicate option 66 (next server) from DHCP proxy and DHCP server && echo Using IP sent by DHCP proxy ${proxydhcp/next-server} && prompt --timeout 5000 || goto nextservercheck
:nextservercheck
isset ${proxydhcp/next-server} && set next-server ${proxydhcp/next-server} ||
isset ${next-server} && goto netboot || goto setserv
:setserv
echo -n Please enter tftp server: && read next-server && goto netboot || goto setserv
:netboot
chain tftp://${next-server}/default.ipxe ||
prompt --key s --timeout 10000 Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds && shell || reboot
Feel free to comment and improve. I’ve tested the script and tried to remember all the issues I came across in the last months but I am sure we’re not there yet.
@TrialAndError said:
Obviously the development of FOG stopped after a long time of hard work.
It would be friendly by the developers to inform the users about that.
I am not sure what exactly you mean. This is an open source project and we have no time schedule for new releases and we don’t have a set list of features to add or bugs to resolve. We simply do what we can and like.
It’s quite bizarre to state that FOG development stopped… totally wrong!
I want to thank Tom again for this! I tried to assist and help but I just don’t know the code (and its history) as much as Tom does. He’s done all the hard work to find and fix this issue. Along the way he also fixed a couple other things as well.
@Tom-Elliott Don’t say sorry. FOG is work in progress and you are pushing things way forward!! :metal:
We are working towards a most stable release of the 1.5.x line of FOG and will publish release candidates of FOG 1.5.9 for this over the next weeks. We ask people to participate and help test to get the final release as good as we can.
@Wirefall Great you posted the full kernel messages listing. At first I didn’t notice any issue but looking closer I found the issue:
igb: probe of 0000:01:00.0 failed with error -2
The PCI ID perfectly matches the one you mentioned in your fist post. Didn’t take long to find several reports on this issue that came in just lately:
https://lkml.org/lkml/2016/11/24/172
https://bugzilla.suse.com/show_bug.cgi?id=1009911
https://patchwork.ozlabs.org/patch/700615/
Some say that it might possibly work if you disable PXE boot for this NIC in BIOS. I don’t think this is a great solution as FOG heavily relies on PXE booting the clients. Let’s hope that this will be fixed in the latest kernel fairly soon!
@sudburr Why not re-install the whole system from scratch?
@Developers @Moderators Ok, I just opened a pull request to remove the 7156 binaries. Please all keep that in mind. Might cause some minor confusion in the next weeks. But it’s good to get rid of it.
As you all can see the tests of the most recent iPXE version shows that things are back to normal again. I reported back to Michael Brown on the iPXE devel mailing list. So he’s got that in mind as well.
@Avaryan To me this sounds like a different issue. Would you mind opening a new thread on this and posting more details (FOG version, USB ID of the NIC used as shown with lsusb
, …).
I’ll mark this solved/closed now. @Psycholiquid Thanks heaps for the great work on testing all the binaries and such!
@Poelie Not sure if you specific NAS will work but here is some information you can start working on: https://wiki.fogproject.org/wiki/index.php?title=NAS_Storage_Node
@seppim said in Capture Ubuntu 20.04 just free diskspace:
The Raid I need when a disk get failure (not so unusual)
If the disk dies you hopefully have a backup of the data as well as the FOG image to deploy that to a fresh disk in a few minutes.
Not saying that RAID is useless. Not at all. There are setups where RAID is great. But with FOG it’s not as simple if you want to use such things.
The setup as you have it now seems to be spread across three MD/RAID containers. I have no idea how to handle that in FOG. Simply setting host primary disk will probably not work.
I have not played with RAID setups yet and so this is just from the top of my head. Maybe there are other users with more insight into that topic. But I just want to give you hints trying to make this easier for you.
@kek I have that feeling that it might be the CA certificate not being valid anymore. On install the fog-client software grabs that CA cert from your FOG server and installs it into mono’s certificate store.
Run certmgr -list -c -v -m Trust
as root to see if a CA cert named FOG Server CA
is there and still valid.
@seppim said in Capture Ubuntu 20.04 just free diskspace:
Host Primary Disk: /dev/md126
This might be specific to Intel (software) RAID, but I am not sure. Why do you need to use a complex setup with LVM or RAID?
@georgebells said in Persistent Groups - Snapins added to host but not deployed:
not only have it assign snap-ins etc to the host but also create a task to deploy them…
May I ask again: Creating the task by saying yes to deploy right at the end of registration?
Just to see if it makes a difference, can you say no to deploy right at the end of registration but schedule a deploy through the web UI. Does it deploy snapins then?
I think Tom is on the right track with saying it could be a but in the code.
@danieln said in Clients imaging despite recieving "Read ERROR: No such file or directory" and "ata1.00: failed command" errors:
I’d like to circle back around to this to get some more clarifcation. My master node has version 1.5.9-RC2 and this particular node (as well as all the other nodes) have version 1.5.9. I’m unsure as to what the differences between RC2 and the 1.5.9 version are, but do you think this is something worth investigating?
Sure, taking a quick look doesn’t hurt. The 1.5.9-RC2 was a release candidate not long before 1.5.9 was released. Possibly that uses a different kernel - depends on when it was installed.
You should be able to get the kernel version by running the following command on all your nodes: file /var/www/html/fog/service/ipxe/bzImage*
(the *
will include the 64 bit and 32 bit kernel - the output should show kernel version 4.19.x or possibly 5.6.x.)
@jkozee @Tom-Elliott Sorry for bringing up such an old topic again. Working on moving towards the new 5.10.x kernel I was looking at the patches we still apply to our kernel. Most are part of the upstream kernel but not the fix discussed in this topic.
Though the kernel code has changed a bit and I am wondering if we’d still see the slowness without our fix? Would anyone of you be able to replicate the issue with a 5.10.x kernel (with and without fix)?
Searching the web a little more I stumbled upon this patch that made it into the official kernel not long ago: https://patchwork.kernel.org/project/linux-input/patch/20200910143455.109293-12-boqun.feng@gmail.com/
Not sure but could play a role in this case. Anyway it would be great to see if the issue can still be replicated with the newer kernel - without fix.
@seppim Your Ubuntu install was setup with LVM but FOG does not support capturing the filesystem within the LVM yet. So FOG will capture as RAW - takes very long and image files in FOG server is huge.
Do you need the LVM setup? Or did you just blindly skip through the partition setup when installing Ubuntu.
The other question arising is, why would /boot have to be 398 GB? That doesn’t make sense at all.
@tom-elliott said in FOG 1.5.9.57 on Debian 10 mysql root password is blank:
We are a fully clean install. Meaning no packages except git are installed. No Apache, php, or MySQL. Why should the installer ask for root password here? It has never been setup before this point. There would be no password. FOG, in my opinion, should not be defining the root user password here either.
I think it should force the user to have a DB root password unless it’s a setup with local socket access as described below. That was one of the major points of re-writing that part of the installer. I tested a lot and would hope that the installer does what I say on all officially supported systems.
@george1421 Good you are bringing this up. I really hope I got that stuff all right but you never can be sure with just two eyes looking at it.
As far as I can tell from the top of my head Debian and Ubuntu changed to a DB root user that should only be able to connect to the DB through a socket but not via network. The idea is that if you have Linux root access to the machine then connecting to the DB through a socket is allowed without (or with empty) DB root password.
That’s why on current Debian and Ubuntu you are not asked to give/set the DB root password.
The mysql_secure_installation
command mentioned by Tom is just a shell script running some SQL commands and I looked through that script and put all of that right into the FOG installer because it was a pain to run that script without user interaction.