Posts made by 3mu

3mu

Thanks, @george1421. I haven’t worked out the DHCP problem yet, though - I just need to walk away from it for a while. I’m stumped.

3mu

Sorry for the delay, @george1421 - I’ve been battling with a DHCP issue (handing out the same IP to every machine) and it’s beaten me! I’ve just run through your test on one machine and, although it is reporting 7 urandom warnings like the 5.6 kernel, it is successfully imaging now.

3mu

@Sebastian-Roth - It is the FOG installer option that I am referring to. The only customisation of the Ubuntu 18.04 server that I am installing on is setting a static IP address on the FOG interface during installation. I’ve had the same result on 1.5.8 (installed by downloading the tar) and 1.5.9-RC2 (using Git).

3mu

@george1421 - 5.6 fails (I think the messages above were from 5.6). I think 4.19.123 was the same. 4.18.3 works, but does have “3 urandom warning(s) missed due to ratelimiting.” 4.17 gives no urandom warnings.

Would you like me to test each and record the results?

3mu

I really don’t know why this breaks it, sorry @Sebastian-Roth - I had run out of ideas and starting changing one setting at a time. It is easy to replicate, though:

Fresh Ubuntu server 18.04 install
Patch it
Install FOG with no router, DNS server = the FOG server IP (with no DNS server running)

The log events start immediately on completion of the FOG installation, and don’t stop when I disable SNAPINREPLICATORGLOBALENABLED.

To avoid the issue, I reinstalled from scratch with no router or DNS server, then installed BIND9 and edited the DHCP config to add it. I haven’t tried installing BIND before the FOG installation.

3mu

I had two machines that booted successfully once on a different switch, but then never again. I did a packet capture and saw that the client wasn’t requesting an IP address when booting FogOS. I changed KERNELLOGLEVEL to 7 to get some more clues:

Starting haveged: haveged: listening socket at 3
OK
random: crng init done
random: 7 urandom warning(s) missed due to ratelimiting
starting enp2s0 interface and waiting for the link to come up
Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
r8169 0000:02:00.0 enp2s0: no native access to PCI extended config space, falling back to CSI
No link detected on enp2s0 for 35 seconds, skipping it.
Failed to get an IP via DHCP! Tried on interface(s): enp2s0
Please check your network setup and try again

Reading up on urandom and haveged led me (eventually) to the Kernel Update function, and changing to kernel 4.18.3 now gives me a reliable boot.

3mu

I had specified the DNS server to be the FOG server, as I had intended to install DNS on the FOG server as the next step. It appears that this was causing the issue. Installing with no DNS server does not produce the error messages. Given the amount of time that I’ve spent trying to work this out, I thought that it was still worth posting.

3mu

I’ve tried a number of fresh installations of 1.5.8 on a dual-homed server (with the second NIC connected to an isolated network to be used for imaging). On completion of the installation, syslog immediately starts filling with the following block of messages - about one per second:

Aug  7 09:05:00 node2 systemd[1]: Started FOGSnapinReplicator.
Aug  7 09:05:00 node2 php[9501]: Could not open input file: env
Aug  7 09:05:00 node2 systemd[1]: FOGSnapinReplicator.service: Main process exited, code=exited, status=1/FAILURE
Aug  7 09:05:00 node2 systemd[1]: FOGSnapinReplicator.service: Failed with result 'exit-code'.
Aug  7 09:05:01 node2 systemd[1]: FOGSnapinReplicator.service: Service hold-off time over, scheduling restart.
Aug  7 09:05:01 node2 systemd[1]: FOGSnapinReplicator.service: Scheduled restart job, restart counter is at 1.
Aug  7 09:05:01 node2 systemd[1]: Stopped FOGSnapinReplicator.

I’m installing on a fresh installation of Ubuntu Server 18.04, fully patched using the following process:

sudo -i
wget https://github.com/FOGProject/fogproject/archive/1.5.8.tar.gz
tar -xzvf 1.5.8.tar.gz
cd fogproject-1.5.8/bin
./installfog.sh

.fogsettings:

## Start of FOG Settings
## Created by the FOG Installer
## Find more information about this file in the FOG Project wiki:
##     https://wiki.fogproject.org/wiki/index.php?title=.fogsettings
## Version: 1.5.8
## Install time: Fri 07 Aug 2020 09:05:30 AM UTC
ipaddress='192.168.170.1'
copybackold='0'
interface='enp2s0'
submask='255.255.255.0'
hostname='node2'
routeraddress=''
plainrouter=''
dnsaddress='192.168.170.1'
username='fogproject'
password='****'
osid='2'
osname='Debian'
dodhcp='y'
bldhcp='1'
dhcpd='isc-dhcp-server'
blexports='1'
installtype='N'
snmysqluser='fogmaster'
snmysqlpass='****'
snmysqlhost='localhost'
mysqldbname='fog'
installlang='0'
storageLocation='/images'
fogupdateloaded=1
docroot='/var/www/'
webroot='/fog/'
caCreated='yes'
httpproto='http'
startrange='192.168.170.10'
endrange='192.168.170.254'
bootfilename='undionly.kpxe'
packages='apache2 bc build-essential cpp curl g++ gawk gcc genisoimage git gzip htmldoc isc-dhcp-server isolinux lftp libapache2-mod-php7.2 libc6 libcurl4 liblzma-dev m4 mariadb-client mariadb-server net-tools nfs-kernel-server openssh-server php7.2 php7.2-bcmath php7.2-cli php7.2-curl php7.2-fpm php7.2-gd php7.2-json php7.2-ldap php7.2-mbstring php7.2-mysql php-gettext tar tftpd-hpa tftp-hpa unzip vsftpd wget xinetd zlib1g '
noTftpBuild=''
sslpath='/opt/fog/snapins/ssl/'
backupPath='/home/'
armsupport='0'
php_ver='7.2'
php_verAdds='-7.2'
sslprivkey='/opt/fog/snapins/ssl//.srvprivate.key'
## End of FOG Settings

enp2s0 is the built-in NIC and is on the isolated network with a static IP (192.168.170.1/24). This build has a second USB NIC connected to the main network (10.0.0.0/24), which has Internet access.

3mu

I have several Lenovo M73 MT-M 10B7-S01900 desktops. I am using one as a FOG server and trying to manage the others as clients. The clients successfully PXE boot to the FOG menu but when attempting to register the clients I get the following error:

starting enp2s0 interface and waiting for the link to come up

failed to get an IP via DHCP! tried on interfaces(s): enp2s0
please check your network setup and try again

FOG Version: 1.5.9-RC2
.fogsettings.txt
Note: the fog server is dual-homed interface enp2s0 is on the isolated imaging network (192.168.168.0/24) and the other on the main LAN (10.0.0.0/24). The imaging network is using FOG DHCP.

Other machines, including Lenovo ThinkPad t500, DELL Optiplex 9020, register successfully.
The M73s also successfully register on another FOG server on a separate network (single-homed with separate DHCP/DNS) that has been upgraded from 1.5.8 to 1.5.9-RC2.

Does anyone have any tips on where to start troubleshooting this?

3mu

@Sebastian-Roth - Sorry for the delay - some other unrelated issues and then a pandemic to deal with! The fix works perfectly.

3mu

Thanks, @Sebastian-Roth. I’ll test as soon as I finish sorting out a few disasters - hopefully in the next day.

3mu

That would be great, thanks @Sebastian-Roth. I’m happy to test.

3mu

@Sebastian-Roth - I look after a network with over 600 servers at work. The ones that cause me problems are the ones with static IP addresses - generally because somebody makes a typo when they are configuring an interface. We also do a lot of migrations, mergers and other changes, so reserved addresses in DHCP make that much easier, as well as enforcing consistency and a basic level of self-documentation. DHCP also makes it easier to deploy in environments where the user does not have control of the network.

I’m sorry for taking up your time, @george1421, and thank you both.

3mu

@Sebastian-Roth said in Wiki errors - "Troubleshooting a multicast":

/sbin/ip route

default via 10.0.0.138 dev ens3 proto dhcp src 10.0.0.203 metric 100
10.0.0.0/24 dev ens3 proto kernel scope link src 10.0.0.203
10.0.0.138 dev ens3 proto dhcp scope link src 10.0.0.203 metric 100

(run under sudo)

Storage Node config:

3mu

@george1421 - Current uptime was 1 day 14 hours. I have just now:

Cleared all tasks, rebooted and created a new multicast task - the log file still says “–interface dev”.
Cleared the tasks, changed the multicast interface to “eth0” (which doesn’t exist) and created a new multicast task - same result.
Cleared the tasks, rebooted and created a new multicast task - same result.

The only things running on this server are FOG and dnsmasq. It is a VM under KVM and has 4GB RAM and a single processor. I have done a single in-place upgrade from 1.5.7 since my original install. I did have to recover the MySQL password (I was distracted and lost the new password before I could record it), but I recovered it without issue.

I have also updated the fogservice.class.php and FOGSnapinReplicator.service files as per the Interface not ready, waiting for it to come up post.

3mu

@george1421 -

There don’t appear to be any spaces before or after “ens3” - I deleted the contents and typed it in again.

3mu

@george1421 - I can’t remember if it picked up the correct interface on install or if I changed it later when preparing to try multicasting, but it has been set to ens3 for at least a few weeks.

ps only gives:

[username]     18913  0.0  0.0  13136  1092 pts/0    S+   01:39   0:00 grep --color=auto udp-send

The multicast task appears as “queued” for each host in the Active Tasks pane, and the multicast log continues as if all is well:

[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19-10desktopclean
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 2 clients found
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 sending on base port 61792
[04-07-20 11:59:25 am] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 61792 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19-10desktopclean/d1p1.img;
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 has started

If I start a multicast from the command line with

sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log  --ttl 1 --nopointopoint

I get

root     20233  0.0  0.1  66696  4268 pts/1    S+   01:48   0:00 sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
root     20234  0.0  0.0   8884   896 pts/1    S+   01:48   0:00 udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
[username]     20351  0.0  0.0  13136  1000 pts/0    S+   01:49   0:00 grep --color=auto udp-send

3mu

@george1421 said in Wiki errors - "Troubleshooting a multicast":

/images/T500Linux19-10desktopclean

Thanks, @george1421. I’ve changed the image name, but the UDPCAST INTERFACE is already set to ens3 (my FOG server is running under KVM). I’ve tried changing it to a dummy name, saving and then changing it back, but it is still reporting “–interface dev”. That explains why the troubleshooting tests work - they use the correct interface.

I’ve also noticed that the FOG_UDPCAST_STARTINGPORT changes. For my first test this morning it was 56494. I rebooted and now it is 62822.

I’m expecting to image no more than 20 machines at once. It won’t be often so it’s not critical that I get it working, but it could help others if I do.

3mu

@george1421 - I only have a single unmanaged switch and a basic broadband router, so any multicast traffic will be treated as a broadcast (I’m planning to use FOG to maintain PCs for Scouts, Guides and the other groups that I’m involved with, so this is being done as cheap as possible). All machines are on the same L2 network.

I have successfully completed the “Testing 1 client” and “Testing 2 client” tests, and started a multicast session a few minutes ago with two computers joining, but I don’t think that the image is actually being deployed. All that I get in the log is

[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19.10desktopclean
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 2 clients found
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 sending on base port 60328
[04-06-20 11:52:39 pm] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 60328 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19.10desktopclean/d1p1.img;
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 has started

every 10 seconds. I’m too tired to continue now, so will have another crack tomorrow and post in the technical section - this post was just to save some time for anyone else who needs to troubleshoot.

Thanks.

3mu

Another helpful addition - the “Something else to try” section has

gunzip -c "/images/anyimagename/file"

The -S switch is helpful here, as the default extension for image files isn’t the .gz that gunzip expects:

gunzip  -S ".img" -c "/images/anyimagename/file"