Thanks, @george1421. I haven’t worked out the DHCP problem yet, though - I just need to walk away from it for a while. I’m stumped.
Posts made by 3mu
-
RE: Lenovo M73 network fails on FOG OS boot
-
RE: Lenovo M73 network fails on FOG OS boot
Sorry for the delay, @george1421 - I’ve been battling with a DHCP issue (handing out the same IP to every machine) and it’s beaten me! I’ve just run through your test on one machine and, although it is reporting 7 urandom warnings like the 5.6 kernel, it is successfully imaging now.
-
RE: FOGSnapinReplicator.service failing on fresh install
@Sebastian-Roth - It is the FOG installer option that I am referring to. The only customisation of the Ubuntu 18.04 server that I am installing on is setting a static IP address on the FOG interface during installation. I’ve had the same result on 1.5.8 (installed by downloading the tar) and 1.5.9-RC2 (using Git).
-
RE: Lenovo M73 network fails on FOG OS boot
@george1421 - 5.6 fails (I think the messages above were from 5.6). I think 4.19.123 was the same. 4.18.3 works, but does have “3 urandom warning(s) missed due to ratelimiting.” 4.17 gives no urandom warnings.
Would you like me to test each and record the results?
-
RE: FOGSnapinReplicator.service failing on fresh install
I really don’t know why this breaks it, sorry @Sebastian-Roth - I had run out of ideas and starting changing one setting at a time. It is easy to replicate, though:
-
Fresh Ubuntu server 18.04 install
-
Patch it
-
Install FOG with no router, DNS server = the FOG server IP (with no DNS server running)
The log events start immediately on completion of the FOG installation, and don’t stop when I disable SNAPINREPLICATORGLOBALENABLED.
To avoid the issue, I reinstalled from scratch with no router or DNS server, then installed BIND9 and edited the DHCP config to add it. I haven’t tried installing BIND before the FOG installation.
-
-
RE: Lenovo M73 network fails on FOG OS boot
I had two machines that booted successfully once on a different switch, but then never again. I did a packet capture and saw that the client wasn’t requesting an IP address when booting FogOS. I changed KERNELLOGLEVEL to 7 to get some more clues:
Starting haveged: haveged: listening socket at 3 OK random: crng init done random: 7 urandom warning(s) missed due to ratelimiting starting enp2s0 interface and waiting for the link to come up Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) r8169 0000:02:00.0 enp2s0: no native access to PCI extended config space, falling back to CSI No link detected on enp2s0 for 35 seconds, skipping it. Failed to get an IP via DHCP! Tried on interface(s): enp2s0 Please check your network setup and try again
Reading up on urandom and haveged led me (eventually) to the Kernel Update function, and changing to kernel 4.18.3 now gives me a reliable boot.
-
RE: FOGSnapinReplicator.service failing on fresh install
I had specified the DNS server to be the FOG server, as I had intended to install DNS on the FOG server as the next step. It appears that this was causing the issue. Installing with no DNS server does not produce the error messages. Given the amount of time that I’ve spent trying to work this out, I thought that it was still worth posting.
-
FOGSnapinReplicator.service failing on fresh install
I’ve tried a number of fresh installations of 1.5.8 on a dual-homed server (with the second NIC connected to an isolated network to be used for imaging). On completion of the installation, syslog immediately starts filling with the following block of messages - about one per second:
Aug 7 09:05:00 node2 systemd[1]: Started FOGSnapinReplicator. Aug 7 09:05:00 node2 php[9501]: Could not open input file: env Aug 7 09:05:00 node2 systemd[1]: FOGSnapinReplicator.service: Main process exited, code=exited, status=1/FAILURE Aug 7 09:05:00 node2 systemd[1]: FOGSnapinReplicator.service: Failed with result 'exit-code'. Aug 7 09:05:01 node2 systemd[1]: FOGSnapinReplicator.service: Service hold-off time over, scheduling restart. Aug 7 09:05:01 node2 systemd[1]: FOGSnapinReplicator.service: Scheduled restart job, restart counter is at 1. Aug 7 09:05:01 node2 systemd[1]: Stopped FOGSnapinReplicator.
I’m installing on a fresh installation of Ubuntu Server 18.04, fully patched using the following process:
sudo -i wget https://github.com/FOGProject/fogproject/archive/1.5.8.tar.gz tar -xzvf 1.5.8.tar.gz cd fogproject-1.5.8/bin ./installfog.sh
.fogsettings:
## Start of FOG Settings ## Created by the FOG Installer ## Find more information about this file in the FOG Project wiki: ## https://wiki.fogproject.org/wiki/index.php?title=.fogsettings ## Version: 1.5.8 ## Install time: Fri 07 Aug 2020 09:05:30 AM UTC ipaddress='192.168.170.1' copybackold='0' interface='enp2s0' submask='255.255.255.0' hostname='node2' routeraddress='' plainrouter='' dnsaddress='192.168.170.1' username='fogproject' password='****' osid='2' osname='Debian' dodhcp='y' bldhcp='1' dhcpd='isc-dhcp-server' blexports='1' installtype='N' snmysqluser='fogmaster' snmysqlpass='****' snmysqlhost='localhost' mysqldbname='fog' installlang='0' storageLocation='/images' fogupdateloaded=1 docroot='/var/www/' webroot='/fog/' caCreated='yes' httpproto='http' startrange='192.168.170.10' endrange='192.168.170.254' bootfilename='undionly.kpxe' packages='apache2 bc build-essential cpp curl g++ gawk gcc genisoimage git gzip htmldoc isc-dhcp-server isolinux lftp libapache2-mod-php7.2 libc6 libcurl4 liblzma-dev m4 mariadb-client mariadb-server net-tools nfs-kernel-server openssh-server php7.2 php7.2-bcmath php7.2-cli php7.2-curl php7.2-fpm php7.2-gd php7.2-json php7.2-ldap php7.2-mbstring php7.2-mysql php-gettext tar tftpd-hpa tftp-hpa unzip vsftpd wget xinetd zlib1g ' noTftpBuild='' sslpath='/opt/fog/snapins/ssl/' backupPath='/home/' armsupport='0' php_ver='7.2' php_verAdds='-7.2' sslprivkey='/opt/fog/snapins/ssl//.srvprivate.key' ## End of FOG Settings
enp2s0 is the built-in NIC and is on the isolated network with a static IP (192.168.170.1/24). This build has a second USB NIC connected to the main network (10.0.0.0/24), which has Internet access.
-
Lenovo M73 network fails on FOG OS boot
I have several Lenovo M73 MT-M 10B7-S01900 desktops. I am using one as a FOG server and trying to manage the others as clients. The clients successfully PXE boot to the FOG menu but when attempting to register the clients I get the following error:
starting enp2s0 interface and waiting for the link to come up failed to get an IP via DHCP! tried on interfaces(s): enp2s0 please check your network setup and try again
FOG Version: 1.5.9-RC2
.fogsettings.txt
Note: the fog server is dual-homed interface enp2s0 is on the isolated imaging network (192.168.168.0/24) and the other on the main LAN (10.0.0.0/24). The imaging network is using FOG DHCP.Other machines, including Lenovo ThinkPad t500, DELL Optiplex 9020, register successfully.
The M73s also successfully register on another FOG server on a separate network (single-homed with separate DHCP/DNS) that has been upgraded from 1.5.8 to 1.5.9-RC2.Does anyone have any tips on where to start troubleshooting this?
-
RE: Wiki errors - "Troubleshooting a multicast"
@Sebastian-Roth - Sorry for the delay - some other unrelated issues and then a pandemic to deal with! The fix works perfectly.
-
RE: Wiki errors - "Troubleshooting a multicast"
Thanks, @Sebastian-Roth. I’ll test as soon as I finish sorting out a few disasters - hopefully in the next day.
-
RE: Wiki errors - "Troubleshooting a multicast"
That would be great, thanks @Sebastian-Roth. I’m happy to test.
-
RE: Wiki errors - "Troubleshooting a multicast"
@Sebastian-Roth - I look after a network with over 600 servers at work. The ones that cause me problems are the ones with static IP addresses - generally because somebody makes a typo when they are configuring an interface. We also do a lot of migrations, mergers and other changes, so reserved addresses in DHCP make that much easier, as well as enforcing consistency and a basic level of self-documentation. DHCP also makes it easier to deploy in environments where the user does not have control of the network.
I’m sorry for taking up your time, @george1421, and thank you both.
-
RE: Wiki errors - "Troubleshooting a multicast"
@Sebastian-Roth said in Wiki errors - "Troubleshooting a multicast":
/sbin/ip route
default via 10.0.0.138 dev ens3 proto dhcp src 10.0.0.203 metric 100 10.0.0.0/24 dev ens3 proto kernel scope link src 10.0.0.203 10.0.0.138 dev ens3 proto dhcp scope link src 10.0.0.203 metric 100
(run under sudo)
Storage Node config:
-
RE: Wiki errors - "Troubleshooting a multicast"
@george1421 - Current uptime was 1 day 14 hours. I have just now:
- Cleared all tasks, rebooted and created a new multicast task - the log file still says “–interface dev”.
- Cleared the tasks, changed the multicast interface to “eth0” (which doesn’t exist) and created a new multicast task - same result.
- Cleared the tasks, rebooted and created a new multicast task - same result.
The only things running on this server are FOG and dnsmasq. It is a VM under KVM and has 4GB RAM and a single processor. I have done a single in-place upgrade from 1.5.7 since my original install. I did have to recover the MySQL password (I was distracted and lost the new password before I could record it), but I recovered it without issue.
I have also updated the fogservice.class.php and FOGSnapinReplicator.service files as per the Interface not ready, waiting for it to come up post.
-
RE: Wiki errors - "Troubleshooting a multicast"
There don’t appear to be any spaces before or after “ens3” - I deleted the contents and typed it in again. -
RE: Wiki errors - "Troubleshooting a multicast"
@george1421 - I can’t remember if it picked up the correct interface on install or if I changed it later when preparing to try multicasting, but it has been set to ens3 for at least a few weeks.
ps only gives:
[username] 18913 0.0 0.0 13136 1092 pts/0 S+ 01:39 0:00 grep --color=auto udp-send
The multicast task appears as “queued” for each host in the Active Tasks pane, and the multicast log continues as if all is well:
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19-10desktopclean [04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 2 clients found [04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 sending on base port 61792 [04-07-20 11:59:25 am] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 61792 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19-10desktopclean/d1p1.img; [04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 has started
If I start a multicast from the command line with
sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
I get
root 20233 0.0 0.1 66696 4268 pts/1 S+ 01:48 0:00 sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint root 20234 0.0 0.0 8884 896 pts/1 S+ 01:48 0:00 udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint [username] 20351 0.0 0.0 13136 1000 pts/0 S+ 01:49 0:00 grep --color=auto udp-send
-
RE: Wiki errors - "Troubleshooting a multicast"
@george1421 said in Wiki errors - "Troubleshooting a multicast":
/images/T500Linux19-10desktopclean
Thanks, @george1421. I’ve changed the image name, but the UDPCAST INTERFACE is already set to ens3 (my FOG server is running under KVM). I’ve tried changing it to a dummy name, saving and then changing it back, but it is still reporting “–interface dev”. That explains why the troubleshooting tests work - they use the correct interface.
I’ve also noticed that the FOG_UDPCAST_STARTINGPORT changes. For my first test this morning it was 56494. I rebooted and now it is 62822.
I’m expecting to image no more than 20 machines at once. It won’t be often so it’s not critical that I get it working, but it could help others if I do.
-
RE: Wiki errors - "Troubleshooting a multicast"
@george1421 - I only have a single unmanaged switch and a basic broadband router, so any multicast traffic will be treated as a broadcast (I’m planning to use FOG to maintain PCs for Scouts, Guides and the other groups that I’m involved with, so this is being done as cheap as possible). All machines are on the same L2 network.
I have successfully completed the “Testing 1 client” and “Testing 2 client” tests, and started a multicast session a few minutes ago with two computers joining, but I don’t think that the image is actually being deployed. All that I get in the log is
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19.10desktopclean [04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 2 clients found [04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 sending on base port 60328 [04-06-20 11:52:39 pm] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 60328 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19.10desktopclean/d1p1.img; [04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 has started
every 10 seconds. I’m too tired to continue now, so will have another crack tomorrow and post in the technical section - this post was just to save some time for anyone else who needs to troubleshoot.
Thanks.
-
RE: Wiki errors - "Troubleshooting a multicast"
Another helpful addition - the “Something else to try” section has
gunzip -c "/images/anyimagename/file"
The -S switch is helpful here, as the default extension for image files isn’t the .gz that gunzip expects:
gunzip -S ".img" -c "/images/anyimagename/file"