SOLVED FOG Trunk 5161 AutoNumbering
Hello. New-ish FOG user here.
I’m work-study helping my school set up a refurbishing lab for a non-profit. We were able to get a function FOG 0.32 set-up running on CentOS 6.5 after some hurdles and stumbling.
Our current project is setting up a 1.2 server on CentOS 6.5. FOG 0.32 served well enough, but the capabilities of the 1.x+ versions made us want to upgrade.
FOG1.2 was installed fresh on a new machine (different from the 0.32 machine we used; the 0.32 system is our fallback server currently.) We had issues with Multicasting, so we updated to the trunk version (vis SVN). Multicasting still didn’t work for us, but the newer torrent-casting works well enough (tested on 5 machines), so we decided this will do.
However, the Quick Registration Autonumbering system seems to have a bug in it. It doesn’t increment the number properly and apparently inserts invalid data into the DB. This invalid data manifests itself as “ghost” hosts. The “hosts” appear in the DB but can’t be manipulated in any way from the web GUI (they don’t even appear there). This includes ghost entries in Groups if I have a default group set in the autoreg options.
The autoreg numbering system is VERY useful to us since there will be refurb’ed machines we’re return. Especially because the lab this is being done in is for fellow students to get some hands on training. We do have a time table to eventually follow (we have leeway since we’re still in the set-up phase.) Having the autoname/number option work properly would reduce the workload on the core staff for the lab (an instructor and 2 work-studies.) We currently have 72 machines to re-image.
Is there anything I could check/edit that would correct the behavior of the auto-numbering system? Any help would be appreciated. We’d prefer not defaulting to using MAC addresses as the hostnames nor manually assigning hostnames via full registration or post-imaging.
@StahnAileron I’ve added a ton of functionality (I think) to the installer.
If even works with typical switches and output to assist in knowing more.
./installfog.sh -?will print potential usage options.
@Tom-Elliott Oh, did not know that. I was using the Wiki as a reference and it never mentioned that switch. The Wiki simply states to delete the .fogsettings file. Nice to know that now. I’ll have to take note of that for our documentation.
@StahnAileron the script for the installer already has a switch to disable update and perform a full/fresh install. It can be done with:
Final Progress Report
So I actually got Multicasting to work. Apparently the switch from 1.2 to Trunk left some files behind and/or wasn’t truly complete. Some stuff I had tinkered with in 1.2 was held over in Trunk. So I essentially screwed myself over on that one.
In any event, trimming down the install (i.e. deleting almost everything from FOG), pulling a fresh copy of the trunk via SVN, and MAKING SURE the installation truly completed all the way fixed my problems. I hate self-induced problems like this, but live and learn.
Thanks for all the help! I now have a better feel for mucking around in FOG.
I noticed that if I had DHCP already running the script would abort because setting up DHCP failed. This apparently was part of my problem. (I wasn’t paying enough attention until my attempts at re-installing the system pointed me to an incomplete re-installation process.) I had to comment out that line in the script to make it easier to re-install FOG. DCHP was always running. (I’d just restart manually it afterwards, just in case.) It seems that the script interprets DHCP already running as a failure. (It doesn’t seem to have this problem with any other running service.)
Also, would it be too much to ask for an option in the installation script to purge the current settings and start fresh? I had to delete the hidden .fogsettings file a couple of times during my endeavor.
@Wayne-Workman I’ll try the procedures you posted. We do have a managed switch in the mix out of 3 total. The others are dumb switches. I’ll look into getting it arranged (physically) so the managed switch is out of the loop. It didn’t give us problems with the FOG 0.32 server we had originally, but shrug never know, right?
@StahnAileron Try to start the fog services 30 seconds after boot. There is an example in this article: https://wiki.fogproject.org/wiki/index.php/Fedora_21_Server
Also, try to clear out the two relevant tables in MySQL, there are steps for that here: https://wiki.fogproject.org/wiki/index.php/Troubleshoot_Downloading_-_Multicast
Also, to further simplify the problem, you might try to use a basic non-managed Layer 2 switch to multicast with till you get it working.
Well, the auto-naming/-numbering now works. Registering hosts for imaging is now far easier. Thanks for the quick fix!
As for multicasting: still having problems. I did check the interface name for the relevant settings (for the master node and one under FOG Settings). Still stalls at the Partclone screen. However, the FOG log still states that the various multicasting services are perpetually crashing and restarting. I’m guessing this is the current issue I need to resolve to get multicasting to work. They each stop working with the log stating they exit with error 255. If the services keep crashing, I assume that would screw over the multicasting jobs I set, no? What can I look at and/or try to stabilize the services?
For future readers, I’ve further updated the troubleshooting multicast article based on things I’ve posted in here.
I’m guessing I’ll just have to update the trunk copy I have (via SVN) and “re-install”/update FOG, correct?
I thought I already checked the interface. I was having minor issues with that when we switched from 1.2 to Trunk. Guess I didn’t check hard enough (though I never would’ve thought to check the interface name in relation to UDP Multicasting.) I’ll be double-checking once I’m in to look at the server.
@Tom-Elliott Thanks for the quick fixes to that (those) bug(s)! I’ll be heading in to my school in a few hours. I’m guessing I’ll just have to update the trunk copy I have (via SVN) and “re-install”/update FOG, correct?
Again, thank you for the help and support. I’ll follow up with a progress report once I get to work on the server once more.
Wayne Workman last edited by
@StahnAileron Look at the pictures below, these pictures were taken from my home FOG server. On it, if I wanted, I could have many nodes; even though it’s only just one “self contained” server. To multicast, I think you have to have a Master Node set, and the interface for that node must be correct.
Nevermind what I said.
I found the autonumber bug and fixed it as well as a rather disastrous bug and a couple other minor bugs. I even added some partial functionality that the auto number system will auto populate itself and increase until it finds a host that does not have that number automatically.
@StahnAileron I’d recommend checking the Storage node. This is especially true on Trunk.
1.2 and possibly prior always made the assumption that eth0 was the interface your NIC was on. You could change it, yes, but it wasn’t until about a month or two ago that i realized it was specifically the interface causing the problem as you describe (Task starts, and next checkin it “Completes and cleaned”)
This is because the UDP sender job is created, but most likely looking for interface named eth0.
The UDP Sender command starts successfully, but then fails. The MulticastTask is tracking this, and because none of the tasks have checked in, and none of them have been cancelled, the only viable solution is that the task must’ve completed successfully (even if it was only 10 seconds…or so).
To fix the multicast issue, Go to Storage Management Page->Choose your relevant storage node (probably named DefaultMember) and look at the interface setting on it. It most likely is eth0 or blank. Open a terminal, or ssh, or whatever, on your FOG Server and run the command:
ip addr showor
Look for your relevant interface.
It will probably look something like:
root@fogserver ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether hh:hh:hh:hh:hh:hh brd ff:ff:ff:ff:ff:ff inet 333.333.333.333/333 brd 333.333.333.355 scope global eno16777728 valid_lft forever preferred_lft forever inet6 hhhh::hhh:hhhh:hhhh:hhhh/64 scope link valid_lft forever preferred_lft forever root@fogserver ~]# ifconfig -a ifconfig -a eno16777728: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 333.333.333.333 netmask 333.333.333.333 broadcast 333.333.333.355 inet6 hhhh::hhhh:hhhh:hhhh:hhhh prefixlen 64 scopeid 0x20<link> ether hh:hh:hh:hh:hh:hh txqueuelen 1000 (Ethernet) RX packets 9437234 bytes 6288619143 (5.8 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 8709833 bytes 1822128317 (1.6 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 625597 bytes 84696575 (80.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 625597 bytes 84696575 (80.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Yes I’m well aware my mac addresses are impossible as well as my ip addresses and what not. That’s on purpose.
So if my IP were possible and I knew it to be 333.333.333.333, using either or both of the commands there I get the same interface name: eno16777728, which is the name i would set my Storage Node’s interface to.
Multicast should magically start working.
I will take a look at the code dealing with auto numbering and see if I can figure out what’s wrong and get a suitable fix. I’m a bit tired today, so I hope to have it solved maybe tomorrow.
Hopefully I’ve helped a little bit.
@Wayne-Workman For the multicasting, could you define “master storage node set”? I only just got into FOG (and Linux). We currently have a single machine with CentOS 6.5 and just FOG Trunk 5161 on it. All the services required to run FOG are ran from that single server. So the only storage node with have is technically the FOG server itself. I’m currently home, so I won’t be able to look at the machine again until Monday, but I’ll take at look interfaces like you mentioned. I do recall having a slight issue with interfaces at one point, though that seems to be corrected for now. (Unicast and Torrent-cast worked.)
@Tom-Elliott Actually, now that I think about again, the multicast issue I had was similar to other threads I’ve seen here: The job is started, but the log says it stops and “completed” 10 seconds later. The hosts that are part of the multicast job just hang at the PartClone screen, waiting. This was in FOG 1.2 (so not quite relevant currently, I guess.) Currently in Trunk 5161, I think the 3 multicast services were spamming the log, terminating with a 255 error repeatedly.
I haven’t looked into the current multicast issues as much as I would’ve like. I got hung up on testing an troubleshooting the Auto-reg/Auto-numbering issue I came across. For now, I just want to focus on the AutoReg/-Number problem I have.
Thanks for the quick replies!
@StahnAileron for Multicasting, you have to set the interface for the master storage node correctly.
CentOS and Fedora can use weird names. You can get the names of the interfaces like this:
ip link show, that will just show the names and info. For addresses too, you would use
ip addr, You’d put that interface information into the master storage node’s settings.
I think it also helps to have a master storage node set… I believe in Trunk, multicasting only happens from a master storage node.
As far as the auto-numbering, I’m afraid I’ll need to leave that question to the more experienced users @Developers @Moderators , I’m not familiar with it. When they reply about it, I’ll be familiar though, because I read every post on here.
And Tom is right, we need to know an exact version.
What version of fog are you running?
You state FOG Trunk 5161, but then reference 1.2.0 throughout the post.
What is the value in the Cloud on the GUI?