Disappearing hosts from host list
-
@foglalt I think the issue has been present for a long while. As soon as we know enough about it and can reproduce it, I would expect that a fix can be created quickly. The problem is knowing what’s causing it and knowing how to reproduce it. We were grasping at straws, So I wrote this script to grasp at straws more efficiently lol.
-
@foglalt What is the MAC address’ of the host with hostID 43?
-
@developers I have found a pattern - not sure what it means but I wanted to share to get your eyes on it.
These next two code blocks are from the first log the script provided:
Date & time: 2018. febr. 28., szerda, 09:29:01 CET Found hostID '5' without a primary MAC.
From the history table, there are some events about one hour earlier with the same hostID.
[2018-02-28,08:28:10],Host,ID: 5 NAME: laci1 has been successfully updated. laci 2018-02-28 08:28:10 10.10.36.124 [2018-02-28,08:28:10],MACAddressAssociation,ID: 131 has been successfully updated. laci 2018-02-28 08:28:10 10.10.36.124 [2018-02-28,08:27:45],Host,ID: 5 NAME: laci1 has been successfully updated. laci 2018-02-28 08:27:45 10.10.36.124
These next two blocks are from the recent log the script provided.
Date & time: 2018. ápr. 10., kedd, 09:52:01 CEST Found hostID '43' without a primary MAC.
Again, we have an event in the history table from about one hour earlier with the same hostID.
[2018-04-05,08:53:10],Host,ID: 43 NAME: laci6 has been successfully updated. laci 2018-04-05 08:53:10 146.110.36.124
Could it be that the timestamps from the history table are just wrong and these events actually happened at the same time? If so, I’m betting that changing the name of a host can somehow - sometimes - cause this primary MAC missing issue.
-
@wayne-workman The timestamps are accurate. The event you’re showing is from February 28th, vs April 5th. I don’t know if the timestamps are in UTC or correct to the timezone the user is in.
-
@tom-elliott The timestamps are from whatever is set on his server. You see the 1-hour pattern though in both occasions? Makes no sense to me. Only thing I can think of is a problem in an hourly ran service. This is the script he’s running every minute via cron: https://github.com/FOGProject/fog-community-scripts/blob/master/troubleshootingTools/monitor-missing-primary-mac.sh
-
Tomorrow or later tonight i send you another. Yes, again it happened. Maybe it van be same pattern at least. Btw host is not renamed. But mac is changed. All time!
-
@foglalt When you say the MAC is changed, are you changing it via the GUI or is this something that’s happening that is not supposed to happen? Please elaborate.
-
@wayne-workman It is simply the following:
- pc1 comes in for reinstallation/installation, its mac is registered in a “dummylikehost” (for example “laci1”).
- image selected, task set to deploy, pc1 finishes, turned off
- pc2 comes in for same purpose, host “laci1” got a mac overwrite (mac gui field selected, typed in the new mac, update button).
the missing host is detected by your script, or previously it was detected when pc3 comes in for processing.
this is why i asked before how others do massive cloning. colleagues do this method cos with it you dont need to remake cloning groups (no image update? so you dont need to change image name). it is like another 5 pc came, you put them on the table, put cables in, register new macs and launch process of cloning. we normally never keep hosts in database, as you may saw in our database before. we only have a few in them. few dedicated ones (like image creators machine and some other).
-
One more thing about yesterdays instance of the problem: during the host update the new mac was registered on the host (this time laci2, to name it as in log you may see) and when the task start was prepared (those hosts are in a group for mass deploy) colleague ralised that group has less member than needed. voila, missing host. so it was disappeared after host update (or during)
-
@foglalt So is it accurate to say “When an existing host’s MAC address is changed to something else, sometimes the primary MAC address is lost.” ?
Also, you know you can image without registering, right?
-
@wayne-workman what do you mean by this? we do this cos of easily possible mass actions (multicast is done with registered hosts, etc or did i miss something?)
-
@foglalt said in Disappearing hosts from host list:
multicast is done with registered hosts, etc or did i miss something?
This thread suggests it’s possible:
https://forums.fogproject.org/topic/9669/image-multicast-without-host-registration-ipxe-input-output-error -
@wayne-workman but how? Anyway it doesnt change the mysterious disappearing sadly
-
@developers the pattern I thought I found earlier no longer holds with this 3rd log.
-
As i am a regular user for detecting and killing hosts with missing primary mac, unfortunatelly, may i use stored procedures for this? does it have any problem with fog database? will it be wiped with fog upgrade? i am trying to find a proper way for colleague to kill issue without me if i am out of office, and for this i am thinking about using stored procedures (well, same for other database commands we may use to maintenance in fog maybe nice)
is it a source of problem or i may use them?
-
@foglalt fog doesn’t do any removing or adding or triggers or procedures so if you can think of one to write that can help fix the issue for you then I say go for it. Please, if you’re feeling kind, post it here and maybe what you write might be able to allow me to apply a more direct fix to the core of fog. I’m hoping with 1.6, however, that this may be fixed due to how we’re approaching Mac addresses.
-
@Tom-Elliott nothing special i would do, just like looking up missig primary mac host and killing the orphans so practically i would move my garbadge collection scripts to database. Maybe i put some authentication before it and let others use it, colleaguea i mean. So nothing new at all.
-
@foglalt You could alter the script you have for detecting them to fix them. FOG wouldn’t touch cron or the script.
-
The plan is that i will make some extra scripts for my colleague and for some logging, and if all the goes to database as stored procs, maybe script to detect ajd delete disabled ones go there too. But deleting is what i like to have more control, dont want make fixing with deletion with script.
But… As a second thought, what if we check if missing ones have actual mac address but flag is missing only for primary mac. If i set primary mac back to 1 nstead of simply deleting the host as before, will they come back “online” maybe?
-
@foglalt said in Disappearing hosts from host list:
But… As a second thought, what if we check if missing ones have actual mac address but flag is missing only for primary mac. If i set primary mac back to 1 nstead of simply deleting the host as before, will they come back “online” maybe?
Sounds reasonable to me. Try it.