Disappearing hosts from host list
-
er, mine active is not 1.3, since it was first happening, it was about 1.2 or so. then as time passed, we tried in many ways, upgrades, etc. not too long ago mine was most uptodate one (1.4.4), then ofc as developing is going (happily see that btw), ofc it âgrew olderâ. but this happened to me on every single uptodate versions so i think the true reason is far more deep than it is at first glance. i have hope in your script
-
@foglalt I think the issue has been present for a long while. As soon as we know enough about it and can reproduce it, I would expect that a fix can be created quickly. The problem is knowing whatâs causing it and knowing how to reproduce it. We were grasping at straws, So I wrote this script to grasp at straws more efficiently lol.
-
@foglalt What is the MAC addressâ of the host with hostID 43?
-
@developers I have found a pattern - not sure what it means but I wanted to share to get your eyes on it.
These next two code blocks are from the first log the script provided:
Date & time: 2018. febr. 28., szerda, 09:29:01 CET Found hostID '5' without a primary MAC.
From the history table, there are some events about one hour earlier with the same hostID.
[2018-02-28,08:28:10],Host,ID: 5 NAME: laci1 has been successfully updated. laci 2018-02-28 08:28:10 10.10.36.124 [2018-02-28,08:28:10],MACAddressAssociation,ID: 131 has been successfully updated. laci 2018-02-28 08:28:10 10.10.36.124 [2018-02-28,08:27:45],Host,ID: 5 NAME: laci1 has been successfully updated. laci 2018-02-28 08:27:45 10.10.36.124
These next two blocks are from the recent log the script provided.
Date & time: 2018. ĂĄpr. 10., kedd, 09:52:01 CEST Found hostID '43' without a primary MAC.
Again, we have an event in the history table from about one hour earlier with the same hostID.
[2018-04-05,08:53:10],Host,ID: 43 NAME: laci6 has been successfully updated. laci 2018-04-05 08:53:10 146.110.36.124
Could it be that the timestamps from the history table are just wrong and these events actually happened at the same time? If so, Iâm betting that changing the name of a host can somehow - sometimes - cause this primary MAC missing issue.
-
@wayne-workman The timestamps are accurate. The event youâre showing is from February 28th, vs April 5th. I donât know if the timestamps are in UTC or correct to the timezone the user is in.
-
@tom-elliott The timestamps are from whatever is set on his server. You see the 1-hour pattern though in both occasions? Makes no sense to me. Only thing I can think of is a problem in an hourly ran service. This is the script heâs running every minute via cron: https://github.com/FOGProject/fog-community-scripts/blob/master/troubleshootingTools/monitor-missing-primary-mac.sh
-
Tomorrow or later tonight i send you another. Yes, again it happened. Maybe it van be same pattern at least. Btw host is not renamed. But mac is changed. All time!
-
@foglalt When you say the MAC is changed, are you changing it via the GUI or is this something thatâs happening that is not supposed to happen? Please elaborate.
-
@wayne-workman It is simply the following:
- pc1 comes in for reinstallation/installation, its mac is registered in a âdummylikehostâ (for example âlaci1â).
- image selected, task set to deploy, pc1 finishes, turned off
- pc2 comes in for same purpose, host âlaci1â got a mac overwrite (mac gui field selected, typed in the new mac, update button).
the missing host is detected by your script, or previously it was detected when pc3 comes in for processing.
this is why i asked before how others do massive cloning. colleagues do this method cos with it you dont need to remake cloning groups (no image update? so you dont need to change image name). it is like another 5 pc came, you put them on the table, put cables in, register new macs and launch process of cloning. we normally never keep hosts in database, as you may saw in our database before. we only have a few in them. few dedicated ones (like image creators machine and some other).
-
One more thing about yesterdays instance of the problem: during the host update the new mac was registered on the host (this time laci2, to name it as in log you may see) and when the task start was prepared (those hosts are in a group for mass deploy) colleague ralised that group has less member than needed. voila, missing host. so it was disappeared after host update (or during)
-
@foglalt So is it accurate to say âWhen an existing hostâs MAC address is changed to something else, sometimes the primary MAC address is lost.â ?
Also, you know you can image without registering, right?
-
@wayne-workman what do you mean by this? we do this cos of easily possible mass actions (multicast is done with registered hosts, etc or did i miss something?)
-
@foglalt said in Disappearing hosts from host list:
multicast is done with registered hosts, etc or did i miss something?
This thread suggests itâs possible:
https://forums.fogproject.org/topic/9669/image-multicast-without-host-registration-ipxe-input-output-error -
@wayne-workman but how? Anyway it doesnt change the mysterious disappearing sadly
-
@developers the pattern I thought I found earlier no longer holds with this 3rd log.
-
As i am a regular user for detecting and killing hosts with missing primary mac, unfortunatelly, may i use stored procedures for this? does it have any problem with fog database? will it be wiped with fog upgrade? i am trying to find a proper way for colleague to kill issue without me if i am out of office, and for this i am thinking about using stored procedures (well, same for other database commands we may use to maintenance in fog maybe nice)
is it a source of problem or i may use them?
-
@foglalt fog doesnât do any removing or adding or triggers or procedures so if you can think of one to write that can help fix the issue for you then I say go for it. Please, if youâre feeling kind, post it here and maybe what you write might be able to allow me to apply a more direct fix to the core of fog. Iâm hoping with 1.6, however, that this may be fixed due to how weâre approaching Mac addresses.
-
@Tom-Elliott nothing special i would do, just like looking up missig primary mac host and killing the orphans so practically i would move my garbadge collection scripts to database. Maybe i put some authentication before it and let others use it, colleaguea i mean. So nothing new at all.
-
@foglalt You could alter the script you have for detecting them to fix them. FOG wouldnât touch cron or the script.
-
The plan is that i will make some extra scripts for my colleague and for some logging, and if all the goes to database as stored procs, maybe script to detect ajd delete disabled ones go there too. But deleting is what i like to have more control, dont want make fixing with deletion with script.
But⊠As a second thought, what if we check if missing ones have actual mac address but flag is missing only for primary mac. If i set primary mac back to 1 nstead of simply deleting the host as before, will they come back âonlineâ maybe?