UNSOLVED Disappearing hosts from host list

  • Hi!

    We noticed a strange issue since 1.3.0 installed. Formerly well known hosts are no visible if listed (like all hosts or group memberships). At first we thought those hosts are not there we only just remembering wrong, but in group list or host lists it is counted (group shows 5 pc but 4 listed) and if we try to create a new host by that name it says there are already a host with that name.

    How can we kill this bug? 🙂

    additional info: the mentioned hosts are for cloning processes only. Name is permanent, but sometimes MAC or image changed for the new cloning procedure.

    Anyone have seen it maybe?

  • Guys, during the struggle to reproduce the issue somehow i found something strange. Let me describe it.

    Starting point: host has a valid mac, up to date. i want to change mac. It can cause 2 different output, and i think it can cause the result somehow (i will do more tests to see if it really happens, but the difference made me curious).

    mac update method1: host primary mac field is the only field, no additional field. i select the full field, then overwrite mac. result is a new mac on primary and an additional mac field appears with the previous mac address (which is wrong, as it has no another, but a new mac actually)

    mac update method2: same starting point, but i only use mouse click to the end of the field, without selection of any char i use backspace, for like 1 char, and type another char instead of the deleted one. here NO additional mac field comes to existence. why? if i change the mac any method i want to change it, not add a new one (i will make test with it, maybe this way it will disappear somehow)

    EDIT: sorry, somehow i did not notice what you wrote before me.

  • @foglalt no the pending Mac is not the issue. It looks up pending macs based on a 1 on the pending.

    The issue, it seems, is a primary Mac is changing for a host. The primary should move to additional macs and the new primary should become primary. However the move is probably happening before setting the new primary Mac causing fog to think the host is no longer valid.

  • Normally the issue happens when a host mac is updated (primary mac field is selected with ctrl+a, then entered the new mac). we never use intentionaly more mac on a host (and they dont have any more interface, not like laptops with wifi, eth, whatever simultaneously). How can a host have secondary mac in database? When i first saw such i asked colleagues how they did, and some faint memory i have about it whuch tells when updated, the previouly overwritten mac becames secondary. But it is far from intentional, those are pc-s, with 1 mac only.

    Can it be a clue to the investigation…?

    like this:

    MariaDB [fog]> select * from hostMAC where hmHostID=37;
    | hmID | hmHostID | hmMAC             | hmDesc | hmPrimary | hmPending | hmIgnoreClient | hmIgnoreImaging |
    |  170 |       37 | 8c:89:a5:53:1b:f5 |        | 0         | 0         | 0              | 0               |
    |  298 |       37 | 6c:4b:90:4e:53:9e |        | 1         |           | 0              | 0               |

    As you may see, the primary mac has a strange difference from additional (298 is the primary, 170 is the additional mac id). why it has no zero at pending and why has the other a zero? That field is not manually edited, so the software logic updates that field. But why differently? I still have a feeling that this issue is connected with “pending mac”…

  • @foglalt said in Disappearing hosts from host list:

    But… As a second thought, what if we check if missing ones have actual mac address but flag is missing only for primary mac. If i set primary mac back to 1 nstead of simply deleting the host as before, will they come back “online” maybe?

    Sounds reasonable to me. Try it.

  • The plan is that i will make some extra scripts for my colleague and for some logging, and if all the goes to database as stored procs, maybe script to detect ajd delete disabled ones go there too. But deleting is what i like to have more control, dont want make fixing with deletion with script.

    But… As a second thought, what if we check if missing ones have actual mac address but flag is missing only for primary mac. If i set primary mac back to 1 nstead of simply deleting the host as before, will they come back “online” maybe?

  • @foglalt You could alter the script you have for detecting them to fix them. FOG wouldn’t touch cron or the script.

  • @Tom-Elliott nothing special i would do, just like looking up missig primary mac host and killing the orphans 😀 so practically i would move my garbadge collection scripts to database. Maybe i put some authentication before it and let others use it, colleaguea i mean. So nothing new at all.

  • @foglalt fog doesn’t do any removing or adding or triggers or procedures so if you can think of one to write that can help fix the issue for you then I say go for it. Please, if you’re feeling kind, post it here and maybe what you write might be able to allow me to apply a more direct fix to the core of fog. I’m hoping with 1.6, however, that this may be fixed due to how we’re approaching Mac addresses.

  • As i am a regular user for detecting and killing hosts with missing primary mac, unfortunatelly, may i use stored procedures for this? does it have any problem with fog database? will it be wiped with fog upgrade? i am trying to find a proper way for colleague to kill issue without me if i am out of office, and for this i am thinking about using stored procedures (well, same for other database commands we may use to maintenance in fog maybe nice)

    is it a source of problem or i may use them?

  • @developers the pattern I thought I found earlier no longer holds with this 3rd log.

  • @wayne-workman but how? Anyway it doesnt change the mysterious disappearing sadly 😞

  • @foglalt said in Disappearing hosts from host list:

    multicast is done with registered hosts, etc or did i miss something?

    This thread suggests it’s possible:

  • @wayne-workman what do you mean by this? we do this cos of easily possible mass actions (multicast is done with registered hosts, etc or did i miss something?)

  • @foglalt So is it accurate to say “When an existing host’s MAC address is changed to something else, sometimes the primary MAC address is lost.” ?

    Also, you know you can image without registering, right?

  • One more thing about yesterdays instance of the problem: during the host update the new mac was registered on the host (this time laci2, to name it as in log you may see) and when the task start was prepared (those hosts are in a group for mass deploy) colleague ralised that group has less member than needed. voila, missing host. so it was disappeared after host update (or during)

  • @wayne-workman It is simply the following:

    • pc1 comes in for reinstallation/installation, its mac is registered in a “dummylikehost” (for example “laci1”).
    • image selected, task set to deploy, pc1 finishes, turned off
    • pc2 comes in for same purpose, host “laci1” got a mac overwrite (mac gui field selected, typed in the new mac, update button).

    the missing host is detected by your script, or previously it was detected when pc3 comes in for processing.

    this is why i asked before how others do massive cloning. colleagues do this method cos with it you dont need to remake cloning groups (no image update? so you dont need to change image name). it is like another 5 pc came, you put them on the table, put cables in, register new macs and launch process of cloning. we normally never keep hosts in database, as you may saw in our database before. we only have a few in them. few dedicated ones (like image creators machine and some other).

  • @foglalt When you say the MAC is changed, are you changing it via the GUI or is this something that’s happening that is not supposed to happen? Please elaborate.

  • Tomorrow or later tonight i send you another. Yes, again it happened. Maybe it van be same pattern at least. Btw host is not renamed. But mac is changed. All time!

  • @tom-elliott The timestamps are from whatever is set on his server. You see the 1-hour pattern though in both occasions? Makes no sense to me. Only thing I can think of is a problem in an hourly ran service. This is the script he’s running every minute via cron: https://github.com/FOGProject/fog-community-scripts/blob/master/troubleshootingTools/monitor-missing-primary-mac.sh