Disappearing hosts from host list
-
Guys, I know this topic is old, but i dont want to create a new topic as my issue now is the exact same as was in this topic. We here had solution to “fix database” with this issue, and i strongly hoped that the problem is only cos of some twisted database issue somehow, but now i again has it happening.
ATM, server is current uptodate 1.4.4, server os is debian 9.3, database is practically totally new (i kept only 2-3 image from old installation to make a new and flawless db. And colleagues report that host time-to-time disappeare from database.
As i has now key to find and eliminate it, i can solve this, but… god, it is so frustrating that i cant see the true reason for such. Issue is still same method as for creation: “dummy host” has mac, it gets deployed. new machine comes in, the original dummy is updated with new mac, deploying again. And on the mac update the host disappears sometimes! (mac is literally overwritten with new, which makes prev mac to be secondary or what? in host/mac pairing i see the new machine has the actual mac in db, but it has no primary mac. And as it has only 1 mac, it is “a host without mac” (as i cannot register a host without a mac, maybe gui wont display such hosts).
We previously cooperated in this matter, maybe with this post it can be continued towards a solution I hope at least.
-
@foglalt Are you sure it’s the same problem? Do the SQL commands I posted fix it still? I have some ideas on how to track down the issue.
-
@wayne-workman i am open for any idea. Issue is same on my end, i can fix it with previous methods you mention.
-
@foglalt Ok, give me a day or two - I’ll put together a monitoring script that will record when the problem occurs - the script will also grab the last 100 lines from apache access logs and error logs - and say the last 50 entries from the history table. That information will have the clues that we can use to find what’s causing this.
-
Ok, just take your time. I am glad that you guys are so responsive in all ways of user interaction! I dont even get the clue about that moron on forum claiming that fogproject is dead… We should thank your work, not complain in such nonsense way.
-
@foglalt I put together a script:
https://github.com/wayneworkman/fog-community-scripts/blob/master/troubleshootingTools/monitor-missing-primary-mac.sh
For future readers, this script will be here in the troubleshooting tools:
https://github.com/FOGProject/fog-community-scriptsIt monitors the count of hosts that are missing a primary mac. If the count goes above 0, it gets those host IDs, the last 100 lines of the apache access log, the last 100 lines of the apache error log, and the last 50 entires from FOG’s history table and dumps them to the file
/root/troubleshooting.log
Once that log file is written to, the script will not write anything else.Get the script onto your fog server, and then you will need to setup this script to run as a cron task every minute on your FOG Server. There’s lots of tutorials if you google search
cron
orcrontab
but generally steps are as follows:sudo -i crontab -e # this might ask you if you want to use vim basic or vim tiny, it doesn't matter which. # Go into insert mode with: i # Paste in the below line: * * * * * /root/monitor-missing-primary-mac.sh # Or wherever you put the script. # Leave insert mode with the escape key esc # Save and close in vim with :wq
There are probably better tutorials out there on how to use Vim & create a cron job, but I put together this little Vim tutorial some years ago:
https://wiki.fogproject.org/wiki/index.php?title=ViIf you need further instructions/clarification on how to setup the script, just ask. When the file
/root/troubleshooting.log
appears on your server, just look through it yourself first - remove any sensitive information. Maybe this file will help you figure out the bug yourself even. If not, you can share the file here with us and we can probably get a very good idea on how to reproduce this issue. -
And again, impressive response time! Thanks again ok, i do preparation and wait fo the rabbit to jump out its nest. If i understand well, if a log appears it found something. Lets wait and hope. i will report back on next hit. (anyway you gave me good ideas on how to do some things for myself as i read through your script, nice work!)
-
This post is deleted! -
@foglalt Do we have the rabbit yet?
-
No, not yet. During weekend we do nothing, and we have a pulsating imaging. As we change pcs we create new imaging waves trust me, it will happen. (Why post is solved?)
-
@foglalt said in Disappearing hosts from host list:
Why post is solved?
Because we found a solution that fixed the problem earlier, and I marked that post as a solution. Now we are continuing in this same thread because it’s the same problem, trying to find the root cause to this issue. We can ask the @moderators to mark the thread as unsolved I suppose.
-
Oh, i see. I will inform you if file is appearing.
-
here it comes again now, i have a log file that your script created and i think this is not the usual situation. here is what happened now:
task was deploying and DURING that task the host data was updated with new mac address (i am aware that it is not best usage it was not me)
so i have the log, where to put it? mail maybe? it has many info about our setup, maybe not best to have it on a forum ,) let me know how to share with it with you.
-
@foglalt You can sanitize the log and post it here, or you could email it to me and I can look over it - PM me for my email address. To reset the ‘trap’, you just need to delete/move/rename the log that appeared. If you choose to sanitize and post here, don’t just wholesale delete stuff. For example, instead of completely removing an address or hostname, replace it with a sanitized one. Use search & replace for this - and ensure the integrity of the log is still valid.
-
@wayne-workman hehe, sanitize i will, well, i already started reading it but work came in the office in waves, so i postponed it to tomorrow. I will send or post it here from work. btw i kept your address from last db transaction we did together.
-
@foglalt said in Disappearing hosts from host list:
here is what happened now:
task was deploying and DURING that task the host data was updated with new mac address (i am aware that it is not best usage it was not me)No matter how it happened, we can learn from this and possibly improve FOG while doing so.
-
I realised that only ip could have been sensitive, so i altered it, rest is unmodified. I am curious if log helps this time (as i mentioned it has now an extra step over normal daily work (mac was changed during selected host was being deployed).
(as logs cannot be inserted here this long as i have, and uploaded files are only pictures, i send to your email.)
-
Got it, I will look over it tonight. At any rate, reset the trap.
-
@foglalt I looked over the log.
What I am seeing is that hostID 5 lost it’s primary MAC address on February 28th at 09:29:01 your time.
Knowing that you set this check up to run every minute, that means this problem happened between 9:28 AM and 9:29 AM.
Good thing the script gets the last 100 lines of the apache access log, these relevant lines were nearly at the top of the output:
10.10.36.124 - - [28/Feb/2018:09:28:10 +0100] "POST /fog/management/index.php?node=host&sub=edit&id=5&tab=host-general HTTP/1.1" 302 726 "http://SERVER/fog/management/index.php?node=host&sub=edit&id=5" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.10.36.124 - - [28/Feb/2018:09:28:10 +0100] "GET /fog/management/index.php?node=host&sub=edit&id=5 HTTP/1.1" 302 698 "http://SERVER/fog/management/index.php?node=host&sub=edit&id=5" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.10.36.124 - - [28/Feb/2018:09:28:10 +0100] "GET /fog/management/index.php?node=host HTTP/1.1" 200 2848 "http://SERVER/fog/management/index.php?node=host&sub=edit&id=5" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0"
One POST, two GET requests. I’m going to assume the POST request caused the primary mac to be disassociated from the host.
I do not see any relevant logs from the history table during this time, and the apache error log is basically totally silent.
So now we have something to go on. I’ll try to replicate the problem this weekend but I do have a very busy weekend so I might not get to it. I am very confident that the primary MAC of this host is being either deleted or disassociated during a host information update via the GUI - on a POST request.
@Foglalt if you can attempt to replicate this problem via editing & saving host information and figure out exactly what’s causing it, it would help. @developers any insight or help in isolating the issue further would be appreciated.
-
If I had to guess, this is due to how fog does macaddress updates. Manually entering a new mac in the primary mac field should cause the new mac to become primary and adjust the original mac address as an associated mac. I haven’t played too much with this action however as it can become extremely difficult to manage. I mean we’re having to check four very different things at the same time not including the primary check itself (mac, pending, ignore client, ignore image) in a future version, and possibly 1.5.1 this will be handled, I think, much better. My intent is to move all macaddresses to their own tab. Within that tab you have a single text entry to insert new macs, then you have a table that will present the primary, pending, image ignore, and client ignore. Getting the logic designed for 1.5.x is a bit more difficuly, but I do have this already coded for 1.6 (nowhere near ready for even alpha testing yet – though much closer than most may think.