Issues with Fog 1.5.2
-
@george1421 Similar, but not quite the same. I did stumble upon that post but didn’t know if that fix was right for me. Before i did the fix from here:https://forums.fogproject.org/topic/10006/ubuntu-is-fog-s-enemy
I could not access the web interface at all. Even if i put /fog at the end. It was a blank page with “cannot connect to database.” Ran the fix and all was well or so it seemed. The client machines can, and do still connect to the fog server, and i can access the gui. It’s just that when im unicasting to the machines it’s like fog freezes. I say fog specifically because the computer is still running fine. This doesn’t happen with 1.5.0 though. -
@george1421 If you’d like me to try it though I can do that
-
@k-hays I would like you to check the website file for fog. The issue is also manifested if you just go to the fog web server root directory. Its not redirected to /fog, but displays some text like listed in the post.
-
@george1421 This is what that file looks like post upgrade
-
@k-hays OK good then you don’t have the other issue we’ve been chasing.
OK I see your fog server is on the 10.100.0.253 subnet and your pxe booting client is on 10.10.0.232 subnet. Is that accurate? My intuition is telling me something is a bit strange here?
If the subnets are correct then do you have routing setup between the two subnets? Are there any screening routers (i.e. firewalls) between?
-
@george1421 Yeah we have 6 different buildings with different ip ranges. This building is the high school, with a 10.10 ip range. The 10.100 range is also at the high school, but its in our server room. We have this all setup to work as weve been using fog for a few years now. We do have routing setup between subnets but not screening routers.
-
@k-hays Ok so you’ve confirmed that from the 10.10.0.x subnet you can ping the fog server? If so then it may be that double backslash in the url after fog. If that is the case, please check the fog settings as well as the storage node configuration to see if you have a training backslash.
-
Welp i goofed. I’m going to have to get that website file again for you because apparently i forgot to reinstall 1.5.2 before getting the file. If there are any differences between the two i will let you know. As far as your most recent question yes, we can ping from that subnet( The clients also do still connect, see second picture). There was not a second / in the storage node but there was under TFTP PXE KERNEL DIR
-
Due to hardware changes and accompanying issues (and time restraints) we are just reloading our fog server on a different machine. Sorry for the delay, summer is our busiest time and of course when we run into the most issues :,D Thank you though!
-
@K-Hays So is this solved from your point of view?
-
@sebastian-roth Yessir!
-
Welpppp, I suppose not. We were running this server on a hyper v and decided to just try out to separate options. We had a new computer we were going to try and load it on, as well as setting up a new hyper v with debian instead. Both fresh installs had the same error when we tried to test either of them. I went ahead and checked the file the George mentioned prior ( On the Hyper V server) and this is the outcome.
-
@george1421 I adjusted the file to match what you put in this post https://forums.fogproject.org/topic/11797/updated-from-fog-v1-50-to-v1-52-issue/25
Will test soon. Any other ideas?
-
@k-hays Can you deploy an image to a single client without issue?
This seems to me like an issue with PHP-FPM getting overloaded.
On debian the pm.max_children directive is often stuck on 5 which is generally too low for most people. (40 is a good starting point for testing)
-
@quazz Yes. Single client works fine. I don’t know the specific number in which it starts to fail; but in a lab we image anywhere from 20 - 35ish computers and it seems to fail whenever we try those numbers. The issue also persisted when we started with ubuntu.
-
@k-hays There’s currently some known issues with PHP-FPM settings/getting overloaded, particularily on debian based systems.
I definitely recommend checking the pm.max_children value (not 100% sure where it is on debian)
You should be able to check the PHP-FPM logs on the WebUI to see if it’s been complaining about not enough max_children or memory exhaustion or timeouts I think.
-
@quazz I’ll go digging. Any ideas?
-
@k-hays Try
grep -irl pm.max_children /etc
-
@quazz Ok I found it! Now, what do you think the max i should put there is. Sometimes we might image up to two, maybe three labs at a time. would 90 be a stretch?
-
@k-hays It depends on how many resources your server has available to itself.
We haven’t done a ton of testing on the exact numbers. The value will depend on the amount of RAM and how much the average PHP-FPM uses.
Start with 40 as a safe value and go from there, imo.