Multicast image issue with FOG 1.5.2

utilman

With 1.4.2 we never had any problems, but we tried to multicast multiple sets of 25 pc’s yesterday and they all hang at the end of the multicast task. Even 12 pc’s hang. However, 4 pc’s not. Still had 1CPU and 1GB memory configured @ the FOG VM and increased it to 2 CPU and 16GB memory. It’s using 3,5GB with 12 PC’s. Storage is SAN, and network speed is 40Gbit @ FOG server.

After 100% the web interface of fog server gets unresponsive. I can only reboot server through SSH and than PCs reboot automatically too. However the PC’s will now not join with domain.

Anyone knows a solution for this problem?

george1421

FOG 1.5.2 uses a new php engine instead of the default apache php engine. I want to see if this new engine is causing you some pain.

So for this first test lets adjust a file setting then restart the fog services.

Edit this file /etc/php-fpm.d/www.conf
Look for a line that reads:
;pm.max_requests = 500
Uncomment that line and change the parameter to 2000 to make it look like this:
pm.max_requests = 2000
Save and exit the editor.
Restart php-fpm and apache.
systemctl restart php-fpm
systemctl restart httpd

If that doesn’t resolve your issue then lets switch back to the apache php engine.
Depending on your distribution you will need to look in /etc/httpd or /etc/apache2 once you are in that directory we need to search for a file that conaints 127.0.0.1:9000

Execute this command
grep -R -e “127.0.0.1:9000” *
That should find the config file for apache we need.
You should see a section that has

#SetHandler application/x-httpd-php
SetHandler "proxy:fcgi://127.0.0.1:9000"

Move the pound ( # ) between the handlers to look like this.

SetHandler application/x-httpd-php
#SetHandler "proxy:fcgi://127.0.0.1:9000"

Save the config file
Restart apache
systemctl restart httpd

I’m pretty sure that the first fix will address your issue though.

utilman

Hello george… thx for the very fast respons.

My www.conf location is strangely enough: /etc/php/7.1/fpm/pool.d/www.conf

Changed the pm.max_requests and rebooted the server and tried again with 12 pc’s.
PC’s waited at the “Erasing current MBR/GPT Tables…” screen for almost 10 minutes…
Why I do not now… Is that result of the change?

11 PC’s of the 12 started multicasting and after they finished the web interface was still
responsive and the 11 PC’s disappeared from the task window and finished. The change
worked…! thx again

We will give it another try with more next week

george1421

@utilman I can’t explain the wait other than the 2 of the 12 that didn’t compete there is an issue.

Since you have php 7.1 install the path to the file is a bit crazy. I should have had you search that instead of using my memory for the path.

Well what is at issue, there is a memory leak (FOG asking for a block of memory from the OS and not returning it after the task is over) in the FOG code. The developers are having a hard time finding this one line of code in the 1000s of lines that make up the FOG server. What this “fix” does is restart the pgp-fpm worker threads after they have responded to 2000 requests. This fix keeps the memory usage from ballooning out of control and the UI responsive. Its good practice to have this option on anyway. The next release of FOG (1.5.3) will have this feature enabled by default so I’m told.

TrialAndError

@george1421

This is worth to be reported as an bug! As you know a solution already you should report it.

(For me only switching back to the apache php engine helped. Otherwise the web inteface did not respond, even not with normal imaging.)

george1421

@trialanderror said in Multicast image issue with FOG 1.5.2:

@george1421

This is worth to be reported as an bug! As you know a solution already you should report it.

The pm.max_requests = 2000 is known and has already been addressed in the 1.5.3 working release.

(For me only switching back to the apache php engine helped. Otherwise the web inteface did not respond, even not with normal imaging.)

Can you tell me more about what OS you have on your fog server and what you tried before switching back to the apache php handler? I would think that you would have more noticeable slowness with the apache-php engine over a dedicated php engine.

TrialAndError

@george1421
It is Ubuntu 16.04. (But I cannot answer more questions because I downgraded to 1.4.3 )

I would like to suggest publishing “known issues” like that obiviously known multicast problem. It would be VERY helpful.

Tom Elliott

@trialanderror that’s not an obviously known issue, and it has been addressed. The known issues come and go as versions increment and they aren’t all “common”. 1.5.2 brought php fpm usages, but it wasn’t known until well into the release. (The requests needing to be at 2000.) and even that isn’t strong enough as a known issue as it only seemed to have impacted a few people, not everybody.

Joe Gill

For those reading this… I had this same issue during a Multicast in 1.5.2. In my case, restarting the server did not bring things back to life. I upgraded to 1.5.3 and things came back online. I have not tried Multicasting since. I am Unicasting 8 right now and they are about done. I will try Multicasting a large group here next week and see.

Thanks for posting this!

Joe Gill

@Tom-Elliott
To update this… In my case, updating to 1.5.3 did not resolve this issue. I was able to Unicast one batch of 8 PCS… That worked. Then when I went to do a wipe task on 26, the interface locked up and the task did not complete.

Any suggestions?

Joe Gill

@Tom-Elliott
So my interface completely locked up. I restarted the server and had no luck at all getting to my interface. I ended up having to remove PHP completely and then let the FOG installer re-install PHP. My interface is back up. I have not ran any tasks yet. I’m hoping it stays up long enough to get something done.

Multicast image issue with FOG 1.5.2

196

12.2k

17.3k

155.4k