WIN10 Multicast Imaging Issues

Joe Gill

@george1421
Well I’m taking off for the weekend. This can wait until next week. The results from re-imaging was not successful. It still fails via multicast but does succeed via unicast.

I’ll touch base with FOG on Monday.

Thanks!

Joe Gill

@george1421
I wanted to report in on my findings… I tested things again and I can indeed push images via Unicast with my current image. I am currently pushing a lab of computers now. Unfortunately this isn’t very efficient.

So I have set up a Windows Server 2016 with MDT and ADK installed on it. I am in the process of creating a golden image with that… I’m hoping by mid-week I will have a new image to use.

Let me know if you’d like to see any log files or if you want to remote in.

Thanks!

george1421

@joe-gill So just multicasting is not working as you need it? (sorry too many threads).

Joe Gill

@george1421

That is correct.

george1421

@joe-gill OK between imaging I want you to test something for me.

Lets disable php-fpm (you’re on FOG 1.5.4 right?) If so, lets check to see if its php-fpm causing the problem.

From the /etc directory search for the php-fpm handler with
grep -R -e ":9000
That should locate the correct configuration file. It should be in /etc/httpd or /etc/apache2 depending on your configuration. I suspect

Look for lines that look like this

#SetHandler application/x-httpd-php
SetHandler "proxy:fcgi://127.0.0.1:9000"

Change the location of the hash tag to this

SetHandler application/x-httpd-php
#SetHandler "proxy:fcgi://127.0.0.1:9000"

Save and exit your editor.
Now restart apache to reload the updated configuration
systemctl restart httpd for RHEL based systems
systemctl restart apache2 for Debian based systems

I’m also suggesting that you create the info.php page as described here so we can test to see when php-fpm is running with apache: https://forums.fogproject.org/topic/12030/fog-using-a-huge-amount-of-memory-and-failing/11

Joe Gill

@george1421

I believe I have already done this.

0_1528740738265_3b5a2eeb-ebd8-4a32-a530-55ca64674a83-image.png

george1421

@joe-gill Well then its pretty clear that php-fpm isn’t causing the multicasting issue. (nuts). Nothing of any value shows up in the multicasting logs when things go sideways?

Joe Gill

@george1421
I can post any logs, if you’d like. Multicast does complete. The image just does not function. I was out sick the last couple days. I’m back at it today. I’m hoping to get my new image done here this week. We’ll see.

Joe Gill

@george1421

The multicast logs look normal.

george1421

@joe-gill So is imaging working correctly now with apache-php? If so I want to break it to find out what is going sideways.

Joe Gill

@george1421
Multicast fails on both Apache-PHP and FPM.

Tom Elliott

@joe-gill mind installing working branch to see if it helps with mulyicast? I’d also recommend trying the rendezvous option for mulyicast. This can be setup in fog settings. Set the address to the ip of your fog server, then as a failsafe restart the multicast manager service and then the clients.

Joe Gill

@tom-elliott
Thanks! Will do! I’ll report back. I will try a few labs today and see what happens.

Joe Gill

@tom-elliott
@george1421

So… I tasked 5 machines for a test batch. Multicast worked on those 5 just fine. The image seems to be running fine. The task session was extremely fast. They were finished in less than 8 minutes.

I tasked 30 more… and the interface locks up. I cannot get to the interface now at all.

My server is running Debian 9.4. It ran fine using Apache. Now that we are back to FPM, the interface is locking up tight.

Thanks guys! I’ll be around today until noon MST than again tomorrow 7 - 3 MST.

Joe Gill

I restarted PHP-FPM and the interface came back up.

Joe Gill

Well I think I may have something for you guys now.

I had a batch of clones multitasking after I restarted the PHP-FPM services when things locked up… Some of the clones in the group did not start. Some did. The ones that started hung at this screen.

Than after a bit of time, I noticed the interface was locked up once again… So I restarted PHP-FPM services and got the following message…

Let me know if you need any log files. Thanks!

Joe Gill

@Tom-Elliott
@george1421

This might help with the Debian 9 issue.

0_1528994080142_Screenshot from 2018-06-14 10-33-30.png

Looks like the pm.max_childeren setting has been reached. That is when the server interface locks up.

Tom Elliott

@joe-gill can you try setting max children to say 200 and restart php fpm

Joe Gill

@tom-elliott
I am playing with it right now. Based on having 4 gb of RAM allotted for the server. I came up with these from another website about php-fpm.

I reset the settings to the following:

pm.max_children = 40
pm.start_servers = 15
pm.min_spare_servers = 15
pm.max_spare_servers = 25
pm.max_requests = 500

So far PHP-FPM hasn’t bombed out. I will report back after I run a few more things.

Thanks!

george1421

@joe-gill Excellent find.

OK we can start debugging (i.e. better understanding what is going on) from here.

When you are running your multicast with 20-30 systems. I need you to run a command on the fog server.
ps -ylC php-fpm --sort:rss
Will print out all of the php-fpm processes. We are interested in the RSS column. In you mind, just give an estimate of the average size of the memory consumed.

FWIW: ps --no-headers -o "rss,cmd" -C php-fpm | awk '{ sum+=$1 } END { printf ("%d%s\n", sum/NR/1024,"Mb") }' will average the memory consumed by all worker threads.

Once imaging is done stop php-fpm and check the free ram and record that value.

Report back the number.

The next test is I want to see if there is a direct correlation between the number of php-fpm worker threads and the number of hosts in your multicast pool. With your new numbers you will need more than 15 hosts in your deployment queue to see any change in the number of worker threads.

You can probably drop back the start_servers, min_spare_servers back to 5, and then drop your max_spare_servers back to 8-10.

The idea is the worker threads are dynamic. Starting up with 5 is not a bad number, and setting min spare will leave 5 running even if there is not requests coming in.

There is a default timer if a worker thread is not awakened in 10 seconds the php-fpm management process will kill it off returning its memory back to the system. The management process will kill off stagnant workers sitting around for 10 seconds until the min_spare_servers is reached. In your case if there are any worker counts above 25, they will be killed immediately to get back to 25 spare threads. The point of all of this is a dance between workers sitting around not doing anything but consuming memory and having system resources.

My suspicion is that there is a direct correlation between multicast clients and worker threads (1:1). Setting the number of max_children too high might starve your FOG server of ram.

WIN10 Multicast Imaging Issues

74

12.7k

17.6k

156.7k