FOG 1.40 - Network traffic does not stop

wanderson

Personally, once I’ve activated the location plugin, the network traffic does not end, there is no image replication in progress, nor snapin replication. What can it be?

0_1532546608249_2006ce8e-8389-401a-b5ee-2446fde8dadd-image.png image url)

wanderson

35 nodes

Tom Elliott

@wanderson looks like you have 2 active tasks.

wanderson

Where do I see these tasks?

0_1532551288307_e0cc3314-59fe-4d92-8410-4603e61088a7-image.png

Dahrell

The first picture you posted showed on the Storage Group Activity that it had two active tasks as of the time of posting. Those tasks must have completed before you checked Active Tasks.

wanderson

so far is active, does not end up no?

0_1532571652349_fa903c1d-2150-4e28-8e57-3f632cb79aa4-image.png

Tom Elliott

@wanderson So you likely need to clean the tasks table. This can be done with mysql. You use the SQL statement:

UPDATE `tasks` SET `taskStateID` = '4' WHERE `taskStateID` NOT IN ('4','5');

Hopefully that will help with the storage group activity issue.

As to what’s transmitting date, I have no idea. As you have so many nodes, how many of those nodes are master? As far as I can tell, your fog server can only see the TRE-TO node. But transmit shouldn’t be too high. I mean it is polling the server every so often to update, which can use bandwidth, but 5 Mbps is a bit much.

Try stopping the replication services to see if your transmit settles down.

systemctl stop FOGImageReplicator FOGSnapinReplicator FOGSnapinHash

(SnapinHash is a program that just gets the hash of snapins, but if the snapin is not local it will use FTP to gather the hash in 1.4 – I believe – which could also be using your bandwidth.

For what it’s worth, I’d highly recommend updating your FOG Server. While there’s some knowns in 1.5.4, even with those issues upgraded should greatly improve your experience with FOG.

wanderson

I stopped the services and cleaned the 2 tasks, but the problem persisted

0_1532605431253_3e0868ff-cdea-436f-9df0-87e556e673f5-image.png

wanderson

only one master node in TRE-TO group

0_1532605653452_7b267393-ebfe-4498-9df1-492f59125a23-image.png

Wayne Workman

@wanderson That could be chatter from the FOG Clients you have deployed. You could test this by turning down the checkin time in FOG Configuraiton.

Also, there’s a linux package called iftop you can install to see a visual graph of network traffic and the IPs involved on the CLI.

Tom Elliott

@wayne-workman These are all transmit. If it were “chatter” it would likely be on the receive side. I suppose on the transmit side it could be old snapins trying to download information. (Similar to the task issue that was present.)

@wanderson Please also consider that for node checking, it checks in rolling curl style, (limited to 5 per try I suppose). What this means is 1.4.0 would check all nodes to find their “up” status. If I remember correctly, 1.4.0 would constantly check the nodes for each bandwidth cycle check. This request (transmit) could be causing the usage. Are you seeing something wrong with the network because of this?

Tom Elliott

Also, consider, the information used to determine the bandwidth rate was not very accurate in 1.4. In particular, the time span that was used was based on each cycle check. (2 minutes = 120 – at one second check time – obtains of information.) This, while seemingly correct, was highly inaccurate as the steps involved to send, obtain, process, and return the data often took more than 1 second to complete. So one iteration to the next may have spanned around 4-5 seconds (or more with many many nodes on the bandwidth chart). Why is this important? Well, imagine, nothing returned for the first iteration, and a change of 7000000 bytes in the next 4-5 seconds. This would represent a 1 second time of 7Mbps, where the actual span of time was likely around 7-8 seconds and likely around 1 Mbps or less.

Hopefully this gives some understanding.

wanderson

When I disable the location plugin the network traffic stops.

Traffic started after I activated this plugin.

0_1532624647772_80ba9f79-69fe-496b-acdc-a41713b3ef77-image.png

Tom Elliott

@wanderson The location plugin, by itself, doesn’t do any transmit/receive data. All it is is a way to tell hosts where to get information from. This can include init/bzImage, images, and snapins. So unless some of your machines are stuck in a boot loop trying to get data from one place to another and failing, this is unlikely the direct issue. But again, I still recommend updating from 1.4.0 to 1.5.4 as there may be many things that 1.4.0 has that are already addressed.

wanderson

@tom-elliott, ok, I will upgrade to fog 1.5.4

Wayne Workman

@tom-elliott said in FOG 1.40 - Network traffic does not stop:

These are all transmit. If it were “chatter” it would likely be on the receive side

Consider also - the data returned from the FOG Server every time the FOG Client makes a request…
A very simple and non-technical change via the web gui for the client checkin time would either rule it out or identify it as the source.

wanderson

Is not he tall?

What can I do to reduce this traffic?

0_1532711439215_8d67e8ad-1645-487e-8c06-5d9ecfd20978-image.png

wanderson

in less than 1 hour it transmits more than 1GB, it started after I put all the nodes in the same group and activated the localization plugin, until in version 1.5.4 it has the same problem

HELP PLEASE!!!

0_1532716157225_4264b2ed-8e05-4735-babf-15c778095f85-image.png

wanderson

disable the plugin location and network traffic falls to 1mbps

0_1532716774427_9dc9fe6b-e4d0-4f6a-872f-edcaa7d89c67-image.png

wanderson

Disable TRE-TO GROUP IS 35 NODES, TRAFFIC IS DOWN

0_1532717126244_bf812ded-97ac-47aa-843a-723b015064d2-image.png

FOG 1.40 - Network traffic does not stop

192

12.6k

17.5k

156.3k