ablohowiak

ablohowiak

1.3.5-RC 10 and working fine.

All clients are back on, I’ve already reduced the client check in time, and the load was fine for the start of the school day.

The Image replicator service is still off.

ablohowiak

@Joe-Schmitt
RC11 has started out well. I did need to rerun the install and reboot the system before the client updater started working. The GUI remains responsive even after all the clients start up in the morning. After a few days I may try shortening the client communication period from 900 so the snapins run more timely.

ablohowiak

@Tom-Elliott I did run the query with the correction. Most of our groups are already built from this summer. RC11 has been very stable and responsive. I’ve reduced the client communication time from 900 to 300 with no impact to performance.

I think your team is in the “home stretch” with 1.3!

ablohowiak

Manually copied images to new storage nodes and added image entries, but when we try to use the old images we get this error:

We stood up a new Fog server and storage nodes because we didn’t have time to migrate everything at once or to clean up the old database. Now the old manually copied images produce errors about “No image files found that would match the partitions to be restored”.

I see in our new captured images there is content in a number of the d1. files that is blank in the old images. Is there a way to fix the old images, or is a deploy old and capture new my best option?

ablohowiak

It doesn’t appear to be a permissions issue. I’m not seeing an entire image as the problem, usually just the largest file, but it’s inconsistent as to which image or storage node will find that the “Files do not match on server”.

[07-06-18 8:25:24 am] * All files synced for this item.
[07-06-18 8:25:25 am] | 000-10-Golden: No need to sync d1.fixed_size_partitions file to FogAllis
[07-06-18 8:25:26 am] | 000-10-Golden: No need to sync d1.mbr file to FogAllis
[07-06-18 8:25:27 am] | 000-10-Golden: No need to sync d1.minimum.partitions file to FogAllis
[07-06-18 8:25:28 am] | 000-10-Golden: No need to sync d1.original.fstypes file to FogAllis
[07-06-18 8:25:28 am] | 000-10-Golden: No need to sync d1.original.swapuuids file to FogAllis
[07-06-18 8:25:29 am] | 000-10-Golden: No need to sync d1.partitions file to FogAllis
[07-06-18 8:25:30 am] | 000-10-Golden: No need to sync d1p1.img file to FogAllis
[07-06-18 8:32:15 am] | Files do not match on server: FogAllis
[07-06-18 8:32:16 am] | Deleting remote file: /images/000-10-Golden/d1p2.img
[07-06-18 8:32:16 am] * Starting Sync Actions
[07-06-18 8:32:16 am] | CMD:
lftp -e 'set xfer:log 1; set xfer:log-file “/opt/fog/log/fogreplicator.000-10-Golden.transfer.FogAllis.log”

drwxrwxrwx 16 fog root 4096 Jun 11 14:12 …
-rwxr-xr-x 1 fog fog 3 Jun 11 10:59 d1.fixed_size_partitions
-rwxr-xr-x 1 fog fog 1048576 Jun 11 10:59 d1.mbr
-rwxr-xr-x 1 fog fog 190 Jun 11 10:59 d1.minimum.partitions
-rwxr-xr-x 1 fog fog 15 Jun 11 10:59 d1.original.fstypes
-rwxr-xr-x 1 fog fog 0 Jun 11 10:59 d1.original.swapuuids
-rwxr-xr-x 1 fog fog 9118461 Jun 11 11:00 d1p1.img
-rwxr-xr-x 1 fog fog 13976945113 Jun 11 12:12 d1p2.img
-rwxr-xr-x 1 fog fog 190 Jun 11 10:59 d1.partitions

But size isn’t the issue, because this is happening with the small files too.

[07-06-18 9:05:29 am] | Files do not match on server: 301Rack
[07-06-18 9:05:29 am] | Deleting remote file: /images/postdownloadscripts/fog.postdownload
[07-06-18 9:05:29 am] | postdownloadscripts: No need to sync fog.postdownload.orig file to 301Rack
[07-06-18 9:05:29 am] * Starting Sync Actions
[07-06-18 9:05:29 am] | CMD:
lftp -e ‘set xfer:log 1; set xfer:log-file “/opt/fog/log/fogreplicator…transfer.301Rack.log”;set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c --parallel=20 -R --ignore-time -vvv --exclude “.srvprivate” “/images/postdownloadscripts” “/images/postdownloadscripts”; exit’ -u fog,[Protected] 10.129.0.17
[07-06-18 9:05:29 am] * Started sync for Image postdownloadscripts
[07-06-18 9:05:29 am] | Replication already running with PID: 8510

The file isn’t deleted, and no transfer log file is ever created. My nodes aren’t all connected at the same speed so I’m going to try throttling the replication to see if it has any impact.

ablohowiak

I’m not replicating the snapins just images. Things worked okay when the storage group was small, and I wasn’t replicating images across storage groups.

Now as storage nodes are added the images sometimes get there, but usually things get hung up on a node not deleting and copying an image. The process loops before making it through all the storage nodes.

ablohowiak

@jgallo the working branch didn’t appear to fix the issue. I’m observing the exact same thing that you described. It makes adding a storage node to group, or a new image unreliable.

ablohowiak

I’ve installed the trunk version on the Master storage node. Is there a way to confirm the version on a storage node?

There’s still a replication issue. Seems that the d1p2.img is always the problematic file?

[07-02-18 9:18:03 am] | Files do not match on server: FogKennedy
[07-02-18 9:18:03 am] | Deleting remote file: /images/000-10-Golden/d1p2.img
[07-02-18 9:18:03 am] * Starting Sync Actions
[07-02-18 9:18:03 am] | CMD:
lftp -e ‘set xfer:log 1; set xfer:log-file “/opt/fog/log/fogreplicator.000-10-Golden.transfer.FogKennedy.log”;set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c --parallel=20 -R --ignore-time -vvv --exclude “.srvprivate” “/images/000-10-Golden” “/images/000-10-Golden”; exit’ -u fog,[Protected] 10.97.0.17
[07-02-18 9:18:03 am] * Started sync for Image 000-10-Golden
[07-02-18 9:18:03 am] | Replication already running with PID: 7307

The log file says that’s it’s deleting the file and starting a sync. There is no change in the timestamp on the destination node image file and no transfer log file is being created on the master storage node.

ablohowiak

Ubuntu 16.04 and Centos 7, all nodes on 1.5.4

I’m having issues with the replication getting stuck on the same node.

Things step through copying postdownloadscripts: there’s no need to sync for a number of nodes but there is one that “needs it” every time even though the files are there. The problem is it never syncs with any other node past that one even if I wait for days. By restarting the FOGImageReplication service I’m able to observe the behavior repeat.

I also have the same problem with an image, bit it occurs a different node than the postdownloadscripts.

If I disable the problematic storage nodes the replication process will move on to the other nodes.

ablohowiak

I was able to correct the issue.

ablohowiak

The PXE works in our production network.

ablohowiak

Yes, it appears there’s something not 100% with my development vlan. Though why the one model, the first one I tired, worked is still a head scratcher.

ablohowiak

Centos 7, fog 1.5.4

I’m having a TFTP timeout issue with a number of my HP elitebook 840 G1 and G2, but the G4 are working with my new fog server.

0_1528402719534_313e3207-99f5-4e8e-b075-49d46ed759cd-image.png

Ubuntu 16.04 fog 1.4.4, my old environment, doesn’t have this issue with these models.

0_1528402797933_5a4c9d7c-821a-453d-a3eb-ce43286b08c9-image.png

I tried a different kernel and updating the bios, but no luck.

Any idea what I did wrong?
Thanks.

ablohowiak

@ablohowiak

Best posts made by ablohowiak

Latest posts made by ablohowiak