This is not a technical problem but maybe someone can give me an idea. I need to run a script as root, on the server, after the multicast is done.
Server OS is Ubuntu 16.04,
This is not a technical problem but maybe someone can give me an idea. I need to run a script as root, on the server, after the multicast is done.
Server OS is Ubuntu 16.04,
@wayne-workman
I also checked the udp-sender command line from the terminal and the interface was the correct one.
I’ll collect more data at the next deploy (in the next few days) and get back here with it.
Went through debug but I still cannot figure out why is this happening.
First, I found that the network interface used by udp-sender was wrong. Corrected that in the config file. Tested again, but the issue repeats exactly as the first time.
I checked, the MulticastManager service was running, status OK. udp-sender was started, network interface ok, number of clients ok. All seems perfect.
But after each server restart, multicast doesn’t start.
If I reinstall fog, all works OK.
Very strange.
I will. Thank you. I have 3 more deployments coming up. I’ll get back here if the issue repeats and I find the solution.
Back with the same problem.
I started one one my bi-annual labs redeployment.
The multicast doesn’t start. All computers are ready for the image:
https://photos.app.goo.gl/H00sLD8uhSOi0rL72
The tasks are created:
https://photos.app.goo.gl/xEM7pZPqrzyREU5A2
And yet nothing is happening.
First, I removed the task, rebooted but it doesn’t help. All computers remain stuck at the same window.
The solution was, as the last time, reinstall fog. I started the installer (it was the same version, 1.4.4).
When the install (or let’s call it upgrade) was done, I created the tasks again, rebooted the computers and this time it worked like a charm.
https://photos.app.goo.gl/RsxaAAAstQTen3fR2
What could be the issue? Multicast service not starting? The reinstall seems to start the service…
Thanks for your help.
P.S.
OS: Ubuntu 16.04 (all updates installed, no fancy settings, the default OS).
FOG: latest version, 1.4.4
@wayne-workman
Yes, checking “Schedule Shutdown after task completion” causes the running task to hang at the end (in both capture and deployment). I am going to take some pictures/videos and get back.
@sebastian-roth Yes, but the my solution was to reinstall FOG. I didn’t know about the solution you mentioned. The reinstall also seems to solve the problem.
@wayne-workman
Yes, I am almost sure that that is the problem. There are “leftover” tasks and this is related to an issue I reported in another thread, the fact that checking the “Schedule Shutdown after task completion” causes the task to hang (remain unfinished).
The next time I have to deploy I am going to do a test and deploy twice, once with that checkbox checked and once unchecked and I am going to capture a video of the last steps of the process on the client, to see what happens.
Sorry for the late reply…
Yes, I triple-checked. All machines were connected and waiting. But I have an idea to why this is happening. I’ll test it as soon as I can and get back with the results.
This is what I have in the log file:
[11-30-17 12:59:02 pm]
==================================
=== ==== ===== ====
=== ========= == === == ===
=== ======== ==== == ==== ===
=== ======== ==== == =========
=== ==== ==== == =========
=== ======== ==== == === ===
=== ======== ==== == ==== ===
=== ========= == === == ===
=== ========== ===== ====
==================================
===== Free Opensource Ghost ======
==================================
============ Credits =============
= https://fogproject.org/Credits =
==================================
== Released under GPL Version 3 ==
==================================
[11-30-17 12:59:03 pm] Interface Ready with IP Address: 127.0.0.1
[11-30-17 12:59:03 pm] Interface Ready with IP Address: 127.0.1.1
[11-30-17 12:59:03 pm] Interface Ready with IP Address: 172.16.1.1
[11-30-17 12:59:03 pm] Interface Ready with IP Address: 192.168.199.199
[11-30-17 12:59:03 pm] Interface Ready with IP Address: 193.231.17.37
[11-30-17 12:59:03 pm] * Starting MulticastManager Service
[11-30-17 12:59:03 pm] * Checking for new items every 10 seconds
[11-30-17 12:59:03 pm] * Starting service loop
[11-30-17 12:59:03 pm] * No tasks found!
[11-30-17 12:59:13 pm] * No tasks found!
[11-30-17 12:59:23 pm] * No tasks found!
[11-30-17 12:59:33 pm] * No tasks found!
[11-30-17 12:59:43 pm] * No tasks found!
[11-30-17 12:59:53 pm] * No tasks found!
[11-30-17 1:00:03 pm] * No tasks found!
[11-30-17 1:00:13 pm] * No tasks found!
[11-30-17 1:00:23 pm] * No tasks found!
[11-30-17 1:00:34 pm] | Task (19) Multi-Cast Task is new!
[11-30-17 1:00:34 pm] | Task (19) /images/S02R1 image file found.
[11-30-17 1:00:34 pm] | Task (19) Multi-Cast Task 19 clients found.
[11-30-17 1:00:34 pm] | Task (19) Multi-Cast Task sending on base port: 55946.
[11-30-17 1:00:34 pm] | Command: /usr/local/sbin/udp-sender --interface enp9s0 --min-receivers 19 --max-wait 600 --portbase 55946 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/S02R1/d1p1.img;/usr/local/sbin/udp-sender --interface enp9s0 --min-receivers 19 --max-wait 10 --portbase 55946 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/S02R1/d1p2.img;/usr/local/sbin/udp-sender --interface enp9s0 --min-receivers 19 --max-wait 10 --portbase 55946 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/S02R1/d1p3.img;
[11-30-17 1:00:34 pm] | Task (19) Multi-Cast Task has started!
[11-30-17 1:00:44 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:00:54 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:01:04 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:01:14 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:01:24 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:01:34 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:01:44 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:01:54 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:02:04 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:02:14 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:02:24 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:02:34 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:02:44 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
[11-30-17 1:02:54 pm] | Task (19) Multi-Cast Task is already running with pid: 30052.
And this line goes on and on…
I just stared a new deployment. All computers got the IP, booted and got to the part where partclone start. And … this:
https://photos.app.goo.gl/q4EZ7qoHDmCxHEul2
All are frozen like that. And task manager looks like this:
https://photos.app.goo.gl/VwqaAPDqlTd4CQg42
All workstations are connected and waiting.
If I reboot the server and do it again, same result.
The only way I could find to fix this is to reinstall FOG.
https://photos.app.goo.gl/96CLWmsBmsCsBqN83
It doesn’t take long because all files are already installed but I believe the installer fixes some services and then it works:
https://photos.app.goo.gl/zqj1WoCoC2RJI8km1
Yes, sure. I have to set up a machine for another one of our labs. I am going to do a capture with the “shutdown” option on, and one without it. Let’s see what happens.
Also, I am going to gather data during deployment. I have 2 more labs to deploy next week. Will post here as soon as I have it.
Latest. 1.4.4 if I’m not mistaking (I don’t have access to the server right now).
The task in the UI, yes. But the effect is different depending on the operation.
If I start a capture operation, the computer I capture from shuts down (so the part on the client is done) but in the web interface still shows that it is ongoing. And the folder containing the image is in the /images/dev/<random string> folder. It isn’t moved in the /images folder and named as I set it.
But I if don’t check the “Schedule Shutdown after task completion”, the capture task finishes OK and the computer I capture from reboots.
When deploying, all computers finish, shut down but in the web UI still says that the task in ongoing (but it shows somewhere around 95% complete). in this case it doesn’t seem to have any negative impact. I start the computers, they boot ok and continue with the after deployment scripts.
I even checked the computers with sfc /scannow and boot time disk check, both ok. So Windows doesn’t seem to be affected, even if, i do get a warning at first boot, saying to “Check the disk for errors”. I did the checking and no errors found on any computer. So the warning must be due to the dirty bit being set. But that can be ignored.
P.S. I got the “Check disk for errors” message again and here is the scan:
As you can see, the dirty bit is set. Partclone (or some other script/app) forgot to clear it. But the volume is otherwise OK.
@sebastian-roth Sure, I will. I have 2 more labs planned for next week, so if I still encounter any problems, I’ll collect as much data as I can.
I noticed this in both capture and deployment tasks. If I check the checkbox mentioned in the title, the tasks never finish.
During capture this is a problem, because the captured image is unfinished.
During deployment not so much of a problem. The computers seem to work fine when I start them. But in the task management page the tasks appear in progress, even though they are finished.
Ubuntu 16.04 LTS.
DHCP is handled by FOG.
I didn’t touch in any way the fog services. My setup is standard, nothing custom.
Good to know where the services are.
Maybe a services overview page in the web interface would be useful. Just an idea…
Thanks for your reply.
This happened to me twice already, so I am writing here, maybe someone can help.
I go into the lab, set up server cables, boot the server, create the deployment task (multicast) and then start the workstations one by one.
First problem, workstations get no IP. I check, the DHCP service didn’t start. I start it manually. Now the workstations get the IP, they boot, get to the part where deployment should start (partclone is ready) and … nothing. Deployment doesn’t start. I don’t know exactly which service does the actual deployment so I cancel the deployment task and reboot the server.
I do it all over again, not it works.
Is there somewhere a list of all the services related to fog (including here the DHCP service). Because when problems occur, I would like to at least be able to start the services manually, but I don’t know their name. I saw that some have a fog prefix, but are those all?
Or maybe there is a place in the web interface where the services status can be checked and I missed it?
@tom-elliott
Yes, I saw this issue also and replaced all references in fstab with /dev/sdx instead of UUID.
@sebastian-roth
Yes, it is the same machine.
I did some more testing today, went through the Ubuntu apps and everything seems OK. So if we ignore the splash screen coming up after deployment and a few seconds of delay in the first boot (I don’t think that in subsequent reboots the time difference remains), then we can safely consider this a “strange thing” but not a problem.
So thank you all for your help and your time. Let’s consider this matter closed.
Thanks for your reply. No, I didn’t use the FOG client. If you will have time to look at my videos, you will see that the boot changes. After deploy, the splash screen appears and the boot is slower by 50% (10 second before deploy, 15 seconds after). That wouldn’t be a big issue but I’m curious why it this happening and, if that changed, what else is affected or could be affected?
I did a test by capturing and redeploying the same hard drive with Clonezilla. I used disk image and went with expert mode and disabled all optional items (like partition resize, and other things). I wanted just a plain simple image. I then restored the image created with Clonezilla and the boot time and behaviour are exactly the same as they were after install. So Clonezilla doesn’t cause any changes in boot behaviour or OS response time.
I think maybe FOG passes some parameters to partclone on capture or restore that could create the problems? Or the way it recreates or formats the partition?