Issues with FOG services



  • This happened to me twice already, so I am writing here, maybe someone can help.

    I go into the lab, set up server cables, boot the server, create the deployment task (multicast) and then start the workstations one by one.
    First problem, workstations get no IP. I check, the DHCP service didn’t start. I start it manually. Now the workstations get the IP, they boot, get to the part where deployment should start (partclone is ready) and … nothing. Deployment doesn’t start. I don’t know exactly which service does the actual deployment so I cancel the deployment task and reboot the server.
    I do it all over again, not it works.

    Is there somewhere a list of all the services related to fog (including here the DHCP service). Because when problems occur, I would like to at least be able to start the services manually, but I don’t know their name. I saw that some have a fog prefix, but are those all?

    Or maybe there is a place in the web interface where the services status can be checked and I missed it?



  • @andreiv I moved your last post to here since it’s a separate issue that you found: https://forums.fogproject.org/topic/11157/checking-schedule-shutdown-after-task-completion-causes-the-running-task-to-hang/2 Please follow it there.



  • @sebastian-roth said in Issues with FOG services:

    Don’t think a re-install would have cleared the left over entries in the DB. At least not that I am aware of.

    The FOG Installer does not DELETE anything from the DB at all. The most it even does is A). Make a new database or B ). Upgrade an older schema database to the latest schema.

    A little fog history:
    There’s a great number of problems that re-running the installer will FIX though. FTP Passwords are one I pushed hard and helped with. So if the FTP passwords (storage management node passwords used for FTP purposes) are not correct for a node, all kinds of crap breaks in FOG and tons of people were needing help diagnosing & fixing that, so we changed the fog installer to just fix it every single time for all storage node Addresses that match the local machine’s address. This fixed the FTP password problem basically instantly for both master nodes and storage nodes for everyone.



  • @andreiv said in Issues with FOG services:

    checking the “Schedule Shutdown after task completion” causes the task to hang (remain unfinished).

    That feature doesn’t work for you?

    @andreiv said in Issues with FOG services:

    The next time I have to deploy I am going to do a test and deploy twice, once with that checkbox checked and once unchecked and I am going to capture a video of the last steps of the process on the client, to see what happens.

    Please do.


  • Developer

    @andreiv Don’t think a re-install would have cleared the left over entries in the DB. At least not that I am aware of. Tom would know better though. I’ll mark this solved anyway.



  • @sebastian-roth Yes, but the my solution was to reinstall FOG. I didn’t know about the solution you mentioned. The reinstall also seems to solve the problem.


  • Developer

    @andreiv So were you able to make it work again??



  • @wayne-workman
    Yes, I am almost sure that that is the problem. There are “leftover” tasks and this is related to an issue I reported in another thread, the fact that checking the “Schedule Shutdown after task completion” causes the task to hang (remain unfinished).
    The next time I have to deploy I am going to do a test and deploy twice, once with that checkbox checked and once unchecked and I am going to capture a video of the last steps of the process on the client, to see what happens.





  • @sebastian-roth

    Sorry for the late reply…
    Yes, I triple-checked. All machines were connected and waiting. But I have an idea to why this is happening. I’ll test it as soon as I can and get back with the results.


  • Developer

    @andreiv From the log this looks pretty good. Maybe there is just one of the clients missing? You schedule a task for 19 machines. Are you absolutely sure all 19 come up to the blue screen? It doesn’t start if just a single one is missing.



  • @sebastian-roth

    0_1512049774813_c639d334-d8bc-4e97-a503-b2c32abb4cbe-image.png

    This is what I have in the log file:

    [11-30-17 12:59:02 pm] 
    ==================================
    ===        ====    =====      ====
    ===  =========  ==  ===   ==   ===
    ===  ========  ====  ==  ====  ===
    ===  ========  ====  ==  =========
    ===      ====  ====  ==  =========
    ===  ========  ====  ==  ===   ===
    ===  ========  ====  ==  ====  ===
    ===  =========  ==  ===   ==   ===
    ===  ==========    =====      ====
    ==================================
    ===== Free Opensource Ghost ======
    ==================================
    ============ Credits =============
    = https://fogproject.org/Credits =
    ==================================
    == Released under GPL Version 3 ==
    ==================================
    
    
    [11-30-17 12:59:03 pm] Interface Ready with IP Address: 127.0.0.1
    [11-30-17 12:59:03 pm] Interface Ready with IP Address: 127.0.1.1
    [11-30-17 12:59:03 pm] Interface Ready with IP Address: 172.16.1.1
    [11-30-17 12:59:03 pm] Interface Ready with IP Address: 192.168.199.199
    [11-30-17 12:59:03 pm] Interface Ready with IP Address: 193.231.17.37
    [11-30-17 12:59:03 pm]  * Starting MulticastManager Service
    [11-30-17 12:59:03 pm]  * Checking for new items every 10 seconds
    [11-30-17 12:59:03 pm]  * Starting service loop
    [11-30-17 12:59:03 pm]  * No tasks found!
    [11-30-17 12:59:13 pm]  * No tasks found!
    [11-30-17 12:59:23 pm]  * No tasks found!
    [11-30-17 12:59:33 pm]  * No tasks found!
    [11-30-17 12:59:43 pm]  * No tasks found!
    [11-30-17 12:59:53 pm]  * No tasks found!
    [11-30-17 1:00:03 pm]  * No tasks found!
    [11-30-17 1:00:13 pm]  * No tasks found!
    [11-30-17 1:00:23 pm]  * No tasks found!
    [11-30-17 1:00:34 pm]  | Task (19) Multi-Cast Task is new!
    [11-30-17 1:00:34 pm]  | Task (19) /images/S02R1 image file found.
    [11-30-17 1:00:34 pm]  | Task (19) Multi-Cast Task 19 clients found.
    [11-30-17 1:00:34 pm]  | Task (19) Multi-Cast Task sending on base port: 55946.
    [11-30-17 1:00:34 pm]  | Command: /usr/local/sbin/udp-sender --interface enp9s0 --min-receivers 19 --max-wait 600 --portbase 55946 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/S02R1/d1p1.img;/usr/local/sbin/udp-sender --interface enp9s0 --min-receivers 19 --max-wait 10 --portbase 55946 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/S02R1/d1p2.img;/usr/local/sbin/udp-sender --interface enp9s0 --min-receivers 19 --max-wait 10 --portbase 55946 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/S02R1/d1p3.img;
    [11-30-17 1:00:34 pm]  | Task (19) Multi-Cast Task has started!
    [11-30-17 1:00:44 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:00:54 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:01:04 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:01:14 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:01:24 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:01:34 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:01:44 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:01:54 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:02:04 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:02:14 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:02:24 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:02:34 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:02:44 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    [11-30-17 1:02:54 pm]  | Task (19) Multi-Cast Task is already running with pid: 30052.
    

    And this line goes on and on…


  • Developer

    @andreiv So if you restart the server you will see the same hang on multicast again? What do you see in /opt/fog/log/multicast.log? Which version of FOG do you use?



  • I just stared a new deployment. All computers got the IP, booted and got to the part where partclone start. And … this:
    https://photos.app.goo.gl/q4EZ7qoHDmCxHEul2
    All are frozen like that. And task manager looks like this:
    https://photos.app.goo.gl/VwqaAPDqlTd4CQg42
    All workstations are connected and waiting.

    If I reboot the server and do it again, same result.

    The only way I could find to fix this is to reinstall FOG.
    https://photos.app.goo.gl/96CLWmsBmsCsBqN83
    It doesn’t take long because all files are already installed but I believe the installer fixes some services and then it works:
    https://photos.app.goo.gl/zqj1WoCoC2RJI8km1



  • @sebastian-roth Sure, I will. I have 2 more labs planned for next week, so if I still encounter any problems, I’ll collect as much data as I can.


  • Developer

    @andreiv When this is happening on your server after turning it on, run systemctl status isc-dhcp-server and post what you see here (text or picture).



  • Ubuntu 16.04 LTS.
    DHCP is handled by FOG.
    I didn’t touch in any way the fog services. My setup is standard, nothing custom.
    Good to know where the services are.
    Maybe a services overview page in the web interface would be useful. Just an idea…

    Thanks for your reply.


  • Moderator

    What OS is the host OS for the FOG server?

    What device is providing your dhcp address for your network?

    I can tell you the services that are specific to FOG are in /opt/fog/service directory.

    It sounds like from your condition that you do not have the services configured to start after a FOG host system reboot. Normally FOG will enable this services to automatically restart when the Host OS is booted.


 

446
Online

41.7k
Users

12.2k
Topics

115.2k
Posts