Error Restoring Images - Clients having identity crisis



  • Upgrading to the latest git version solved my image resizing problems but now I’m having another issue. I did a complete backup of my latest master, tried to deploy it to a computer lab and each client seems to be reliant on xxxx-001. For example, backed up from a computer named IMG-A85XMA, am restoring to LAB103-001 through LAB103-027 but as soon as the task completes on 001, every other maching fails stating “There is no active task for 001” even though the menu shows their names correctly, they are reporting the correct MAC, and when they error out and go to the reboot screen they show the correct MAC and name.

    If I power off unit 001 and give it a deploy task, then unit 002 or any other unit will complete their task, but when they do they remove the task 001 from the scheduled tasks and thus the next unit in line, and all other units tasked with a deploy all fail. See the below screenshot showing the hostname as 002 and it expecting a task for 001. The odd part is I would expect if anything it to be tied to the original host name of the computer that was captured but that is not what it is expecting.

    0_1508468752839_20171016_124811.png

    I have tried completely deleting the host registrations, recreating the registrations, a new image. It appears to be some massive confusion in the task scheduling or the database???


  • Moderator

    @entr0py said in Error Restoring Images - Clients having identity crisis:

    MSI

    Can I ask you to post to this thread: https://forums.fogproject.org/topic/10987/what-can-we-do-when-we-don-t-trust-uuid With the inventory data from one of those MSI boards? We only need one entry unless you have different computer models than mainstream Dell,HP,Lenovo. We are looking for a good example of different mobo manufacturers and how they populate smbios (what FOG queries to identify the target machines).


  • Developer

    @entr0py @Wayne-Workman I think I have come up with a nice and reliable solution to this. Though it’s still only made up in my mind and I need to put it down into code so we can start testing. Will get to it soon.



  • @wayne-workman

    I wish it was that simple. Looks like it isnt just one specific motherboard from MSI and I have a bunch of Asrock machines that might have similar behavior, I will investigate this week.

    When we were working in the database our focus was the MSI units that were delivering ffffffff based UUIDs but we also noticed some with 000000’s and those seem to be the Asrock units.

    Not sure what the ideal solution is here but it appears UUID is problematic across multiple vendors.



  • @sebastian-roth Well then I’d say it’s a tough problem to solve. Maybe we use UUID anyways and just publicly shame the MSI engineers.


  • Developer

    @Wayne-Workman Definitely possible but still not perfect. What if tables have non-unique UUID? More often than not people are using those with just one single USB NIC…



  • @sebastian-roth I’ve been thinking about that problem a little - is it possible to create a primary key from two fields - UUID and MAC ?


  • Developer

    @Wayne-Workman Though you are right about keeping our hardware lists up to date I am sure we better come up with a more reliable way of identifying client machines to not have to rely on sysuuid only as we see that some are problematic. I’ll work on this soon.



  • @entr0py We need to update our problematic hardware list with the model of this problematic motherboard.


  • Developer

    After some more digging together with @entr0py (thanks heaps for your patience and playing nicely!) we figured that some of his systems (mostly MSI motherboards) report non-unique UUIDs. Pushed a quick fix for now but we’ll need to work out a proper system identifier.

    https://github.com/FOGProject/fogproject/commit/80b6eb1d7c8d654c6550767cdba4b183fb259968



  • 1.5.0-RC9 SVN 6080

    dev-branch

    I’m not using multicast, I select the group, or a few individual machine for that matter, and schedule for instant deploy. They already have fog-client on them and they reboot as expected and start their task, until they get to the point of actually doing something and at that point they either error out, or worst case, they all start, the first one done deletes the task for 001 and then all the rest of the units totally fail as soon as partclone finishes because they try to get their tasks and there are none. They reboot and the while fiasco starts over again because 002 still has an active task so PXE send it to restore but once it gets to that point it says no task for 001 and reboots.

    I can give access to the fog server as well as anything else that might be helpful. I really think at this point it is some kind of database issue. I sent Tom a PM because I felt this was more of a localized problem not a project issue but I never got a reply and I have to get this running today.

    I also took my screenshot on the reports page so you can see each host does show up correctly with their individual MAC address so it isn’t like a MAC duplication issue.
    0_1508505215001_Screenshot 1.png

    Thanks


  • Developer

    @entr0py said in Error Restoring Images - Clients having identity crisis:

    Upgrading to the latest git version solved my image resizing problems but now I’m having another issue.

    Can you please be more specific to which version you upgraded. Latest git can mean different things. Is it dev-branch or working. As well post the version numbers you see in the blue cloud on the web UI.

    When you schedule a tast for a client you don’t do multicast for all 27 labp PCs, right? Scheduling for xxx-002 for example, what do you see in “Active Tasks” on the web UI? Does it say 001 or 002? Something seems odd here.


 

370
Online

41.7k
Users

12.2k
Topics

115.1k
Posts