Imaging Issue



  • I’ve got a weird issue going on. Fog SVN 3281, Ubuntu 12.04. Deploying an image works fine on most stations, but on a certain type of machine it never starts. I had to take a video of the screen and play it back to catch what’s going on because it happens to fast, but basically it gets to the screen where it’s checking things. It says Checking OS…Win7, checking cpu cores…2, Send method…NFS, Attempting to send Inventor… and then after that point, it does something that looks like you’re holding down the enter key. The screen is all black and there’s a white cursor in the bottom left hand corner. If I press a letter, I see it flash by and go up off the top of the screen - just as if I was holding enter in a terminal screen.

    It definitely appears to be an issue with this certain group of machines as imaging works fine elsewhere. The only thing that comes to mind is to change the boot loader, so I’ve tried undionly.kpxe and undionly.kkpxe but they both exhibit the same behavior. I’m not even sure if that’s a good place to start, but thought it might be worth a try.

    Any ideas?



  • Finally got a chance to get back to this!
    Setting the usb nic flag doesn’t make a difference.


  • Senior Developer

    On a host under kernel args add has_usb_nic=1



  • Hmmm… you lost me there. Or I’m having a brain moment. How do I set the USB nic flag?


  • Moderator

    [FONT=arial][COLOR=#222222]Can you try 3.19.2, and then set the USB nic flag and see if it works?
    If that does, can you do the same with the next till we find where things go wrong?
    This was Tom’s idea, btw.[/COLOR][/FONT]



  • Sure, these are the files currently working for me:

    bzImage: [url]https://www.dropbox.com/s/m9ti2dieab1eqn9/bzImage?dl=0[/url]
    init.xz: [url]https://www.dropbox.com/s/r3lsc4301h58jwx/init.xz?dl=0[/url]


  • Moderator

    [quote=“Ben Warfield, post: 46158, member: 17746”]
    The culprits seem to be init.xz and bzImage.[/quote]

    Can you post the the two files that are working for you?

    I’d like to try them.


  • Moderator

    [quote=“Ben Warfield, post: 46163, member: 17746”]Yep! - finally making progress haha

    uname -rm returns this: 3.18.5 x86_64[/quote]

    From a previous post:

    [quote=“sudburr, post: 44203, member: 4706”]I rolled the kernel back through successive older versions.

    – svn: 3127
    [B]-- Kernel: 3.18.5[/B]
    – iPXE: 1.0.0+ (acc27)

    [FONT=Tahoma][COLOR=#141414]… works.[/COLOR][/FONT][/quote]

    [FONT=Tahoma][COLOR=#141414]Just tying these two things together because there’s clearly an issue.[/COLOR][/FONT]



  • That number seems… familiar. :)



  • Yep! - finally making progress haha

    uname -rm returns this: 3.18.5 x86_64


  • Senior Developer

    While on that prompt, what’s the output of uname -rm?


  • Moderator

    Ok now we’re getting somewhere, good job! It’s not your network after all.

    From here, it’s up to Tom.

    I’d recommend making a copy of those two files and put them in a safe spot for the future…



  • [QUOTE]Where is this new r2961 build located, physically? Building “A” or the troubled building “B”[/QUOTE]
    The new server is physically located in building A. All of our virtual infrastructure is there.

    So I do have good news - it’s working, with some recovered files. Undionly.kpxe appears to have nothing to do it. The problem appears no matter which version of that I have.

    The culprits seem to be init.xz and bzImage. I restored both of them from a backup from 2 weeks ago. When those restored files are in place, everything works happily on both r2961 and r3278. As Tom pointed out earlier, it looks like the revision version of FOG does not apply to these 2 files. It’s worth noting that replacing one or the other didn’t fix the problem - I had to replace both of them for it to work.

    I see one major difference in debug mode, I attached a picture for clarity. The difference is I see the sending discover. It sends one discover, then the link comes up, another discover, and bingo - it gets an IP. That sending discover process does not show up in the bzImage/init.xz file combination that the installer script is currently downloading.

    [url="/_imported_xf_attachments/1/1920_Debug.JPG?:"]Debug.JPG[/url]


  • Moderator

    Where is this new r2961 build located, physically? Building “A” or the troubled building “B”



  • Okay, good to know. I’m going to try to get those files off the VM backup just to see if they make a difference.


  • Moderator

    This post is deleted!

  • Senior Developer

    It is getting the latest kernels and init’s every time. No it’s not possible to download one from february anymore, but this is not where things are failing…directly. Nor is it the kernel if the interface comes up outside of that scope. The only thing I can think is the other Nodes/Servers you have are on older Init’s?



  • [QUOTE]What’s most intriguing to me is that it doesn’t work at one level but works fine elsewhere?[/QUOTE]
    That’s very interesting to me too. And yet there are other machines connected to the same switch that are working, exact same port configurations.

    So the latest update: I did a fresh install of Ubuntu 12.04 and fog r2961. Much to my surprise, I’m seeing the same behavior - the network is starting before the link is up, and it’s not getting an IP. “Downloading kernels and inits…OK” Is it downloading the latest and greatest kernel/init, no matter which revision I’m on? I did go into settings and revert back to a kernel from February, but same results. What does the init.xz file do, and is there a way I can get a version of that from back in February?


  • Senior Developer

    What’s most intriguing to me is that it doesn’t work at one level but works fine elsewhere?


  • Moderator

    If it does work, we should slowly iterate up in the revisions until we find where it breaks.

    This will help the developers with creating a fix.


Log in to reply
 

366
Online

39.3k
Users

11.0k
Topics

104.4k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.