SOLVED RC19 - Server sending magic packets to all hosts

  • Testers

    • FOG Version: RC19
    • OS: Ubuntu 14.04 LTS

    Since upgrading to RC19, I’ve randomly been getting notifications from a few switches about packet loss. All of these switches are in the same VLAN as the Fog server. After taking a packet capture, it’s clear that the Fog server is sending massive amounts of WoL (magic packets) to seemingly all the hosts in the DB (over 10k). There was only one task on the server at the time.

    Restarting the server stops this behavior. Starting a single task (and selecting WoL) results in the appropriate WoL packets going to the host’s primary MAC.

  • Testers

    @Tom-Elliott Haven’t had it happen again yet.

  • Testers

    @Tom-Elliott Haven’t had it happen again yet.

  • Is this still occurring?

  • @MRCUR There isn’t a “global” on/off switch for WOL. This is semi-intentional though. There’s only a few cases where FOG performs WOL at all.

    1. Powermanagement -> Ondemand or scheduled.

    2. Host deploy tasking -> Every task now has a “WOL” checkbox with the exception being the “pure” WOL tasking.

    3. Group deploy tasking -> Every task now has a “WOL” checkbox with the exception being the “pure” WOL tasking.

    FOG Task Scheduler (FOGScheduler service) checks every minute. If any “scheduled” Powermanagement task or any Deployed tasking is scheduled either by cron or delayed and the time is up and wol was enabled, it will send WOL packets to those systems. If there’s an instant tasking created and hasn’t checked in yet, Task Scheduler will re-send WOL packets as well (to try to limit human involvement as much as possible.)

  • Testers

    @Tom-Elliott Just want to clarify one bit of the WoL issue I saw - it wasn’t one big packet trying to wake a ton of MAC’s. It was thousands of individual WoL packets each with their own MAC to wake.

    When I caught this yesterday, there was one task in the system to do a deploy. I couldn’t even find any WoL packets trying to wake this machine coming from Fog. Instead, of the few packets I checked, they were for random machines all over the district. And for many of them, there would be upwards of 50-100 WoL packets for just one MAC.

    Is there a system level way for me to disable WoL? Could that be added easily? We don’t use WoL at all currently and that isn’t going to change anytime soon.

  • @Wayne-Workman I don’t understand the concern here. While I can publish static binaries when I release, having it download hasn’t been a problem for a VERY long time.

    FOG 1.2.0 could not be installed in a closed-system unless it had already been installed with an “open-system”.

    The issues that are occurring now are unrelated to static vs. dynamic binaries. They’re literally repository related.

    The reason I decided to use specific repos? To ensure everybody is on the same page for the installation. How can we expect people to just know how to install PHP 5.5 or greater? I mean we have an installer, and we’re requiring this version. How can we “break” the installer simply because their system doesn’t have the version of software we’re requiring?

    The major differences between 1.2.0 and 1.3.0-RC series are solely because we’re trying to make a better product with limited requirement on the administrator/user installing. 1.2.0 did not install any repositories or update any of the “required” packages. If it were a fresh install, it would most certainly require internet connection to perform the package installation. If it couldn’t get to the internet, it would fail in much the same way.

    Ondrej repository literally changed things from how they were to a totally new system. Granted the new system they’ve adopted is a bit simpler, but it did mean a slight bit of a headache until it was all figured out.

    I don’t know why you’re concerned with the init’s, kernels, and client when that had nothing to do with the problems that were being seen.

  • @x23piracy What’s scary?

    You can look in the code base and try to see what’s causing it, and I cannot see it. I did add a “safety” if you will, but even then it’s still really strange. I am only guessing here as to the cause if FOG is indeed the one responsible – directly. The only thing I could think of that would make all systems keep “stacking up” is if there’s a “stuck” element in either powermanagement or scheduledtasks.

    These two items are cycled on a timer and will be operated on the next time their “turn” comes around. If there’s something found in the tasking relationship, but the item is blank it could return ALL items.

    That said, the way I’m seeing it described is rather a “massive” wol packet that has all the macs rolled into one. It sounds, to me, like the prior WOL requests are being stacked upon rather than each WOL item being treated separately. There’s only one point in the code that “could” do this but it’s a localized variable. This localized variable basically means that it’s only available at the initial call. Once it’s gone, the variable is gone.

    Again these are just my guesses at this point. You could validate it relatively quickly though.

    If you have two systems side by side.

    Reboot your fog server.

    Task one of them with something WOL related.

    Once system turns on, turn it off.

    Task the other system in the same way.

    Does the first system ALSO turn on?

  • @Tom-Elliott said in RC19 - Server sending magic packets to all hosts:

    Remoted in to help out. Issue to install the working-RC-24 with the new “items” which failed out was the code I added to try to ensure things were cleaned up had some issues. Now we wait, I suppose, to see if WOL is still doing the “additionals”.

    Hopefully RC24 will be released soon. Scary

  • @Tom-Elliott said in RC19 - Server sending magic packets to all hosts:

    @Wayne-Workman it didn’t.

    I guess the major difference is the inits, kernels, and client. I think those three things should be packaged with the release, and then used if cannot be contacted instead of just flat failing. I guess that 1.2.0 could be installed in a closed-system because it came with these things, and because perhaps there was a red hat satellite server available or a repo server available or something, or people pre-installed the needed components.

  • Remoted in to help out. Issue to install the working-RC-24 with the new “items” which failed out was the code I added to try to ensure things were cleaned up had some issues. Now we wait, I suppose, to see if WOL is still doing the “additionals”.

  • @Wayne-Workman it didn’t.

  • @Tom-Elliott How did 1.2.0 accomplish it?

  • @Wayne-Workman There’s not much of a way around it though. Unless you plan to compile every package for each OS during installation (and providing the binaries to do compile from).

  • @Tom-Elliott said in RC19 - Server sending magic packets to all hosts:

    We found some issues, now, with ondrej repository.

    This is one of the many things that scares me about having a 1.3.0 release that requires internet access to install stuff.

  • @MRCUR Please use the RC-24 working branch.

    We found some issues, now, with ondrej repository.

    This has been corrected for (albeit not a full test on the installation, just the implementation.)

    Please see if it helps you and if it works as intended?

  • Testers

    @Tom-Elliott Upgraded both primary & storage node to RC 23. Unfortunately now I can’t get the image replicator service to start (I’ll open a new thread). I’ll see if this crops up again and take another packet cap if it does.

  • @x23piracy, @MRCUR I was already aware of the potential (albeit accidental).

    This, from what I can tell, has been corrected for in the working branch.

    This seems, to me, to be coming from normalization? I’m just guessing. When I setup the mac it associates its magic packet now. This was done from RC-19 to current and for most it appears to work relatively fine. I don’t see how it can be getting multiple system’s macs though. Maybe related to the $hwAddr line though that still makes little sense.

    I’m only guessing though (remember).

  • Strange, yesterday at work i had a call from one of our employes where one computer always turned on if another computer in the same room had started windows. I’ve seen that myself when i turned off the computer that turned on by magic and rebooted the other pc in the same room it turned on again after windows was showing the login screen. This two computers had nothing todo with fog (not registered no client).

    I checked the mac of the computers and they where unknown for fog so everything should be good and i thougth this wasn’t fogs mistake.

    Then i checked the computer that was turning on by magic for its nic, it was a intel 219 and i know that they have a driver that supports wake up by pattern match, so i disabled that setting and turned of the machine to test if it will turn on again and it did.

    Then i disabled WOL, wake up by magic paket and the machine stopped to turn on, since this was only a computer for one of our customers i forgot what was happened.

    Since i am not sure why this computer only woken up when the second one has been rebooted i cannot see any relation to fog as an issuer of that but i was thinking it is maybe god to tell…

  • Could you move to RC-23 and see if the behaviour still exists?