Best posts made by Sebastian Roth

Sebastian Roth

Relay agent IP being 192.168.1.1 makes me think this might be a DHCP answer from your ISP. But this is just a wild guess. Would you mind uploading the full packet dump (use display filter bootp || tftp and then save those packets to a new PCAP file) so I can have a closer look? You can also send me a chat message with a link if you don’t want to publicly upload the file - although there should not be any reason to be concerned as this seams to be all private IP addresses!

Sebastian Roth

@Wayne-Workman Thanks for looking into this! About normalized DB structure: Usually one should move information to an extra table if it is redundant or used in various combinations. That’s not the case here because CIDRs won’t be redundant within one organization. So I think “the harder route” - as you call it - is not advisable anyhow. I am with you there.

Sebastian Roth

@Quazz Thanks for the PCAP file. Three things I find are a bit awkward:

Even the very first DHCP discovery in this capture file has DHCP option 175 set. This is usually not the case on a PXE booting NIC but only when iPXE/etherboot is being used. So where is the very first DHCP communication or why is option 175 set (what kind of client is this?)?
This first DHCP discovery is being answered by the same DHCP server (192.168.1.156) twice offering two different IP addresses to the client! Seams like the client can handle this but this is not a good clean way and might be part of the issue.
After ipxe.kpxe is being transfered via TFTP there is another round of DHCP communication (perfectly fine as iPXE requests an IP) but this time I see two different DHCP servers answering (and offering different IPs): 192.168.1.156 and 192.168.1.1

From my point of view this DHCP setup might need a bit of a “cleanup” and hopefully your issues will go away.

Sebastian Roth

@Thiago You should be able to skip deploying particular partition files by renaming them (e.g. mv d1p2.img d1p2.disabled). Should tell you “Partition File Missing” for sda2 but will still continue doing all the other things.

@Tom-Elliott There is also the “Partition” option within the image definition. Selecting “Partition 2 only” works great for upload but will totally mess up your client if you try to deploy this to a machine because it does not have d1.partitions or d1.mbr but creates the default full size NTFS partition…

Sebastian Roth

Interesting post! Great that you got it as far as iPXE being loaded from TFTP. iPXE itself does another round of DHCP to be able to communicate and load things like the kernel or menu via HTTP. You wonder why this is not working if DHCP does work on the first run before iPXE. Spanning tree settings on your switch might be an issue here. But quite often this would cause an issue earlier on already. But you can still try setting “port fast” on this switch port where your client is connected. Newer versions of iPXE are better at handling this. But not the one coming with FOG 1.2.0 AFAIK.

I really love diving into a PCAP dump as this is right down to the details. I don’t need to guess what’s going on. I just see it. I am more than happy to have a look if you upload a PCAP file! Wireshark filter bootp && tftp is great for this (or port 67 or port 68 or port 69 if you are using tcpdump)…

Sebastian Roth

@abos_systemax The kernel panic you are seeing is due to the FOG 1.2.0 default kernel (can’t remember the version right now but you can check with command file /var/www/fog/service/ipxe/bzImage* on your FOG server) does not have the so called EFI_STUB yet! Please update the kernel (web gui -> FOG configuration -> Kernel update) and you should be able to boot UEFI devices!

That said I totally agree with what Wayne told you already. FOG 1.2.0 is not very good at handling newer devices and I am sure you will run into trouble soon. I’d suggest try FOG trunk and maybe spend a few hours with one of the latest introduced bugs but save yourself days (if not weeks) of work trying to make 1.2.0 work for all your devices (compiling your own iPXE binaries is just the tip of the iceberg really!!).

Sebastian Roth

The PCAP file is showing a perfect connect form client to server on port 111 (portmap) to get the NFS port. Then… silence. After some more digging and testing it turns out that an intermediate cisco switch is causing the problem!

Sebastian Roth

@plegrand You have the most interesting partition layouts in the universe. Recovery and extended partition. Let’s see if we can still handle this. I did some tests with the information you gave and upload is working for me without corrupting the partition table on the source system (your’s was corrupted if I understood correctly). So what I need you to do is to get the machine in the original state (you have done this already as I know from the chat!), then schedule a debug upload task and run sfdisk -d /dev/sda. Please take a picture of the screen because I need the exact numbers! If those numbers are different from what you posted as d1.partitions then you can just stop here and wait till I have figured it out. But if the numbers you see (again - please take a picture so we can check as well!) are exactly the same as in d1.partitions then you need to run capture (command fog) and check output of sfdisk -d /dev/sda right after capture has finished to see if it is corrupted again and possibly why (take a picture and post here!).

Sebastian Roth

@jes6309 As you can see here this NIC model is supported since kernel version 2.6.x and upwards. So it should not be a driver issue! As well, “Sending discovery…” actually means that we found a network interface and try to get an IP for it via DHCP. So it could be:

Layer 1 issue like cable (you already checked that)
Spanning tree issue (make sure you have RSTP or configured “port fast”)
Auto-negotiation issue (try configuring static speed instead of auto-negotiation for that port)
Ethernet energy saving (see if your switch has EEE/802.3az feature and disable if possible)

Sebastian Roth

Tom is absolutely right about shutdown before the job actually finishes is not a good idea.

But just in case people want to do other things in the postdownloadscript and run into the same “command is unknown” issue. My guess is that the PATH environment is not properly set within the scripts. So you’d need to call /sbin/shutdown to make it work…

Sebastian Roth

Sorry for this off topic post but I need to thank @Wayne-Workman for his comment and feel like I want to add something to it.

But I’m still light-years behind some of these other guys.

I guess you already know this but better to say it out load - for others to hear/read as well. We are all light-years behind someone else. But that’s not the point. Everyone can learn, everyone can help. So I can just repeat Wayne’s suggestion on helping us in the forums. Don’t feel like you might not know enough or could give a wrong answer. As I said everyone does know something and wrong answers can still help to learn (for you and others).

Sebastian Roth

Is this Ubuntu 16.04? See here: https://wiki.fogproject.org/wiki/index.php?title=Ubuntu_16.04#MySQL_password_behavior

Sebastian Roth

@Lenain As Quazz said it would be good to know which version of FOG you use. There have been many changes in the code to support this kind of disks in the last months. So if you still use FOG 1.2.0 it won’t work for sure. Best if you run the latest release candidate.

Did you set /dev/nvme0n1 as disk device for that host in the host configuration (web gui)?

Sebastian Roth

@alh said:

To me it is not clear why at this stage (after we booted from the undionly.kpxe and therefore already found our desired boot server) one does not simply hard-code the IP of the FOG server here during installation?

The iPXE binaries are not being compiled on installation time but provided as binary. Therefore we can’t just hard-code the FOG server IP.

Sebastian Roth

I just had a quick look at the dislocker code and figured that it’s fairly simple to detect a bitlocker partition. For those interested, see here and here. Should be fairly simple to add some detection code to our inits. I’ve got that on my (long) list of things to do…

Sebastian Roth

@FallingWax I think we found what is causing this. Seems like some systems are using faulty system UUIDs which we use in FOG since a couple of months actually. Turns out that some MSI motherboards (more precisely the firmware running on those) have non-unique system UUIDs like FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF).

I just pushed a fix to disable that code in FOG for now. Please upgrade to the very latest working and see if you still have issues.

I am fairly sure it’ll all go away after upgrading.

Sebastian Roth

@RobTitian16 From what you say it sounds as Wayne is pointing the right direction as you say that this seems to happen at the same spot. On the other hand it could also be a disk issue on the Hyper-V server. Run SMART-tests on both I suggest. For Windows I’d suggest a tool called CrystalDiskInfo and on Linux just use smartctl.

Sebastian Roth

@jhuesser Just note that the xxx7156.* binaries will be gone in the next release as iPXE project has fixed the issue and we decided to remove those binaries. So you need to change your DHCP server to serve ipxe.efi when the next FOG version is released and you want to upgrade.

Sebastian Roth

@deepak727 Tried it myself… This worked like a charm in a virtualbox VM:

kernel http://${fog-ip}/fog/service/ipxe/clonezilla/vmlinuz
initrd http://${fog-ip}/fog/service/ipxe/clonezilla/initrd.img
imgargs vmlinuz initrd=initrd.img boot=live username=user union=overlay config components quiet noswap edd=on nomodeset nodmraid locales= keyboard-layouts= ocs_live_run="ocs-live-general" ocs_live_extra_param="" ocs_live_batch=no net.ifnames=0 nosplash noprompt fetch=http://${fog-ip}/fog/service/ipxe/clonezilla/filesystem.squashfs
boot || goto MENU

Your first two parameter sets were missing the very important union=overlay. The one you found on the clonezilla website was right, you just needed to exchange ‘http://’ for ‘tftp://’… So this works just as well for me:

initrd http://${fog-ip}/fog/service/ipxe/clonezilla/initrd.img
chain -ar http://${fog-ip}/fog/service/ipxe/clonezilla/vmlinuz initrd=initrd.img boot=live username=user union=overlay config components quiet noswap edd=on nomodeset nodmraid locales= keyboard-layouts= ocs_live_run=“ocs-live-general” ocs_live_extra_param="" ocs_live_batch=no net.ifnames=0 nosplash noprompt fetch=http://${fog-ip}/fog/service/ipxe/clonezilla/filesystem.squashfs
boot || goto MENU

It’s all about the right parameters.

Sebastian Roth

Not sure if we ought to change the default for all the people out there. But you can always do this for yourself if you need it. Open source rocks.

Take a look at this manual on how to extract and mount the initrd file to be able to modify it: https://wiki.fogproject.org/wiki/index.php/Modifying_the_Init_Image (section ‘Version 1.0.0 and up’)

After mounting it you should find the file for full host registration in ./initmountdir/bin/fog.man.reg. Checkout this script and you’ll sure find out where to change the default values for AD and ImageNow… (should be somewhere around line numbers 280 and 310).