Unable to capture Windows 10 Image

Notalot

Afternoon all, I have a new virtual Fog server running on the Dev-Branch (it was on 1.5.6) on Debian 9 but when capturing a Windows 10 image from a Hyper-v Server it starts at 3GB/min but gradually drops to 1.28MB/min over the course of a few hours.

The image I’m Capturing has been running 18 hours and only taken 3.74% of a 37gb image.

If we capture a freshly installed windows 10 image it does capture but it slows to 70mb/min by the end of the image.

I’ve tried dropping the Kernel back to 4.15.2 (and all versions in between) based on a couple of other posts with no success.

What is strange is that the capture seams to hang after a sever thousand blocks and will do a batch then pause again.

Fog Server specs :
12 CPU
8gb Ram
Legacy network adaptor (but tried the synthetic adaptor as well)

Client pc:
12 CPU’s
8gb Ram
Legacy Network Adaptor

The virtual host doesn’t seam to be over stretched on performance.

Any suggestions will be appreciated.

george1421

This isn’t the first time I’m hearing about terrible capture performance with Hyper-V. Can we change one of the parameters to see if its the fog server or the target computer? What happens if you attempt to capture a physical machine? I would expect on a healthy 1GbE network you should be able to capture at a rate of 6GB/min to a modern target computer w/SSD or NVMe disks.

Notalot

Thanks, when I captured a physical windows 7 computer the speeds were much better 1.2gb/min, I’ll run a windows 10 install tomorrow morning and capture it but I would expect it to be better.

I have access to several Fog servers, previously another server (fog 1.5.4) captured a windows 10 VM at 300mb/min which is slow but it does finish in about 8 hours. I moved the VM I’m trying to capture across and started the capture with the same results (1mb/min capture rate). So off the back of this I started again with my image (same Windows 10 version, Windows 10 Pro Education 1903) but had the same result on both fog servers.

sudburr

A Gen1 machine using Legacy Adapter maxes out at 100 Mb.

A Gen2 machine using the default `network adapter’ maxes out at the hosts’ max.

It sounds like you are creating and capturing on the same physical box to the same single physical drive.

You want to to create your VMs on one drive, while capturing to a different drive.

Also, you really want to use an SSD for your VM drive.

Notalot

@sudburr Thanks for the reply, I realise that i didn’t add this but I did move the fog server to a different virtual host and tried to capture the windows 10 virtual and it had the same issue.

Also one of my colleagues (about 20 minutes ago) copied the windows 10 VM and tried to capture it to his physical fog box, I don’t have the specs for that server but it did exactly the same as the virtual Fog servers

Also we have been using fog for years on a virtual server (with limited speeds 300mb/min, which is totally acceptable for our setup and how often we capture images, its much faster when deploying) we have only started having issues with Windows 10 Edu Pro

george1421

@Notalot I’m trying to build a truth table in my head on this and I’m still not seeing the combinations.

During all FOG operations the target computer does (really) all of the work. It takes the image from the local disk, compresses it and sends it to the FOG server. The fog server takes the stream from the network adapter and writes it to disk and then manages the entire process.

For a 100MB/s network, I would expect to see about 700MB/min transfer rates, and for 1GbE about 6GB/min to modern target hardware.

While this is a bit off point, I get about 1.2GB/s transfer rates to my FOG-Pi3 server at home. The point is the FOG server doesn’t need much horse power during imaging.

I seemed to remember a thread in the FOG forums a while ago that talked about cruddy hyper-v performance because of the disk controller selected on the target VM. I seem to remember something about IDE/SATA vs SCSI and one gave a lot better performance than the other. But since I don’t use hyper-v, I only half remember it and of course I can’t find it now in the forums now.

sudburr

For comparison, my stuff running on Hyper-V 2019 host, although same stuff when I had it running on Hyper-2016.

My development server sitting right beside me.

a desktop i7-7700
32 GB RAM
onboard Intel 1 Gb NIC
256 GB NVMe boot drive
2 TB SATA-III SSD
2 TB 7200rpm SATA-III HDD
running Server 2019 Standard with Hyper-V

The FOG server VM

disk 1 = 20 GB .vhdx which is stored on the NVMe boot drive above.
disk 2 = the 2 TB HDD above (for /images storage)
Gen1 machine
2 GB Memory
4 virtual processors (marginal differences going higher, noticeable difference going lower)
Network adapter (not LEGACY, connected to the onboard Intel 1 Gb NIC)
running CentOS 7.x minimal
1.4.4 Fog Server
bzimage/32 4.15.2

The VMs I use to build images are Gen1 or Gen2, and built on the 2 TB SSD.

Captures are saved to the 2 TB HDD.

Our field servers for deployment are ancient, 11year old Lenovo M58 with Pentium e2200 CPU, 2 GB memory, and 500-2000 GB HDD also running CentOS 7.x but on bare metal.

You really, really want to get off the legacy adapter, it’s only 100 Mb.

Notalot

@george1421

Hey sorry for the delay in getting back to you yesterday.

So I installed the same version of windows 10 to a laptop this morning and captured it at 1.2gb/m but it did still do the pausing on the blocks (just not as often or for as long) (about 10 mins for a 12gb fresh install of windows).

My Colleague lowered the CPU count on the windows 10 VM from 12 to 4 and re ran the job on his physical FOG server and it captured in 2 hours at 250mb/m, I did the same on the virtual host which didn’t make any change.

I’ve also just tested unteaming 1 network connection from each of the virtual hosts and setting up a dedicated switch for imaging. That also didn’t make any difference.

george1421

@Notalot said in Unable to capture Windows 10 Image:

My Colleague lowered the CPU count on the windows 10 VM from 12 to 4

How many physical cores does the host have on it?

On the vm host server where the fog server is running, what does the disk subsystem look like? Is it a raid array, ssd. nvme, hdd?

If I remember right the FOS Linux kernel is capped at 8 CPUs for some reason. So your capture/deploy will only use 8 (v)CPUs even if you give it more.

Notalot

@george1421

The virtual host has 2x 16 core processors (32 cores total) for both the server hosting the VM as well as fog server.

The disk setup is 2 raid5 arrays, the system is on 3x15k 300gb SAS drive and the storage (where the VHD is) is on 3x10k 1tb SAS drives.

Good to know about the CPU cap.

george1421

@Notalot I was concerned about over provisioning the vm host by promising more vCPUs to the vm client than the vm host had available. That is always a recipe for a crappy vm experience. But in your case that’s not it.

The 3 drive raid 5 on spinning disks are not the best solution, but at least they are better than a single spindle hdd for a vm host server. That 3 disk raid-5 probably isn’t your speed issue.

Its still not clear of the pausing is on the target vm end or the fog server since they are both running under hyper-v

Notalot

@george1421

So over the weekend I’ve been playing around, I cleared some space on our hyper V Dev server which has 2 SSD’s in Raid 0 and moved the windows 10 VM across to it.

It has been capturing steady at 35mb/m for 10 hours, the stuttering on the block count is much better still happening but not for very long each time.

Notalot

@george1421 So I spoke too soon…

The image is still running but looks like its hung, see attached (I’ve blanked out the identifying info)

george1421

@Notalot Tell me about your hyper-v environment, what is the host OS for both the fog server as well as the target system? That performance is pretty bad no matter how you look at it.

Notalot

@george1421

The hosts are all Windows server 2012R2 Datacentre.

2x16 core 2.1ghz processors (AMD Opteron 6272)
64gb Ram
3x 300gb 15k sas drives in Raid 5 for the OS
3x 1tb 10k sas drives in raid 5 for the storage of the VM’s
4 1gb network links teamed together shared with the OS

The Dev server I’m working with:

Windows server 2012R2 Datacentre
2x 6 core 2.4ghz Processors (AMD Opteron 2431)
32gb Ram
2x 250gb SSD’s in Raid 0
2x 1gb network links teamed together shared with the OS.

Below are the specs for the individual VM’s

VM Win10.JPG

VM Fog.JPG

george1421

@Notalot I have a hyper-v install on a system at our hot site for veeam replication. I’m in the process of spinning up a new Win10 install on that host. Its a vm under vSphere running 2016 Datacenter. So this should be interesting a Win10 vm inside a 2016 VM. I want to see what the capture rates are to my production FOG server running at my local site. I suspect the bottleneck should be the WAN link between the two servers.

Then I’ll spin up a fog server at the hot site and see what the same server image capture rates are. I just can’t believe that hyper-v is only able to capture at less than 100MB/min, at that rate you might as well be using floppy disks…

Notalot

@george1421 Hey thanks, So today I accessed a known good Fog server running 1.5.4 (fog1.5.4 from now on) on hyper-v with the same specs above (I used the settings as a guide for setting up the new ones, but it was a fresh install each time)

fog1.5.4 historically would capture at 300mb/m and would images at 1.2gb/m (limited by the network connection), I’m currently uploading the image at 21mb/m (this is the same if I attempt to capture the last known good image, whilst writing this I got the rcu_sched self-detected stall error). I’ve updated its network connection to the synthetic connection and it will deploy images at 7-8gb/m now.

I’m about to upload a previously imaged computer in to fog1.5.4 to see what it does.

When uploading a windows 7 image from hyper V to fog 1.5.4 (its running currently) I’m getting 220mb/m but that’s fluctuating currently (between 210 and 230 mb/m due to the hang on the current block as described before).

Notalot

@george1421

So the image finished from the physical laptop at 2.1gb/m which would suggest that the issue is on the windows 10 VM side.

george1421

@Notalot I’m currently installing fog under hyper-v instance. I have a second hyper-v instance with the disk connected to the scsi adapter instead of the ide adapter. I want to see if there is a speed difference on the fog side between the two adapters with all else being equal.

MDT just finished building the vm target system under hyper-v. So I’m ready to capture with my production fog server soon. I should be able to get benchmark numbers in the next few hours.

george1421

I understand this information will not help you with your capture performance but it does give is a baseline to contrast and compare against.

Hardware:
Both virtualization servers are Dell R540 servers with 2x 14 core processors running on a Dell raid-10 8 disk array. The link between the primary site and the hot site is 1GbE. The hypervisor is vSphere 6.5 at both sites. The Hyper-V server is running as a VM at the hot site on 2016 Datacenter server. It has 4vCPU and 16GB of ram allocated with 2 virtual hard drives. One for the OS and one for the disks for hyper-v. For the hyper-v host I installed hyper-v along side the existing windows 2016 data center. I did not install hyper-v on “bare metal” so I don’t know if that would have any performance impact or not (sorry not a hyper-v admin).

Disk image /compression setup as windows 10 zstd level 6. I picked zstd because its a bit more cpu intensive than gzip.

Test #1
FOG Server: Main site running as a vSphere client.
FOG target: Hot site running a hyper-v client (inside a vSphere client).

Test #2
FOG Server: Main site running as vSphere client.
FOG Target: Main site running as vSphere client (on same server as fog server)

Test #3
FOG Server: Hot site running as hyper-v client (inside a vSphere client)
FOG target: Hot site running a hyper-v client (inside a vSphere client).

Test #4
FOG Server: Hot site running as hyper-v client (inside a vSphere client)
FOG Target: Main site running as vSphere client.

Conclusion:
It appears that the vm client has a bigger impact on the fog capture performance than the fog server. Which is understandable because the target computers does all of the heavy lifting during image capture and deployment, the fog server does very little other than take the image stream from the network and write it to the local hard drive on the fog server and also manager the overall imaging process. So adding 12 vCPUs to your fog server will not make imaging go faster. For the hyper-v target I had to use the legacy network adapter which allowed for pxe booting. The native network adapter would not pxe boot. I’m suspecting some of the slowness was in the legacy network adapter as @sudburr posted already. Never the less I still can explain the 32MB/m transfer rates you are seeing. I’m almost half tempted to install virtual box here and see what kind of performance I get out that slow poke type-2 hypervisor.

Unable to capture Windows 10 Image

160

12.2k

17.3k

155.5k