Performance issues with BIG images



  • Hi all,

    I’ve been using fog, and it’s brilliant. My requirements from it have changed, and it’s having issues performing adequately. Mind you I don’t blame it. At the moment it’s running on sata drives with no raid, and a 1gb switch.

    For various reasons I won’t bore you with, I need to deploy 12 different, sector by sector images, each image is about 160gb in size, at the same time. It does nothing else, there are only 12 hosts, possibly going to 16, but never more than that.

    My poor existing servers and it’s 1gb infrastructure is crying.

    I know I’ve got to increase the fog server and associated topology and I’d appreciate your thoughts.

    As regards the network topology I’m thinking of a 12 / 24 port 1 gb switch with 2 x 10 gb uplinks going to the fog Server, with 2x10 go cards in it.

    Would the fog Server be able to use both cards simultaneously when serving the images?

    As regards the fog Server, I’m thinking of a raid card with multiple SSDs in a raid 0 configuration. I’ll manually copy the images to a USB, they never change. It’s a class environment.

    Are there any particular cards or chipsets you would recommend. I’m thinking of https://www.newegg.com/global/uk/Product/Product.aspx?Item=N82E16816117402

    Are these supported?

    The server itself is a Hp microserver, the old version, with 8 gb of ram. What recommendations would you give me regarding it’s replacement?

    Is there any other advice you could offer me?

    Thanks

    Julian.



  • Thanks for all the advice, I appreciate it. I’m going to go with a raid 5, 5 drive SSD array in my new fog Server, a few sata drives and some time looking at the scripts to implement HSM

    Have a good day as they say

    Yours

    Julian


  • Moderator

    @Julianh said in Performance issues with BIG images:

    One feature that would be nice for me would be an “Up coming deployment feature” something that allowed me to use cheap SATA as my " all images" storage, and SSD “disks” as my " next image to be deployed". An option to tick images, or groups of images, that would then be moved to the active store, ready to deploy. I can see that being useful to quite a few actually.

    What you are referring to here is called hierarchical storage management. That would be implemented at the OS level and not FOG. You can do this today with some bash (linux) level scripting to move the images and then update the FOG database to reflect where the image is currently stored. It wouldn’t be too hard to do outside of FOG without changing FOG at all. You would use the storage node methodology. I just wrote a tutorial about adding a second disk to a FOG server for more disk space, you might do something similar with your hierarchical storage concept. Your sata datastore would be similar to the second hard drive.

    Ref: https://forums.fogproject.org/topic/10450/adding-additional-image-storage-space-to-fog-server


  • Moderator

    @Julianh said in Performance issues with BIG images:

    I do however have 10 HP ML 115 G5 servers available to me from an old class, but they are quad core 2.2 AMD, with 8GB of RAM. Unfortunately the RAM cannot be increased, that’s their maximum.

    Two things come to mind.

    1. Why not use fog storage nodes and spread the load among a few of the hp servers instead of one heavily loaded server? Since they are 12 different images you will not need any fog replication running between the storage nodes.
    2. You do not need 8GB of ram to run FOG. I have FOG running on as Raspberry Pi2 and Pi3. The Pi2 has 1GB of ram. The Pi will not operate at the scale that you are running, but it just shows that 4GB of ram is more than enough. The 8GB of ram in the ML 115 is spot on.

    As for the version of FOG, I would surely install 1.4.4 since its the newest. Also the FOG 1.3.0+ are faster than 1.2.0 in several ways. FOG 1.4.0+ also has a different compression / decompression engine you can use for faster performance is you are compressing those raw images for storage on the fog server. To use the new engine you will want to recapture your images using the partclone zstd format. Using zstd you will pay a slight penalty in time on image capture over gzip, but deployments will be faster and on fog server image size will be smaller.

    You could setup your microserver and the master fog node and then add a few of the ML 115 as FOG storage nodes if you need to spread the workload.

    Also Ubiquiti has a reasonable priced 10G switch. I have no idea how good it is at actual throughput of data. It would allow you to add the ML115s in at 10G too.

    I know I keep adding more ideas out. You have a unique situation so it might require a bit of sideways thinking to get all of the bits to move in one direction, fast.



  • Thank you so much for your help, a little more regarding the scenario.

    The 12 clients are all 4-6, 4.7 ghz cores, 32 gb ram, single SSDs with 1 gb network cards. The image has to be raw sector by sector, as it’s not windows or Linux file systems and isn’t recognised by Fog. To be fair no one images this system. Well except me it seems. They are all in a classroom and are reimaged at the end the week. Hence why all at the same time. Imaging times are about 18 hours.

    The HP Microserver is a brilliant piece of kit, but it is just that, a microserver. The cards you recommended, thank you, are full height and wouldn’t fit. I do however have 10 HP ML 115 G5 servers available to me from an old class, but they are quad core 2.2 AMD, with 8GB of RAM. Unfortunately the RAM cannot be increased, that’s their maximum.

    I think I’ll initially go with the suggestion of raid controller with 4x 480 GB SSDs and see the imaging times from there.

    Do you think the ML115 G5 is up to the job?

    I’ll also have to look at the infrastructure. The cards you suggested are perfect. I can get a net gear GS748TS, which is effectively a 48 port 1 GB switch with 2x 10 GB uplink ports for £100. That’s the way forward for me.

    I’ll upgrade from fog 1.20, old but very stable to the latest. I’ll probably just reinstall on the ML 115, move the disks with images, and add the image definitions, pointing to the old disks. Copying the big image to the new SSD controller “disk”. Any known issues with ubuntu 16.04 LTS?

    Do you have any particular suggestions or warnings I should be aware of?

    In the future I may need to increase the SSD “disks” as I add new images. One feature that would be nice for me would be an “Up coming deployment feature” something that allowed me to use cheap SATA as my " all images" storage, and SSD “disks” as my " next image to be deployed". An option to tick images, or groups of images, that would then be moved to the active store, ready to deploy. I can see that being useful to quite a few actually.

    Thank you again for your help and suggestions.

    Yours

    Julian


  • Moderator

    Just running the numbers and assuming that the server was infinitely fast, to move a single 160GB data over your network in an ideal environment it will take you about 22-24 minutes to push a single uncompressed image to a client.

    Transferring 12 unicast images over your network switch backplane will transfer about 1.5GB/sec of data. Which in an ideal situation consume a bit more than 1 10G link.

    For what its worth, your single sata drive can probably sustain 30-40MB/s transfer rates. A single 1GbE link can manage (practically) about 90MB/s throughput.

    I am a bit curious about your intent and requirement do deploy all of these images simultaneously.


  • Moderator

    The raid controller you mentioned will work. I checked and there are linux drivers for that raid controller. You just have to be careful when selecting raid controllers to make sure they are not MS Windows only raid controller where there are no linux drivers available. But the one you selected will be OK. Note you will also need to purchase the SFF-8643 octopus cables to connect this raid card to the sata cables. You need to make sure you get the octopus cable for sata.

    In regards to the server and imaging. The heavy lifting during imaging is done by the target computers not the FOG server. The FOG server’s job is imaging management and moving data from the storage to the network interface. That is the data path you need to optimize. If that HP microserver is a dual or quad core CPU it should be more than sufficient. If you are looking for something else/different you could always pick up a Dell R420 or R620 from ebay for just a few hundred and have all of the bits already assembled for you. Just add fuel and go.


  • Moderator

    On the networking side, using 2 10G network interfaces will be much better than 4 or 6 1GbE network connections. The question I have in my mind is/would 2 independent 10G cards give better performance than a single dual port 10G card. I would think the 2 independent cards would give you better performance since they would have their pci-e channels. You can get older mellanox connectx-2 single port 10G cards for $30USD. You can get the mellanox connectx-3 dual port cards for about $125USD.

    To answer your question will fog use both of these cards? Fog doesn’t care. The underlying linux OS is what manages LAG groups. Yes linux will support a 2 port 10G lag. Setting up the lag takes a little skill but it does work really well. You just need to make sure the switch LAG type and the linux lag type are in agreement. Probably 802.3ad (LACP) is probably a safe bet.


  • Moderator

    Lets pick this apart in the order you posted.
    1 Sata drive. This is going to be your biggest bottle neck (before the NICs). You didn’t mention your deployment interval but I would try to add more spindles to this setup, or even ssd drives. The 160GB x 12 (~2TB). Regardless of the image size consider what that HD head is doing trying to service 12 data streams at the same time. That head is going to be bouncing all over that platter trying to recall data. If it was me I would pick up a used Dell (or LSI) H700 raid controller from ebay for about $100USD. Just use 1 channel of it to give you the ability to create a 4 drive raid 0 array. Yes I said raid 0. The goal is to go fast here. You don’t care about data resilience. Just backup your data to an external 4TB hard drive in an eSata dock. You could do the same thing with 512GB SSD drives for even better speed without blowing too much money. The 2TB worth of data adds quite a bit to the expense.


  • Moderator

    @Julianh said in Performance issues with BIG images:

    Is there any other advice you could offer me?

    I want you to understand the XY problem before going any further:
    http://xyproblem.info/
    https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem

    What you’ve done here is simply give us this ridiculous requirement you have to do all these large deployments all at once, and you’re asking us to help you perform some sort of miracle. This is the text-book definition of the XY problem.

    I want to know the real problem first.


Log in to reply
 

376
Online

38982
Users

10712
Topics

101678
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.