unable to deploy RAID 1 disk

eistek

@george1421

I have got following error when process was about %5

eistek

@george1421

i have changed undionly.kpxe to undionly.kkpxe

and now ;

george1421

@eistek said in unable to deploy RAID 1 disk:

i have changed undionly.kpxe to undionly.kkpxe

The iPXE kernel only manages the FOG iPXE menu and launching of the FOS image. Once FOS Linux has started the iPXE kernel (undionly.kpxe) is discarded. Changing this boot kernel will not have an impact on the error message you now have.

What these images show is that the FOS linux kernel is crashing. This is typically related to a hardware error. This is only a wild guess but I would say memory (RAM chip) or hard drive.

george1421

@eistek Do you have a smaller image you can deploy to this system? The image doesn’t have to run on the target computer, I’m more interested in if it deploys completely to the target computer. I’m looking for one in the 20-40GB range to test deployment. 135GB is a bit rare and I can’t say for sure if FOG can handle that size, I simply don’t know.

According to the partclone info the image failed at about 28GB of the file being transferred to the target computer.

eistek

@george1421

Here is the my solution;

1- i have created new clean raid.
2- i have added mdraid= true and /dev/md126 to my host configuration.
3- I have removed one of the raid disk from PC.
4- Started to deploy. Deploying image is finished without any error.
5- Restarted PC and Windows has started without any problem.
6- Then power off the PC and plugged in second disk to PC and turned on .
7- Raid started to rebuild.
8- Rebuild is finished and everything is looks well now

george1421

@eistek Well done!!

eistek

@george1421

I don’t know why but if i try to deploy with 2 disk pluged in it gives kernel panic.

Special thanx to @george1421 @Jonathan-Cool @Tom-Elliott

x23piracy

@eistek well nice that you found a solution but you need to work physical at the machine to solve this, it would be really nice if the problem could be solved while deploying to a working raid instead of letting it rebuild after deploying to just a single drive.

Regards X23

george1421

@eistek Its hard to describe this, but the error means that the hardware can’t keep up with the CPU. Its like (not technically correct) the disk subsystem can’t keep up with the volume of data being written to the disks. Where you are getting buffer overruns in the disks. From researching this, the error happens more often when writing a lot of data to the local console and the console can’t keep up with the data stream.

george1421

Tom and I chatted about this issue a little this morning. I think I have an explanation of why it worked when you removed one of the disks from the array.

Understand these are only anecdotal understandings of what might be going on.

Tom said you shouldn’t be able to image any array that is degraded or rebuilding.
I said that the error was representing the hardware can’t keep up with the processor causing a thread to timeout.
You created the array using the built in Intel Raid firmware and then attempted to image the computer.
You couldn’t image because the inits were missing a critical utility.
Your raid controller is one of those “fake-raid” controllers otherwise known as a hardware assisted software raid.
Hardware assisted software raid relies on the main CPU for array activities. Unlike a hardware raid which has its own processor to manage the array, a software raid uses the main CPU’s extra processing capacity.
Once the inits were fixed you booted into FOG and attempted to image the computer with an array.
At the time FOS loaded on the target computer, it saw the array was uninitialized so linux started to rebuild the array.

This is where things went sideways.

Now consider what is going on here. The linux OS is trying to rebuild the array that wasn’t initialized using the main CPU. At the same time FOG is trying to push the image to the disk subsystem as fast as it can. So this is where you have a chicken and egg situation.

The OS is busy building the array at some rate of build, for the sake of argument lets say 70MB/s translate that over to 4.2GB/min. My fog server and target computer can push images at 6.2GB/min. At some point we are going to have a data collision between the array being built and FOG laying the image down on the disk. This may explain why it gets to about 5GB deployed and it crashes.

The other side of this is that the array is being built the same time FOG is laying down the image and the FOG thread has to way too long for the disk subsystem to complete and times out.

Understand the above is just based on noodling about this issue all day and not really based on any hard evidence.

In my case and on my test system. I found the utility was missing so I copied it from another server. Once the array came up the OS started building the array. Tom patched the inits and wanted me to confirm they worked before he pushed them to the production server. I was in debug mode just watching the array rebuild so I updated the inits and rebooted the test box (mind you I’m doing this debugging remotely). The box rebooted, but I forgot it was in debug deploy mode, so I lost access to the box remotely. Deciding to call it a night I logged out of the dev system. Now that dev pxe target computer was running all night. Since the OS was up it was rebuilding the array. When I came in, in the morning I keyed in fog on the console and it started to deploy. Since I was only pushing out a 5GB image AND the array was already built it deployed correctly.

So what to do for the next time?

I guess if you are building an array and its initialized, pxe boot the target computer into hardware compatibility mode and let it sit until its disk activity is done. If we run across a lot of raid systems we might add a function to the hardware compatibility tests to report the percentage of array synchronization. I wouldn’t want to ask the developers to spend time on this for just a one off situation. But its something to consider.

unable to deploy RAID 1 disk

83

12.6k

17.5k

156.3k