"Bad Sectors" when uploading image (Abort), RAID-1 crashed


  • hello,

    i ve got the following situation:

    i was capturing an image from a computer (windows xp) which has raid-1.
    during this process an error-message showed up, saying that some “bad blocks” were found, and i should try so save my data.
    after this, the upload process did cancel and restart. after the restart it did start the upload-process again and ending with the same message of “bad blocks” and restart again and so on.

    so i did cancel this task in fog.

    when i start the computer now a message that “no os system” was found is showing up.

    in raid bios-menu the message is shown that the array is downgraded.
    this arrays consists of two disks.
    the disk on port-0 is not in use.

    i guess that maybe the “shrinking” of the partition did break the mirror in raid-1. but this should not be the problem that windows-xp is not being found.

    i still have the data of the image-upload-process in /images/dev folder.

    my question:
    how can i find the original size of the partition?
    because it seems that after every restart and retry to upload the image the last (so shrinked) partition-size is being written to the /dev folder?

    what else could cause the problem?
    (of course, there are also the “bad blocks/secotrs”. but until the backup this seems not to make any troubles. )

    i am in troubles and i need please some help 🙂
    (and yea, i had never to deal with RAID)

    thanks in advance.

    greetings lakk

  • Moderator

    @lakk I have had to work (deal) with them from time to time. I can tell you I did the exact same thing with them (breaking the mirror) by (assuming) the intel raid controller acts like a traditional raid controller. I can tell you it does not, because it exposes both the raid device and the JBOD disks to the OS. The OS needs to be smart enough to know how to manage the array.

    I did write a tutorial on how to use FOG with these type of raid adapters here: https://forums.fogproject.org/topic/7882/capture-deploy-to-target-computers-using-intel-rapid-storage-onboard-raid (oh my all the way back in 2016…)

    I can tell you another example (possibly of what you are seeing). We have several dell precision rack mount workstations that use these raid controllers for their local disks. Somewhere in 2018-2019 they upgraded the OS from Windows 7 to Windows 10. About 6 months later we got a call that 2 of the workstations had reverted back to windows 7. This wasn’t possible because it was a clean install of windows 10 and not an upgrade from Windows 7. Its just not possible to do what they said it did. We had them reboot the workstation and take a few screen shots. They called back and said that it switched back to windows 10. Thinking they were just crazy we said the next time it happened give us a call. About a month later it did it again. To no make this any longer of an example I’ll cut to the point. We found that the raid-1 mirror was split (akin to split brain) some time before windows 10 was installed. So not knowing the mirror was broken they installed windows 10 and it went onto one disk while the other disk remained at windows 7 install. It appears that the intel raid controller picks at random which disk will be the leader and the other the follower in the mirror (for the intel controller the leader disk has read/write activity, while the follower only has write activity). That is how on one boot it would start up as win10 and the another boot win7.


  • hello sebastian,

    thanks for your reply!

    i have taken a backup of both HDDs with “ddrescue” too.

    Why did you use the resizable image type?:
    to be honest: I am using FOG for backups and have captured a lot of images using the default settings - and i had never troubles with it. the other computers were using windows-7 oder 10, and also did not have RAID.
    so, in this case, while capturing windows-xp with raid, i did not think about this.

    because i had never to deal with this computer and also not with RAID all of this seems to me a little bit odd.

    i guess i will have to test around a little bit to get more information.

    greetings lakk

  • Senior Developer

    @lakk said in "Bad Sectors" when uploading image (Abort), RAID-1 crashed:

    how can i find the original size of the partition?
    because it seems that after every restart and retry to upload the image the last (so shrinked) partition-size is being written to the /dev folder?

    Yes subsequent captures will overwrite the earlier information. FOG does not store all the old information. You can only manually try to put back an older partition layout manually. But this is very tricky and many things can go wrong.

    Before proceeding I would take a RAW (dd) backup of both the disks.

    All in all it sounds like a chain of a couple of things that have gone wrong.

    • RAID got degraded some time ago, so both disks are now in a different state and should not be mixed up!
    • Why did you use the resizable image type? While FOG is not a backup tool I would not use resizable image if I don’t plan to deploy the image to a different size disk.
    • While I can’t be absolutely sure I would think that shrinking should not leave the system in an un-bootable state. So my guess is the bad sector error is playing into that too.

    I think you are raising the right questions about your system but I am not sure this is the right forum for this. There are so many things we don’t know about your setup and I don’t feel confident to give guidance in such a situation (speaking for myself).


  • hello george1421,

    thank you for your support. its a “3ware” raid controller.

    its possible to use FOG to clone these disk structures but you need to know ahead of time
    

    yes, i did understand this later 🙂

    sorry for going to ask maybe some “dumb” questions now , but i am really a little bit nervous about this situation. its a very old system (win-xp, so over 15 years old) and it contains very important data. but i could create a backup - so thats the good news.
    i am trying not to miss something, which could effort a lot more work.

    a.) what if i do remove the wrong hdd and reattach it again?
    e.g. i do remove the sdb, i start the computer, noticing this mistake, turning off the computer, reattaching sdb and removing sda.

    do i have to worry about losing data?

    b.) hdd on port 0 not in use?
    what does this mean?
    is this the reason why i do get he message about the “missing os system”?

    because i can see the sda if i start a live-sytem and can also create a backup with clonezilla.

    c.) is it a normal behaviour of RAID-1 not to boot if the mirror is broken?

    should this behaviour be bein expected, or could it be also posible to boot a system even if the mirror is broken?
    e.g. the mirror was broken a long time ago - but was not recongniced.
    so the system was running in a “degraded” mode.

    thank you for you help!

    greetings lakk3ware.jpg

  • Moderator

    @lakk What is your raid controller? The built in intel raid? If so that setup is to put it polity a hardware assisted software raid. Its possible to use FOG to clone these disk structures but you need to know ahead of time that you are using these hybrid raid configurations. This intel raid presents these disks as normal disks to the operating system so you can easily break the raid if you were to just change one of the disks. You need to startup the software raid manager and talk to the logical raid disk. Its a bit confusing to explain but your source disk is now damaged.

    If you remove the disk0 (/dev/sda) from the computer you may be able to boot the computer using disk1 (/dev/sdb) since FOG wasn’t told about the array it probably only interacted with disk0 (/dev/sda). Your computer should boot but in a degraded state since disk0 is out of sync with disk1.

408
Online

8.0k
Users

14.9k
Topics

140.6k
Posts