Replacing a going-to-fail RAID-0 disk

Sensenmann

(@sensenmann)

Posts: 5

Member

Topic starter

I have a question for planning to set up my RAID.
In another thread, I've already got an excellent answer for a disk which fails the certify-process. Thank you very much again! :-)

Now to the scenario:

SoftRAID claims to monitor the disks and informs me, when a disk is going to fail.
When one disk in the RAID-0 starts to die (for example reallocating bad sectors or other SMART-Related errors) - but is still full readable and working - is there a way to "replace" the failing disk?

For example installing a new disk and doing a dd (disk-dump) or something like this to clone all bits and byte from failing to new?

I know, RAID-5 would be the a better solution for my 4-disk RAID-0 scenario, but I need the performance for video editing and also RAID-5 takes about 50% of one CPU while writing (calculation the checksum I think).

Or isn't this possible, because SoftRAID somehow remembers the serialnumber of the RAID-members?

Posted : 08/03/2016 5:53 pm

SoftRAID Support

(@softraid-support)

Posts: 9210

Member Admin

There is no user interface method to replace a disk in a RAID 0.

However, knowledgable users can use the command line, dd, as you indicated.

I can give the framework for the command here, but note:
THIS IS NOT FOR USERS WITHOUT EXPERIENCE WITH THE COMMAND LINE!

the dd command can just as easily erase all data as copy it. So you must be absolutely certain what you are doing.

The basic command would be this:

sudo dd bs=1m if=/dev/rdisk1 of=/dev/rdisk2 conv=noerror,sync
rdisk1 would be the source disk
rdisk2 would be the target disk.

The process is this:
remove ALL disks to the stripe except for the bad disk.
insert just the new disk.
Run SoftRAID and in the DISKS file, identify each disk number whether disk0, disk1, disk2, etc.
replace the correct disk number in the above line.
Paste the line into terminal
Wait for it to complete.
then remove the old disk, insert the original remaining disks to the stripe, and it should mount.

I want to repeat, this is a dangerous process for the uninitiated.

The conv=noerror,sync option is optional. If you are copying a "bad" disk, or one you suspect has bad sectors, then use this command.

What this does is if any unreadable areas of the disk are encountered, it will fill them with zeros and keep copying. this ensures that only the bad areas are damaged and the rest of the disk remains in sync. Otherwise, the dd command will simply copy the data over and skip bad areas.

You can also reduce the block size for transfers. this improves recovery chances with bad disks, but slows down the disk copy time quite a bit. available settings for bs are 1k, 4k, 8k, 16k, 32k, 64k, 128k, etc. (512 bytes is the default, but incredibly slow!) bs=1m is 1 Megabyte and the best setting.

I repeat: Do not try this at home, unless you have backups and consider yourself comfortable with the terminal app.

Posted : 08/03/2016 6:46 pm

bdphifer

(@bdphifer)

Posts: 9

Member

When performing the dd task, do you 'rdiskX' or just 'diskX'?

sudo dd bs=1m if=/dev/rdisk1 of=/dev/rdisk2 conv=noerror,sync

Posted : 01/09/2016 10:32 am

SoftRAID Support

(@softraid-support)

Posts: 9210

Member Admin

Always use rdisk, it talks to the raw disk that way

Be careful, if you get something wrong, there is no going back.

Posted : 01/09/2016 11:27 am

bdphifer

(@bdphifer)

Posts: 9

Member

Always use rdisk, it talks to the raw disk that way

Be careful, if you get something wrong, there is no going back.

Thanks!! I have 99% of my drives (all but ~300 mb for 4+Tb) backed up to an online source (Crashplan), including the two 3T drives in the RAID. The errors on the disk I am replacing are causing my Mac to reboot, and not finish the b/u.

While waiting I googled the dd command a bit, and saw many people mentioning that I need to unmount the disk, however your instructions do not mention that. Is that because these are RAID disks and not really 'mounted' anyway?

Thanks!

Posted : 01/09/2016 11:54 am

SoftRAID Support

(@softraid-support)

Posts: 9210

Member Admin

You absolutely must have the volume unmounted. the problem is OS X cannot deal with two volumes that have identical GUID's. Unpredictable things can happen. So you cannot have a situation where both volumes mounted.

Since all you have is two "halves" of the stripe, there is no risk from identical mounted volumes. Just be careful to remove the original before connecting up the other half of the volume.

Posted : 01/09/2016 1:04 pm

bdphifer

(@bdphifer)

Posts: 9

Member

I am copying data from a 3T disk that is part of a 2 disk 6T RAID. I ran the dd and got this message.

dd: /dev/rdisk3: Input/output error
123889+0 records in
123889+0 records out
129907032064 bytes transferred in 872.206742 secs (148940642 bytes/sec)
dd: /dev/rdisk3: Input/output error
dd: /dev/rdisk3: Input/output error
123890+0 records in
123890+0 records out
129908080640 bytes transferred in 875.584727 secs (148367230 bytes/sec)

There was about 5T on the RAID.

I have it on a 4 bay drive enclosure, and the lights are still flashing as if the copying is still happening.

Any thoughts?

TIA.

ps. I have been very happy with SoftRAID and the support that you guys give. I will be purchasing SoftRAID when I put everything back together on my main Mac (doing this on a Macbook Air because the iMac doesn't like the old drive).

Posted : 02/09/2016 8:38 am

bdphifer

(@bdphifer)

Posts: 9

Member

In waiting for an answer from you guys, I have left the drives as is. Does the fact that the original drive has 18.7 Mb free and the new one shows 134 present a problem? (This is shown on SoftRAID)

Posted : 02/09/2016 11:09 am

SoftRAID Support

(@softraid-support)

Posts: 9210

Member Admin

The two disks may not be identical.

On the original problem, this means at least 1MB of your disk was not copied.

Because the dd command you used was in the form of:
sudo dd bs=1m if=/dev/rdisk1 of=/dev/rdisk2 conv=noerror,sync

this means dd copied 1MB of zeros where the error happened.

I recommend repeating the dd command. Hopefully there is not an error again. If disk3 was the "source" it means that the disk cannot be read. So you have a 1MB "hole" in your data. A better solution for you may be to backup the volume and create it from scratch. When you backup, you will learn what file(s) are damaged, because OS X won''t be able to copy them. Using dd, you do not know what files are broken from the bad disk, until you get a corrupted file message when you try to open it.

this is why using dd works for "geek" approaches, but is bad practice, as if you are replacing a failing disk, you get a result that leaves you uncertain of where your data corruption (loss) is.

Posted : 02/09/2016 1:17 pm

bdphifer

(@bdphifer)

Posts: 9

Member

Here is the text from my dd this morning.

/dev/disk0 (internal, physical):
#: TYPE NAME 0: GUID_partition_scheme 1: EFI EFI 2: Apple_CoreStorage Macintosh HD 3: Apple_Boot Recovery HD /dev/disk1 (internal, virtual):
#: TYPE NAME 0: Apple_HFS Macintosh HD /dev/disk2 (internal, physical):
#: TYPE NAME 0: FDisk_partition_scheme 1: Apple_HFS /dev/disk3 (external, physical):
#: TYPE NAME 0: GUID_partition_scheme 1: EFI EFI 2: FA709C7E-65B1-4593-BFD5-E71D61DE9B02 3: Apple_Boot Boot OSX 4: B6FA30DA-92D2-4A9A-96F1-871EC6486200 /dev/disk4 (external, physical):
#: TYPE NAME 0: GUID_partition_scheme 1: EFI EFI 2: Apple_HFS Media (New) SIZE IDENTIFIER
*121.3 GB disk0
209.7 MB disk0s1
120.5 GB disk0s2
650.0 MB disk0s3
SIZE IDENTIFIER
+120.2 GB disk1
Logical Volume on disk0s2
2440A4A2-0978-4BDE-81FE-8E519C8DB1A7
Unlocked Encrypted
SIZE IDENTIFIER
*62.5 GB disk2
MacBook Air Extra St... 62.5 GB disk2s1
SIZE IDENTIFIER
*3.0 TB disk3
209.7 MB disk3s1
3.0 TB disk3s2
134.2 MB disk3s3
2.3 MB disk3s5
SIZE IDENTIFIER
*3.0 TB disk4
209.7 MB disk4s1
3.0 TB disk4s2

Familys-MacBook-Air:~ PhiferFamily$ $ diskutil unmount/dev/disk3
-bash: $: command not found
Familys-MacBook-Air:~ PhiferFamily$ $ diskutil unmount /dev/disk3
-bash: $: command not found
Familys-MacBook-Air:~ PhiferFamily$ diskutil unmount /dev/disk3
disk3 was already unmounted or it has a partitioning scheme so use "diskutil unmountDisk" instead
Familys-MacBook-Air:~ PhiferFamily$ diskutil unmountdisk /dev/disk3
Unmount of all volumes on disk3 was successful
Familys-MacBook-Air:~ PhiferFamily$ diskutil unmountdisk /dev/disk4
Unmount of all volumes on disk4 was successful
Familys-MacBook-Air:~ PhiferFamily$ sudo dd bs=1m if=/dev/rdisk3 of=/dev/rdisk4 conv=noerror,sync

WARNING: Improper use of the sudo command could lead to data loss
or the deletion of important system files. Please double-check your
typing when using sudo. Type "man sudo" for more information.

To proceed, enter your password, or type Ctrl-C to abort.

Password:

dd: /dev/rdisk3: Input/output error
123889+0 records in
123889+0 records out
129907032064 bytes transferred in 872.206742 secs (148940642 bytes/sec)
dd: /dev/rdisk3: Input/output error
dd: /dev/rdisk3: Input/output error
123890+0 records in
123890+0 records out
129908080640 bytes transferred in 875.584727 secs (148367230 bytes/sec)
dd: /dev/rdisk3: Input/output error

I think I typed the command correctly, but I may just have bad eyes!

Also, the box that has the two drives in it is still sputtering it's lights as if the two drives are talking. If I stop that drive, is it going to hurt anything?

Again, thanks for your assistance.

Posted : 02/09/2016 2:22 pm

SoftRAID Support

(@softraid-support)

Posts: 9210

Member Admin

disk3 is the source. My comments above are accurate.

it may be best not to try to go forward with this plan, but backup/recreate the volume. At least you will know for certain, which file(s) are not readable.

Using dd worked, but you have at least a 2MB portion of your volume (its a stripe) that contains damaged files and you don't know where that is.

Posted : 02/09/2016 2:31 pm

bdphifer

(@bdphifer)

Posts: 9

Member

disk3 is the source. My comments above are accurate.

it may be best not to try to go forward with this plan, but backup/recreate the volume. At least you will know for certain, which file(s) are not readable.

Using dd worked, but you have at least a 2MB portion of your volume (its a stripe) that contains damaged files and you don't know where that is.

Thanks for the quick response! Part of me would like to avoid doing a d/l of 4+Tb, so I have a couple thoughts on solving that.

1) putting the old RAID back together and moving some (if not all) of the data to a third drive, then creating a new RAID with the good drive and the new one and moving back to there.

2) purchasing a new drive and creating a RAID from scratch. Then with the old RAID back together moving data to the new drive (or d/l if necessary).

I think I can do the copy from the old RAID if I can put it together by going to the online back-up and see what was copied.

My other question is whether the good drive of the old RAID is still OK. When all is done, can I just wipe it clean and use it for something? Is the 'bad' drive even good for anything? If I wipe that clean would it be useable ? (although not anything super-important since it's not in the best of shape)

Again, I can't say how pleased I have been with the support from SoftRAID. Thanks!!!

Posted : 02/09/2016 3:36 pm

bdphifer

(@bdphifer)

Posts: 9

Member

Also, can I shut off the drive case that is holding the two drives without doing any damage to the original good drive of the RAID?

Posted : 02/09/2016 3:40 pm

SoftRAID Support

(@softraid-support)

Posts: 9210

Member Admin

The plan of putting the original disks and volume back together and backing up is best. Then if there are parts of one disk that cannot be read, you can isolate that to a specific folder/files.

The second (good) drive from the stripe appears fine. Unless SoftRAID tells you otherwise. So you can repurpose it, yes.

Posted : 02/09/2016 5:15 pm

bdphifer

(@bdphifer)

Posts: 9

Member

I am now in the process of trying to back-up my drives. The two drives do not show up as a RAID. How do I get my Mac to recognize them as a RAID?

TIA!

Posted : 16/09/2016 8:30 am