Does SoftRAID support using a Hot Spare with RAID 4/5?

geoff

(@geoff)

Posts: 1

Member

Topic starter

It would seem like a no-brainer to be able to connect/assign a drive that (after certification) is kept in Idle/Standby/Sleep mode, and then used to automatically rebuild redundancy upon notification that a drive is degraded has failed. But I don't see anything like this in the docs?

I know there's options for RAID 10, etc, but the idea is that the warm space *doesn't have* the R/W wear any of the other drives do. To me, it seems a lot better... is there a way to do this?

Posted : 27/12/2018 9:58 am

SoftRAID Support

(@softraid-support)

Posts: 9207

Member Admin

No we do not have this feature.

the main reason was that in the google study of 100,000 disks, it was discovered that there is no relationship between failure rates of drives which are heavily used vs. Lightly used. So the idea of a warm spare being like "new" is a false assumption. It will have the same failure pattern as a heavily used disk.

We may add this in the future, but we believe a dual redundant (active) drive is better than a warm spare.

Posted : 27/12/2018 10:44 am

mkush

(@mkush)

Posts: 1

Member

I wonder how applicable the Google study is to SSDs. Also, one of the benefits of a hot spare is that it can immediately begin rebuilding with no user intervention, thus lowering the chances of a second drive failing before the first one is rebuilt. It is for that reason that I’d use the feature if it were available. I agree that double parity may be better still, especially in the case of HDDs, but for SSDs, which are recommended to run RAID 4 in SR, will there even be a dual parity equivalent? If not, hot spares would be an easy-to-implement alternative.

No we do not have this feature.

the main reason was that in the google study of 100,000 disks, it was discovered that there is no relationship between failure rates of drives which are heavily used vs. Lightly used. So the idea of a warm spare being like "new" is a false assumption. It will have the same failure pattern as a heavily used disk.

We may add this in the future, but we believe a dual redundant (active) drive is better than a warm spare.

Posted : 21/06/2019 10:23 am

SoftRAID Support

(@softraid-support)

Posts: 9207

Member Admin

The google study is not relevant to SSD's at all.

We are trying to collect stats on SSD's. As far as we are aware, and we discuss this with other vendors and data recovery experts, there is no way to "predict" an SSD failure yet. All you can do is look at the wear indicator, where it is available.

Posted : 21/06/2019 2:20 pm

bcg365

(@bcg365)

Posts: 4

Member

While I appreciate the position taken interpreting the Google study, I'd like to add my voice to requesting hot spare functionality. I start with a slightly different background perhaps in that my day job is as a large scale storage system engineer for commercial services providers. In my world, the single most important factor is data integrity. Data cannot be lost for any reason. Performance, ultimately, is secondary,

The RAID levels recognize the dichotomy between performance design and integrity design. Hence the options variations between pure speed, pure redundancy, and choices in between. The concept of a hot spare is at the integrity end. When a disk is detected as failing, some level of redundancy is lost at that moment, no matter the parity scheme (4/5/6 or even higher parity - triple is common for really big drives these days). If immediately some of the data is being rebuilt onto another drive, the desired level of redundancy is being restored in time, The original desired level of redundancy was chosen for a reason. Getting back to that level as quickly as possible is why I want RAID solutions to start with, and I wish SoftRAID would provide that functionality via hot spare handling.

It doesn't matter that the hot spare may be the next drive to randomly fail or not. Assume the hot spare fails - then you're right back to the same spot you were after the first failure. Assume a different drive fails while the hot spare is rebuilding - then you've either lost your RAID (if single parity) or you're closer to getting back to at least single parity then if you had to wait for a user to order, receive, and install the first replacement. The only thing we know for sure is that spinning disks will certainly fail. For me I want to be closer to my desired redundancy level as fast as possible, and hot spare capability is the way that can happen. The hot spare can be SMART monitored for failure just as any other drive, and periodically exercised to check status as well.

If hot spare functionality is added, I'd further request a hot drive swap out ability be added. Assume for example I've had a RAID running for 2 years on some drives. I might want to begin swapping those out one by one as opposed to trying to copy them wholesale to a new set of drives in a full enclosure. Granted it would take longer to do, but depending on overall size duplicating the container space may not be feasible. A manual hot drive swap out would have the advantage of retaining desired redundancy while the target drive is being "replaced" as opposed to simply pulling a drive and pushing in a replacement for a forced rebuild.

I grant that my situation may be unique. I am currently running 9 drives off a single MacMini, preparing to move to 14 next year. SoftRAID is by far the most economical way to do that as I don't need separate RAID/NAS style boxes and their associated ecosystems - I just need capacity which a bunch of generic enclosures and raw drives provide easily with SoftRAID as the first line protection mechanism. I am eagerly awaiting the next major release with dual parity due to the scale and size of drives I use at home.

On the SSD comments - agreed. SSDs fail in odd ways compared to spinning media. Spinning media gradually builds indicators toward failure. SSDs, in my experience, either just fail unexpectedly, or soft-fail internally with unpredictable performance patterns as a result. The soft-fail can be very hard to predict and detect. Rare, but nasty when they do.

Posted : 03/12/2019 11:32 pm

SoftRAID Support

(@softraid-support)

Posts: 9207

Member Admin

Good comments.

Whe we implement RAID 6, it should address some of these issues.

I am not sold on hot spares, but we can talk about it in house. They add no "value", when with RAID 6, they can be storing additional parity data. Especially since longevity is the same for warm spares and online active disks.

We already support replacing drives in a volume one by one, with larger disks even. (you cannot increase the volume until all have been replaced, though)

We are looking forward to more data on APFS reliability. HFS is a big problem with large volumes, so hopefully APFS will address some of that. (but we need to see availability of directory/volume recovery tools)

Be careful with "generic" enclosures. USB is not "certified" by anyone, and we see more problems with USB triggered data corruption than should be the case. Thunderbolt 3 enclosures, as far as I am aware, must still pass stringent certification requirements.

Posted : 04/12/2019 4:40 am

robotfist

(@robotfist)

Posts: 2

Member

We are looking forward to more data on APFS reliability. HFS is a big problem with large volumes, so hopefully APFS will address some of that. (but we need to see availability of directory/volume recovery tools)

When can we expect APFS support from SoftRaid? I just built an OWC Thunderbay 6 RAID system with 4TB HGST drives. Just wondering if I'm setting myself up for issues using 4TB drives in RAID 5 with HFS+.

Posted : 07/12/2019 4:26 pm

SoftRAID Support

(@softraid-support)

Posts: 9207

Member Admin

I don't think you are "setting yourself up for problems" with HFS. While APFS appears more robust, there are zero repair tools and in Catalina, in my testing, I have been able to make Cataline APFS volumes unmountable in crash testing. That means there is no way to recover the data. That is partly why we are going slow, and also reported this issue to Apple, which appears to be on the fix list for a future update of Catalina.

that will make SoftRAID 6 much more compelling. ;-)

We hope in the first qurter of the year to have some news on SoftRAID 6.

Posted : 07/12/2019 9:46 pm

SoftRAID Support

(@softraid-support)

Posts: 9207

Member Admin

I don't think you are "setting yourself up for problems" with HFS. While APFS appears more robust, there are zero repair tools and in Catalina, in my testing, I have been able to make Catalina APFS volumes unmountable in crash testing. That means there is no way to recover the data. That is partly why we are going slow, and also reported this issue to Apple, which appears to be on the fix list for a future update of Catalina.

that will make SoftRAID 6 much more compelling. ;-)

We hope in the first qurter of the year to have some news on SoftRAID 6.

Posted : 07/12/2019 9:46 pm