I've recently updated from 5.1.1 to 5.5, and I immediately got 'disk failure predicted' reports on three different disks. Prior to 5.5, there were no such errors. Be that as it may, I backed up the first disk and performed a verification (it passed) and a certification (that also passed.) I'll verify and certify the other two disks, but the fact that my first disk came back without errors is a little strange to me.
My questions are:
Does 5.5 perform a more detailed analysis of the SMART data than did 5.1.1, and could that explain why it's reporting errors where 5.1.1 did not?
What would cause 5.5 to report errors, but also certify and verify the same disk?
Thanks,
MattLTH
Predicted failure alerts (if that is what you received) is just that, a prediction of future failure.
If you have a predicted failure on the disks, then SoftRAID will display the reallocated sector counts, etc on the drives. That data is provided directly from the drives.
If there were disk errors, SoftRAID will tell you to replace the drives also. That is not predicted failure however. A disk error is reported any time a disk is unable to respond to a read or write request. Sometimes these can happen from something besides an actual disk error.
Let us know what you actually have, and we can confirm. But if your disks are showing any reallocated sectors, this is a sate that is permanent and you should prepare to replace the drives.
Here's a quick update: I backed up all three drives, then ran each one through the Certify process with the default settings. (Three passes.) All drives passed the certification. I've reformatted them and restored the original data without issue, and no errors have been reported by SoftRAID Lite 5.5.
If I experience more errors with these drives, I'll post about it here.
Thanks,
MattLTH
There are two disk error detection mechanisms being discussed here. The FIRST MECHANISM is that disk drive itself will detect bad sectors and reallocate spare sectors to replace them, when needed. Each disk drive has a certain, finite number of spare sectors available when it is manufactured. If the disk drive detects another bad sector, and there are no spare sectors remaining, then the disk will fail. It will be unusable. When SoftRaid reports "disk failure predicted" it is telling you that the disk drive has already reallocated a number of spare sectors to replace bad sectors. This is predictive ... SoftRaid is telling you that at some point the disk drive will run out of spare sectors, and will then fail.
The SECOND MECHANISM is SoftRaid itself. During the verification process SoftRaid reads every sector on the disk to determine if that sector can be read without error. Note the SoftRaid will not (and cannot) read sectors that the drive itself has determined are bad. Rather, SoftRaid will read the reallocated good sector that the drive has substituted for the bad sector.
So when SoftRaid completes a scan and tells you that no bad sectors were found, it means that from a software (i.e., operating system) perspective the drive is operating perfectly. However, that same drive may be in the process of failing, but it hasn't yet run out of spare sectors it can reallocate to mask bad sectors.
I hope this added some clarity to this issue.
This is essentially correct.
However we predict failure on any disk with even a single reallocated sector.
back in the days when we could purchase disk drives with a printout of all the "pre-reallocated sectors", a drive was expected to have hundreds or more bad sectors. They would all get remapped and the drive woudl continue working. Those days are gone. Reallocated sectors are now a RARE event, not common.
Drives today can sustain multiple bad sectors and still function, but statistically, several studies have found that once a drive has had a single sector reallocated, it is no longer trustworthy, as the odds of a total failure are very high (relatively) on that disk. So we recommend not to store important data on any disk with even one reallocated sector.
The google study of 100,000 disks is worth reading. So are the many blog entries at backplaze.com, where their experiences with disks are public. They validated the google study, where it was found that factors such as temperature, power on cycles, hours of use, were not predictive of future failure. The only factor that was relavent was reallocated sector counts and unreliable sector counts. that is what SoftRAID relies on (and displays the actual numbers in the disk tiles)

