SMART is a kind of self diagnostic for disks. With Flash media, SMART information is not as useful for predicting imminent failure, but is useful nethertheless. SoftRAID is also using anonymous data collected from users to try to isolate additional methods to predict future failure on NVMe drives. It will take time, but hopefully we will find some patterns in the raw SMART data we collect (only with users who opt-in to SoftRAID's anonymous data collection)
SoftRAID 8 brings added support for SMART with NVMe blades. SoftRAID now displays media wear indicators and Total Bytes Read (TBR) on NVMe .
This information can help you protect your data with more timely replacement of your blades. You can be aware of excessive wear on one or more of your blades and take action before failure.
When should I consider proactively replacing my NVMe blade that has not yet failed?
We suggest when your blades reach Total Bytes read of about 2,000 Times the capacity in TB, its time to replace the drives.
(example: with 1TB blades, when your blades reach 2,000TB read, its advisable to plan to replace the drives)
Wear indicators correlate with the Total Bytes Read. The wear indicators will generally track with the TBR. The more data read, the less wear remaining on the drives. (Writes cause the media to wear far more than reads, but these numbers still correlate)
If you see one of several blades wear indicator dropping faster, it could be starting to fail.
(Example: you have a Thunderblade 4 drive enclosure. 3 of the drives show 96-98% wear remaining, but one drive is showing 60%. This indicates while the drive is not failed, it is likely to fail much sooner than the other 3 drives)
If you have any NVMe on Thunderbolt that SoftRAID does NOT display Wear indicators, please post a SoftRAID tech support file. We will also want to get the raw SMART data from the blades, which we will tell you how to collect.
Feel free to comment or post additional questions about SMART over NVMe here.
I have just upgraded to v8 and immediately started receiving a failure warning for a NVMe disk mounted in the NVMe slot of my six drive Thunderbay.
From the log:
Mar 28 17:19:56 - SoftRAID Monitor: The disk at disk14, Location: Thunderbolt, PCI bus, 931 GB is predicted to fail. This disk is 20 - 60 times more likely to fail in the next 2 - 6 months than a normal disk. This prediction is based on SMART data retrieved from the disk. This disk should be replaced soon. SMART Attributes used for failure prediction are: ID 198 (Uncorrectable Sector Count) = 0.
From what I can tell an uncorrectable sector count of 0 is not a bad thing. Can you explain what is generating this warning?
[Mac Studio M2 Max running 14.4.1]

