Notifications
Clear all

Certification / disk question

10 Posts
3 Users
0 Reactions
20.7 K Views
(@iflyby77)
Posts: 18
Member
Topic starter
 

I have been experimenting with the certification function of SoftRAID. Some drives (From brand new to old enough that I would normally think about discontinuing use) run through the process just fine. Two drives have failed. I am a little confused if the drives are bad due to conflicting factor listed below. Any ideas are greatly appreciated. The 2nd drive is under warranty, but passes all the tests that need to fail in order to RMA it.

Drive 1 - WDC 3.5" Green 2TB - a few years old. Failed SoftRAID certification twice. Wrote zeros to all sectors. Passes all tests with WDC Data Lifeguard and Seagate's SeaTools.

Drive 2 - Seagate 2.5" 2TB - a few months old. Failed SoftRAID certification twice. Failed writing zeros to all sectors. Passes all tests with WDC Data Lifeguard and Seagate's SeaTools.

 
Posted : 25/03/2017 9:50 am
(@softraid-support)
Posts: 9200
Member Admin
 

Unless it failed at the very end (the very last sectors), I would say they are not reliable. Write passes are blind so a disk that cannot write is not reliable.

The manufacturer tools are very basic tests, unless they were to take a very long time, at least one or two write passes and read passes.

This is why the certify test is so good. It results in a disk that you know can be written to and read from 3 times without errors. If a disk cannot pass this strenous, but simple test, it is not reliable for important data.

 
Posted : 25/03/2017 11:26 pm
(@iflyby77)
Posts: 18
Member
Topic starter
 

Thank you for the response. Both tests looked like they should take about 5 hours, which is far less than the certify test. The Seagate one was done from DOS, and supposed to be able to "fix" problems, but did not indicate that there were any to fix. I am including two screenshots. The shorter message occurred first. I do not know how far into the tests they occurred because I started them and let them run on their own.

Unless it failed at the very end (the very last sectors), I would say they are not reliable. Write passes are blind so a disk that cannot write is not reliable.

The manufacturer tools are very basic tests, unless they were to take a very long time, at least one or two write passes and read passes.

This is why the certify test is so good. It results in a disk that you know can be written to and read from 3 times without errors. If a disk cannot pass this strenous, but simple test, it is not reliable for important data.

 
Posted : 27/03/2017 6:36 pm
(@softraid-support)
Posts: 9200
Member Admin
 

The way to see where the error happened is the SoftRAID log.

Look at the offset where the error occurred. the only times an error is not a disk error is when:
There was a hardware failure (bus/cable, etc.)
The error happens on the last write to the disk. Occasionally we see such errors, we don't know why the drive errors, but if you look at the offset and if it matches the capacity of the disk, you can ask yourself if this was a "bug" instead of an actual error.

Disk write errors usually occur when a disk is on its last legs.

 
Posted : 27/03/2017 7:53 pm
(@chobochobo)
Posts: 142
Member
 

Sorry to hijack this, thread. I have been certifying some 4tb toshiba hdds in 2 thunderbays. The first lot completed okay. But the second lot have come back with errors at the very end
'an error occurred certifying a disk. There was an error verifying data on this disk. The data read back from the disk was different than what was written out.
Disk with error: disk 6, SATA bus 0, ID 6 (thunderbolt), 4TB

3 out of 4, the fourth is finishing in 26 minutes. Is this a problem with the disks, the enclosure (it's in a daisychain - 2nd after the first thunderbay and has a Apple cinema display connected to it. Help please?

 
Posted : 24/04/2017 3:39 am
(@softraid-support)
Posts: 9200
Member Admin
 

If the data was read back inacurately, then there is a problem.

Did this issue happen at the same time, or over time? It is possible it was the cable, or enclosure besides the disks.

recertify them, or try certify one of the "failing" disks on the known good enclsoure, so you have ammunition if you think the enclsoure is faulty.

Disks should not fail a verify pass in a certify. If multiple fail, then I would suspect other hardware, as you are doing.

 
Posted : 24/04/2017 10:48 am
(@chobochobo)
Posts: 142
Member
 

All 4 new HDDs failed certification. I did them all at the same time.

I'm currently trying again, this time each enclosure to its own thunderbolt port. Another 60 hours to go.

BTW I'm currently trying to copy data from my remaining drives in my USB enclosure to a new drive in the same enclosure. The USB is attached to an OWC thunderbolt dock that one of the Thunderbays is attached to. I'm not sure, but whilst I'm doing this the certify time for *both* the enclosures seems to have gone up. Will USB copy operations affect the certify time in TB enclosures?

 
Posted : 24/04/2017 11:03 am
(@softraid-support)
Posts: 9200
Member Admin
 

USB copies should not affect hte certify times, as even if you are connecting the USB drives through a Thunderbolt dock, you would not be close to the max throughput over Thunderbolt.

Times on certify vary widely, it goes up and down, as disk performance slows as you get further into the disk and faster when you start over. I don't think the change you notice is due to USB file copies.

 
Posted : 24/04/2017 1:57 pm
(@chobochobo)
Posts: 142
Member
 

A quick question: at the default setting, certifying consists of three sets of identical writing and verifying? So if the process fails at the very end, it's basically at the third set of writing/ verifying or is there any 'rounding up' process after the sets of three?

 
Posted : 25/04/2017 9:08 am
(@softraid-support)
Posts: 9200
Member Admin
 

the very end could be the random access test.

A certify pass writes a semi-random pattern across the disk. Then reads it back.
the last pass is all zero's, so the disk is returned to "factory new" condition.

after the certify, a random access test is done, that stresses the disk very heavily, with random read and write I/O all over the disk.

 
Posted : 25/04/2017 1:36 pm
Share:
close
open