Notifications
Clear all

Three disks failed certification within an hour of each other

28 Posts
2 Users
0 Likes
6,543 Views
(@softraid-support)
Posts: 8005
Member Admin
 

Do you want to try a possible fixed version of the app? (We think, but are not certain yet, that we fixed this issue.)

 
Posted : 24/02/2020 11:24 am
(@ericbarker)
Posts: 20
Member
Topic starter
 

Do you want to try a possible fixed version of the app? (We think, but are not certain yet, that we fixed this issue.)

Sure! I definitely would.

However, I still haven't gotten confirmation from you as to whether the Terminal program finished correctly. As I said, after a number of days, it just stopped and said nothing.

The MacOS error is consistent with a new drive trying to load (or a drive that's been erased), since a certified drive is not formatted.

It may be that all my drives HAVE already been certified.

 
Posted : 24/02/2020 3:32 pm
(@softraid-support)
Posts: 8005
Member Admin
 

Here is a link to the beta (the driver is the same as 5.8.2, only the app has changed to hopefully prevent the application from losing contact with the softraidtoool.)

https://srforums.wpengine.com/updates/Latest_SoftRAID_5_Beta.html

Just run this and do the same certify.

I do not know whether the certify completed, as you indicated the disks ejected. I don't know whether that was after or before the command was running.

That is why I want us to add more verboseness to this command. ;-)

 
Posted : 24/02/2020 3:57 pm
(@ericbarker)
Posts: 20
Member
Topic starter
 

Okay thanks. I've got the beta running with the old driver (it gave me a warning to update the driver, but couldn't find one and quit). I think for the sake of testing, instead of doing 3 passes all at once, I'm just going to do a single pass multiple times. That may be a better strategy when working with larger drives like this. It shows a single pass taking just under 48 hours.

 
Posted : 24/02/2020 4:23 pm
(@softraid-support)
Posts: 8005
Member Admin
 

The drive is not important, just the application.

But you want to do at least a 2 pass. The reason is the last pass is always zero's. Zero's are the default state of a disk, so you are not exercising it this way. A two pass at least has a random pattern on pass 1, then zero on pass 2.

 
Posted : 24/02/2020 8:02 pm
(@ericbarker)
Posts: 20
Member
Topic starter
 

ARGGG!!!!!

About 5 days into the test (most of the way through pass 2), the system unmounted the bay, giving me 6 dialog warnings of "This disk is unreadable" (which, of course just means it got disconnected and reconnected). And unlike the older version of SoftRaid, when I restarted the Certify, it didn't ask me to continue, but just restarted from scratch.

So I guess that's another 6 days? I've been at this for almost a month now.

 
Posted : 28/02/2020 4:29 pm
(@softraid-support)
Posts: 8005
Member Admin
 

Are you sure it did not complete? that seems likely based on the lack of restart request.

Maybe its time to put them to use. If you want, zero the disks to set them on all 0000's

 
Posted : 28/02/2020 6:25 pm
(@ericbarker)
Posts: 20
Member
Topic starter
 

Here is the messages the LOG spit out at that time:

Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,667,855,360, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,358,100,480, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,316,015,616, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,041,964,032, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,271,357,952, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,204,772,352, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,684,632,576, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,058,741,248, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,288,135,168, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,374,877,696, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,332,792,832, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,221,549,568, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,701,409,792, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,075,518,464, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,391,654,912, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,304,912,384, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,349,570,048, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,718,187,008, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,092,295,680, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,238,326,784, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,321,689,600, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,408,432,128, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,366,347,264, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,255,104,000, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 11:23:50 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).

 
Posted : 28/02/2020 7:52 pm
(@softraid-support)
Posts: 8005
Member Admin
 

It was at the same time, so yes, this was a hardware event, the disks were probably ejected when the Thunderbolt bus reset. It was at 12TB on the second read pass. So that should tell you where you were in the process. If you were doing 2 passes, consider it done. If 3, do one more pass.

 
Posted : 28/02/2020 9:53 pm
(@ericbarker)
Posts: 20
Member
Topic starter
 

I should have gone into a little more detail, I was flying out the door at the time. Now that I think about it, this event looks like it happened the moment I woke my iMac up back from monitor sleep. So my guess is that this was almost surely triggered by the computer, surely not by the drives, and probably not by the Thunderbay itself, due to the fact that it coincided with a computer process. Does this sound logical?

Now, you say "This was a hardware event", what do you mean by that?

Moving Forward: I had been going for 3 passes, so I will do one more. You mentioned that the final pass writes all Zeros. I assume that means that if I choose "1 Pass", it will zero them out in the process? If all the disks make it, I will consider it good to go. 48 Hours.

My only concern is whether this "unmount on wakeup" issue could continue to plague me in day-to-day use later. I do some very long renders and encodes, I'm crossing my fingers that this isn't a sign of hiccups from the OS to come later on.

BTW: Thank you for all you help, you've helped me out a great deal, I know I've been a bit of a squeaky wheel.

 
Posted : 29/02/2020 12:49 am
(@softraid-support)
Posts: 8005
Member Admin
 

As you figured out the likely cause, that is a "Hardware event". Yes waking from sleep could have cause this. 10.15.3 fixes some of this stuff, but not all.

And yes a single pass will zero out the drives.

Sorry this has been a hassle. Weird how this process was actually more reliable under FireWire than under Thunderbolt!

 
Posted : 29/02/2020 4:07 am
(@ericbarker)
Posts: 20
Member
Topic starter
 

All disks finally finished a final pass! Thank you so much for working me through this. I'm going to create the array now and make sure everything works.

 
Posted : 02/03/2020 2:13 am
(@softraid-support)
Posts: 8005
Member Admin
 

Can you run a test for me?
I assume if you just leave the SoftRAID Application open, it will eventually give "The SoftRAID Application must quit" message. Is this always after the Monitor has gone to sleep?
If you set the Monitor to Never sleep and disable Screen saver, does it not happen at all, take longer to happen, or never happen? I would appreciate the information. This bug is annoying but not widespread and hard to reproduce in the lab.

 
Posted : 02/03/2020 2:42 pm
Page 2 / 2
Share:
close
open