Notifications
Clear all

Three disks failed certification within an hour of each other

Page 2 / 2
(@softraid-support)
Member Admin

Do you want to try a possible fixed version of the app? (We think, but are not certain yet, that we fixed this issue.)

ReplyQuote
Posted : 24/02/2020 10:24 am
(@ericbarker)
Eminent Member Customer

Do you want to try a possible fixed version of the app? (We think, but are not certain yet, that we fixed this issue.)

Sure! I definitely would.

However, I still haven't gotten confirmation from you as to whether the Terminal program finished correctly. As I said, after a number of days, it just stopped and said nothing.

The MacOS error is consistent with a new drive trying to load (or a drive that's been erased), since a certified drive is not formatted.

It may be that all my drives HAVE already been certified.

ReplyQuote
Topic starter Posted : 24/02/2020 2:32 pm
(@softraid-support)
Member Admin

Here is a link to the beta (the driver is the same as 5.8.2, only the app has changed to hopefully prevent the application from losing contact with the softraidtoool.)

https://srforums.wpengine.com/updates/Latest_SoftRAID_5_Beta.html

Just run this and do the same certify.

I do not know whether the certify completed, as you indicated the disks ejected. I don't know whether that was after or before the command was running.

That is why I want us to add more verboseness to this command. ;-)

ReplyQuote
Posted : 24/02/2020 2:57 pm
(@ericbarker)
Eminent Member Customer

Okay thanks. I've got the beta running with the old driver (it gave me a warning to update the driver, but couldn't find one and quit). I think for the sake of testing, instead of doing 3 passes all at once, I'm just going to do a single pass multiple times. That may be a better strategy when working with larger drives like this. It shows a single pass taking just under 48 hours.

ReplyQuote
Topic starter Posted : 24/02/2020 3:23 pm
(@softraid-support)
Member Admin

The drive is not important, just the application.

But you want to do at least a 2 pass. The reason is the last pass is always zero's. Zero's are the default state of a disk, so you are not exercising it this way. A two pass at least has a random pattern on pass 1, then zero on pass 2.

ReplyQuote
Posted : 24/02/2020 7:02 pm
(@ericbarker)
Eminent Member Customer

ARGGG!!!!!

About 5 days into the test (most of the way through pass 2), the system unmounted the bay, giving me 6 dialog warnings of "This disk is unreadable" (which, of course just means it got disconnected and reconnected). And unlike the older version of SoftRaid, when I restarted the Certify, it didn't ask me to continue, but just restarted from scratch.

So I guess that's another 6 days? I've been at this for almost a month now.

ReplyQuote
Topic starter Posted : 28/02/2020 3:29 pm
(@softraid-support)
Member Admin

Are you sure it did not complete? that seems likely based on the lack of restart request.

Maybe its time to put them to use. If you want, zero the disks to set them on all 0000's

ReplyQuote
Posted : 28/02/2020 5:25 pm
(@ericbarker)
Eminent Member Customer

Here is the messages the LOG spit out at that time:

Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,667,855,360, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,358,100,480, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,316,015,616, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,041,964,032, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,271,357,952, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,204,772,352, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,684,632,576, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,058,741,248, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,288,135,168, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,374,877,696, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,332,792,832, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,221,549,568, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,701,409,792, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,075,518,464, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,391,654,912, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,304,912,384, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,349,570,048, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) encountered a read error (offset 13,519,718,187,008, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk6, SN: ZL20SFHQ, SATA bus 0, id 3 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) encountered a read error (offset 12,939,092,295,680, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk9, SN: ZL21HPWX, SATA bus 0, id 1 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,238,326,784, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) encountered a read error (offset 12,708,321,689,600, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) encountered a read error (offset 13,877,408,432,128, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk7, SN: ZL21HBFM, SATA bus 0, id 2 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk4, SN: ZL22TS59, SATA bus 0, id 5 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) encountered a read error (offset 14,374,366,347,264, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk10, SN: ZL22L6HF, SATA bus 0, id 0 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) encountered a read error (offset 13,950,255,104,000, i/o block size = 16,777,216). Error during pass number = 2. This disk should be replaced immediately.
Feb 28 1150 - SoftRAID Application: The certify disk command for disk disk5, SN: ZL22SL1L, SATA bus 0, id 4 (Thunderbolt) failed because this disk has unreliable sectors. It should be replaced immediately (error number = 66).

ReplyQuote
Topic starter Posted : 28/02/2020 6:52 pm
(@softraid-support)
Member Admin

It was at the same time, so yes, this was a hardware event, the disks were probably ejected when the Thunderbolt bus reset. It was at 12TB on the second read pass. So that should tell you where you were in the process. If you were doing 2 passes, consider it done. If 3, do one more pass.

ReplyQuote
Posted : 28/02/2020 8:53 pm
(@ericbarker)
Eminent Member Customer

I should have gone into a little more detail, I was flying out the door at the time. Now that I think about it, this event looks like it happened the moment I woke my iMac up back from monitor sleep. So my guess is that this was almost surely triggered by the computer, surely not by the drives, and probably not by the Thunderbay itself, due to the fact that it coincided with a computer process. Does this sound logical?

Now, you say "This was a hardware event", what do you mean by that?

Moving Forward: I had been going for 3 passes, so I will do one more. You mentioned that the final pass writes all Zeros. I assume that means that if I choose "1 Pass", it will zero them out in the process? If all the disks make it, I will consider it good to go. 48 Hours.

My only concern is whether this "unmount on wakeup" issue could continue to plague me in day-to-day use later. I do some very long renders and encodes, I'm crossing my fingers that this isn't a sign of hiccups from the OS to come later on.

BTW: Thank you for all you help, you've helped me out a great deal, I know I've been a bit of a squeaky wheel.

ReplyQuote
Topic starter Posted : 28/02/2020 11:49 pm
(@softraid-support)
Member Admin

As you figured out the likely cause, that is a "Hardware event". Yes waking from sleep could have cause this. 10.15.3 fixes some of this stuff, but not all.

And yes a single pass will zero out the drives.

Sorry this has been a hassle. Weird how this process was actually more reliable under FireWire than under Thunderbolt!

ReplyQuote
Posted : 29/02/2020 3:07 am
(@ericbarker)
Eminent Member Customer

All disks finally finished a final pass! Thank you so much for working me through this. I'm going to create the array now and make sure everything works.

ReplyQuote
Topic starter Posted : 02/03/2020 1:13 am
(@softraid-support)
Member Admin

Can you run a test for me?
I assume if you just leave the SoftRAID Application open, it will eventually give "The SoftRAID Application must quit" message. Is this always after the Monitor has gone to sleep?
If you set the Monitor to Never sleep and disable Screen saver, does it not happen at all, take longer to happen, or never happen? I would appreciate the information. This bug is annoying but not widespread and hard to reproduce in the lab.

ReplyQuote
Posted : 02/03/2020 1:42 pm
Page 2 / 2
Share:
close
open