Is This Error Code A Disk Error Or Application Error? Need D...

GuitarFlex

(@guitarflex)

Posts: 197

Estimable Member

Topic starter

My now over 4-year project continues and it seems every few months I post on here when some snag comes up certifying yet another disk. Last Friday afternoon I began certifying a new 16TB Enterprise ordered from OWC. I had one last trial version of SoftRaid to take advantage of on an old computer in a separate room, but will of course be purchasing an additional seat next. Can't lose my video and audio computers for days on end.

This afternoon when I went to check the progress. The blue disk light was on in the Elite Pro enclosure and the disk was not spinning. After entering my password and looking on the laptop screen I found this (in the picture).

Since last time I did this on an unused PC and there was no resume certification option, I had forgotten how much more elegant the macOS version is (which is what I use to manage and monitor the ThunderBay Flex 8 on my main video computer). Anyhow, I was mentally prepared to begin disk certification all over from the start, but then remembered I was now using an extra MBP and it asked if I wanted to start all over or resume. I chose to resume but then wanted to post the pictures here.

This is for a 3rd disk to be stored off-site. Mirrored copy onsite. Yes I realize that's the bare minimum recommendation. I'm running out of money to keep pouring into this project and can't afford to keep buying endless disks but am following at least the baseline recommendations from OWC/SoftRaid for critical data.

I need a definitive answer. Is "Disk with Error: (null)" an actual Disk Error and if so, does that negate that it says "0 errors" in the progress bar upon resuming?

I'm doing the work and trying to be good like I've been told precisely in order to "do the right thing." What is the right thing now? Trust that the disk is okay since it says "0 errors" upon resuming certification, start certification all over again from the very start, or return the disk and try another?

Attachment : Forum-Question.jpg

Thanks as always,

Sean

This topic was modified 3 years ago by GuitarFlex

Posted : 04/09/2023 3:17 pm

SoftRAID Support

(@softraid-support)

Posts: 9201

Member Admin

You can attach a SoftRAID tech support file, and I can look. The porblem is unless the disk actually reports a hard error, it is difficult to know what happened. I would resume, however. there is still another pass of write/read to go.

If the enclosure hung, that could happen. Or, if the disk hung. And, with USB, we cannot get the drive's SMART data (Drive Dx can, but we cannot)

Hope this helps.

Posted : 04/09/2023 4:39 pm

GuitarFlex

(@guitarflex)

Posts: 197

Estimable Member

Topic starter

Thanks. Here's the support file. Under "Show SoftRaid Log" it was telling me that there was an error and I "should replace this disk immediately." Which of course concerns me. Quite a bit. Is there any way to add additional passes to the certification process when it's already underway? I guess not but if so I would ramp it up to 4 or 5. Or maybe I'll just start all over again. But before I do either, here is the tech support file. Thanks for any insights you can provide from this.

Attachment : oldmacbook 2023-09-04 21.52.55.sr_supt

Posted : 04/09/2023 5:00 pm

SoftRAID Support

(@softraid-support)

Posts: 9201

Member Admin

@guitarflex

No you cannot add passes to a disk, however, 3 passes is adequate for all purposes unless you suspect the enclosure has an intermittent problem. If a drive passes a 3-pass, it is highly likely to pass a longer certify.

Maybe get the trial of DriveDx and see if it can read the SMART data on your enclosure or not. then you know if there are any reallocated sectors. (we do not find any correlation with heat, slow spinup, etc with predicted failure, pretty much just reallocated sector counts)

Posted : 05/09/2023 7:25 pm

GuitarFlex

(@guitarflex)

Posts: 197

Estimable Member

Topic starter

@softraid-support I'll have to wait until the certification is completed in the next day or two before running DriveRx and will report back then if I don't just certify it all over again from the ground-up out of paranoia.

Meanwhile, since I updated to (current) SoftRaid 7 on BigSur, I've had some bizarre errors. Known good drives that flashed "The disk you inserted is not readable by this computer. Do you want to Initialize?" upon startup (when I opened SoftRaid, somehow the volume mounted fine). A few random disk ejects. This is not the thread the chronicle those but about 90 minutes ago, while cloning my daily work drive, a disk in my ThunderBay dismounted! SoftRaid says the disk is healthy but of course I don't want that happening again.

Posted : 05/09/2023 7:42 pm

SoftRAID Support

(@softraid-support)

Posts: 9201

Member Admin

@guitarflex

That error you see is put up by MacOS. We try to suppress those when they are posted on SoftRAID disks. (MacOS does this for all "new" disks.) Its possible the SoftRIAD driver was not loading fast enough. All this will be fixed when you eventually upgrade to Ventura and forward.

Disk ejects is a hardware issue. It means Thunderbolt momentarily lost communication, or hiccup'd, which powers off the disks instantly.

Every contact point on Thunderbolt has two controller chips, which communicate with each other. If either chip crashes or resets, it momentarily auto-disconnects from the Thunderbolt bus. Even in this short time, the disks are signaled to power down. This is essentially what causes the disks to eject messages.

You can try making sure cables are tightly connected.
Move the cables to different ports (do this one at a time, so you have an idea what part of the Thunderbolt bus is causing it)
Make sure no kitty cats are moving the cables, even slightly. ;-)
You can try a different Thunderbolt cable if you have one.
Set display sleep to "never" and uncheck "Put drives to sleep" are also involved in this.

The causes of this are complicated. Even adding a Thunderbolt monitor can trigger disk ejects in some cases.

Posted : 06/09/2023 10:24 am

GuitarFlex

(@guitarflex)

Posts: 197

Estimable Member

Topic starter

Update. Need a fairly quick answer if the advice is "re-certify the disk." Certification passed but now I'm *really* confused. In the SoftRaid Log (not the tech support file, but just log), it said that during pass 2, the disk "encountered a write error." Then beneath that on another line of text it says the "certify disk command failed because the disk has unreliable sectors. The disk should be replaced immediately."

The rest of the story has already been told above. After it passed the 3rd round of certification I downloaded DriveRX but had to Initialize the disk and create a volume before DriveRX would tell me anything.

DriveRX now says that the "SMART Status" is OK (0 issues found). Picture included.

Here's the deciding question: If I had just ordered a ThunderBay Flex and it was being pre-certified, and all of the above had happened in-house while OWC was pre-certifying my disks, what would the protocol be?

Does DriveRX mean this disk is trustworthy? Or it it time to begin the 7-day process or certifying another three passes? I haven't come this far to screw it up in a rush at the end. If OWC would not re-certify, then I won't. But if they would, then I will.

I've mentioned it before and not to belabor the point but just saying one more time that I would *HAPPILY* pay a premium if I could buy pre-certified disks

Attachment : DriveRX.jpg

This post was modified 3 years ago by GuitarFlex

Posted : 07/09/2023 3:20 pm

SoftRAID Support

(@softraid-support)

Posts: 9201

Member Admin

@guitarflex

If we encounter any errors during certify, the disk is scrapped/returned. regardless. We warranty disks that we ship for 3 years with "replacement" warranties, not "refurbished. So it is self beneficial to pre-certify as we can eliminate early failures (reducing our cost) and giving customers a better solution.

Sometimes certify has an error for another reason. unstable mains power, for instance.

A write error is serious, and also less informative, as drives should never have write errors, data is written out "blind", meaning it is just sent to the media. So a write error is more of a communications error.

DriveDX uses the same SMART technology we do, they add a few things to predictive failure than we do, which may or may not be statistically significant. Unless there is a large population study, errors that sound "bad", may or may not be. We do not have enough data to justify adding, for instance "exceeded temperature" as a predictive failure mode.

If all the other disks passed, then I would return the drive. No sense having a drive that can not pass a certify around.

We tried selling pre-certifed drives. the biggest problem is supply. First supply chain issues, as it takes a week to certify a large drive, that is a week we cannot turn over the inventory. Now, with cost of carrying inventory, it makes the costs of certifying drives even higher.

This is definitely a topic we want to do. Maybe when supply chains are more stable, we can go back to it.

Posted : 07/09/2023 6:49 pm

GuitarFlex

(@guitarflex)

Posts: 197

Estimable Member

Topic starter

@softraid-support Thank you for the answer and update. It came in time since I had not yet begun another round hoping for it to certify a second time. The information about the write error is helpful. While I wish I didn't have to go through the hassle of returning the disk, I'm really glad to have the tools to know that it seems wise to this time. Also, it makes me feel good to know that I was right when someone else this week said "you're hanging on too tight!" As my Maestro used to say, knowledge is nice, but if we don't use it, it's just no good! I'll make a phone call tomorrow and sort the rest out with OWC. Hopefully the next time I'm on this forum it's not to report another disk error.

(I only certified the one this time, but have probably certified at least 4-6 disks since owning SoftRaid and this was the first time I encountered an error during certification, so yes indeed, all other disks have always passed).

Posted : 07/09/2023 7:10 pm

GuitarFlex

(@guitarflex)

Posts: 197

Estimable Member

Topic starter

Started the return process today and ordered two more 16TB disks (all from OWC). Feel free to sell me if I am mistaken about any of this:

Is there any reason to pay the $100 (or is it more? I can't remember off the top of my head) for an additional seat of SoftRaid Pro when all I need is to run disk certification tasks on a spare computer? I noticed it's only $50 to buy Lite, so my plan (unless there's a better reason to spend more) is to buy Lite to run on a 2015 MBP. Obviously I realize there are many other bells and whistles missing from Lite, but Disk Certification features are the same, correct?

This post was modified 3 years ago by GuitarFlex

Posted : 08/09/2023 3:56 pm

SoftRAID Support

(@softraid-support)

Posts: 9201

Member Admin

@guitarflex

If you already own Pro, just deactivate it, and activate on this computer. Long enough to finish the certification.

Posted : 09/09/2023 3:58 pm

Is This Error Code A Disk Error Or Application Error? Need Definitive Answer For Critical Data