Notifications
Clear all

BigSur 11.2.3 + SoftRAID 6.0.1b53 + RAID 5 (4 disks) + 1 disk failure = Crash on boot / volumes won't mount

(@bldraid)
Eminent Member

Apparently one of my 4 Seagate 3TB Barracuda drives has failed in my OWC Thunderbay enclosure. macOS will not boot at all with drive attached -- it gets to the startup chime and then promptly restarts. If I unplug the Thunderbolt cable and allow the machine to fully boot first, launch the SoftRAID app and then plug in the enclosure, the volumes on the RAID 5 array briefly mount with appropriate error messages (see attachment screenshot -- aside, why the heck aren't PNG file types allowed for attachments?! That's what macOS generates by default for screenshots) but the machine kernel panics and crashes within 30 seconds.

RAID 5 is supposed to be able to survive a single disk failure, right?

SoftRAID Console
Quote
Topic starter Posted : 08/03/2021 9:33 pm
(@softraid-support)
Member Admin

Images are not valuable for diagnosing, is the main reason we discourage image attachments. Attach a SoftRAID Tech SUpport file and I can diagnose much quicker. (please attach one). I would first go to SoftRAID RAID Preferences. Disable Auto Rebuild. The Rebuild may be triggering the panic. Lets see if that can get you going.

 

Also, did you disconnect the 4th (failing) disk?

ReplyQuote
Posted : 09/03/2021 12:52 am
(@bldraid)
Eminent Member

If images are discouraged, why allow jpegs then? I would think screenshots of the SoftRAID UI are going to be a common thing to want to post.

Anyway, I already submitted a tech support file as part of Case 01121438 and don't want to upload it here in the public forum.

No, turning off "Auto Rebuild RAID volumes" (and "Automatically rebuild Out-of-Sync mirror" too just in case) did not eliminate the kernel panics.

No, I have not disconnected the 4th disk -- I was hoping you could tell me how to get the LEDs to work or some other way of pulling the disks one by one and comparing them against the serial numbers SoftRAID still reports as okay.

ReplyQuote
Topic starter Posted : 09/03/2021 1:08 am
(@softraid-support)
Member Admin

I saw your case and took it over. If you respond again, I will get the case.

Blink disk light will work, but only if you insert one disk at a time, in this case, as the 4th disk may be causing the hangs.

ReplyQuote
Posted : 09/03/2021 1:24 am
(@bldraid)
Eminent Member

Just to follow up for others -- turning off auto-rebuilds did not help, but physically removing the failed disk eliminated the kernel panics. But I am wondering why the driver can't bypass a failed disk that's still plugged in -- kernel panics are never acceptable. Also, my drives are Seagate Barracudas Model SEAST3000DM001 with SMART status -- and I got no indication of impending failure before this catastrophic failure. :-(

 

ReplyQuote
Topic starter Posted : 09/03/2021 1:40 am
(@softraid-support)
Member Admin

@bldraid

There are several reasons. The most likely is SATA disks can time out when trying to read a sector and retry it. If hte timeout is too long, then the system will hang. This is an OS X behavior. Another is the disk has electrically failed and simply hanging the bus when powering up. There is no way around such an issue.

ReplyQuote
Posted : 09/03/2021 7:59 am
(@bldraid)
Eminent Member

Any thoughts on why SMART gave no indications of impending doom?

ReplyQuote
Topic starter Posted : 09/03/2021 8:18 am
(@softraid-support)
Member Admin

@bldraid

SMART is a check of the drive. If the circuitry fails, there is nothing to respond to the SMART query.

As you know, sometimes electronics fails. SMART checks for mechanical issues, it does not detect electrical circuitry faults, especially if you have the circuit board on the drive failing.

ReplyQuote
Posted : 09/03/2021 10:56 am
(@bldraid)
Eminent Member

Yes, but isn't the circuitry the least likely to fail barring some external event, such as a power surge (and there was none in this case)? Just discouraging to have sudden and catastrophic death of a disk that takes the OS down with it. I thought modern hardware and OSes were supposed to be more resilient against this kind of thing.

ReplyQuote
Topic starter Posted : 09/03/2021 11:00 am
(@softraid-support)
Member Admin

@bldraid

One would think so. reality is not living up to the hype. Technology is in a cycle of ever-present pressure to lower costs. Pressure to lower costs/be more competitive, means devices are rarely over-engineered.

Consider the history of disk drives. Early drives were tested and actually shipped with a printout of "defects". (that was when this was common). Now, drives are shipped untested at all. the burden goes to the consumer to do QA, and discover whether they received a reliable drive or not.

ReplyQuote
Posted : 09/03/2021 11:39 am
Share:
close
open