Lost the disks, now...
 
Notifications
Clear all

Lost the disks, now what?

17 Posts
2 Users
0 Reactions
1,273 Views
(@mark5009)
Posts: 13
Active Member
Topic starter
 

Hi.

Today I had a problem when I nudged the Thunderbolt cable from my Mac Studio M1 Max (running 12.6.1) to my OWC Thunderbay 4. Basically the system went down, I've now lost a drive, and I'm in SoftRAID hell (run it with the drive connected and I get constantly rebooted).

I upgraded to SoftRAID 7.0.1 and the problem persists. Basically all my e-stuff is on a this box (half setup as a mirror 2-disks, the other half as striped 2-disks--all four are 4Tb WD and all new and healthy (or where)) and I'd really like to get this back.

I managed to get a support log from one of the runs and it is attached. I also have a crash log but the attachment won't allow it so I'll post details below.

Any help greatly appreciated!

TIA .. mark.

 

 

-- CRASH log

This topic was modified 3 years ago by SoftRAID Support
 
Posted : 02/12/2022 11:39 pm
(@softraid-support)
Posts: 9200
Member Admin
 

You need to "Make plain text" to attach the panic log, or Text Edit saves the file as .rtf, which is not allowed here. Does not matter what you think you name it, the hidden extension is rtf, until you use "Make plain text".

 

That this happened just from a loose cable, is weird.

I need you to create a second support file, with the other disk to the RAID 0 volume connected, not this one, so we have both disks captured.

Nothing can happen until next week, probably Monday/tuesday, however. We will look at it and hopefully can fix both volumes.

 
Posted : 03/12/2022 1:11 pm
(@mark5009)
Posts: 13
Active Member
Topic starter
 

@softraid-support 

Thanks for the reply. I'm really dead in the water here and anything you can do to help is greatly appreciated. Yup, this is the result of me cleaning up my desk and jagging the thunderbolt cable. Not quite what I was expecting either and it is making me rethink my backup and reliability strategy. Having RAID disks is nice but if this single cable fail is going to corrupt everything it is not working the way I want it to.

Okay so I restarted the RAID box and got the log and support file. One of the disks in the dock is missing (I assume this the one marked as dead and it is part of the striped set), two of the others are in a constant wait state signaled by "getting disk info" and the fourth disk seems fine. I'm not sure what else to do here.

Thanks in advance

 

 ..mark.

 
Posted : 03/12/2022 8:02 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@mark5009 

Power cycle the enclosure and see if the disks spin up properly.

 
Posted : 03/12/2022 8:35 pm
(@mark5009)
Posts: 13
Active Member
Topic starter
 
Posted by: @softraid-support

@mark5009 

Power cycle the enclosure and see if the disks spin up properly.

Okay. Here's what I'm doing:

1. shut down mac and raid (connecting from USB3 -> thunderbolt -> raid)

2. start raid. start mac. kernel panic and shutdown.

3. turnoff raid. restart mac and get panic log (attached)

4. turn on raid. get error dialog (Read Disk Error -- disk 11; softraid ID 090D4F021312C500). click ok

5. new error dialog (SoftRAID Volume Disappeared for "secure" [one of the mirror disks]). click ok

6. same error dialog as 5 appears. click ok

7. start SoftRAID 7.0.1 which now shows no disks at all and one volume (the attached timemachine)

8. recycle power on raid, no change in SoftRAID (no disks)

9. restart SoftRAID with same result

10. scratches head and finishes this post :-)

 

 

 
Posted : 03/12/2022 9:36 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@mark5009 
Disconnect your disks. Run SoftRAID 7.0.

"reinstall SoftRAID driver"

"Allow" OWC if prompted.

Restart

Connect your disks.

What do you see? still getting a panic?

 
Posted : 04/12/2022 1:02 am
(@mark5009)
Posts: 13
Active Member
Topic starter
 

@softraid-support 

0. Power off raid. Disconnect cable. Reinstall SoftRAID driver. Restart. Power up raid. Reconnect cable

1. First up is a "SoftRAID Disk Read Error" (disk10, SoftRAID ID: 090D4F021312C500). Click OK

2. Start SoftRAID and I see all 4 disks with "getting disk info"

3. Dialog: SoftRAID Volume Disappeared (volume 'secure')" a disk was removed (the read error disk 10?). Now 3 drives showing all at SATA bus 0, ID 0, LUN 0 (Thunderbolt), Click OK

4. Dialog: "SoftRAID Disk Read Error" (disk11, SoftRAID ID: 090D4EFAF5DFD300). Click OK

5. Dialog: SoftRAID Volume Disappeared (volume 'secure')" a disk was removed (the read error disk 11?). Click OK

6. Dialog: SoftRAID Error: An error occured reading the partition map on the disk. The disk or its cables are unreliable and should be replaced. Disk with error: (null)" Click OK

7. Dialog: SoftRAID Error: An error occured reading the partition map on the disk. This disk hung during a read. The disk should be replaced. You may have to restart your mac before this disk will respond again." (disk 10) Click OK

8. Dialog: SoftRAID Error: An error occured reading the partition map on the disk. This disk hung during a read. The disk should be replaced. You may have to restart your mac before this disk will respond again." (disk 8) Click OK

9. Dialog: SoftRAID Error: An error occured reading the partition map on the disk. The disk or its cables are unreliable and should be replaced. Disk with error: (null)" Click OK

10. Dialog: SoftRAID Error: An error occured reading the partition map on the disk. This disk hung during a read. The disk should be replaced. You may have to restart your mac before this disk will respond again." (disk 7) Click OK

End of dialogs.

SoftRAID shows 3 disks in "getting disk info" state (disk7, disk8, disk10). disk9 is MIA. There are no volumes showing.

--

Is there anything else to be done here but pull the disks and wipe them? Is this actually a disk fail, a controller fail, a software fail, or some combination of these?

TIA .. mark

 
Posted : 04/12/2022 5:18 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@mark5009 

The disks are hanging. whether enclosure, cable or drives is unknown.
What if you only connect one disk? Does it also hang?

Do you have a different Thunderbolt cable to test with?

 
Posted : 04/12/2022 7:09 pm
(@mark5009)
Posts: 13
Active Member
Topic starter
 

@softraid-support 

Thanks for your help! Making some progress (yeah!)

I pulled both disks for the striped volume and left the two for the mirror. SoftRAID recognizes both disks as good, with "no errors" but the volume as "degraded." I've started a "rebuilding" and am now about 10m I/O requests in without error, so going well so far.

This seems to indicate that the cabling is okay and the system is, at a hardware level, working correctly.

Once the mirror is going correctly (finished the rebuild) I'll plug in the 2nd stripe volume and see what happens. I suspect I'll need to replace at least 1 drive

More updates as they happen

Thanks again

 .. mark

 
Posted : 04/12/2022 8:30 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@mark5009 

When you get further, you can post a support file. I can guide you based on that.

 
Posted : 04/12/2022 10:08 pm
(@mark5009)
Posts: 13
Active Member
Topic starter
 

@softraid-support 

Now it is just getting weird, maybe in a good way. I did the rebuild on the mirror then rebooted the mac, disconnected and reconnected the raid, and the mirror came up clean, no errors. Great. Plugged in stripe-a and says it is missing a disk. Expected. Pushed in stripes-b and it immediated came up with a "Replace Disk" error but when cleared shows the striped volume up and working with no errors. It also reports both disks as clean with no errors. Go figure.

Attached is the report

 
Posted : 05/12/2022 4:38 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@mark5009 

this is the error you saw:

Dec 06 08:25:45 - SoftRAID Driver: A SoftRAID disk (disk11, SoftRAID ID: 090D4F021312C500) previously encountered a read or write error. This disk should be replaced.

It refers to one of the stripe disks, which had an error in the past. It may be failing, I cannot tell for sure, but it should not be "dropping out" like that.

This is what the log shows about that error: (indicating the disk was hanging/not responding)

Dec 03 14:02:50 - SoftRAID Driver: A disk (disk6, SoftRAID ID: 090D4F021312C500) for the SoftRAID volume "stripes" (disk8) was removed or stopped responding while the volume was mounted and in use.
Dec 03 14:03:39 - SoftRAID Driver: A disk (disk4, SoftRAID ID: 090D4F021312C500) for the SoftRAID volume "stripes" (disk6) was removed or stopped responding while the volume was mounted and in use.
Dec 03 14:03:49 - SoftRAID Driver: A disk (disk13, SoftRAID ID: 090D4F021312C500) for the SoftRAID volume "stripes" (disk17) was removed or stopped responding while the volume was mounted and in use.
Dec 03 15:09:01 - SoftRAID Driver: A disk (disk11, SoftRAID ID: 090D4F021312C500) for the SoftRAID volume "stripes" () was removed or stopped responding while the volume was mounted and in use.

 
Posted : 05/12/2022 4:47 pm
(@mark5009)
Posts: 13
Active Member
Topic starter
 

@softraid-support 

Okay.

Given that everything is now up and going again, with all disks showing "no errors," is there any further action I should be taking? Or are we good to go?

Again, many thanks for your help!

 .. mark.

 
Posted : 05/12/2022 5:09 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@mark5009 

As long as an individual disk is not dropping out, you are good to go.

 
Posted : 06/12/2022 2:56 pm
(@mark5009)
Posts: 13
Active Member
Topic starter
 

@softraid-support 

Thanks! Any idea of why this all turned so bad so quickly?

 ..m.

 
Posted : 06/12/2022 4:46 pm
Page 1 / 2
Share:
close
open