Thanks on the TLER hint.. I've seen mention of it before but will follow that trail next. The externals are already now primary due to the restart after kernel panic, and the internals were originally set up to be primary, so I don't hold out much hope for the reboot helping, but I have to reboot anyway to get the internal drives alive so I suspect. I'm looking at all possible ways to keep MacOS from invoking restarts now as well.. The KP this morning was just a nice surprise :)
Another update. After getting things going, here are a few things I have observed.
Once I got the 1+0 rebuilt, verified with DW and Disk Utilities, I logged in and started a sync with Dropbox.. several hundred GB of stuff. I got a kernel panic within about 40 minutes (this I think I reported earlier) that resulted in a split mirror.
I then thought it would be a bright idea to finish the Dropbox sync with only the currently active mirror stripe (external), so I did. The entire several-hundred GB synced up and now I'm ready to re-add the missing mirror disks. Note that the Dropbox sync was a success this time.
Well, numbskull that I am, and late at night, when I was deleting the non-up-to-date mirror (INTERNAL) in order to reattach it and get the mirror sync started, I deleted the EXTERNAL (most up-to-date mirror) accidentally. Oy. Well, I only lost that DropBox sync so what the heck. I continued along and rebuilt the 1+0, set the primary mirror to the internal drives, logged in as myself, and started the DropBox sync all over again.
Another kernel panic.
This time, luckily enough, when I powered down then back up after the KP, the internal drives were somehow considered secondary and external primary, it Softraid recognized this and re-synced them in just a few minutes. Whew!
I started the DropBox sync again, this time I throttled it to 10Mbps, I don't know, just to see if slowing things down made a difference. Well, it went along for a good while, so I bumped it to 20Mbps. Some time after that, another kernel panic.
Luckily (and right now it sure feels like luck) when I rebooted the EXTERNAL mirror was still primary and it SoftRAID synced the still-secondary INTERNAL mirror to it in a few minutes (recall a previous reboot sequence resulted in both mirrors being declared primary even though I had set the EXTERNAL to primary).
Trying to sync DropBox again.
No idea what's causing the KP, I don't read kernel tracebacks. What I do know from above is that the one time the DropBox sync succeeded was when I was only using one side of the mirror and the 1+0 was running degraded, without the internal mirror drives at all. Every other time with the 1+0 up and running fine, I get a KP when doing the DropBox sync. I'm waiting for the next one now, although I have choked the bandwidth down again, just because it's all I have left in the bag of tricks to try, aside from splitting the mirror intentionally..
You guys interested in the kernel traceback?
This is getting old really quick. After several more attempts at syncing DropBox and several more kernel panics, SoftRAID was holding up fine enough, recognizing the out-of-sync mirror pair after a shutdown/power-up and syncing quickly then letting me repeat the inevitable. Until the last one. After the last KP I did the same things.. log in as admin (no data on the RAID), look at the situation with SoftRaid (missing internal disks), shutdown, power up. Then it got ugly on this round. The internal disks became primary mirrors and the two external disks were reported as "data disk". Huh? That's new and exciting. Not. Well, OK, let's shutdown and restart again and see what SoftRAID thinks it has now. OK, this time it's a split mirror. Let's do it one more time.. Split mirror. Well, I guess 2 out of 3 ain't bad. Which seems to be about the consistency I'm getting right now with SoftRAID's ability to recover from error cases.. maybe 2 out of 3 times. But I digress.
At this point I'm on the verge of ripping out SoftRAID and finding another solution. The disk boot problem, well I can throw money at that and find a pair of disks that show up when the mac KPs or restarts or updates software and at least not tempt SoftRAID with trying to figure out the world after such shenanigans.
I need a solution. I SoftRAID to consistently keep the external drives as primary no matter if the internals show up or not. This is regardless of whether I can solve the issue with the disks disappearing on a restart. I need to be able to sync DropBox on the mirror without a KP. Maybe next time it splits mirrors I'll do the sync before recreating the mirror part of the 1+0. I'm running completely out of ideas. The ONLY other variable is that I did run MacOS update after recovering the first time and it did two security updates, I believe.
What else can I provide to you folks to get some help here? I've spent a week as of today trying to get this back up and running solid again.
Send me a SoftRAID Tech support file first. (attach it)
What brand PCI card do you have?
What brand is the enclosure?
This info and support file may help me diagnose this.
@softraid-support PCI card is a Sonnet Tempo SATA Pro 6Gb 4-port P/N TSATA6-PRO-E4
Enclosure is a MediaSonic 4-bay enclosure model HF2-SU3S2
I am connected via the eSATA ports
I have attached a support file; however, at the time of collection (now) I have rebuilt everything and all is back in order, except I am hoping to get some advice before resuming the DropBox sync again. I actually need to do some things besides rebuild my disks for a change :) I wish I had gathered the TS file when things were in their funkiest of states earlier but I just wanted to get it rebuilt at the time.
I did boot in Safe Mode prior to the most recent mess, hoping any kernel cache issues might be cleared up and that could have been the cause of the crash, but obviously that didn't help.
One idea I'm toying with is moving my dropbox sync folder to another drive that is a regular Mac APFS or HFS drive and ensuring it works that way (as it did once before on the active half of the split mirror) then copying the files back into my home folder on the RAID. Assuming that all worked I'm not sure DropBox would be entirely happy with that once I moved the sync folder though.
First thing is the kernel panic. the dropbox extension is in the middle of the panic, however, in the Apple forums is this clue on the cause of the kernel panics:
The panic log indicates a fault in the drivers for the graphics processor (GPU.) Possible causes that have been reported include a failed GPU, overheating due to dust buildup or a failed cooling fan, and perhaps a bug that may be triggered by using more than one GPU in a Mac Pro (especially NVIDIA GT 120's in a 2009 Mac Pro) and/or connecting more than one display.
So try pulling the card and blowing the dust out of your machine as best you can (reverse end of a vacuum cleaner), including blowing out the fans, then reseat the card. See if that helps the kernel panics.
I think these drives are the cause of not showinig up after a warm restart. I believe they have a TLER enabled setting that does not play well with the Mac Pro's. It may be possible to contact WD and get a firmware app that can disable TLER. (would require booting from linux, or Windows in some way)
Another idea is this. Get a SSD drive. Put it in the lower DVD slot, the connector is already there. Then move all 4 drives internally. That will stop the mirror splitting, although you need to eliminate the cause of the kernel panics. It will make your machine much faster also. You do not even need a "fast" SSD, as the SATA in that slot is limited to 300MB/s, but that is much faster than any SSD and APFS does not work well on HDD's anyway.
@softraid-support I have tried to leave at least two follow-up replies in this forum and do not see them when checking back in.. are you getting them?
I don't think so, just the one thread here. I would see all responses they come in for my approval.
@softraid-support OK, here's what I thought I had entered twice now :)
First, I did the dust-blowing etc as suggested, no change in behavior, still panics in the same scenario.
After some research on my own, I am not convinced it would be a graphics-card related panic. Those tracebacks have different drivers than the tracebacks in my panics. My tracebacks have AHCI and related block device drivers, while the graphics card panics have drivers specific to the device (NVDA) or other graphics-oriented driver names.
To provide further evidence, here is some info about more experimentation I did on the issues.
First, you may recall I previously successfully synced my DropBox account to a degraded 1+0 RAID, i.e. only one slice was active, on the external drives, the mirror of that slice was not online at the time.
For reasons of my own doing, I had to re-sync DropBox again. After rebuilding the RAID array again and getting the 1+0 fully functional, I tried to sync DropBox, about 350GB worth of mostly large video files (up to 2G in some cases).
EVERY attempt at syncing now resulted in a kernel panic within 20-30 minutes.
After multiple attempts at this along with several more incidents of split mirrors, caused by the restart issue in my Mac Pro, I decided to attempt the DropBox sync again with only one side of the mirror active again. This time, it was the internal drives making up the slice. This time, as the previous time in a similar configuration, the DropBox sync completed fully.
Afterwards I re-added the external drives as the mirror, they synced to the internal drives and all is up and running now.
What does this suggest to me?
Given:
* panics always happened with internal slice and external slice mirrors active
* Dropbox sync succeeded twice with no panics when only one side of the array was active, i.e. the mirrors were split and I was running a slice unprotected/degraded.
* those successful syncs occurred once on the internal drives, once on the external drives
* a full restore from Time Machine succeeded with the 1+0 also intact
It appears to me that:
- the panic is related to disk access based on the traceback
- the factors involved in the panic scenario are: RAID 1+0 fully in service, large sync from DropBox, maybe something related to large data transfers over a network.
- the panic ONLY occurs with the mirrored slices are active and synced
- either Dropbox or SoftRAID or the combination is creating an issue which causes a panic. I'm not saying it necessarily originates with either product's kexts, but it's a place to start. Obviously other factors could be the Sonnet Tempo card in play or some conflict between large concurrent writes to the internal drive bus and the external drives via PCIe.
I also noted there was a similar thread on the forums with a mac pro 5,1 kernel panic that SoftRAID said was to be fixed in a 5.8 maintenance release.. I don't have the link handy as this is my third attempt at providing this data, and I only had the link handy the first time around. I think it would be worth another look by SoftRAID developers.
Thanks.
There is an Apple test tool that we run, for hundreds of hours at a time, that tests for any driver instability. We do not know of any I/O issues in 5.8.4 that are related to SoftRAID. We also run our own torture test driver tools. RAID 1+0 is pretty simple also.
Dropbox should not have access to low level disk access, so I would guess is not involved.
If you have the externals as "primary" do you get the kernel panic? (anotherwards, is the panic when there are 4 disks, or when you are using the externals?)
It could be the Sonnet card. I think it uses the Mac's built in driver, though. One thing you can try, is the Lycom card is pretty inexpensive and also uses the Mac's built in drivers. Might be worth a try to eliminate another possible cause.
@softraid-support To answer your question about the panic and if externals or internals are primary, the panics would occur with either set of drives as primary. That was one of the changes i experimented with in the configuration to try to avoid split mirrors, that is to have the primary external. Or internal. Both failed during the DropBox sync with the same panic.
Sure, it well could be the Sonnet card. Is there any useful information in the panic logs that may point to one place or another?
Panic logs are very difficult to read, except when deep dived by an experienced engineer, as you have to be able to interpret the registers, etc. I cannot do it.
Its a very busy time period, but I can try a configuration like yours, but it may be a while before I can get to it.

