Watchdog kernel pan...
 
Notifications
Clear all

Watchdog kernel panics - only with RAID'd SSDs

Page 2 / 3
(@johnd)
Active Member Customer

@softraid-support INTERESTING! I'm having the same issue, and have been chasing the root cause. I was running a 2010 Mac Pro (w/OpenCore & Big Sur 11.2.3), running SoftRAID 6.0.5 and no issues with the machine at all. I migrated to a 2013 Mac Pro on Saturday (4 days ago), with a ThunderBay 8 (Apple TB3 to TB2). I didn't run 11.2.3 very long (with no issues) 'till I upgraded to 11.4 and started having screen freezes (background tasks kept running, mouse pointer still moved), then watchdog would panic over WindowServer and reboot. I upgraded to 11.5RC2 and that didn't help. I've been avoiding running groups of apps to see if/when the system becomes stable. I wonder if it's SoftRAID after reading this thread? I might try the 6.1 beta and see if that helps.

 

PS - I'm running 2x SATA SSD's in SoftRAID-0 for my VM's. After not having my VM's running the past 24 hours, the system seems stable. That led me to suspect Parallels (I'm running it in legacy KEXT mode to get bridged networking), but I wonder if the SSD RAID activity with SoftRAID is the issue?

 

PPS - The 2x SSD's in SoftRAID-0 are formatted APFS, if that makes a difference.

This post was modified 7 days ago 2 times by JohnD
ReplyQuote
Posted : 21/07/2021 11:36 am
(@softraid-support)
Member Admin

@johnd

Please run the beta 3 to see if there are any changes. I would not expect, as we do not get many users with frequent kernel panics during use, as you describe. the watchdog panic, as far as I know, is a timeout issue, generally, but not exclusively at shutdown. If we can help diagnose it, we will. Let me know your experience with 6.1

ReplyQuote
Posted : 21/07/2021 11:55 am
JohnD liked
(@jupeman)
Active Member

@johnd Note that my experience was NOT SoftRaid.  Basic Apple RAID (Stripe0 using Disk Utility) is what I finally suspected (after months of testing crazy things).  Sonnet, the maker of my PCI-E RAID card suggested I try SoftRaid.  6.0.5 did not help:  exactly the same issue.  HOWEVER...

@softraid-support ...so far running 6.1b3 works.  No KP's yet.  It has not been long, only 1.5 days, but the situations where I would see almost certain kernel panics have not triggered any yet.  You know better than me, but there is definitely something wrong in the bowels of how macOS is handling these RAID setups causing the watchdog issues. Perhaps 6.1 specifically addresses this? 

This post was modified 7 days ago by Jupeman
ReplyQuote
Topic starter Posted : 21/07/2021 12:18 pm
JohnD liked
(@johnd)
Active Member Customer

@softraid-support I'm still turning things on one-by-one every few hours to see when it panics again. In an attempt to cut-to-the-chase, I'm running a RAID verification with SoftRAID, on that 2x SATA SSD volume, and no issues so far. I still highly suspect Parallels, and SoftRAID may or may not be related. Once I fire up the VM's and experience a panic, I'll upgrade SoftRAID to 6.1b3 and see if anything changes. If not, I'll revert Parallels back to UseKestless=1 and if the solves the problem, report the issue to them. I'll keep you posted over the next day or two.

ReplyQuote
Posted : 21/07/2021 12:39 pm
(@jupeman)
Active Member

@softraid-support I spoke too soon.  Two KP's today under beta.  The exact same panic as before.

 

This post was modified 7 days ago 2 times by Jupeman
ReplyQuote
Topic starter Posted : 21/07/2021 9:41 pm
(@softraid-support)
Member Admin

@jupeman

You bring up an interesting question. What we did was changed a procedure to avoid an issue. It is possible Apple's RAID, which is now 10 years old, suffers from the same problem. I will ask around.

ReplyQuote
Posted : 22/07/2021 10:26 am
(@softraid-support)
Member Admin

@jupeman

And this was happening under Apple RAID also? The SoftRAID driver is loaded, but not in the backtrace, I notice. Seems to be crashing in the Thunderbolt chips.

Can you attach a SoftRAID tech support file?

ReplyQuote
Posted : 22/07/2021 10:49 am
(@jupeman)
Active Member
Posted by: @softraid-support

@jupeman

You bring up an interesting question. What we did was changed a procedure to avoid an issue. It is possible Apple's RAID, which is now 10 years old, suffers from the same problem. I will ask around.

@softraid-support. I am concluding that there is something fundamentally wrong with how macOS is handling external disks - or non Apple disks.  This is the second Mac Pro (and this is a brand new virgin machine from Apple that they sent to me in an attempt to fix the KPs...) that I am experiencing these issue on.  My Mac just froze after 30 minutes with in an external drive (also SSD - btw, I tried OWC's Envoy Express and it disconnects all the time, too - an issue I haven't even tried to take-up with tech support yet).  I will note that what does seem to work right now:  the Promise r4i.  For that I use it with the macOS built-in PromiseSTEX kext (6.2.13).  So there is a RAID5 on 4xHDDs that seems fine.  Anything SSD not internal Apple seems to not work, even when HFS+.  I wonder if it is all T2 related, simply because my trusty 2015 MBP has no issues (and also doesn't have T2).

This post was modified 6 days ago 2 times by Jupeman
ReplyQuote
Topic starter Posted : 22/07/2021 10:51 am
(@softraid-support)
Member Admin

@jupeman

Are the freezes you are getting showing "Watchdog" in the panic logs?

Disk ejects are thunderbolt related, a known problem that has been mostly ignored since 2016.

Perchance do you have dual Monitors? What kind of docks/hubs? Do you have the rack mount or "desktop" version? Do you get ejects on both thunderbolt areas?

ReplyQuote
Posted : 22/07/2021 11:03 am
(@jupeman)
Active Member
Posted by: @softraid-support

@jupeman

Are the freezes you are getting showing "Watchdog" in the panic logs?

Disk ejects are thunderbolt related, a known problem that has been mostly ignored since 2016.

Perchance do you have dual Monitors? What kind of docks/hubs? Do you have the rack mount or "desktop" version? Do you get ejects on both thunderbolt areas?

@softraid-support First off, thank you for the engaged conversation.  It is awesome support.  If only I could talk with Apple engineers so freely.

 

No freezes showing up in any log.  

Crazy on the Thunderbolt ejects...

Yes, I do have dual monitors (LG 5k).  No hubs.  Desktop Mac Pro. I'll test the various thunderbolt connections, I hadn't really investigated the disk eject much.  Just happened when I plugged it in, though.  Took about a minute!!

ReplyQuote
Topic starter Posted : 22/07/2021 11:32 am
(@softraid-support)
Member Admin

@jupeman

In case those are older (TB2) monitors, they had a recall to fix a disk eject triggering issue. I doubt yours are that old, but worth bringing up.

I see complaints about Thunderbolt triggering ejects occasionally, so I borrowed a couple LG Monitors, a 4K and 5K, for testing. After 6 months, though, I never saw a disk eject on my MP. Finally returned them to product development. Its a frustrating problem to deal with.

ReplyQuote
Posted : 22/07/2021 1:04 pm
(@jupeman)
Active Member
Posted by: @softraid-support

@jupeman

In case those are older (TB2) monitors, they had a recall to fix a disk eject triggering issue. I doubt yours are that old, but worth bringing up.

I see complaints about Thunderbolt triggering ejects occasionally, so I borrowed a couple LG Monitors, a 4K and 5K, for testing. After 6 months, though, I never saw a disk eject on my MP. Finally returned them to product development. Its a frustrating problem to deal with.

@softraid-support Nope, TB3 monitors...  Early times, but the Envoy Express ejected very quickly and numerous times when connected to the top of the Mac Pro, fairly quickly when connected to a TB port on the "connection card" (the USB + TB card in every MP) but it has not ejected when connected to a TB port on my graphics card (6700x) so far...

Have we exhausted ideas on the Raid0 with four SSDs on the Sonnet card discussion?

ReplyQuote
Topic starter Posted : 22/07/2021 1:53 pm
(@johnd)
Active Member Customer

Just following up, and adding to the conversation. My KP's are total screen freeze, mouse cursor still active, background tasks still active, WindowServer crashed KP, and watchdog eventually reboots the machine. Since installing 6.1b3, I have not had a KP, but it's only been 1.5 days so far (a record worth noting though). I have a Thunderbay 8 with a 4x HDD RAID-5, and 2x SSD RAID-0 (for VM's). I have not had any issues with drives ejecting. I have my primary monitor connected via HDMI, and second monitor connected via Apple TB2-to-DVI adapter. Both monitor feeds are KVM'd, so not directly attached to monitors. I've read that USB hubs can cause this issue, and I do have 4x USB3 hubs (one for each USB3 port on the 6,1) - I really do have that many USB devices to plug in, but have not connected very many yet due to troubleshooting.

 

If this turns out to be an issue with external drives/RAID's, that should be able to be solved via OpenCore. I'm actually an admin in the Facebook group "OpenCore - on the Mac Pro", and could write a simple config so drives appear as internal to macOS, if that's worth testing. That's a trick we used on the 5,1 to allow recovery partitions and crash dumps on PCIe/NVMe drives.

ReplyQuote
Posted : 23/07/2021 12:05 am
(@jupeman)
Active Member

@softraid-support Another KP this morning, same thing.  6.1 beta definitely does not help this problem.  Any other thoughts?

ReplyQuote
Topic starter Posted : 23/07/2021 7:27 am
(@softraid-support)
Member Admin

@jupeman

It was worth the try, although SoftRAID is not in the backtrace, it is some kind of time out going on. Is the panic happening often enough where you can start elminating some variables? (other devices, drivers, and see if we can isolate the actual trigger for this)

 

ReplyQuote
Posted : 23/07/2021 11:16 am
Page 2 / 3
Share:
close
open