Validate RAID1+0 vo...
 
Notifications
Clear all

Validate RAID1+0 volume insta-fails on 6.2.1b3

8 Posts
2 Users
1 Reactions
611 Views
(@higgins)
Posts: 23
Eminent Member
Topic starter
 

Hi there,

I've got an 8-bay OWC enclosure w/14TB spinning disks, configured as RAID1+0. Host machine is an M1 Max laptop, connected via the OWC TBT4 dock to the enclosure using OWC cables. Using SR 6.2.1b3 because I've got other RAID volumes (including one RAID5) that may really need it.

The setup works fine; I've been editing off of it for three work days and am happy. BUT. I went to validate the array today, for the first time ever—figured it's been in use a while, this is its second host computer, I've been writing tons of data to it, so I'd like to know how it's doing.

The validation instantly fails, citing "a read error while validating the volume" and suggesting "This disk should be replaced." It's pointing to the disk in the "A" position (far left in enclosure), but neither the disk nor the volume reports any I/O errors. All SMART tests (including by DriveDx) look totally normal for all disks in the array. The log lines are like this:

Dec 29 15:29:27 - SoftRAID Driver: The volume "Tetris Array CH" (disk16) has started validating. Incorrect blocks will be updated.
Dec 29 15:29:27 - SoftRAID Driver: The volume "Tetris Array CH" (disk16) failed to validate because one the disks encountered a read error. The disk (disk8, SoftRAID ID: 092A49B8410A6780) was unable to read sectors (offset 0, i/o block size 16777216, error E00002C2). This disk should be replaced.

...I tried un-mounting all volumes, powering down and removing every attached drive except that array, and starting up again. Same behavior when trying to validate, though the volume "acts fine" otherwise. I have not tried validating other arrays on this software/hardware configuration, though could easily do that if it would help.

So...is this a known issue? Or do I really have a failing drive all of a sudden? Curious if there are steps to troubleshoot or perhaps it's a beta thing.

 
Posted : 29/12/2021 6:59 pm
(@softraid-support)
Posts: 9200
Member Admin
 

This could be a hardware issue. I set up 8 flash drives in a RAID 10 and validated successfully.

So this is unlikely to be a SoftRAID application bug.

 

Since this replicates and on the A drive, try a "verify disk" on the A drive and see if it passes. restart to give it the best shot at passing.

If it can verify disk successfully, start another validate and then save a SoftRAID tech support file and attach it to the forum.

 
Posted : 29/12/2021 11:46 pm
(@higgins)
Posts: 23
Eminent Member
Topic starter
 

@softraid-support OK, I followed the steps you outlined, but I'm still running into something odd.

The "Verify Disk" operation on Disk A worked fine and took less time than I expected (only 14 hours or something). So, emboldened by that, I tried to Validate the array again. Same error as before. SR asks for admin password, then immediately fails out, claiming that first disk is bad.

Attached is the support file. I did try the Validate operation twice, just in case...same result.

Incidentally, first Validate I had unmounted the volume, as SR told me that was preferred for speed when doing VERIFY (not validate). I figured maybe it would help during validation as well. BUT, of course, when I initiated the Validate, SR was like "waiting for mount" or some such status, so I used SR to mount the array. That's when it immediately failed the first time. Second time was right after that, while it was still mounted, no other changes.

If it would help, I have an Intel machine here on the same OS version w/latest retail SR installed. I could attempt Validate w/that machine? (Could also install the SR beta if that's preferred—would mean fewer software stack changes.)

 

 
Posted : 04/01/2022 6:20 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@higgins 

Try the intel machine, you have me curious. I don't see why it is failing on validate. I may need to get engineering involved, but people are still coming back from travels and flight delays.

It is the 8A disk with the read error, I can confirm. I wonder if there is a different IO pattern in validate vs verify disk.

Application version or driver version won't make a difference here, as long as the driver is loading, which it is.

 
Posted : 05/01/2022 12:19 am
(@higgins)
Posts: 23
Eminent Member
Topic starter
 

@softraid-support OK, I have started the Validate on the Intel machine (iMac Pro). There is clearly something different here, as the process actually is running rather than failing out immediately. I'll keep you posted when it completes, or if something noteworthy happens. :)

 
Posted : 05/01/2022 2:17 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@higgins 

THanks for letting me know. I will investigate in case this is a bug in the M1 code.

 
Posted : 05/01/2022 5:54 pm
(@higgins)
Posts: 23
Eminent Member
Topic starter
 

@softraid-support Right on. BTW, the Validate completed overnight on the Intel machine. There were 93 blocks updated. Would it be useful to have a .supt file or other info?

 
Posted : 06/01/2022 11:14 am
(@softraid-support)
Posts: 9200
Member Admin
 

@higgins 

No, I got this, it is a bug in SoftRAID application. I have it in our bug list now. RAID 4/5 validate, but the bug affects other volume types.

This post was modified 4 years ago by SoftRAID Support
 
Posted : 06/01/2022 12:11 pm
higgins reacted
Share:
close
open