softraidtool isn't ...
 
Notifications
Clear all

softraidtool isn't returning a proper error code on exit but it should

16 Posts
2 Users
2 Reactions
1,598 Views
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

Today I started checking out softraidtool on the command line and was disappointed to find that in situations where an error occurs, softraidtool returns an exit code of 0.  In the Unix world, this is a bug.  Here is an example of what I'm talking about:

$ softraidtool volume badvolname info
SoftRAIDTool status: waiting for volume badvolname to appear (1 seconds remaining)
SoftRAIDTool error: volume badvolname did not respond
$ echo $?
0

 

This makes creating shell scripts using softraidtool harder to catch errors and act accordingly.

 
Posted : 16/09/2024 2:08 pm
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

Another example where a Unix command does return a non-zero code to the execution environment:

$ find badargdir -name foo
find: badargdir: No such file or directory

$ echo $?
1

 
Posted : 19/09/2024 11:33 am
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

From Wikipedia ( https://en.wikipedia.org/wiki/Exit_status#POSIX ):

POSIX-compatible systems typically use a convention of zero for success and nonzero for error.[14] Some conventions have developed as to the relative meanings of various error codes; for example GNU recommend that codes with the high bit set be reserved for serious errors.

Are the developers aware of this issue?

 
Posted : 26/09/2024 12:32 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@wfiveash 

This has been added as a feature request. I do not know any timeline, as it is a relatively low priority but also low impact change. Your feedback was welcome

 
Posted : 26/09/2024 6:13 pm
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

Personally this feels more like a bug than a feature (note, I worked as a software engineer on IBM's AIX and Sun's Solaris Unix variants from 1990-2017).  Of course, if you can tell me how one can go about creating a Python or shell program that uses softraidtool and reliably handles error conditions then I'd be happy right now.

 
Posted : 27/09/2024 12:06 pm
(@softraid-support)
Posts: 9200
Member Admin
 

@wfiveash 

We have created several python scripts with softraidtool.

What are you trying to create? Maybe we can assist?

We do not have the debugging responses you want, sorry, maybe in the future.

 
Posted : 27/09/2024 7:26 pm
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

Posted by: @softraid-support

@wfiveash 

We have created several python scripts with softraidtool.

What are you trying to create? Maybe we can assist?

We do not have the debugging responses you want, sorry, maybe in the future.

 

I'm not talking about debugging.  In your python scripts, how are errors from softraidtool detected?

For my use I'd like to be able to run a script via cron that regularly runs softraidtool validation on a volume.  I want that script to notify me with a summary as to whether the validation succeeded or not.

 

 
Posted : 28/09/2024 9:47 am
(@softraid-support)
Posts: 9200
Member Admin
 

@wfiveash 

The good news is this is a feature being added.

The command, you probably know, is

sudo softraidtool volume myvolume validate

Is the validate command failing, or, are you testing failure modes?

 
Posted : 29/09/2024 3:06 am
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

Here is a  ksh shell script snippet of what I want to do:

sudo softraidtool volume myvolume validate > /tmp/sr-log.txt 2>&1
ec=$?

if [[ $ec == 0 ]]
then
# notify is a command that sends a iOS push message to my iPhone and Macs
notify -m "Validation of myvolume succeeded without error"
else # an error was returned
notify -m "Validation of myvolume failed with error code: $ec, see /tmp/sr-log.txt for more detail"
fi

######################################################

If softraidtool always returns 0 regardless of whether the command it ran succeeded or failed then the "if [[ $ec == 0 ]]" logic will not work.

 
Posted : 30/09/2024 11:32 am
(@softraid-support)
Posts: 9200
Member Admin
 

@wfiveash 

Why not try this, create a 200GB volume (for speed).
Validate. Look in the log to get the validate message you need.
validate, pull the plug on the enclosure, or otherwise fail the validate. Look for that message.
Build the script around those messages.
The validate command should work reliably.

I can't get you any engineering advice at present.

 
Posted : 01/10/2024 11:04 am
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

Posted by: @softraid-support

@wfiveash 

Why not try this, create a 200GB volume (for speed).
Validate. Look in the log to get the validate message you need.
validate, pull the plug on the enclosure, or otherwise fail the validate. Look for that message.
Build the script around those messages.
The validate command should work reliably.

I can't get you any engineering advice at present.

 

What if those messages change?  Output like that is typically not seen as a stable interface that a script can rely on unlike actual return codes, not to mention it is much less efficient to parse text for this purpose compared to checking the return code.

 

 
Posted : 01/10/2024 11:59 am
(@softraid-support)
Posts: 9200
Member Admin
 

@wfiveash

Unlikely, that text has not changed in 30 years, since SoftRAID 2.0

We add things to logging, but there is no need to change what is logged. Older versions may not have had the serial number of disks, for example, but verify failed is the same since 1.0. You can rely on that.

 
Posted : 01/10/2024 5:18 pm
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

I still think this issue is a bug, not a feature.  Here is what I found regarding command line program return codes running on macOS:

On Apple macOS, command-line programs follow the same conventions for return codes (also called exit status codes) as most Unix-like systems, including Linux. The standard is based on POSIX, which defines the following conventions for return codes:

Common Exit Codes:

  1. 0 (Success): Indicates that the command or program executed successfully without any errors.
  2. Non-zero values (Error/Failure): Any non-zero return code generally indicates an error. Specific non-zero values may be used for different types of errors. For example:
    • 1: A general error (usually used when the error is unspecified or general).
    • 2: Misuse of shell built-ins (for example, when incorrect syntax is used in a command).
    • 126: Command invoked cannot be executed (e.g., permission denied).
    • 127: Command not found.
    • 128: Invalid argument to exit.
    • >128: These codes usually indicate that the process was terminated by a signal. For instance, 130 corresponds to termination by a Ctrl+C (SIGINT, signal 2), as the exit code is calculated by adding 128 to the signal number.

Anyway, I look forward to this getting fixed soon.

 
Posted : 02/10/2024 10:27 am
(@softraid-support)
Posts: 9200
Member Admin
 

@wfiveash 

I passed this on to engineering. This is not a high priority, but it may be an easy addition. No promises, but it will be looked at.

 
Posted : 04/10/2024 10:07 am
envoy510 and wfiveash reacted
(@wfiveash)
Posts: 49
Trusted Member
Topic starter
 

I see in SR 8.5 that this bug does not appear to be fixed -- when can I expect this to be fixed?

 
Posted : 08/04/2025 12:12 pm
Page 1 / 2
Share:
close
open