CommandLine::OptionParser

Welcome to OptionParser

OptionParser is designed to be a flexible command line parser with a Ruby look and feel to it. OptionParser got its birth from the need for a parser that is standards compliant, yet flexible. OptionParser supports the standard command line styles of Unix, Gnu and X Toolkit, but also lets you break those rules.

OptionParser is not a port of a traditional command line parser, but it is written to meet the feature requirements of traditional command line parsers. When using it as a library, you should notice that it is expressive, supports Ruby’s blocks and lambda’s, and is sprinkled with a little bit of magic.

While the library can be used by itself, it is also designed to work with the CommandLine::Application class. These tools work together to facilitate the generation of a sophisticated (batch oriented) application user interface in a matter of minutes.

If you need a refresher on the traditional option parsing schemes, see "Traditional Option Parsing Schemes" below.

Jumping Right In

OptionParser Usage

The OptionParser library consists of three classes, Option, OptionParser and OptionData. For each option an Option object is created. When you are ready to prepare for command line parsing, these options are collected into an array and fed to OptionParser. This OptionParser object controls the type of option scheme that is implemented. When it comes time to parse a command line, call the method Option#parse. This will parse any array, but parses ARGV by default. The result is an OptionData object. This object can be used from which to extract values or it can be passed to another class as a fully encapsulated data object.

Getting Started

Installing

  gem install -r OptionParser

Loading the library

  require 'rubygems'
  require 'commandline/optionparser'
  include CommandLine

Using Option Parser

An option is created with the following syntax:

  opt = Option.new([options], <properties>)

The options can be :flag or :posix. :flag means that the option is a mode flag and does not take any arguments. :posix means that Option will validate the properties to ensure they are posix compliant.

An option object has six properties. Four of these properties define attributes of the object. The last two define actions that are taken when a command line is parsed.

  1. :names
  2. :arg_arity
  3. :opt_description
  4. :arg_description
  5. :opt_found
  6. :opt_not_found

It is not necessary to set values for all of these properties. Some are set automatically, as we’ll see below.

Posix

The default Option object is non-posix.

    op1  = OptionParser.new(:posix, opts)
    op2  = OptionParser.new(opts)
    op1.posix  #=> true
    op2.posix  #=> false

Mode-Flag

To create a mode flag, that is, an option that is either true or false depending if it is seen on the command line or not, we could write:

  opt_debug = Option.new(
    :names           => %w(--debug -d),       # the flag has two names
    :arg_arity       => [0,0],                # this says take no arugments
    :opt_description => "Sets debug to true",
    :arg_description => "",
    :opt_found       => true,                 # true if seen on command line
    :opt_not_found   => false                 # false if not seen on command line
  )

Now, this is a lot of work just for a common mode-flag. However, there is a shorter way:

  opt = Option.new(:flag, :names => %w(--debug -d))

When Option sees the :flag option, it makes some assignments behind the scenes and what you are left with is:

    :names           => ["--debug", "-d"]
    :arg_arity       => [0, 0]
    :opt_description => "Sets debug to true."  # debug is taken from the first name
    :arg_description => ""
    :opt_found       => true
    :opt_not_found   => false

For a common option like a mode-flag, Option will use the first option ‘word’ it finds in the :names list and use that in the automatic option text. Of course, if you don’t want any text, just set the option description to an empty string:

  :opt_description => "".

Option Arguments

If an option is not a mode flag, then it takes arguments. Most option parsers only permit a single argument per option flag. If your application needs multiple arguments, the standard method is just to repeat the option multiple times, once for each required argument. For example, if I need to pass two files to an application I would need something like:

  myapp -f file1 -f file2

But, it would be cleaner if the command line could be expressed as:

  myapp -f file1 file2

Well, no longer do you have to suffer with thirty-year old option parser technology. OptionParser permits multiple arguments per option flag and the number of arguments can be defined to be variable.

To define an option that takes 1 or more arguments, the following can be done:

  opt = Option.new(:names => "--file", :arg_arity => [1,-1])

Let’s say the option required at least two arguments, but not more than five. This is defined with:

  opt = Option.new(:names => "--file", :arg_arity => [2,5])
  OptionParser.new(opt).parse

  % myapp --file file1                    # exception raised
  % myapp --file file1 file2              # ok
  % myapp --file file1 file2 file3        # ok
  % myapp --file f1 f2 f3 f4 f5 f6        # f6 remains on the command line

This ability is handy on occassions where an option argument is ‘optional’.

  myapp --custom                 # no args, uses $HOME/.myapprc
  myapp --custom my_custom_file  # uses my_custom_file

This type of option is defined by:

  opt = Option.new(:names => "--custom", :arg_arity => [0,1])

If the :arg_arity is not satisfied, an exception is raised.

Actions

The option properties :opt_found and :opt_not_found are the source of the value returned for an option when it is parsed. These properties can be either an object or a proc/lambda. If they are an object, then the stored object is simply returned. If they are lambdas, then the stored value is the return value of the proc/lambda. So, the following will have the same result:

  opt_debug = Option.new(:flag
    :names           => %w(--debug -d),
    :opt_found       => true,
    :opt_not_found   => false
  )

  opt_debug = Option.new(:flag
    :names           => %w(--debug -d),
    :opt_found       => lambda { true },
    :opt_not_found   => lambda { false }
  )

Notice that there is no need to set an instance variable to a default value. Normally one does:

  @debug = false
  # option setup
  ... parse the commandline
  @debug = true if parse_results["--debug"]

But with OptionParser, one has the capability of doing the following:

  opt_debug = Option.new(:flag, :names => %w(--debug -d))
  ... parse the commandline
  @debug = option_data[:debug]  # value is set without need for default

  # or

  opt_debug = Option.new(:flag
    :names           => %w(--debug -d),
    :opt_found       => lambda { @debug = true },
    :opt_not_found   => lambda { @debug = false }
  )
  # do nothing, variable already set.

I find this much easier to manage than having to worry about setting default behaviour. Now that we know how to create options, let’s move on to the commandline parser.

OptionParser

Once the options are defined, we load them into an OptionParser and parse the command line. The syntax for creating an OptionParser object is:

  OptionParser.new(prop_flags, option)
  OptionParser.new(prop_flags, [options])
  OptionParser.new(option)
  OptionParser.new([options])

where the possible property flags are:

  :posix
  :unknown_options_action => :collect | :ignore | :raise

If you want to parse posix, you must specify so. OptionParser will not assume posix mode just because all of the options are posix options. This allows you to use posix only options but not require the strict parsing rules.

Below are a few examples of creating an OptionParser object:

  opt = Option.new(:flag, :names => %w(-h))
  op1 = OptionParser.new(:posix, opt)
  op2 = OptionParser.new(opt)

or

  opts = []
  opts << Option.new(:flag, :names => %w(--help h))
  opts << Option.new(:flag, :names => %w(--debug d))

Options may be added to an OptionParser by three different methods:

  # Options added as arguments during OptionParser construction
  op = OptionParser.new(opt1, opt2)
  op = OptionParser.new([opt1, opt2])

or

  # Options added in a block constructor
  op = OptionParser.new { |o| o << opts }

or

  # Options added to an existing OptionParser object
  op  = OptionParser.new
  op << opts

Parsing the Command Line

Parsing the command line is as simple as calling #parse:

  option_data = op.parse

Printing an Option Summary

A OptionParser with a complete set of options added to it defines the human interface that your application presents to a user. Therefore, the parser should be able to provide a nicely formatted summary for the user.

An example is shown below with its corresponding output:

  require 'rubygems'
  require 'commandline/optionparser'
  include CommandLine
  puts OptionParser.new { |o|
    o << Option.new(:flag, :names => %w[--debug -d])
    o << Option.new(:flag, :names => %w[--help  -h],
              :opt_description => "Prints this page.")
    o << Option.new(:names => %w[--ouput -o],
              :opt_description => "Defines the output file.",
              :arg_description => "output_file")
    o << Option.new(:names => %w[--a-long-opt --with-many-names -a -A],
              :arg_arity => [2,-1],
              :opt_description => "Your really long description here.",
              :arg_description => "file1 file2 [file3 ...]")
  }.to_s

Generates the output:

  OPTIONS

      --debug,-d
          Sets debug to true.

      --help,-h
          Prints this page.

      --ouput,-o output_file
          Defines the output file.

      --a-long-opt,--with-many-names,-a,-A file1 file2 [file3 ...]
          Your really long description here.

Option Data

The OptionData is the return value of OptionParser#parse. The parsing results for each option are accessed with the bracket notation #[].

  opt = Option.new(:posix,
                   :names => %w(-r),
                   :opt_found => OptionParser::GET_ARGS)
  od = OptionParser.new(:posix, opt).parse(["-rubygems"])
  od["-r"] #=> "ubygems"

  od = OptionParser.new(:posix, opt).parse(["-r", "ubygems"])
  od["-r"] #=> "ubygems"

OptionData behaves similar to a hash object in that the parsed option data is accessed with #[] where the key is the first item in the :names array of each option. An option cannot access its parsed values using just any of its names.

  od = OptionParser.new { |o|
    o << Option.new(:flag, :names => %w(--valid --notvalid))
    o << Option.new(:flag, :names => %w(--first --second))
  }.parse(%w(--notvalid --second))
  od["--valid"]    #=> true
  od["--first"]    #=> true
  od["--notvalid"] #=> CommandLine::OptionData::UnknownOptionError
  od["--second"]   #=> CommandLine::OptionData::UnknownOptionError

Built-in Data Handlers

OptionParser has built-in data handlers for handling common scenarios. These lambdas can save a lot of typing.

GET_ARG_ARRAY

This is useful for options that take a variable number of arguments. It returns all the arguments in an array.

  # GET_ARG_ARRAY returns all arguments in an array, even if no
  # arguments are present. This is not to be confused with the option
  # occuring multiple times on the command line.
  opt = Option.new(:names          => %w(--file),
                   :argument_arity => [0,-1],
                   :opt_found      => OptionParser::GET_ARG_ARRAY)
                   #:opt_found      => :collect)  # would this be better?
  od  = OptionParser.new(opt).parse(%w(--file))
  od["--file"]    #=> []
  od  = OptionParser.new(opt).parse(%w(--file=file))
  od["--file"]    #=> ["file"]
  od  = OptionParser.new(opt).parse(%w(--file=file1 --file file2))
  od["--file"]    #=> ["file2"]
  od  = OptionParser.new(opt).parse(%w(--file=file1 file2))
  od["--file"]    #=> ["file1", "file2"]
  od  = OptionParser.new(opt).parse(%w(--file file1 file2))
  od["--file"]    #=> ["file1", "file2"]

GET_ARGS

This is a ‘smart’ option getter. If no arguments are found, it returns true. If a single argument is found, it returns that argument. If more than one argument is found, it returns an array of those arguments.

  opt = Option.new(:names          => %w(--file),
                   :argument_arity => [0,-1],
                   :opt_found      => OptionParser::GET_ARGS)
                   #:opt_found      => :smart_collect)  # would this be better?
  od  = OptionParser.new(opt).parse(%w(--file))
  od["--file"]    #=> true
  od  = OptionParser.new(opt).parse(%w(--file=file))
  od["--file"]    #=> "file"
  od  = OptionParser.new(opt).parse(%w(--file=file1 --file file2))
  od["--file"]    #=> "file2"
  od  = OptionParser.new(opt).parse(%w(--file=file1 file2))
  od["--file"]    #=> ["file1", "file2"]
  od  = OptionParser.new(opt).parse(%w(--file file1 file2))
  od["--file"]    #=> ["file1", "file2"]

And, for those oxymoronic non-optional options:

  opt = Option.new(:names => %w(--not-really-an-option),
    :opt_not_found => OptionParser::OPT_NOT_FOUND_BUT_REQUIRED
  )
  OptionParser.new(opt).parse([])   #=> OptionParser::MissingRequiredOptionError

OptionData

We have just shown that after parsing a command line, the result of each option is found from OptionData. The values that remain on the command line are assigned to args. Other attributes of OptionData are:

  od.argv             # the original command line
  od.unknown_options  # If OptionParser was told to :collect unknown options
  od.args             # arguments not claimed by any option
  od.not_parsed       # arguments following a '--' on the command line
  od.cmd              # not yet implemented - but a cvs like command

Traditional Option Parsing Schemes

This section is a brief overview of traditional command line parsing.

Command line options traditionally occur in three flavors:

  • Unix (or POSIX.2)
  • Gnu
  • X Toolkit

Below is a summary of these schemes. (Note: I did not invent these traditional parsing conventions. Most of the information contained below was pulled from internet resources and I have quoted these resources where possible.)

Unix Style (POSIX)

The Unix style command line options are a single character preceded by a single dash (hyphen character). In general, lowercase options are preferred with their uppercase counterparts being the special case variant.

Mode Flag

If an option does not take an argument, then it is a mode-flag.

Optional Separation Between the Option Flag and Its Argument

If the option takes an argument, the argument follows it with optional white space separating the two. For example, the following forms are both valid:

  sort -k 5
  sort -k5

Grouping

A mode-flag can be grouped together with other mode-flags behind a single dash. For example:

  tar -c -v -f

is equivalent to:

  tar -cvf

If grouping is done, the last option in a group can be an option that takes an argument. For example

  sort -r -n -k 5

can be written as

  sort -rnk 5

but not

  sort -rkn 5

because the ‘5’ argument belongs to the ‘k’ option flag.

Option Parsing Termination

It is convention that a double hyphen is a signal to stop option interpretation and to read the remaining statements on the command line literally. So, a command such as:

 app -- -x -y -z

will not ‘see’ the three mode-flags. Instead, they will be treated as arguments to the application:

 #args = ["-x", "-y", "-z"]

POSIX Summary

  1. An option is a hyphen followed by a single alphanumeric character.
  2. An option may require an argument which must follow the option with an optional space in between.
      -r ubygems
      -rubygems
      -r=ubygems   # not ok. '=' is Gnu style
    
  3. Options that do not require arguments can be grouped after a hyphen.
  4. Options can appear in any order.
  5. Options can appear multiple times.
  6. Options precede other nonoption arguments. TODO: Test for this
  7. The — argument terminates options.
  8. The - option is used to represent the standard input stream.

References

www.mkssoftware.com/docs/man1/getopts.1.asp

Gnu Style

The Gnu style command line options provide support for option words (or keywords), yet still maintain compatibility with the Unix style options. The options in this style are sometimes referred to as long_options and the Unix style options as short_options. The compatibility is maintained by preceding the long_options with two dashes. The option word must be two or more characters.

Separation Between the Option Flag and Its Argument

Gnu style options cannot be grouped. For options that have an argument, the argument follows the option with either whitespace or an ’=’. For example, the following are equivalent:

  app --with-optimizer yes
  app --with-optimizer=yes

Option Parsing Termination

Similar to the Unix style double-hyphen ’- -’, the Gnu style has a triple-hyphen ’- - -’ to signal that option parsing be halted and to treat the remaining text as arguments (that is, read literally from the command line)

 app --- -x -y -z
 args = ["-x", "-y", "-z"]

Mixing Gnu and Unix Styles

The Gnu and the Unix option types can be mixed on the same commandline. The following are equivalent:

  app -a -b --with-c
  app -ab --with-c
  app -ba --with-c
  app --with-c -ab

X Toolkit Style

The X Toolkit style uses the single hyphen followed by a keyword option. This style is not compatible with the Unix or the Gnu option types. In most situations this is OK since these options will be filtered from the command line before passing them to an application.

’-’ and STDIN

It is convention that a bare hypen indicates to read from stdin.

The OptionParser Style

The CommandLine::OptionParser does not care what style you use. It is designed for maximum flexiblity so it may be used within any organiziation to meet their standards.

Multiple Option Names

OptionParser does not place restrictions on the number of options. The only restriction is that an option name begin with a hyphen ’-’. A definitely conjured example of this freedom is:

  :names => %w(
    --file --File --f --F -file -File -f -F
  )

Prefix Matching

Although not encouraged, some prefer the ability to truncate option words to their first unique match. For example, an application that support this style and accepts the following two option words:

 ["--foos", "--fbars"]

will accept any of the following as valid options

  app --fo
  app --foo
  app --foos

for the "—foos" option flag since it can be determined that "—fo" will only match "—foos" and not "—fbars".

Repeated Arguments

A common question is how an option parser should respond when an option is specified on the command line multiple times. This is true for mode flags, but especially true for options that require an argument, For example, what should happen when the following is given:

  app -f file1 -f file2

Should the parser flag this as an error or should it accept both arguments.

OptionParser gives you the choice of whether it raises an exception when an option is seen more than once, or it just passes the data onto the user.

How the data is handled is up to the user, but it typically boils down to either Append, Replace or Raise. This is described in more detail in the usage section.

CVS Mode

CVS is a common application with a unique command line structure. The cvs application commandline can be given options, but requires a command. This command can also be given options. This means that there are two sets of options, one set for the cvs application and one set for the cvs-command. Some example formats are:

  cvs [cvs-options]
  cvs [cvs-options] command [command-options-and-arguments]

  cvs -r update
  cvs -r update .
  cvs edit -p file

To handle this, the first unclaimed argument is treated as a command and the options and option-arguments that follow belong to that command. More on how this is handled in the usage section.

Option Grouping

A conflict can occur where a grouping of single letter Unix options has the value as a word option preceded by a single dash. For this reason, it is customary to use the double-dash notation for word options. Unless double-dashes are enforced for word options, OptionParser will check for possible name conflicts and raise an exception if it finds one.