Copyright (c) Hyperion Entertainment and contributors.
AmigaOS Manual: ARexx Parsing
Parsing extracts substrings from a string and assigns them to variables. Parsing is performed using the PARSE instruction or its variants ARG and PULL. The operation input is called the parse string and comes from several sources, including argument strings, expressions, or the console.
String-manipulation functions like SUBSTR() and INDEX() may be used for parsing, but the PARSE instruction statement is more efficient, especially if extracting many fields from a string.
Templates
Parsing is controlled by a template, a group of tokens that specifies bot the variables to be given values and the way to determine the value strings. The way tokens are arranged in the template determines whether the token is one of two basic template objects: a marker of a target.
- Marker
- Determines the starting and ending position in the parse string or the scan position.
- Target
- A symbol assigned a value by the parsing operation. That value is the substring determined by the marker positions.
Markers
There are three types of marker objects:
- Absolute markers
- Actual index position in the parse string.
- Relative markers
- A positive or negative offset from the current position.
- Pattern markers
- Matches the pattern against the parse string beginning at the current scan position.
Targets
Targets, like markers, can affect the scan position if value strings are begin extracted by tokenization. Parsing by tokenization extracts words (tokens) from the parse string and is used whenever a target is followed immediately by another target. During tokenization the current scan position is advanced past any blanks to the start of the next word. The ending index is the position just past the end of the word and the value string has neither leading nor trailing blanks.
Targets are specified by variable symbols. The place holder, denoted by a period (.), is a special type of target and behaves like a normal target except that it does not have an assigned value.
Template Objects
Each template object is specified by one or more tokens:
Symbols
A symbol may specify a target or a marker. It's a marker if it follows an operator (+, - or =) and the symbol value is used as an absolute or relative position. Symbols enclosed in parentheses specify pattern markers, and the symbol value is used as the pattern string. It specifies a target if neither of the preceding cases apply and the symbol is variable. Fixed symbols always specify absolute markers and must be whole numbers. The only exception is the place holder (.) target.
- Strings
- A string always represents a pattern marker.
- Parentheses
- A symbol enclosed in parentheses is a pattern marker and the value of the symbol is used as the pattern string. While the symbol may be either fixed or variable, it will usually be a variable. A fixed pattern could be given more simply as a string.
- Operators
- The three operators (+, - and =) are valid within a template and must be followed by a fixed or variable symbol. The value of the symbol is used as a marker and must represent a whole number. The "+" and "-" operators signify a relative marker, whose value is negated by the "-" operator. The "=" operator indicates an absolute marker and is optional if the marker is defined by a fixed symbol.
- Commas
- The comma (,) marks the end of a template. It is also used as a separator when multiple templates are provided with an instruction. The interpreter obtains a new parse string before processing each succeeding template. For some source options, the new string will be identical to the previous one. The ARG, EXTERNAL and PULL options will generally supply a different string, as will the VAR option if the variable has been modified.
The ARexx interface command parser has been generalized to recognize double-delimiter sequences within a (quoted) string file. The quoting convention is convenient for short programs, but it is easy to run out of quoting levels in longer programs. Single and double-quotes within a REXX program are equivalent, but the external environment may make a distinction.
AmigaDOS uses double-quotes. Strings entered from a Shell must begin with a double-quote, especially if you wish to include semicolons. For example:
RX "SAY 'It''s possible, indeed; you ain''t seen nothin'' yet!' " -> It's possible, indeed; you ain't seen nothin' yet! RX "SAY '""Hello!""'"-> "Hello!"