Copyright (c) 2012-2016 Hyperion Entertainment and contributors.
AmigaOS Manual: ARexx Parsing
Parsing extracts substrings from a string and assigns them to variables. Parsing is performed using the PARSE instruction or its variants ARG and PULL. The operation input is called the parse string and comes from several sources, including argument strings, expressions, or the console.
String-manipulation functions like SUBSTR() and INDEX() may be used for parsing, but the PARSE instruction statement is more efficient, especially if extracting many fields from a string.
Parsing is controlled by a template, a group of tokens that specifies both the variables to be given values and the way to determine the value strings. The way tokens are arranged in the template determines whether the token is one of two basic template objects: a marker or a target.
- Determines the starting and ending position in the parse string or the scan position.
- A symbol assigned a value by the parsing operation. That value is the substring determined by the marker positions.
There are three types of marker objects:
- Absolute markers
- Actual index position in the parse string.
- Relative markers
- A positive or negative offset from the current position.
- Pattern markers
- Matches the pattern against the parse string beginning at the current scan position.
Targets, like markers, can affect the scan position if value strings are being extracted by tokenization. Parsing by tokenization extracts words (tokens) from the parse string and is used whenever a target is followed immediately by another target. During tokenization the current scan position is advanced past any blanks to the start of the next word. The ending index is the position just past the end of the word and the value string has neither leading nor trailing blanks.
Targets are specified by variable symbols. The place holder, denoted by a period (.), is a special type of target and behaves like a normal target except that it does not have an assigned value.
Each template object is specified by one or more tokens:
- A symbol may specify a target or a marker. It's a marker if it follows an operator (+, - or =) and the symbol value is used as an absolute or relative position. Symbols enclosed in parentheses specify pattern markers, and the symbol value is used as the pattern string. It specifies a target if neither of the preceding cases applies and the symbol is variable. Fixed symbols always specify absolute markers and must be whole numbers. The only exception is the place holder (.) target.
- A string always represents a pattern marker.
- A symbol enclosed in parentheses is a pattern marker and the value of the symbol is used as the pattern string. While the symbol may be either fixed or variable, it will usually be a variable. A fixed pattern could be given more simply as a string.
- The three operators (+, - and =) are valid within a template and must be followed by a fixed or variable symbol. The value of the symbol is used as a marker and must represent a whole number. The "+" and "-" operators signify a relative marker, whose value is negated by the "-" operator. The "=" operator indicates an absolute marker and is optional if the marker is defined by a fixed symbol.
- The comma (,) marks the end of a template. It is also used as a separator when multiple templates are provided with an instruction. The interpreter obtains a new parse string before processing each succeeding template. For some source options, the new string will be identical to the previous one. The ARG, EXTERNAL and PULL options will generally supply a different string, as will the VAR option if the variable has been modified.
The ARexx interface command parser has been generalized to recognize double-delimiter sequences within a (quoted) string file. The quoting convention is convenient for short programs, but it is easy to run out of quoting levels in longer programs. Single and double-quotes within a REXX program are equivalent, but the external environment may make a distinction.
AmigaDOS uses double-quotes. Strings entered from a Shell must begin with a double-quote, especially if you wish to include semicolons. For example:
RX "SAY 'It''s possible, indeed; you ain''t seen nothin'' yet!' " -> It's possible, indeed; you ain't seen nothin' yet! RX "SAY '""Hello!""'"-> "Hello!"
The Scanning Process
Scan positions are expressed as an index in the parse string and can range from 1 (the start of the string) to the length of the string plus 1 (the end).
The substring specified by two scan indices includes the characters from the starting position up to, but not including, the ending position. For example, the indices 1 and 10 specify characters 1-9 in the parse string. If the second scan index is less than or equal to the first, the remainder of the parse string is used as the substring. This means that a template specification like:
PARSE ARG 1 all 1 first second
will assign the entire parse string to the variable ALL. If the current scan index is already at the end of the parse string, the remainder is the null string.
When a pattern marker is matched against the parse string, the marker position is the index of the first character of the matched pattern or the end of the string if no match was found. The pattern is removed from the string whenever a match is found. This is the only operation that modifies the parse string during the parsing process.
Templates are scanned from left to right with the initial scan index set to 1. The scan position is updated each time a marker object is encountered, according to the type and value of the marker.
Whenever a target object is found, the assigned value is determined by examining the next template object. If the next object is another target, the value string is determined by tokenizing the parse string. Otherwise, the current scan position is used as the start of the value string and the position specified by the following marker is used as the end point.
The scan continues until all of the objects in the template have been used. Every target will be assigned a value. Once the parse string has been exhausted, the null string is assigned to any remaining targets.
Parsing by Tokenization
Computer programs frequently split a string into its component words or tokens. This is accomplished with a template consisting entirely of variables (targets).
/*Assume "hammer 1 each $600.00" was entered*/ PULL item qty units cost .
In this example the input line from the PULL instruction is split into words and assigned to the variables in the template. The variable item receives the value "hammer", qty is set to "1", units is set to "each" and cost gets the value "$600.00". The final place holder (.) is given a null value, since there are only four words in the input. However, it forces the preceding variable cost to be given a tokenized value. If the place holder were omitted, the remainder of the parse string would be assigned to cost, which would then have a leading blank.
answer = "Only Amiga makes it possible." DO forever PARSE VAR answer first answer /*Place first word into 'first' and the rest into 'answer'.*/ IF first =='' THEN LEAVE /*Stop if there are no more words*/ SAY answer END
The first word of a string is removed and the remainder is placed back in the string. The process continues until no more words are extracted. The output is:
Amiga makes it possible. makes it possible. it possible. possible.
Parsing by Pattern
Pattern markers extract the desired fields. The "pattern" in this case is very simple - a single character - but could be an arbitrary string of any length. This form of parsing is useful whenever delimiter characters are present in the parse string.
/*Assume an argument string "12, 35.5,1" */ ARG hours ',' rate ',' withhold
The pattern is actually removed from the parse string when a match is found. If the parse string is scanned again from the beginning, the length and structure of the string may be different than at the start of the parsing process. The original source of the string, however, is never modified.
Parsing by Positional Markers
Parsing with positional markers is used whenever the files of interest are known to be in certain positions in a string.
/* Records look like: */ /* Start: 1-5 */ /* Length: 6-10 */ /* Name: @ (start,length)*/ PARSE VALUE record WITH 1 start +5 length +5 =start name +length
The records being processed contain a variable length field. The starting position and length of the field are given in the first part of the record with a variable positional marker used to extract the desired field.
The "=start" sequence is an absolute marker whose value is the position placed in the variable start earlier in the scan. The "+length" sequence supplies the effective length of the field.
More than one template can be specified with an instruction by separating the templates with a comma. The ARG instruction (or PARSE UPPER ARG) accesses the argument strings provided when the program was called. Each template accesses the succeeding argument string. For example:
/*Assume arguments are ('one two',12,sort)*/ ARG first second,amount,action,option
The first template consists of the variables first and second, which are set to the values "one" and "two". In the next two templates, amount gets the value "12" and action is set to "SORT". The last template consists of the variable "option", which is set to the null string, since only three arguments were available.
When multiple templates are used with the EXTERNAL or PULL source options, each additional template requests an additional line of input from the user:
/*Read last, first, and middle names and ssn*/ PULL last ',' first middle,ssn
Two lines of input are read. The first input line is expected to have three words which are assigned to the variables "last", "first", and "middle": The first variable is followed by a comma. The entire second input line is assigned to the variable "ssn".
Multiple templates can be useful even with a source option that returns the identical parse string. If the first template included pattern markers that altered the parse string, the subsequent templates could still access the original string. Subsequent parse strings obtained from the VALUE source do not cause the expression to be re-evaluated, but only retrieve the prior result.
Command-line Argument Parsing
AmigaDOS uses spaces as command-line argument separators. To supply arguments containing spaces, such as paths, double quotation marks must be used to keep the Shell from interpreting the parts of the argument as separate arguments. It is advised to follow the same policy when supplying arguments to ARexx programs.
In ARexx you can use the following method to parse command-line:
/* Command-line parser */ PARSE ARG commandline count = 0 DO WHILE LENGTH( commandline ) > 0 commandline = STRIP( commandline, 'B' ) count = count + 1 IF LEFT( commandline, 1 ) = '"' THEN DO PARSE VAR commandline '"'parameter.count'"' commandline END ELSE DO PARSE VAR commandline parameter.count commandline END END parameter.0 = count DROP commandline count
The program stores the supplied command-line parameters to a stem variable parameter. The total number of parameters is stored in parameter.0. To list the parameters, just print the contents of the parameter stem:
/* Print parameters */ DO counter = 1 TO parameter.0 SAY 'Parameter ' || counter || ': »' || parameter.counter || '»' END
If you need the name of your ARexx script, for example for displaying error messages, you can obtain the name with parse source instruction:
/* Get name of your ARexx script */ PARSE SOURCE info.1 info.2 info.3 info.4 info.5 info.6 info.7 SAY 'Script name: ' || info.3 SAY 'Script name (full path): ' || info.4