2.4.5.3. - Regular Expression Functions
Regular expressions are sequences of special characters
for searching for patterns in strings.
spec implements extended regular expression using
the C library
regcomp()
and
regexec()
functions,
which have a somewhat platform-dependent implementation.
See the regular expression man page (
man 7 regex
on
Linux
and
man re_format
on OS X) for details of regular expression syntax.
The names and usage of the following spec functions resemble those used in
the UNIX
awk
(or
gawk)
utility.
(These functions added in spec release 6.03.04.)
rsplit(str, arr, regex)
-
Similar to
split()
above, but the optional delimiter argument can be a regular expression. The stringstr
is split into elements that are delimited by the regular expressionregex
and the resulting substrings are assigned to successive elements of the arrayarr
, starting with element 0. The delimiting characters are eliminated. Returns the number of elements assigned. sub(regex, sub, str)
-
Replaces the first instance of the regular expression
regex
in the source stringstr
with the substitute stringsub
. An&
in the substitute string is replaced with the text that was matched by the regular expression. A\&
(which must be typed as"\\&"
) will produce a literal&
. Returns the modified string. gsub(regex, sub,
Ostr
)-
Replaces all instances of the regular expression
regex
in the source stringstr
with the substitute stringsub
. An&
in the substitute string is replaced with the text that was matched by the regular expression. A\&
(which must be typed as"\\&"
) will produce a literal&
. Returns the modified string. gensub(regex, sub, which, str)
-
Replaces instances of the regular expression
regex
in the source stringstr
with the substitute stringsub
based on the value ofwhich
. Ifwhich
is a string beginning withG
org
(for global), all instances that match are replaced. Otherwise,which
is a positive integer that indicates which match to replace. For example, a2
means replace the second match.
In addition, the substitute text may contain the sequences\N
(which must be typed as"\\N"
), whereN
is a digit from 0 to 9. That sequence will be replaced with the text that matches theN
th parenthesized subexpression inregex
. A\0
is replaced with the text that matches the entire regular expression. Returns the modified string. match(str, regex [, arr])
-
Returns the position in the source string
str
that matches the regular expressionregex
. The first position is 1. Returns 0 if there is no match or -1 if the regular expression is invalid. If the associative arrayarr
is provided, its contents are cleared and new elements are assigned based on the consecutive matching parenthesized subexpressions inregex
. The zeroth element,arr\fC[0]
, is assigned the entire matching text, whilearr
is assigned the starting position of the match and[0]["start"]
arr
is assigned the length of the match. Elements from 1 onward are assigned matches, positions and lengths of the corresponding matching parenthesized subexpressions in[0]["length"]
regex
.