|
|
|
@ -22,13 +22,13 @@ matching character streams; if a byte regexp is used with a character
|
|
|
|
|
string, it matches bytes in the UTF-8 encoding of the string.
|
|
|
|
|
|
|
|
|
|
Regular expressions can be compiled into a @deftech{regexp value} for
|
|
|
|
|
repeated matches. The @scheme[regexp] and @scheme[byte-regexp]
|
|
|
|
|
repeated matches. The @racket[regexp] and @racket[byte-regexp]
|
|
|
|
|
procedures convert a string or byte string (respectively) into a
|
|
|
|
|
regexp value using one syntax of regular expressions that is most
|
|
|
|
|
compatible to @exec{egrep}. The @scheme[pregexp] and
|
|
|
|
|
@scheme[byte-pregexp] procedures produce a regexp value using a
|
|
|
|
|
compatible to @exec{egrep}. The @racket[pregexp] and
|
|
|
|
|
@racket[byte-pregexp] procedures produce a regexp value using a
|
|
|
|
|
slightly different syntax of regular expressions that is more
|
|
|
|
|
compatible with Perl. In addition, Scheme constants written with
|
|
|
|
|
compatible with Perl. In addition, Racket constants written with
|
|
|
|
|
@litchar{#rx} or @litchar{#px} (see @secref["reader"]) produce
|
|
|
|
|
compiled regexp values.
|
|
|
|
|
|
|
|
|
@ -43,22 +43,22 @@ The following syntax specifications describe the content of a string
|
|
|
|
|
that represents a regular expression. The syntax of the corresponding
|
|
|
|
|
string may involve extra escape characters. For example, the regular
|
|
|
|
|
expression @litchar{(.*)\1} can be represented with the string
|
|
|
|
|
@scheme["(.*)\\1"] or the regexp constant @scheme[#rx"(.*)\\1"]; the
|
|
|
|
|
@racket["(.*)\\1"] or the regexp constant @racket[#rx"(.*)\\1"]; the
|
|
|
|
|
@litchar{\} in the regular expression must be escaped to include it
|
|
|
|
|
in a string or regexp constant.
|
|
|
|
|
|
|
|
|
|
The @scheme[regexp] and @scheme[pregexp] syntaxes share a common core:
|
|
|
|
|
The @racket[regexp] and @racket[pregexp] syntaxes share a common core:
|
|
|
|
|
|
|
|
|
|
@common-table
|
|
|
|
|
|
|
|
|
|
The following completes the grammar for @scheme[regexp], which treats
|
|
|
|
|
The following completes the grammar for @racket[regexp], which treats
|
|
|
|
|
@litchar["{"] and @litchar["}"] as literals, @litchar{\} as a
|
|
|
|
|
literal within ranges, and @litchar{\} as a literal producer
|
|
|
|
|
outside of ranges.
|
|
|
|
|
|
|
|
|
|
@rx-table
|
|
|
|
|
|
|
|
|
|
The following completes the grammar for @scheme[pregexp], which uses
|
|
|
|
|
The following completes the grammar for @racket[pregexp], which uses
|
|
|
|
|
@litchar["{"] and @litchar["}"] bounded repetition and uses
|
|
|
|
|
@litchar{\} for meta-characters both inside and outside of ranges.
|
|
|
|
|
|
|
|
|
@ -101,26 +101,26 @@ arbitrarily large sequence).
|
|
|
|
|
|
|
|
|
|
@defproc[(regexp? [v any/c]) boolean?]{
|
|
|
|
|
|
|
|
|
|
Returns @scheme[#t] if @scheme[v] is a @tech{regexp value} created by
|
|
|
|
|
@scheme[regexp] or @scheme[pregexp], @scheme[#f] otherwise.}
|
|
|
|
|
Returns @racket[#t] if @racket[v] is a @tech{regexp value} created by
|
|
|
|
|
@racket[regexp] or @racket[pregexp], @racket[#f] otherwise.}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@defproc[(pregexp? [v any/c]) boolean?]{
|
|
|
|
|
|
|
|
|
|
Returns @scheme[#t] if @scheme[v] is a @tech{regexp value} created by
|
|
|
|
|
@scheme[pregexp] (not @scheme[regexp]), @scheme[#f] otherwise.}
|
|
|
|
|
Returns @racket[#t] if @racket[v] is a @tech{regexp value} created by
|
|
|
|
|
@racket[pregexp] (not @racket[regexp]), @racket[#f] otherwise.}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@defproc[(byte-regexp? [v any/c]) boolean?]{
|
|
|
|
|
|
|
|
|
|
Returns @scheme[#t] if @scheme[v] is a @tech{regexp value} created by
|
|
|
|
|
@scheme[byte-regexp] or @scheme[byte-pregexp], @scheme[#f] otherwise.}
|
|
|
|
|
Returns @racket[#t] if @racket[v] is a @tech{regexp value} created by
|
|
|
|
|
@racket[byte-regexp] or @racket[byte-pregexp], @racket[#f] otherwise.}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@defproc[(byte-pregexp? [v any/c]) boolean?]{
|
|
|
|
|
|
|
|
|
|
Returns @scheme[#t] if @scheme[v] is a @tech{regexp value} created by
|
|
|
|
|
@scheme[byte-pregexp] (not @scheme[byte-regexp]), @scheme[#f]
|
|
|
|
|
Returns @racket[#t] if @racket[v] is a @tech{regexp value} created by
|
|
|
|
|
@racket[byte-pregexp] (not @racket[byte-regexp]), @racket[#f]
|
|
|
|
|
otherwise.}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -134,7 +134,7 @@ is used multiple times, it is faster to compile the string once to a
|
|
|
|
|
@tech{regexp value} and use it for repeated matches instead of using the
|
|
|
|
|
string each time.
|
|
|
|
|
|
|
|
|
|
The @scheme[object-name] procedure returns
|
|
|
|
|
The @racket[object-name] procedure returns
|
|
|
|
|
the source string for a @tech{regexp value}.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
@ -144,10 +144,10 @@ the source string for a @tech{regexp value}.
|
|
|
|
|
|
|
|
|
|
@defproc[(pregexp [string string?]) pregexp?]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp], except that it uses a slightly different syntax
|
|
|
|
|
Like @racket[regexp], except that it uses a slightly different syntax
|
|
|
|
|
(see @secref["regexp-syntax"]). The result can be used with
|
|
|
|
|
@scheme[regexp-match], etc., just like the result from
|
|
|
|
|
@scheme[regexp].
|
|
|
|
|
@racket[regexp-match], etc., just like the result from
|
|
|
|
|
@racket[regexp].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(pregexp "ap*le")
|
|
|
|
@ -160,7 +160,7 @@ Takes a byte-string representation of a regular expression (using the
|
|
|
|
|
syntax in @secref["regexp-syntax"]) and compiles it into a
|
|
|
|
|
byte-@tech{regexp value}.
|
|
|
|
|
|
|
|
|
|
The @scheme[object-name] procedure
|
|
|
|
|
The @racket[object-name] procedure
|
|
|
|
|
returns the source byte string for a @tech{regexp value}.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
@ -171,10 +171,10 @@ returns the source byte string for a @tech{regexp value}.
|
|
|
|
|
|
|
|
|
|
@defproc[(byte-pregexp [bstr bytes?]) byte-pregexp?]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[byte-regexp], except that it uses a slightly different
|
|
|
|
|
Like @racket[byte-regexp], except that it uses a slightly different
|
|
|
|
|
syntax (see @secref["regexp-syntax"]). The result can be used with
|
|
|
|
|
@scheme[regexp-match], etc., just like the result from
|
|
|
|
|
@scheme[byte-regexp].
|
|
|
|
|
@racket[regexp-match], etc., just like the result from
|
|
|
|
|
@racket[byte-regexp].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(byte-pregexp #"ap*le")
|
|
|
|
@ -183,11 +183,11 @@ syntax (see @secref["regexp-syntax"]). The result can be used with
|
|
|
|
|
@defproc*[([(regexp-quote [str string?] [case-sensitive? any/c #t]) string?]
|
|
|
|
|
[(regexp-quote [bstr bytes?] [case-sensitive? any/c #t]) bytes?])]{
|
|
|
|
|
|
|
|
|
|
Produces a string or byte string suitable for use with @scheme[regexp]
|
|
|
|
|
to match the literal sequence of characters in @scheme[str] or
|
|
|
|
|
sequence of bytes in @scheme[bstr]. If @scheme[case-sensitive?] is
|
|
|
|
|
true, the resulting regexp matches letters in @scheme[str] or
|
|
|
|
|
@scheme[bytes] case-insensitively, otherwise it matches
|
|
|
|
|
Produces a string or byte string suitable for use with @racket[regexp]
|
|
|
|
|
to match the literal sequence of characters in @racket[str] or
|
|
|
|
|
sequence of bytes in @racket[bstr]. If @racket[case-sensitive?] is
|
|
|
|
|
true, the resulting regexp matches letters in @racket[str] or
|
|
|
|
|
@racket[bytes] case-insensitively, otherwise it matches
|
|
|
|
|
case-sensitively.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
@ -198,7 +198,7 @@ case-sensitively.
|
|
|
|
|
@defproc[(regexp-max-lookbehind [pattern (or/c regexp? byte-regexp?)])
|
|
|
|
|
exact-nonnegative-integer?]{
|
|
|
|
|
|
|
|
|
|
Returns the maximum number of bytes that @scheme[pattern] may consult
|
|
|
|
|
Returns the maximum number of bytes that @racket[pattern] may consult
|
|
|
|
|
before the starting position of a match to determine the match. For
|
|
|
|
|
example, the pattern @litchar{(?<=abc)d} consults three bytes
|
|
|
|
|
preceding a matching @litchar{d}, while @litchar{e(?<=a..)d} consults
|
|
|
|
@ -220,93 +220,93 @@ the start of the input or of a line.}
|
|
|
|
|
(or/c #f (cons/c string? (listof (or/c string? #f))))
|
|
|
|
|
(or/c #f (cons/c bytes? (listof (or/c bytes? #f)))))]{
|
|
|
|
|
|
|
|
|
|
Attempts to match @scheme[pattern] (a string, byte string, @tech{regexp
|
|
|
|
|
value}, or byte-@tech{regexp value}) once to a portion of @scheme[input]. The
|
|
|
|
|
matcher finds a portion of @scheme[input] that matches and is closest
|
|
|
|
|
to the start of the input (after @scheme[start-pos]).
|
|
|
|
|
Attempts to match @racket[pattern] (a string, byte string, @tech{regexp
|
|
|
|
|
value}, or byte-@tech{regexp value}) once to a portion of @racket[input]. The
|
|
|
|
|
matcher finds a portion of @racket[input] that matches and is closest
|
|
|
|
|
to the start of the input (after @racket[start-pos]).
|
|
|
|
|
|
|
|
|
|
The optional @scheme[start-pos] and @scheme[end-pos] arguments select
|
|
|
|
|
a portion of @scheme[input] for matching; the default is the entire
|
|
|
|
|
string or the stream up to an end-of-file. When @scheme[input] is a
|
|
|
|
|
string, @scheme[start-pos] is a character position; when
|
|
|
|
|
@scheme[input] is a byte string, then @scheme[start-pos] is a byte
|
|
|
|
|
position; and when @scheme[input] is an input port, @scheme[start-pos]
|
|
|
|
|
The optional @racket[start-pos] and @racket[end-pos] arguments select
|
|
|
|
|
a portion of @racket[input] for matching; the default is the entire
|
|
|
|
|
string or the stream up to an end-of-file. When @racket[input] is a
|
|
|
|
|
string, @racket[start-pos] is a character position; when
|
|
|
|
|
@racket[input] is a byte string, then @racket[start-pos] is a byte
|
|
|
|
|
position; and when @racket[input] is an input port, @racket[start-pos]
|
|
|
|
|
is the number of bytes to skip before starting to match. The
|
|
|
|
|
@scheme[end-pos] argument can be @scheme[#f], which corresponds to the
|
|
|
|
|
@racket[end-pos] argument can be @racket[#f], which corresponds to the
|
|
|
|
|
end of the string or the end-of-file in the stream; otherwise, it is a
|
|
|
|
|
character or byte position, like @scheme[start-pos]. If @scheme[input]
|
|
|
|
|
character or byte position, like @racket[start-pos]. If @racket[input]
|
|
|
|
|
is an input port, and if the end-of-file is reached before
|
|
|
|
|
@scheme[start-pos] bytes are skipped, then the match fails.
|
|
|
|
|
@racket[start-pos] bytes are skipped, then the match fails.
|
|
|
|
|
|
|
|
|
|
In @scheme[pattern], a start-of-string @litchar{^} refers to the first
|
|
|
|
|
position of @scheme[input] after @scheme[start-pos], assuming that
|
|
|
|
|
@scheme[input-prefix] is @scheme[#""]. The end-of-input @litchar{$}
|
|
|
|
|
refers to the @scheme[end-pos]th position or (in the case of an input
|
|
|
|
|
In @racket[pattern], a start-of-string @litchar{^} refers to the first
|
|
|
|
|
position of @racket[input] after @racket[start-pos], assuming that
|
|
|
|
|
@racket[input-prefix] is @racket[#""]. The end-of-input @litchar{$}
|
|
|
|
|
refers to the @racket[end-pos]th position or (in the case of an input
|
|
|
|
|
port) the end of file, whichever comes first, assuming that
|
|
|
|
|
@scheme[output-prefix] is @scheme[#f].
|
|
|
|
|
@racket[output-prefix] is @racket[#f].
|
|
|
|
|
|
|
|
|
|
The @scheme[input-prefix] specifies bytes that effectively precede
|
|
|
|
|
@scheme[input] for the purposes of @litchar{^} and other look-behind
|
|
|
|
|
matching. For example, a @scheme[#""] prefix means that @litchar{^}
|
|
|
|
|
matches at the beginning of the stream, while a @scheme[#"\n"]
|
|
|
|
|
@scheme[input-prefix] means that a start-of-line @litchar{^} can match
|
|
|
|
|
The @racket[input-prefix] specifies bytes that effectively precede
|
|
|
|
|
@racket[input] for the purposes of @litchar{^} and other look-behind
|
|
|
|
|
matching. For example, a @racket[#""] prefix means that @litchar{^}
|
|
|
|
|
matches at the beginning of the stream, while a @racket[#"\n"]
|
|
|
|
|
@racket[input-prefix] means that a start-of-line @litchar{^} can match
|
|
|
|
|
the beginning of the input, while a start-of-file @litchar{^} cannot.
|
|
|
|
|
|
|
|
|
|
If the match fails, @scheme[#f] is returned. If the match succeeds, a
|
|
|
|
|
list containing strings or byte string, and possibly @scheme[#f], is
|
|
|
|
|
returned. The list contains strings only if @scheme[input] is a string
|
|
|
|
|
and @scheme[pattern] is not a byte regexp. Otherwise, the list
|
|
|
|
|
If the match fails, @racket[#f] is returned. If the match succeeds, a
|
|
|
|
|
list containing strings or byte string, and possibly @racket[#f], is
|
|
|
|
|
returned. The list contains strings only if @racket[input] is a string
|
|
|
|
|
and @racket[pattern] is not a byte regexp. Otherwise, the list
|
|
|
|
|
contains byte strings (substrings of the UTF-8 encoding of
|
|
|
|
|
@scheme[input], if @scheme[input] is a string).
|
|
|
|
|
@racket[input], if @racket[input] is a string).
|
|
|
|
|
|
|
|
|
|
The first [byte] string in a result list is the portion of
|
|
|
|
|
@scheme[input] that matched @scheme[pattern]. If two portions of
|
|
|
|
|
@scheme[input] can match @scheme[pattern], then the match that starts
|
|
|
|
|
@racket[input] that matched @racket[pattern]. If two portions of
|
|
|
|
|
@racket[input] can match @racket[pattern], then the match that starts
|
|
|
|
|
earliest is found.
|
|
|
|
|
|
|
|
|
|
Additional [byte] strings are returned in the list if @scheme[pattern]
|
|
|
|
|
Additional [byte] strings are returned in the list if @racket[pattern]
|
|
|
|
|
contains parenthesized sub-expressions (but not when the open
|
|
|
|
|
parenthesis is followed by @litchar{?:}). Matches for the
|
|
|
|
|
sub-expressions are provided in the order of the opening parentheses
|
|
|
|
|
in @scheme[pattern]. When sub-expressions occur in branches of an
|
|
|
|
|
in @racket[pattern]. When sub-expressions occur in branches of an
|
|
|
|
|
@litchar{|} ``or'' pattern, in a @litchar{*} ``zero or more''
|
|
|
|
|
pattern, or other places where the overall pattern can succeed without
|
|
|
|
|
a match for the sub-expression, then a @scheme[#f] is returned for the
|
|
|
|
|
a match for the sub-expression, then a @racket[#f] is returned for the
|
|
|
|
|
sub-expression if it did not contribute to the final match. When a
|
|
|
|
|
single sub-expression occurs within a @litchar{*} ``zero or more''
|
|
|
|
|
pattern or other multiple-match positions, then the rightmost match
|
|
|
|
|
associated with the sub-expression is returned in the list.
|
|
|
|
|
|
|
|
|
|
If the optional @scheme[output-port] is provided as an output port,
|
|
|
|
|
the part of @scheme[input] from its beginning (not @scheme[start-pos])
|
|
|
|
|
that precedes the match is written to the port. All of @scheme[input]
|
|
|
|
|
up to @scheme[end-pos] is written to the port if no match is
|
|
|
|
|
found. This functionality is most useful when @scheme[input] is an
|
|
|
|
|
If the optional @racket[output-port] is provided as an output port,
|
|
|
|
|
the part of @racket[input] from its beginning (not @racket[start-pos])
|
|
|
|
|
that precedes the match is written to the port. All of @racket[input]
|
|
|
|
|
up to @racket[end-pos] is written to the port if no match is
|
|
|
|
|
found. This functionality is most useful when @racket[input] is an
|
|
|
|
|
input port.
|
|
|
|
|
|
|
|
|
|
When matching an input port, a match failure reads up to
|
|
|
|
|
@scheme[end-pos] bytes (or end-of-file), even if @scheme[pattern]
|
|
|
|
|
@racket[end-pos] bytes (or end-of-file), even if @racket[pattern]
|
|
|
|
|
begins with a start-of-string @litchar{^}; see also
|
|
|
|
|
@scheme[regexp-try-match]. On success, all bytes up to and including
|
|
|
|
|
@racket[regexp-try-match]. On success, all bytes up to and including
|
|
|
|
|
the match are eventually read from the port, but matching proceeds by
|
|
|
|
|
first peeking bytes from the port (using @scheme[peek-bytes-avail!]),
|
|
|
|
|
first peeking bytes from the port (using @racket[peek-bytes-avail!]),
|
|
|
|
|
and then (re-)reading matching bytes to discard them after the match
|
|
|
|
|
result is determined. Non-matching bytes may be read and discarded
|
|
|
|
|
before the match is determined. The matcher peeks in blocking mode
|
|
|
|
|
only as far as necessary to determine a match, but it may peek extra
|
|
|
|
|
bytes to fill an internal buffer if immediately available (i.e.,
|
|
|
|
|
without blocking). Greedy repeat operators in @scheme[pattern], such
|
|
|
|
|
without blocking). Greedy repeat operators in @racket[pattern], such
|
|
|
|
|
as @litchar{*} or @litchar{+}, tend to force reading the entire
|
|
|
|
|
content of the port (up to @scheme[end-pos]) to determine a match.
|
|
|
|
|
content of the port (up to @racket[end-pos]) to determine a match.
|
|
|
|
|
|
|
|
|
|
If the input port is read simultaneously by another thread, or if the
|
|
|
|
|
port is a custom port with inconsistent reading and peeking procedures
|
|
|
|
|
(see @secref["customport"]), then the bytes that are peeked and
|
|
|
|
|
used for matching may be different than the bytes read and discarded
|
|
|
|
|
after the match completes; the matcher inspects only the peeked
|
|
|
|
|
bytes. To avoid such interleaving, use @scheme[regexp-match-peek]
|
|
|
|
|
(with a @scheme[progress-evt] argument) followed by
|
|
|
|
|
@scheme[port-commit-peeked].
|
|
|
|
|
bytes. To avoid such interleaving, use @racket[regexp-match-peek]
|
|
|
|
|
(with a @racket[progress-evt] argument) followed by
|
|
|
|
|
@racket[port-commit-peeked].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-match #rx"x." "12x4x6")
|
|
|
|
@ -329,27 +329,27 @@ bytes. To avoid such interleaving, use @scheme[regexp-match-peek]
|
|
|
|
|
(listof string?)
|
|
|
|
|
(listof bytes?))]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match], but the result is a list of strings or
|
|
|
|
|
Like @racket[regexp-match], but the result is a list of strings or
|
|
|
|
|
byte strings corresponding to a sequence of matches of
|
|
|
|
|
@scheme[pattern] in @scheme[input]. (Unlike @scheme[regexp-match],
|
|
|
|
|
results for parenthesized sub-patterns in @scheme[pattern] are not
|
|
|
|
|
@racket[pattern] in @racket[input]. (Unlike @racket[regexp-match],
|
|
|
|
|
results for parenthesized sub-patterns in @racket[pattern] are not
|
|
|
|
|
returned.)
|
|
|
|
|
|
|
|
|
|
The @scheme[pattern] is used in order to find matches, where each
|
|
|
|
|
The @racket[pattern] is used in order to find matches, where each
|
|
|
|
|
match attempt starts at the end of the last match, and @litchar{^} is
|
|
|
|
|
allowed to match the beginning of the input (if @scheme[input-prefix]
|
|
|
|
|
is @scheme[#""]) only for the first match. Empty matches are handled
|
|
|
|
|
allowed to match the beginning of the input (if @racket[input-prefix]
|
|
|
|
|
is @racket[#""]) only for the first match. Empty matches are handled
|
|
|
|
|
like other matches, returning a zero-length string or byte sequence
|
|
|
|
|
(they are more useful in the complementing @scheme[regexp-split]
|
|
|
|
|
function), but @scheme[pattern] is restricted from matching an empty
|
|
|
|
|
(they are more useful in the complementing @racket[regexp-split]
|
|
|
|
|
function), but @racket[pattern] is restricted from matching an empty
|
|
|
|
|
sequence immediately after an empty match.
|
|
|
|
|
|
|
|
|
|
If @scheme[input] contains no matches (in the range @scheme[start-pos]
|
|
|
|
|
to @scheme[end-pos]), @scheme[null] is returned. Otherwise, each item
|
|
|
|
|
If @racket[input] contains no matches (in the range @racket[start-pos]
|
|
|
|
|
to @racket[end-pos]), @racket[null] is returned. Otherwise, each item
|
|
|
|
|
in the resulting list is a distinct substring or byte sequence from
|
|
|
|
|
@scheme[input] that matches @scheme[pattern]. The @scheme[end-pos]
|
|
|
|
|
argument can be @scheme[#f] to match to the end of @scheme[input]
|
|
|
|
|
(which corresponds to an end-of-file if @scheme[input] is an input
|
|
|
|
|
@racket[input] that matches @racket[pattern]. The @racket[end-pos]
|
|
|
|
|
argument can be @racket[#f] to match to the end of @racket[input]
|
|
|
|
|
(which corresponds to an end-of-file if @racket[input] is an input
|
|
|
|
|
port).
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
@ -369,12 +369,12 @@ port).
|
|
|
|
|
(or/c #f (cons/c string? (listof (or/c string? #f))))
|
|
|
|
|
(or/c #f (cons/c bytes? (listof (or/c bytes? #f)))))]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match] on input ports, except that if the match
|
|
|
|
|
fails, no characters are read and discarded from @scheme[in].
|
|
|
|
|
Like @racket[regexp-match] on input ports, except that if the match
|
|
|
|
|
fails, no characters are read and discarded from @racket[in].
|
|
|
|
|
|
|
|
|
|
This procedure is especially useful with a @scheme[pattern] that
|
|
|
|
|
begins with a start-of-string @litchar{^} or with a non-@scheme[#f]
|
|
|
|
|
@scheme[end-pos], since each limits the amount of peeking into the
|
|
|
|
|
This procedure is especially useful with a @racket[pattern] that
|
|
|
|
|
begins with a start-of-string @litchar{^} or with a non-@racket[#f]
|
|
|
|
|
@racket[end-pos], since each limits the amount of peeking into the
|
|
|
|
|
port. Otherwise, beware that a large portion of the stream may be
|
|
|
|
|
peeked (and therefore pulled into memory) before the match succeeds or
|
|
|
|
|
fails.}
|
|
|
|
@ -393,19 +393,19 @@ fails.}
|
|
|
|
|
#f)))
|
|
|
|
|
#f)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match], but returns a list of number pairs (and
|
|
|
|
|
@scheme[#f]) instead of a list of strings. Each pair of numbers refers
|
|
|
|
|
to a range of characters or bytes in @scheme[input]. If the result for
|
|
|
|
|
the same arguments with @scheme[regexp-match] would be a list of byte
|
|
|
|
|
Like @racket[regexp-match], but returns a list of number pairs (and
|
|
|
|
|
@racket[#f]) instead of a list of strings. Each pair of numbers refers
|
|
|
|
|
to a range of characters or bytes in @racket[input]. If the result for
|
|
|
|
|
the same arguments with @racket[regexp-match] would be a list of byte
|
|
|
|
|
strings, the resulting ranges correspond to byte ranges; in that case,
|
|
|
|
|
if @scheme[input] is a character string, the byte ranges correspond to
|
|
|
|
|
if @racket[input] is a character string, the byte ranges correspond to
|
|
|
|
|
bytes in the UTF-8 encoding of the string.
|
|
|
|
|
|
|
|
|
|
Range results are returned in a @scheme[substring]- and
|
|
|
|
|
@scheme[subbytes]-compatible manner, independent of
|
|
|
|
|
@scheme[start-pos]. In the case of an input port, the returned
|
|
|
|
|
Range results are returned in a @racket[substring]- and
|
|
|
|
|
@racket[subbytes]-compatible manner, independent of
|
|
|
|
|
@racket[start-pos]. In the case of an input port, the returned
|
|
|
|
|
positions indicate the number of bytes that were read, including
|
|
|
|
|
@scheme[start-pos], before the first matching byte.
|
|
|
|
|
@racket[start-pos], before the first matching byte.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-match-positions #rx"x." "12x4x6")
|
|
|
|
@ -421,8 +421,8 @@ positions indicate the number of bytes that were read, including
|
|
|
|
|
(listof (cons/c exact-nonnegative-integer?
|
|
|
|
|
exact-nonnegative-integer?))]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match-positions], but returns multiple matches
|
|
|
|
|
like @scheme[regexp-match*].
|
|
|
|
|
Like @racket[regexp-match-positions], but returns multiple matches
|
|
|
|
|
like @racket[regexp-match*].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-match-positions #rx"x." "12x4x6")
|
|
|
|
@ -437,8 +437,8 @@ like @scheme[regexp-match*].
|
|
|
|
|
[input-prefix bytes? #""])
|
|
|
|
|
boolean?]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match], but returns merely @scheme[#t] when the
|
|
|
|
|
match succeeds, @scheme[#f] otherwise.
|
|
|
|
|
Like @racket[regexp-match], but returns merely @racket[#t] when the
|
|
|
|
|
match succeeds, @racket[#f] otherwise.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-match? #rx"x." "12x4x6")
|
|
|
|
@ -450,8 +450,8 @@ match succeeds, @scheme[#f] otherwise.
|
|
|
|
|
[input (or/c string? bytes? input-port?)])
|
|
|
|
|
boolean?]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match?], but @scheme[#t] is only returned when the
|
|
|
|
|
entire content of @scheme[input] matches @scheme[pattern].
|
|
|
|
|
Like @racket[regexp-match?], but @racket[#t] is only returned when the
|
|
|
|
|
entire content of @racket[input] matches @racket[pattern].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-match-exact? #rx"x." "12x4x6")
|
|
|
|
@ -468,15 +468,15 @@ entire content of @scheme[input] matches @scheme[pattern].
|
|
|
|
|
(or/c (cons/c bytes? (listof (or/c bytes? #f)))
|
|
|
|
|
#f)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match] on input ports, but only peeks bytes from
|
|
|
|
|
@scheme[input] instead of reading them. Furthermore, instead of
|
|
|
|
|
Like @racket[regexp-match] on input ports, but only peeks bytes from
|
|
|
|
|
@racket[input] instead of reading them. Furthermore, instead of
|
|
|
|
|
an output port, the last optional argument is a progress event for
|
|
|
|
|
@scheme[input] (see @scheme[port-progress-evt]). If @scheme[progress]
|
|
|
|
|
becomes ready, then the match stops peeking from @scheme[input]
|
|
|
|
|
and returns @scheme[#f]. The @scheme[progress] argument can be
|
|
|
|
|
@scheme[#f], in which case the peek may continue with inconsistent
|
|
|
|
|
@racket[input] (see @racket[port-progress-evt]). If @racket[progress]
|
|
|
|
|
becomes ready, then the match stops peeking from @racket[input]
|
|
|
|
|
and returns @racket[#f]. The @racket[progress] argument can be
|
|
|
|
|
@racket[#f], in which case the peek may continue with inconsistent
|
|
|
|
|
information if another process meanwhile reads from
|
|
|
|
|
@scheme[input].
|
|
|
|
|
@racket[input].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(define p (open-input-string "a abcd"))
|
|
|
|
@ -502,9 +502,9 @@ information if another process meanwhile reads from
|
|
|
|
|
#f)))
|
|
|
|
|
#f)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match-positions] on input ports, but only peeks
|
|
|
|
|
bytes from @scheme[input] instead of reading them, and with a
|
|
|
|
|
@scheme[progress] argument like @scheme[regexp-match-peek].}
|
|
|
|
|
Like @racket[regexp-match-positions] on input ports, but only peeks
|
|
|
|
|
bytes from @racket[input] instead of reading them, and with a
|
|
|
|
|
@racket[progress] argument like @racket[regexp-match-peek].}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@defproc[(regexp-match-peek-immediate [pattern (or/c string? bytes? regexp? byte-regexp?)]
|
|
|
|
@ -516,10 +516,10 @@ bytes from @scheme[input] instead of reading them, and with a
|
|
|
|
|
(or/c (cons/c bytes? (listof (or/c bytes? #f)))
|
|
|
|
|
#f)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match-peek], but it attempts to match only bytes
|
|
|
|
|
that are available from @scheme[input] without blocking. The
|
|
|
|
|
Like @racket[regexp-match-peek], but it attempts to match only bytes
|
|
|
|
|
that are available from @racket[input] without blocking. The
|
|
|
|
|
match fails if not-yet-available characters might be used to match
|
|
|
|
|
@scheme[pattern].}
|
|
|
|
|
@racket[pattern].}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@defproc[(regexp-match-peek-positions-immediate [pattern (or/c string? bytes? regexp? byte-regexp?)]
|
|
|
|
@ -535,10 +535,10 @@ match fails if not-yet-available characters might be used to match
|
|
|
|
|
#f)))
|
|
|
|
|
#f)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match-peek-positions], but it attempts to match
|
|
|
|
|
only bytes that are available from @scheme[input] without
|
|
|
|
|
Like @racket[regexp-match-peek-positions], but it attempts to match
|
|
|
|
|
only bytes that are available from @racket[input] without
|
|
|
|
|
blocking. The match fails if not-yet-available characters might be
|
|
|
|
|
used to match @scheme[pattern].}
|
|
|
|
|
used to match @racket[pattern].}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@defproc[(regexp-match-peek-positions* [pattern (or/c string? bytes? regexp? byte-regexp?)]
|
|
|
|
@ -549,8 +549,8 @@ used to match @scheme[pattern].}
|
|
|
|
|
(listof (cons/c exact-nonnegative-integer?
|
|
|
|
|
exact-nonnegative-integer?))]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match-peek-positions], but returns multiple matches like
|
|
|
|
|
@scheme[regexp-match*].}
|
|
|
|
|
Like @racket[regexp-match-peek-positions], but returns multiple matches like
|
|
|
|
|
@racket[regexp-match*].}
|
|
|
|
|
|
|
|
|
|
@defproc[(regexp-match/end [pattern (or/c string? bytes? regexp? byte-regexp?)]
|
|
|
|
|
[input (or/c string? bytes? input-port?)]
|
|
|
|
@ -566,15 +566,15 @@ Like @scheme[regexp-match-peek-positions], but returns multiple matches like
|
|
|
|
|
(or/c #f (cons/c bytes? (listof (or/c bytes? #f)))))
|
|
|
|
|
(or/c #f bytes?))]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match], but with a second result: a byte
|
|
|
|
|
string of up to @scheme[count] bytes that correspond to the input
|
|
|
|
|
(possibly including the @scheme[input-prefix]) leading to the end of
|
|
|
|
|
the match; the second result is @scheme[#f] if no match is found.
|
|
|
|
|
Like @racket[regexp-match], but with a second result: a byte
|
|
|
|
|
string of up to @racket[count] bytes that correspond to the input
|
|
|
|
|
(possibly including the @racket[input-prefix]) leading to the end of
|
|
|
|
|
the match; the second result is @racket[#f] if no match is found.
|
|
|
|
|
|
|
|
|
|
The second result can be useful as an @scheme[input-prefix] for
|
|
|
|
|
attempting a second match on @scheme[input] starting from the end of
|
|
|
|
|
the first match. In that case, use @scheme[regexp-max-lookbehind]
|
|
|
|
|
to determine an appropriate value for @scheme[count].}
|
|
|
|
|
The second result can be useful as an @racket[input-prefix] for
|
|
|
|
|
attempting a second match on @racket[input] starting from the end of
|
|
|
|
|
the first match. In that case, use @racket[regexp-max-lookbehind]
|
|
|
|
|
to determine an appropriate value for @racket[count].}
|
|
|
|
|
|
|
|
|
|
@deftogether[(
|
|
|
|
|
@defproc[(regexp-match-positions/end [pattern (or/c string? bytes? regexp? byte-regexp?)]
|
|
|
|
@ -618,8 +618,8 @@ to determine an appropriate value for @scheme[count].}
|
|
|
|
|
(or/c #f bytes?))]
|
|
|
|
|
)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-match-positions], etc., but with a second result
|
|
|
|
|
like @scheme[regexp-match/end].}
|
|
|
|
|
Like @racket[regexp-match-positions], etc., but with a second result
|
|
|
|
|
like @racket[regexp-match/end].}
|
|
|
|
|
|
|
|
|
|
@;------------------------------------------------------------------------
|
|
|
|
|
@section{Regexp Splitting}
|
|
|
|
@ -634,24 +634,24 @@ like @scheme[regexp-match/end].}
|
|
|
|
|
(cons/c string? (listof string?))
|
|
|
|
|
(cons/c bytes? (listof bytes?)))]{
|
|
|
|
|
|
|
|
|
|
The complement of @scheme[regexp-match*]: the result is a list of
|
|
|
|
|
strings (if @scheme[pattern] is a string or character regexp and
|
|
|
|
|
@scheme[input] is a string) or byte strings (otherwise) from in
|
|
|
|
|
@scheme[input] that are separated by matches to
|
|
|
|
|
@scheme[pattern]. Adjacent matches are separated with @scheme[""] or
|
|
|
|
|
@scheme[#""]. Zero-length matches are treated the same as for
|
|
|
|
|
@scheme[regexp-match*].
|
|
|
|
|
The complement of @racket[regexp-match*]: the result is a list of
|
|
|
|
|
strings (if @racket[pattern] is a string or character regexp and
|
|
|
|
|
@racket[input] is a string) or byte strings (otherwise) from in
|
|
|
|
|
@racket[input] that are separated by matches to
|
|
|
|
|
@racket[pattern]. Adjacent matches are separated with @racket[""] or
|
|
|
|
|
@racket[#""]. Zero-length matches are treated the same as for
|
|
|
|
|
@racket[regexp-match*].
|
|
|
|
|
|
|
|
|
|
If @scheme[input] contains no matches (in the range @scheme[start-pos]
|
|
|
|
|
to @scheme[end-pos]), the result is a list containing @scheme[input]'s
|
|
|
|
|
content (from @scheme[start-pos] to @scheme[end-pos]) as a single
|
|
|
|
|
element. If a match occurs at the beginning of @scheme[input] (at
|
|
|
|
|
@scheme[start-pos]), the resulting list will start with an empty
|
|
|
|
|
If @racket[input] contains no matches (in the range @racket[start-pos]
|
|
|
|
|
to @racket[end-pos]), the result is a list containing @racket[input]'s
|
|
|
|
|
content (from @racket[start-pos] to @racket[end-pos]) as a single
|
|
|
|
|
element. If a match occurs at the beginning of @racket[input] (at
|
|
|
|
|
@racket[start-pos]), the resulting list will start with an empty
|
|
|
|
|
string or byte string, and if a match occurs at the end (at
|
|
|
|
|
@scheme[end-pos]), the list will end with an empty string or byte
|
|
|
|
|
string. The @scheme[end-pos] argument can be @scheme[#f], in which
|
|
|
|
|
case splitting goes to the end of @scheme[input] (which corresponds to
|
|
|
|
|
an end-of-file if @scheme[input] is an input port).
|
|
|
|
|
@racket[end-pos]), the list will end with an empty string or byte
|
|
|
|
|
string. The @racket[end-pos] argument can be @racket[#f], in which
|
|
|
|
|
case splitting goes to the end of @racket[input] (which corresponds to
|
|
|
|
|
an end-of-file if @racket[input] is an input port).
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-split #rx" +" "12 34")
|
|
|
|
@ -675,52 +675,52 @@ an end-of-file if @scheme[input] is an input port).
|
|
|
|
|
string?
|
|
|
|
|
bytes?)]{
|
|
|
|
|
|
|
|
|
|
Performs a match using @scheme[pattern] on @scheme[input], and then
|
|
|
|
|
Performs a match using @racket[pattern] on @racket[input], and then
|
|
|
|
|
returns a string or byte string in which the matching portion of
|
|
|
|
|
@scheme[input] is replaced with @scheme[insert]. If @scheme[pattern]
|
|
|
|
|
matches no part of @scheme[input], then @scheme[iput] is returned
|
|
|
|
|
@racket[input] is replaced with @racket[insert]. If @racket[pattern]
|
|
|
|
|
matches no part of @racket[input], then @racket[iput] is returned
|
|
|
|
|
unmodified.
|
|
|
|
|
|
|
|
|
|
The @scheme[insert] argument can be either a (byte) string, or a
|
|
|
|
|
The @racket[insert] argument can be either a (byte) string, or a
|
|
|
|
|
function that returns a (byte) string. In the latter case, the
|
|
|
|
|
function is applied on the list of values that @scheme[regexp-match]
|
|
|
|
|
function is applied on the list of values that @racket[regexp-match]
|
|
|
|
|
would return (i.e., the first argument is the complete match, and then
|
|
|
|
|
one argument for each parenthesized sub-expression) to obtain a
|
|
|
|
|
replacement (byte) string.
|
|
|
|
|
|
|
|
|
|
If @scheme[pattern] is a string or character regexp and @scheme[input]
|
|
|
|
|
is a string, then @scheme[insert] must be a string or a procedure that
|
|
|
|
|
accept strings, and the result is a string. If @scheme[pattern] is a
|
|
|
|
|
byte string or byte regexp, or if @scheme[input] is a byte string,
|
|
|
|
|
then @scheme[insert] as a string is converted to a byte string,
|
|
|
|
|
@scheme[insert] as a procedure is called with a byte string, and the
|
|
|
|
|
If @racket[pattern] is a string or character regexp and @racket[input]
|
|
|
|
|
is a string, then @racket[insert] must be a string or a procedure that
|
|
|
|
|
accept strings, and the result is a string. If @racket[pattern] is a
|
|
|
|
|
byte string or byte regexp, or if @racket[input] is a byte string,
|
|
|
|
|
then @racket[insert] as a string is converted to a byte string,
|
|
|
|
|
@racket[insert] as a procedure is called with a byte string, and the
|
|
|
|
|
result is a byte string.
|
|
|
|
|
|
|
|
|
|
If @scheme[insert] contains @litchar{&}, then @litchar{&}
|
|
|
|
|
is replaced with the matching portion of @scheme[input] before it is
|
|
|
|
|
substituted into the match's place. If @scheme[insert] contains
|
|
|
|
|
If @racket[insert] contains @litchar{&}, then @litchar{&}
|
|
|
|
|
is replaced with the matching portion of @racket[input] before it is
|
|
|
|
|
substituted into the match's place. If @racket[insert] contains
|
|
|
|
|
@litchar{\}@nonterm{n} for some integer @nonterm{n}, then it is
|
|
|
|
|
replaced with the @nonterm{n}th matching sub-expression from
|
|
|
|
|
@scheme[input]. A @litchar{&} and @litchar{\0} are synonymous. If
|
|
|
|
|
@racket[input]. A @litchar{&} and @litchar{\0} are synonymous. If
|
|
|
|
|
the @nonterm{n}th sub-expression was not used in the match, or if
|
|
|
|
|
@nonterm{n} is greater than the number of sub-expressions in
|
|
|
|
|
@scheme[pattern], then @litchar{\}@nonterm{n} is replaced with the
|
|
|
|
|
@racket[pattern], then @litchar{\}@nonterm{n} is replaced with the
|
|
|
|
|
empty string.
|
|
|
|
|
|
|
|
|
|
To substitute a literal @litchar{&} or @litchar{\}, use
|
|
|
|
|
@litchar{\&} and @litchar{\\}, respectively, in
|
|
|
|
|
@scheme[insert]. A @litchar{\$} in @scheme[insert] is
|
|
|
|
|
@racket[insert]. A @litchar{\$} in @racket[insert] is
|
|
|
|
|
equivalent to an empty sequence; this can be used to terminate a
|
|
|
|
|
number @nonterm{n} following @litchar{\}. If a @litchar{\} in
|
|
|
|
|
@scheme[insert] is followed by anything other than a digit,
|
|
|
|
|
@racket[insert] is followed by anything other than a digit,
|
|
|
|
|
@litchar{&}, @litchar{\}, or @litchar{$}, then the @litchar{\}
|
|
|
|
|
by itself is treated as @litchar{\0}.
|
|
|
|
|
|
|
|
|
|
Note that the @litchar{\} described in the previous paragraphs is a
|
|
|
|
|
character or byte of @scheme[input]. To write such an @scheme[input]
|
|
|
|
|
as a Scheme string literal, an escaping @litchar{\} is needed
|
|
|
|
|
before the @litchar{\}. For example, the Scheme constant
|
|
|
|
|
@scheme["\\1"] is @litchar{\1}.
|
|
|
|
|
character or byte of @racket[input]. To write such an @racket[input]
|
|
|
|
|
as a Racket string literal, an escaping @litchar{\} is needed
|
|
|
|
|
before the @litchar{\}. For example, the Racket constant
|
|
|
|
|
@racket["\\1"] is @litchar{\1}.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-replace "mi" "mi casa" "su")
|
|
|
|
@ -740,13 +740,13 @@ before the @litchar{\}. For example, the Scheme constant
|
|
|
|
|
[input-prefix bytes? #""])
|
|
|
|
|
(or/c string? bytes?)]{
|
|
|
|
|
|
|
|
|
|
Like @scheme[regexp-replace], except that every instance of
|
|
|
|
|
@scheme[pattern] in @scheme[input] is replaced with @scheme[insert],
|
|
|
|
|
Like @racket[regexp-replace], except that every instance of
|
|
|
|
|
@racket[pattern] in @racket[input] is replaced with @racket[insert],
|
|
|
|
|
instead of just the first match. Only non-overlapping instances of
|
|
|
|
|
@scheme[pattern] in @scheme[input] are replaced, so instances of
|
|
|
|
|
@scheme[pattern] within inserted strings are @italic{not} replaced
|
|
|
|
|
@racket[pattern] in @racket[input] are replaced, so instances of
|
|
|
|
|
@racket[pattern] within inserted strings are @italic{not} replaced
|
|
|
|
|
recursively. Zero-length matches are treated the same as in
|
|
|
|
|
@scheme[regexp-match*].
|
|
|
|
|
@racket[regexp-match*].
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-replace* "([Mm])i ([a-zA-Z]*)" "mi cerveza Mi Mi Mi"
|
|
|
|
@ -762,10 +762,10 @@ recursively. Zero-length matches are treated the same as in
|
|
|
|
|
[(regexp-replace-quote [bstr bytes?]) bytes?])]{
|
|
|
|
|
|
|
|
|
|
Produces a string suitable for use as the third argument to
|
|
|
|
|
@scheme[regexp-replace] to insert the literal sequence of characters
|
|
|
|
|
in @scheme[str] or bytes in @scheme[bstr] as a replacement.
|
|
|
|
|
Concretely, every @litchar{\} and @litchar{&} in @scheme[str] or
|
|
|
|
|
@scheme[bstr] is protected by a quoting @litchar{\}.
|
|
|
|
|
@racket[regexp-replace] to insert the literal sequence of characters
|
|
|
|
|
in @racket[str] or bytes in @racket[bstr] as a replacement.
|
|
|
|
|
Concretely, every @litchar{\} and @litchar{&} in @racket[str] or
|
|
|
|
|
@racket[bstr] is protected by a quoting @litchar{\}.
|
|
|
|
|
|
|
|
|
|
@examples[
|
|
|
|
|
(regexp-replace "UT" "Go UT!" "A&M")
|
|
|
|
|