man-pages/man3/Str.3o.html
2021-03-31 01:06:50 +01:00

1094 lines
15 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of Str</TITLE>
</HEAD><BODY>
<H1>Str</H1>
Section: OCaml library (3o)<BR>Updated: 2020-01-30<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
Str - Regular expressions and high-level string processing
<A NAME="lbAC">&nbsp;</A>
<H2>Module</H2>
Module Str
<A NAME="lbAD">&nbsp;</A>
<H2>Documentation</H2>
<P>
Module
<B>Str</B>
<BR>&nbsp;:&nbsp;
<B>sig end</B>
<P>
<P>
Regular expressions and high-level string processing
<P>
<P>
<P>
<P>
<P>
<P>
<P>
<A NAME="lbAE">&nbsp;</A>
<H3>Regular expressions</H3>
<P>
<P>
<I>type regexp </I>
<P>
<P>
The type of compiled regular expressions.
<P>
<P>
<P>
<I>val regexp </I>
:
<B>string -&gt; regexp</B>
<P>
Compile a regular expression. The following constructs are
recognized:
<P>
-
<B>. </B>
Matches any character except newline.
<P>
-
<B>* </B>
(postfix) Matches the preceding expression zero, one or
several times
<P>
-
<B>+ </B>
(postfix) Matches the preceding expression one or
several times
<P>
-
<B>? </B>
(postfix) Matches the preceding expression once or
not at all
<P>
-
<B>[..] </B>
Character set. Ranges are denoted with
<B>-</B>
, as in
<B>[a-z]</B>
.
An initial
<B>^</B>
, as in
<B>[^0-9]</B>
, complements the set.
To include a
<B>]</B>
character in a set, make it the first
character of the set. To include a
<B>-</B>
character in a set,
make it the first or the last character of the set.
<P>
-
<B>^ </B>
Matches at beginning of line: either at the beginning of
the matched string, or just after a '\n' character.
<P>
-
<B>$ </B>
Matches at end of line: either at the end of the matched
string, or just before a '\n' character.
<P>
-
<B>\| </B>
(infix) Alternative between two expressions.
<P>
-
<B>\(..\)</B>
Grouping and naming of the enclosed expression.
<P>
-
<B>\1 </B>
The text matched by the first
<B>\(...\)</B>
expression
(
<B>\2</B>
for the second expression, and so on up to
<B>\9</B>
).
<P>
-
<B>\b </B>
Matches word boundaries.
<P>
-
<B>\ </B>
Quotes special characters. The special characters
are
<B>$^\.*+?[]</B>
.
<P>
Note: the argument to
<B>regexp</B>
is usually a string literal. In this
case, any backslash character in the regular expression must be
doubled to make it past the OCaml string parser. For example, the
following expression:
<B>let r = Str.regexp hello \\([A-Za-z]+\\) in</B>
<B>Str.replace_first r \\1 hello world </B>
returns the string
<B>world</B>
.
<P>
In particular, if you want a regular expression that matches a single
backslash character, you need to quote it in the argument to
<B>regexp</B>
(according to the last item of the list above) by adding a second
backslash. Then you need to quote both backslashes (according to the
syntax of string constants in OCaml) by doubling them again, so you
need to write four backslash characters:
<B>Str.regexp \\\\</B>
.
<P>
<P>
<P>
<I>val regexp_case_fold </I>
:
<B>string -&gt; regexp</B>
<P>
Same as
<B>regexp</B>
, but the compiled expression will match text
in a case-insensitive way: uppercase and lowercase letters will
be considered equivalent.
<P>
<P>
<P>
<I>val quote </I>
:
<B>string -&gt; string</B>
<P>
<P>
<B>Str.quote s</B>
returns a regexp string that matches exactly
<B>s</B>
and nothing else.
<P>
<P>
<P>
<I>val regexp_string </I>
:
<B>string -&gt; regexp</B>
<P>
<P>
<B>Str.regexp_string s</B>
returns a regular expression
that matches exactly
<B>s</B>
and nothing else.
<P>
<P>
<P>
<I>val regexp_string_case_fold </I>
:
<B>string -&gt; regexp</B>
<P>
<P>
<B>Str.regexp_string_case_fold</B>
is similar to
<B>Str.regexp_string</B>
,
but the regexp matches in a case-insensitive way.
<P>
<P>
<P>
<P>
<A NAME="lbAF">&nbsp;</A>
<H3>String matching and searching</H3>
<P>
<P>
<P>
<I>val string_match </I>
:
<B>regexp -&gt; string -&gt; int -&gt; bool</B>
<P>
<P>
<B>string_match r s start</B>
tests whether a substring of
<B>s</B>
that
starts at position
<B>start</B>
matches the regular expression
<B>r</B>
.
The first character of a string has position
<B>0</B>
, as usual.
<P>
<P>
<P>
<I>val search_forward </I>
:
<B>regexp -&gt; string -&gt; int -&gt; int</B>
<P>
<P>
<B>search_forward r s start</B>
searches the string
<B>s</B>
for a substring
matching the regular expression
<B>r</B>
. The search starts at position
<B>start</B>
and proceeds towards the end of the string.
Return the position of the first character of the matched
substring.
<P>
<P>
<B>Raises Not_found</B>
if no substring matches.
<P>
<P>
<P>
<I>val search_backward </I>
:
<B>regexp -&gt; string -&gt; int -&gt; int</B>
<P>
<P>
<B>search_backward r s last</B>
searches the string
<B>s</B>
for a
substring matching the regular expression
<B>r</B>
. The search first
considers substrings that start at position
<B>last</B>
and proceeds
towards the beginning of string. Return the position of the first
character of the matched substring.
<P>
<P>
<B>Raises Not_found</B>
if no substring matches.
<P>
<P>
<P>
<I>val string_partial_match </I>
:
<B>regexp -&gt; string -&gt; int -&gt; bool</B>
<P>
Similar to
<B>Str.string_match</B>
, but also returns true if
the argument string is a prefix of a string that matches.
This includes the case of a true complete match.
<P>
<P>
<P>
<I>val matched_string </I>
:
<B>string -&gt; string</B>
<P>
<P>
<B>matched_string s</B>
returns the substring of
<B>s</B>
that was matched
by the last call to one of the following matching or searching
functions:
<P>
-
<B>Str.string_match</B>
<P>
<P>
-
<B>Str.search_forward</B>
<P>
<P>
-
<B>Str.search_backward</B>
<P>
<P>
-
<B>Str.string_partial_match</B>
<P>
<P>
-
<B>Str.global_substitute</B>
<P>
<P>
-
<B>Str.substitute_first</B>
<P>
provided that none of the following functions was called in between:
<P>
-
<B>Str.global_replace</B>
<P>
<P>
-
<B>Str.replace_first</B>
<P>
<P>
-
<B>Str.split</B>
<P>
<P>
-
<B>Str.bounded_split</B>
<P>
<P>
-
<B>Str.split_delim</B>
<P>
<P>
-
<B>Str.bounded_split_delim</B>
<P>
<P>
-
<B>Str.full_split</B>
<P>
<P>
-
<B>Str.bounded_full_split</B>
<P>
Note: in the case of
<B>global_substitute</B>
and
<B>substitute_first</B>
,
a call to
<B>matched_string</B>
is only valid within the
<B>subst</B>
argument,
not after
<B>global_substitute</B>
or
<B>substitute_first</B>
returns.
<P>
The user must make sure that the parameter
<B>s</B>
is the same string
that was passed to the matching or searching function.
<P>
<P>
<P>
<I>val match_beginning </I>
:
<B>unit -&gt; int</B>
<P>
<P>
<B>match_beginning()</B>
returns the position of the first character
of the substring that was matched by the last call to a matching
or searching function (see
<B>Str.matched_string</B>
for details).
<P>
<P>
<P>
<I>val match_end </I>
:
<B>unit -&gt; int</B>
<P>
<P>
<B>match_end()</B>
returns the position of the character following the
last character of the substring that was matched by the last call
to a matching or searching function (see
<B>Str.matched_string</B>
for
details).
<P>
<P>
<P>
<I>val matched_group </I>
:
<B>int -&gt; string -&gt; string</B>
<P>
<P>
<B>matched_group n s</B>
returns the substring of
<B>s</B>
that was matched
by the
<B>n</B>
th group
<B>\(...\)</B>
of the regular expression that was
matched by the last call to a matching or searching function (see
<B>Str.matched_string</B>
for details).
The user must make sure that the parameter
<B>s</B>
is the same string
that was passed to the matching or searching function.
<P>
<P>
<B>Raises Not_found</B>
if the
<B>n</B>
th group
of the regular expression was not matched. This can happen
with groups inside alternatives
<B>\|</B>
, options
<B>?</B>
or repetitions
<B>*</B>
. For instance, the empty string will match
<B>\(a\)*</B>
, but
<B>matched_group 1 </B>
will raise
<B>Not_found</B>
because the first group itself was not matched.
<P>
<P>
<P>
<I>val group_beginning </I>
:
<B>int -&gt; int</B>
<P>
<P>
<B>group_beginning n</B>
returns the position of the first character
of the substring that was matched by the
<B>n</B>
th group of
the regular expression that was matched by the last call to a
matching or searching function (see
<B>Str.matched_string</B>
for details).
<P>
<P>
<B>Raises Not_found</B>
if the
<B>n</B>
th group of the regular expression
was not matched.
<P>
<P>
<B>Raises Invalid_argument</B>
if there are fewer than
<B>n</B>
groups in
the regular expression.
<P>
<P>
<P>
<I>val group_end </I>
:
<B>int -&gt; int</B>
<P>
<P>
<B>group_end n</B>
returns
the position of the character following the last character of
substring that was matched by the
<B>n</B>
th group of the regular
expression that was matched by the last call to a matching or
searching function (see
<B>Str.matched_string</B>
for details).
<P>
<P>
<B>Raises Not_found</B>
if the
<B>n</B>
th group of the regular expression
was not matched.
<P>
<P>
<B>Raises Invalid_argument</B>
if there are fewer than
<B>n</B>
groups in
the regular expression.
<P>
<P>
<P>
<P>
<A NAME="lbAG">&nbsp;</A>
<H3>Replacement</H3>
<P>
<P>
<P>
<I>val global_replace </I>
:
<B>regexp -&gt; string -&gt; string -&gt; string</B>
<P>
<P>
<B>global_replace regexp templ s</B>
returns a string identical to
<B>s</B>
,
except that all substrings of
<B>s</B>
that match
<B>regexp</B>
have been
replaced by
<B>templ</B>
. The replacement template
<B>templ</B>
can contain
<B>\1</B>
,
<B>\2</B>
, etc; these sequences will be replaced by the text
matched by the corresponding group in the regular expression.
<B>\0</B>
stands for the text matched by the whole regular expression.
<P>
<P>
<P>
<I>val replace_first </I>
:
<B>regexp -&gt; string -&gt; string -&gt; string</B>
<P>
Same as
<B>Str.global_replace</B>
, except that only the first substring
matching the regular expression is replaced.
<P>
<P>
<P>
<I>val global_substitute </I>
:
<B>regexp -&gt; (string -&gt; string) -&gt; string -&gt; string</B>
<P>
<P>
<B>global_substitute regexp subst s</B>
returns a string identical
to
<B>s</B>
, except that all substrings of
<B>s</B>
that match
<B>regexp</B>
have been replaced by the result of function
<B>subst</B>
. The
function
<B>subst</B>
is called once for each matching substring,
and receives
<B>s</B>
(the whole text) as argument.
<P>
<P>
<P>
<I>val substitute_first </I>
:
<B>regexp -&gt; (string -&gt; string) -&gt; string -&gt; string</B>
<P>
Same as
<B>Str.global_substitute</B>
, except that only the first substring
matching the regular expression is replaced.
<P>
<P>
<P>
<I>val replace_matched </I>
:
<B>string -&gt; string -&gt; string</B>
<P>
<P>
<B>replace_matched repl s</B>
returns the replacement text
<B>repl</B>
in which
<B>\1</B>
,
<B>\2</B>
, etc. have been replaced by the text
matched by the corresponding groups in the regular expression
that was matched by the last call to a matching or searching
function (see
<B>Str.matched_string</B>
for details).
<B>s</B>
must be the same string that was passed to the matching or
searching function.
<P>
<P>
<P>
<P>
<A NAME="lbAH">&nbsp;</A>
<H3>Splitting</H3>
<P>
<P>
<P>
<I>val split </I>
:
<B>regexp -&gt; string -&gt; string list</B>
<P>
<P>
<B>split r s</B>
splits
<B>s</B>
into substrings, taking as delimiters
the substrings that match
<B>r</B>
, and returns the list of substrings.
For instance,
<B>split (regexp [ \t]+) s</B>
splits
<B>s</B>
into
blank-separated words. An occurrence of the delimiter at the
beginning or at the end of the string is ignored.
<P>
<P>
<P>
<I>val bounded_split </I>
:
<B>regexp -&gt; string -&gt; int -&gt; string list</B>
<P>
Same as
<B>Str.split</B>
, but splits into at most
<B>n</B>
substrings,
where
<B>n</B>
is the extra integer parameter.
<P>
<P>
<P>
<I>val split_delim </I>
:
<B>regexp -&gt; string -&gt; string list</B>
<P>
Same as
<B>Str.split</B>
but occurrences of the
delimiter at the beginning and at the end of the string are
recognized and returned as empty strings in the result.
For instance,
<B>split_delim (regexp ) abc </B>
returns
<B>[; abc; ]</B>
, while
<B>split</B>
with the same
arguments returns
<B>[abc]</B>
.
<P>
<P>
<P>
<I>val bounded_split_delim </I>
:
<B>regexp -&gt; string -&gt; int -&gt; string list</B>
<P>
Same as
<B>Str.bounded_split</B>
, but occurrences of the
delimiter at the beginning and at the end of the string are
recognized and returned as empty strings in the result.
<P>
<P>
<I>type split_result </I>
=
<BR>&nbsp;|&nbsp;Text
<B>of </B>
<B>string</B>
<BR>&nbsp;|&nbsp;Delim
<B>of </B>
<B>string</B>
<BR>&nbsp;
<P>
<P>
<P>
<P>
<I>val full_split </I>
:
<B>regexp -&gt; string -&gt; split_result list</B>
<P>
Same as
<B>Str.split_delim</B>
, but returns
the delimiters as well as the substrings contained between
delimiters. The former are tagged
<B>Delim</B>
in the result list;
the latter are tagged
<B>Text</B>
. For instance,
<B>full_split (regexp [{}]) {ab}</B>
returns
<B>[Delim {; Text ab; Delim }]</B>
.
<P>
<P>
<P>
<I>val bounded_full_split </I>
:
<B>regexp -&gt; string -&gt; int -&gt; split_result list</B>
<P>
Same as
<B>Str.bounded_split_delim</B>
, but returns
the delimiters as well as the substrings contained between
delimiters. The former are tagged
<B>Delim</B>
in the result list;
the latter are tagged
<B>Text</B>
.
<P>
<P>
<P>
<P>
<A NAME="lbAI">&nbsp;</A>
<H3>Extracting substrings</H3>
<P>
<P>
<P>
<I>val string_before </I>
:
<B>string -&gt; int -&gt; string</B>
<P>
<P>
<B>string_before s n</B>
returns the substring of all characters of
<B>s</B>
that precede position
<B>n</B>
(excluding the character at
position
<B>n</B>
).
<P>
<P>
<P>
<I>val string_after </I>
:
<B>string -&gt; int -&gt; string</B>
<P>
<P>
<B>string_after s n</B>
returns the substring of all characters of
<B>s</B>
that follow position
<B>n</B>
(including the character at
position
<B>n</B>
).
<P>
<P>
<P>
<I>val first_chars </I>
:
<B>string -&gt; int -&gt; string</B>
<P>
<P>
<B>first_chars s n</B>
returns the first
<B>n</B>
characters of
<B>s</B>
.
This is the same function as
<B>Str.string_before</B>
.
<P>
<P>
<P>
<I>val last_chars </I>
:
<B>string -&gt; int -&gt; string</B>
<P>
<P>
<B>last_chars s n</B>
returns the last
<B>n</B>
characters of
<B>s</B>
.
<P>
<P>
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
<DT id="2"><A HREF="#lbAC">Module</A><DD>
<DT id="3"><A HREF="#lbAD">Documentation</A><DD>
<DL>
<DT id="4"><A HREF="#lbAE">Regular expressions</A><DD>
<DT id="5"><A HREF="#lbAF">String matching and searching</A><DD>
<DT id="6"><A HREF="#lbAG">Replacement</A><DD>
<DT id="7"><A HREF="#lbAH">Splitting</A><DD>
<DT id="8"><A HREF="#lbAI">Extracting substrings</A><DD>
</DL>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:58 GMT, March 31, 2021
</BODY>
</HTML>