man-pages/man3/Str.3o.html


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of Str</TITLE>
</HEAD><BODY>
<H1>Str</H1>
Section: OCaml library (3o)<BR>Updated: 2020-01-30<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>

<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>

Str - Regular expressions and high-level string processing
<A NAME="lbAC">&nbsp;</A>
<H2>Module</H2>

Module   Str
<A NAME="lbAD">&nbsp;</A>
<H2>Documentation</H2>

<P>
Module
<B>Str</B>

<BR>&nbsp;:&nbsp;
<B>sig  end</B>

<P>
<P>
Regular expressions and high-level string processing
<P>
<P>
<P>
<P>
<P>
<P>
<P>

<A NAME="lbAE">&nbsp;</A>
<H3>Regular expressions</H3>

<P>
<P>

<I>type regexp </I>

<P>
<P>
The type of compiled regular expressions.
<P>
<P>
<P>
<I>val regexp </I>

:
<B>string -&gt; regexp</B>

<P>
Compile a regular expression. The following constructs are
recognized:
<P>
-
<B>.     </B>

Matches any character except newline.
<P>
-
<B>*     </B>

(postfix) Matches the preceding expression zero, one or
several times
<P>
-
<B>+     </B>

(postfix) Matches the preceding expression one or
several times
<P>
-
<B>?     </B>

(postfix) Matches the preceding expression once or
not at all
<P>
-
<B>[..]  </B>

Character set. Ranges are denoted with
<B>-</B>

, as in
<B>[a-z]</B>

.
An initial
<B>^</B>

, as in
<B>[^0-9]</B>

, complements the set.
To include a
<B>]</B>

character in a set, make it the first
character of the set. To include a
<B>-</B>

character in a set,
make it the first or the last character of the set.
<P>
-
<B>^     </B>

Matches at beginning of line: either at the beginning of
the matched string, or just after a '\n' character.
<P>
-
<B>$     </B>

Matches at end of line: either at the end of the matched
string, or just before a '\n' character.
<P>
-
<B>\|    </B>

(infix) Alternative between two expressions.
<P>
-
<B>\(..\)</B>

Grouping and naming of the enclosed expression.
<P>
-
<B>\1    </B>

The text matched by the first
<B>\(...\)</B>

expression
(
<B>\2</B>

for the second expression, and so on up to
<B>\9</B>

).
<P>
-
<B>\b    </B>

Matches word boundaries.
<P>
-
<B>\     </B>

Quotes special characters.  The special characters
are
<B>$^\.*+?[]</B>

.
<P>
Note: the argument to
<B>regexp</B>

is usually a string literal. In this
case, any backslash character in the regular expression must be
doubled to make it past the OCaml string parser. For example, the
following expression:
<B>let r = Str.regexp hello \\([A-Za-z]+\\) in</B>

<B>Str.replace_first r \\1 hello world </B>

returns the string
<B>world</B>

.
<P>
In particular, if you want a regular expression that matches a single
backslash character, you need to quote it in the argument to
<B>regexp</B>

(according to the last item of the list above) by adding a second
backslash. Then you need to quote both backslashes (according to the
syntax of string constants in OCaml) by doubling them again, so you
need to write four backslash characters:
<B>Str.regexp \\\\</B>

.
<P>
<P>
<P>
<I>val regexp_case_fold </I>

:
<B>string -&gt; regexp</B>

<P>
Same as
<B>regexp</B>

, but the compiled expression will match text
in a case-insensitive way: uppercase and lowercase letters will
be considered equivalent.
<P>
<P>
<P>
<I>val quote </I>

:
<B>string -&gt; string</B>

<P>
<P>
<B>Str.quote s</B>

returns a regexp string that matches exactly
<B>s</B>

and nothing else.
<P>
<P>
<P>
<I>val regexp_string </I>

:
<B>string -&gt; regexp</B>

<P>
<P>
<B>Str.regexp_string s</B>

returns a regular expression
that matches exactly
<B>s</B>

and nothing else.
<P>
<P>
<P>
<I>val regexp_string_case_fold </I>

:
<B>string -&gt; regexp</B>

<P>
<P>
<B>Str.regexp_string_case_fold</B>

is similar to
<B>Str.regexp_string</B>

,
but the regexp matches in a case-insensitive way.
<P>
<P>
<P>
<P>

<A NAME="lbAF">&nbsp;</A>
<H3>String matching and searching</H3>

<P>
<P>

<P>
<I>val string_match </I>

:
<B>regexp -&gt; string -&gt; int -&gt; bool</B>

<P>
<P>
<B>string_match r s start</B>

tests whether a substring of
<B>s</B>

that
starts at position
<B>start</B>

matches the regular expression
<B>r</B>

.
The first character of a string has position
<B>0</B>

, as usual.
<P>
<P>
<P>
<I>val search_forward </I>

:
<B>regexp -&gt; string -&gt; int -&gt; int</B>

<P>
<P>
<B>search_forward r s start</B>

searches the string
<B>s</B>

for a substring
matching the regular expression
<B>r</B>

. The search starts at position
<B>start</B>

and proceeds towards the end of the string.
Return the position of the first character of the matched
substring.
<P>
<P>
<B>Raises Not_found</B>

if no substring matches.
<P>
<P>
<P>
<I>val search_backward </I>

:
<B>regexp -&gt; string -&gt; int -&gt; int</B>

<P>
<P>
<B>search_backward r s last</B>

searches the string
<B>s</B>

for a
substring matching the regular expression
<B>r</B>

. The search first
considers substrings that start at position
<B>last</B>

and proceeds
towards the beginning of string. Return the position of the first
character of the matched substring.
<P>
<P>
<B>Raises Not_found</B>

if no substring matches.
<P>
<P>
<P>
<I>val string_partial_match </I>

:
<B>regexp -&gt; string -&gt; int -&gt; bool</B>

<P>
Similar to
<B>Str.string_match</B>

, but also returns true if
the argument string is a prefix of a string that matches.
This includes the case of a true complete match.
<P>
<P>
<P>
<I>val matched_string </I>

:
<B>string -&gt; string</B>

<P>
<P>
<B>matched_string s</B>

returns the substring of
<B>s</B>

that was matched
by the last call to one of the following matching or searching
functions:
<P>
-
<B>Str.string_match</B>

<P>
<P>
-
<B>Str.search_forward</B>

<P>
<P>
-
<B>Str.search_backward</B>

<P>
<P>
-
<B>Str.string_partial_match</B>

<P>
<P>
-
<B>Str.global_substitute</B>

<P>
<P>
-
<B>Str.substitute_first</B>

<P>
provided that none of the following functions was called in between:
<P>
-
<B>Str.global_replace</B>

<P>
<P>
-
<B>Str.replace_first</B>

<P>
<P>
-
<B>Str.split</B>

<P>
<P>
-
<B>Str.bounded_split</B>

<P>
<P>
-
<B>Str.split_delim</B>

<P>
<P>
-
<B>Str.bounded_split_delim</B>

<P>
<P>
-
<B>Str.full_split</B>

<P>
<P>
-
<B>Str.bounded_full_split</B>

<P>
Note: in the case of
<B>global_substitute</B>

and
<B>substitute_first</B>

,
a call to
<B>matched_string</B>

is only valid within the
<B>subst</B>

argument,
not after
<B>global_substitute</B>

or
<B>substitute_first</B>

returns.
<P>
The user must make sure that the parameter
<B>s</B>

is the same string
that was passed to the matching or searching function.
<P>
<P>
<P>
<I>val match_beginning </I>

:
<B>unit -&gt; int</B>

<P>
<P>
<B>match_beginning()</B>

returns the position of the first character
of the substring that was matched by the last call to a matching
or searching function (see
<B>Str.matched_string</B>

for details).
<P>
<P>
<P>
<I>val match_end </I>

:
<B>unit -&gt; int</B>

<P>
<P>
<B>match_end()</B>

returns the position of the character following the
last character of the substring that was matched by the last call
to a matching or searching function (see
<B>Str.matched_string</B>

for
details).
<P>
<P>
<P>
<I>val matched_group </I>

:
<B>int -&gt; string -&gt; string</B>

<P>
<P>
<B>matched_group n s</B>

returns the substring of
<B>s</B>

that was matched
by the
<B>n</B>

th group
<B>\(...\)</B>

of the regular expression that was
matched by the last call to a matching or searching function (see
<B>Str.matched_string</B>

for details).
The user must make sure that the parameter
<B>s</B>

is the same string
that was passed to the matching or searching function.
<P>
<P>
<B>Raises Not_found</B>

if the
<B>n</B>

th group
of the regular expression was not matched.  This can happen
with groups inside alternatives
<B>\|</B>

, options
<B>?</B>

or repetitions
<B>*</B>

.  For instance, the empty string will match
<B>\(a\)*</B>

, but
<B>matched_group 1 </B>

will raise
<B>Not_found</B>

because the first group itself was not matched.
<P>
<P>
<P>
<I>val group_beginning </I>

:
<B>int -&gt; int</B>

<P>
<P>
<B>group_beginning n</B>

returns the position of the first character
of the substring that was matched by the
<B>n</B>

th group of
the regular expression that was matched by the last call to a
matching or searching function (see
<B>Str.matched_string</B>

for details).
<P>
<P>
<B>Raises Not_found</B>

if the
<B>n</B>

th group of the regular expression
was not matched.
<P>
<P>
<B>Raises Invalid_argument</B>

if there are fewer than
<B>n</B>

groups in
the regular expression.
<P>
<P>
<P>
<I>val group_end </I>

:
<B>int -&gt; int</B>

<P>
<P>
<B>group_end n</B>

returns
the position of the character following the last character of
substring that was matched by the
<B>n</B>

th group of the regular
expression that was matched by the last call to a matching or
searching function (see
<B>Str.matched_string</B>

for details).
<P>
<P>
<B>Raises Not_found</B>

if the
<B>n</B>

th group of the regular expression
was not matched.
<P>
<P>
<B>Raises Invalid_argument</B>

if there are fewer than
<B>n</B>

groups in
the regular expression.
<P>
<P>
<P>
<P>

<A NAME="lbAG">&nbsp;</A>
<H3>Replacement</H3>

<P>
<P>

<P>
<I>val global_replace </I>

:
<B>regexp -&gt; string -&gt; string -&gt; string</B>

<P>
<P>
<B>global_replace regexp templ s</B>

returns a string identical to
<B>s</B>

,
except that all substrings of
<B>s</B>

that match
<B>regexp</B>

have been
replaced by
<B>templ</B>

. The replacement template
<B>templ</B>

can contain
<B>\1</B>

,
<B>\2</B>

, etc; these sequences will be replaced by the text
matched by the corresponding group in the regular expression.
<B>\0</B>

stands for the text matched by the whole regular expression.
<P>
<P>
<P>
<I>val replace_first </I>

:
<B>regexp -&gt; string -&gt; string -&gt; string</B>

<P>
Same as
<B>Str.global_replace</B>

, except that only the first substring
matching the regular expression is replaced.
<P>
<P>
<P>
<I>val global_substitute </I>

:
<B>regexp -&gt; (string -&gt; string) -&gt; string -&gt; string</B>

<P>
<P>
<B>global_substitute regexp subst s</B>

returns a string identical
to
<B>s</B>

, except that all substrings of
<B>s</B>

that match
<B>regexp</B>

have been replaced by the result of function
<B>subst</B>

. The
function
<B>subst</B>

is called once for each matching substring,
and receives
<B>s</B>

(the whole text) as argument.
<P>
<P>
<P>
<I>val substitute_first </I>

:
<B>regexp -&gt; (string -&gt; string) -&gt; string -&gt; string</B>

<P>
Same as
<B>Str.global_substitute</B>

, except that only the first substring
matching the regular expression is replaced.
<P>
<P>
<P>
<I>val replace_matched </I>

:
<B>string -&gt; string -&gt; string</B>

<P>
<P>
<B>replace_matched repl s</B>

returns the replacement text
<B>repl</B>

in which
<B>\1</B>

,
<B>\2</B>

, etc. have been replaced by the text
matched by the corresponding groups in the regular expression
that was matched by the last call to a matching or searching
function (see
<B>Str.matched_string</B>

for details).
<B>s</B>

must be the same string that was passed to the matching or
searching function.
<P>
<P>
<P>
<P>

<A NAME="lbAH">&nbsp;</A>
<H3>Splitting</H3>

<P>
<P>

<P>
<I>val split </I>

:
<B>regexp -&gt; string -&gt; string list</B>

<P>
<P>
<B>split r s</B>

splits
<B>s</B>

into substrings, taking as delimiters
the substrings that match
<B>r</B>

, and returns the list of substrings.
For instance,
<B>split (regexp [ \t]+) s</B>

splits
<B>s</B>

into
blank-separated words.  An occurrence of the delimiter at the
beginning or at the end of the string is ignored.
<P>
<P>
<P>
<I>val bounded_split </I>

:
<B>regexp -&gt; string -&gt; int -&gt; string list</B>

<P>
Same as
<B>Str.split</B>

, but splits into at most
<B>n</B>

substrings,
where
<B>n</B>

is the extra integer parameter.
<P>
<P>
<P>
<I>val split_delim </I>

:
<B>regexp -&gt; string -&gt; string list</B>

<P>
Same as
<B>Str.split</B>

but occurrences of the
delimiter at the beginning and at the end of the string are
recognized and returned as empty strings in the result.
For instance,
<B>split_delim (regexp  )  abc </B>

returns
<B>[; abc; ]</B>

, while
<B>split</B>

with the same
arguments returns
<B>[abc]</B>

.
<P>
<P>
<P>
<I>val bounded_split_delim </I>

:
<B>regexp -&gt; string -&gt; int -&gt; string list</B>

<P>
Same as
<B>Str.bounded_split</B>

, but occurrences of the
delimiter at the beginning and at the end of the string are
recognized and returned as empty strings in the result.
<P>
<P>
<I>type split_result </I>

=
<BR>&nbsp;|&nbsp;Text
<B>of </B>

<B>string</B>

<BR>&nbsp;|&nbsp;Delim
<B>of </B>

<B>string</B>

<BR>&nbsp;
<P>
<P>
<P>
<P>
<I>val full_split </I>

:
<B>regexp -&gt; string -&gt; split_result list</B>

<P>
Same as
<B>Str.split_delim</B>

, but returns
the delimiters as well as the substrings contained between
delimiters.  The former are tagged
<B>Delim</B>

in the result list;
the latter are tagged
<B>Text</B>

.  For instance,
<B>full_split (regexp [{}]) {ab}</B>

returns
<B>[Delim {; Text ab; Delim }]</B>

.
<P>
<P>
<P>
<I>val bounded_full_split </I>

:
<B>regexp -&gt; string -&gt; int -&gt; split_result list</B>

<P>
Same as
<B>Str.bounded_split_delim</B>

, but returns
the delimiters as well as the substrings contained between
delimiters.  The former are tagged
<B>Delim</B>

in the result list;
the latter are tagged
<B>Text</B>

.
<P>
<P>
<P>
<P>

<A NAME="lbAI">&nbsp;</A>
<H3>Extracting substrings</H3>

<P>
<P>

<P>
<I>val string_before </I>

:
<B>string -&gt; int -&gt; string</B>

<P>
<P>
<B>string_before s n</B>

returns the substring of all characters of
<B>s</B>

that precede position
<B>n</B>

(excluding the character at
position
<B>n</B>

).
<P>
<P>
<P>
<I>val string_after </I>

:
<B>string -&gt; int -&gt; string</B>

<P>
<P>
<B>string_after s n</B>

returns the substring of all characters of
<B>s</B>

that follow position
<B>n</B>

(including the character at
position
<B>n</B>

).
<P>
<P>
<P>
<I>val first_chars </I>

:
<B>string -&gt; int -&gt; string</B>

<P>
<P>
<B>first_chars s n</B>

returns the first
<B>n</B>

characters of
<B>s</B>

.
This is the same function as
<B>Str.string_before</B>

.
<P>
<P>
<P>
<I>val last_chars </I>

:
<B>string -&gt; int -&gt; string</B>

<P>
<P>
<B>last_chars s n</B>

returns the last
<B>n</B>

characters of
<B>s</B>

.
<P>
<P>
<P>

<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
<DT id="2"><A HREF="#lbAC">Module</A><DD>
<DT id="3"><A HREF="#lbAD">Documentation</A><DD>
<DL>
<DT id="4"><A HREF="#lbAE">Regular expressions</A><DD>
<DT id="5"><A HREF="#lbAF">String matching and searching</A><DD>
<DT id="6"><A HREF="#lbAG">Replacement</A><DD>
<DT id="7"><A HREF="#lbAH">Splitting</A><DD>
<DT id="8"><A HREF="#lbAI">Extracting substrings</A><DD>
</DL>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:58 GMT, March 31, 2021
</BODY>
</HTML>