man-pages/man3/regex.3.html
2021-03-31 01:06:50 +01:00

516 lines
10 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of REGEX</TITLE>
</HEAD><BODY>
<H1>REGEX</H1>
Section: Linux Programmer's Manual (3)<BR>Updated: 2019-10-10<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
regcomp, regexec, regerror, regfree - POSIX regex functions
<A NAME="lbAC">&nbsp;</A>
<H2>SYNOPSIS</H2>
<PRE>
<B>#include &lt;<A HREF="file:///usr/include/sys/types.h">sys/types.h</A>&gt;</B>
<B>#include &lt;<A HREF="file:///usr/include/regex.h">regex.h</A>&gt;</B>
<B>int regcomp(regex_t *</B><I>preg</I><B>, const char *</B><I>regex</I><B>, int </B><I>cflags</I><B>);</B>
<B>int regexec(const regex_t *</B><I>preg</I><B>, const char *</B><I>string</I><B>, size_t </B><I>nmatch</I><B>,</B>
<B> regmatch_t </B><I>pmatch[]</I><B>, int </B><I>eflags</I><B>);</B>
<B>size_t regerror(int </B><I>errcode</I><B>, const regex_t *</B><I>preg</I><B>, char *</B><I>errbuf</I><B>,</B>
<B> size_t </B><I>errbuf_size</I><B>);</B>
<B>void regfree(regex_t *</B><I>preg</I><B>);</B>
</PRE>
<A NAME="lbAD">&nbsp;</A>
<H2>DESCRIPTION</H2>
<A NAME="lbAE">&nbsp;</A>
<H3>POSIX regex compiling</H3>
<B>regcomp</B>()
is used to compile a regular expression into a form that is suitable
for subsequent
<B>regexec</B>()
searches.
<P>
<B>regcomp</B>()
is supplied with
<I>preg</I>,
a pointer to a pattern buffer storage area;
<I>regex</I>,
a pointer to the null-terminated string and
<I>cflags</I>,
flags used to determine the type of compilation.
<P>
All regular expression searching must be done via a compiled pattern
buffer, thus
<B>regexec</B>()
must always be supplied with the address of a
<B>regcomp</B>()
initialized pattern buffer.
<P>
<I>cflags</I>
may be the
bitwise-<B>or</B>
of zero or more of the following:
<DL COMPACT>
<DT id="1"><B>REG_EXTENDED</B>
<DD>
Use
<B>POSIX</B>
Extended Regular Expression syntax when interpreting
<I>regex</I>.
If not set,
<B>POSIX</B>
Basic Regular Expression syntax is used.
<DT id="2"><B>REG_ICASE</B>
<DD>
Do not differentiate case.
Subsequent
<B>regexec</B>()
searches using this pattern buffer will be case insensitive.
<DT id="3"><B>REG_NOSUB</B>
<DD>
Do not report position of matches.
The
<I>nmatch</I>
and
<I>pmatch</I>
arguments to
<B>regexec</B>()
are ignored if the pattern buffer supplied was compiled with this flag set.
<DT id="4"><B>REG_NEWLINE</B>
<DD>
Match-any-character operators don't match a newline.
<DT id="5"><DD>
A nonmatching list
(<B>[^...]</B>)
not containing a newline does not match a newline.
<DT id="6"><DD>
Match-beginning-of-line operator
(<B>^</B>)
matches the empty string immediately after a newline, regardless of
whether
<I>eflags</I>,
the execution flags of
<B>regexec</B>(),
contains
<B>REG_NOTBOL</B>.
<DT id="7"><DD>
Match-end-of-line operator
(<B>$</B>)
matches the empty string immediately before a newline, regardless of
whether
<I>eflags</I>
contains
<B>REG_NOTEOL</B>.
</DL>
<A NAME="lbAF">&nbsp;</A>
<H3>POSIX regex matching</H3>
<B>regexec</B>()
is used to match a null-terminated string
against the precompiled pattern buffer,
<I>preg</I>.
<I>nmatch</I>
and
<I>pmatch</I>
are used to provide information regarding the location of any matches.
<I>eflags</I>
may be the
bitwise-<B>or</B>
of one or both of
<B>REG_NOTBOL</B>
and
<B>REG_NOTEOL</B>
which cause changes in matching behavior described below.
<DL COMPACT>
<DT id="8"><B>REG_NOTBOL</B>
<DD>
The match-beginning-of-line operator always fails to match (but see the
compilation flag
<B>REG_NEWLINE</B>
above).
This flag may be used when different portions of a string are passed to
<B>regexec</B>()
and the beginning of the string should not be interpreted as the
beginning of the line.
<DT id="9"><B>REG_NOTEOL</B>
<DD>
The match-end-of-line operator always fails to match (but see the
compilation flag
<B>REG_NEWLINE</B>
above).
<DT id="10"><B>REG_STARTEND</B>
<DD>
Use
<I>pmatch[0]</I>
on the input string, starting at byte
<I>pmatch[0].rm_so</I>
and ending before byte
<I>pmatch[0].rm_eo</I>.
This allows matching embedded NUL bytes
and avoids a
<B><A HREF="/cgi-bin/man/man2html?3+strlen">strlen</A></B>(3)
on large strings.
It does not use
<I>nmatch</I>
on input, and does not change
<B>REG_NOTBOL</B>
or
<B>REG_NEWLINE</B>
processing.
This flag is a BSD extension, not present in POSIX.
</DL>
<A NAME="lbAG">&nbsp;</A>
<H3>Byte offsets</H3>
Unless
<B>REG_NOSUB</B>
was set for the compilation of the pattern buffer, it is possible to
obtain match addressing information.
<I>pmatch</I>
must be dimensioned to have at least
<I>nmatch</I>
elements.
These are filled in by
<B>regexec</B>()
with substring match addresses.
The offsets of the subexpression starting at the
<I>i</I>th
open parenthesis are stored in
<I>pmatch[i]</I>.
The entire regular expression's match addresses are stored in
<I>pmatch[0]</I>.
(Note that to return the offsets of
<I>N</I>
subexpression matches,
<I>nmatch</I>
must be at least
<I>N+1</I>.)
Any unused structure elements will contain the value -1.
<P>
The
<I>regmatch_t</I>
structure which is the type of
<I>pmatch</I>
is defined in
<I>&lt;<A HREF="file:///usr/include/regex.h">regex.h</A>&gt;</I>.
<P>
typedef struct {
<BR>&nbsp;&nbsp;&nbsp;&nbsp;regoff_t&nbsp;rm_so;
<BR>&nbsp;&nbsp;&nbsp;&nbsp;regoff_t&nbsp;rm_eo;
} regmatch_t;
<P>
Each
<I>rm_so</I>
element that is not -1 indicates the start offset of the next largest
substring match within the string.
The relative
<I>rm_eo</I>
element indicates the end offset of the match,
which is the offset of the first character after the matching text.
<A NAME="lbAH">&nbsp;</A>
<H3>POSIX error reporting</H3>
<B>regerror</B>()
is used to turn the error codes that can be returned by both
<B>regcomp</B>()
and
<B>regexec</B>()
into error message strings.
<P>
<B>regerror</B>()
is passed the error code,
<I>errcode</I>,
the pattern buffer,
<I>preg</I>,
a pointer to a character string buffer,
<I>errbuf</I>,
and the size of the string buffer,
<I>errbuf_size</I>.
It returns the size of the
<I>errbuf</I>
required to contain the null-terminated error message string.
If both
<I>errbuf</I>
and
<I>errbuf_size</I>
are nonzero,
<I>errbuf</I>
is filled in with the first
<I>errbuf_size - 1</I>
characters of the error message and a terminating null byte ('\0').
<A NAME="lbAI">&nbsp;</A>
<H3>POSIX pattern buffer freeing</H3>
Supplying
<B>regfree</B>()
with a precompiled pattern buffer,
<I>preg</I>
will free the memory allocated to the pattern buffer by the compiling
process,
<B>regcomp</B>().
<A NAME="lbAJ">&nbsp;</A>
<H2>RETURN VALUE</H2>
<B>regcomp</B>()
returns zero for a successful compilation or an error code for failure.
<P>
<B>regexec</B>()
returns zero for a successful match or
<B>REG_NOMATCH</B>
for failure.
<A NAME="lbAK">&nbsp;</A>
<H2>ERRORS</H2>
The following errors can be returned by
<B>regcomp</B>():
<DL COMPACT>
<DT id="11"><B>REG_BADBR</B>
<DD>
Invalid use of back reference operator.
<DT id="12"><B>REG_BADPAT</B>
<DD>
Invalid use of pattern operators such as group or list.
<DT id="13"><B>REG_BADRPT</B>
<DD>
Invalid use of repetition operators such as using '*'
as the first character.
<DT id="14"><B>REG_EBRACE</B>
<DD>
Un-matched brace interval operators.
<DT id="15"><B>REG_EBRACK</B>
<DD>
Un-matched bracket list operators.
<DT id="16"><B>REG_ECOLLATE</B>
<DD>
Invalid collating element.
<DT id="17"><B>REG_ECTYPE</B>
<DD>
Unknown character class name.
<DT id="18"><B>REG_EEND</B>
<DD>
Nonspecific error.
This is not defined by POSIX.2.
<DT id="19"><B>REG_EESCAPE</B>
<DD>
Trailing backslash.
<DT id="20"><B>REG_EPAREN</B>
<DD>
Un-matched parenthesis group operators.
<DT id="21"><B>REG_ERANGE</B>
<DD>
Invalid use of the range operator; for example, the ending point of the range
occurs prior to the starting point.
<DT id="22"><B>REG_ESIZE</B>
<DD>
Compiled regular expression requires a pattern buffer larger than 64&nbsp;kB.
This is not defined by POSIX.2.
<DT id="23"><B>REG_ESPACE</B>
<DD>
The regex routines ran out of memory.
<DT id="24"><B>REG_ESUBREG</B>
<DD>
Invalid back reference to a subexpression.
</DL>
<A NAME="lbAL">&nbsp;</A>
<H2>ATTRIBUTES</H2>
For an explanation of the terms used in this section, see
<B><A HREF="/cgi-bin/man/man2html?7+attributes">attributes</A></B>(7).
<TABLE BORDER>
<TR VALIGN=top><TD><B>Interface</B></TD><TD><B>Attribute</B></TD><TD><B>Value</B><BR></TD></TR>
<TR VALIGN=top><TD>
<B>regcomp</B>(),
<B>regexec</B>()
</TD><TD>Thread safety</TD><TD>MT-Safe locale<BR></TD></TR>
<TR VALIGN=top><TD>
<B>regerror</B>()
</TD><TD>Thread safety</TD><TD>MT-Safe env<BR></TD></TR>
<TR VALIGN=top><TD>
<B>regfree</B>()
</TD><TD>Thread safety</TD><TD>MT-Safe<BR></TD></TR>
</TABLE>
<A NAME="lbAM">&nbsp;</A>
<H2>CONFORMING TO</H2>
POSIX.1-2001, POSIX.1-2008.
<A NAME="lbAN">&nbsp;</A>
<H2>SEE ALSO</H2>
<B><A HREF="/cgi-bin/man/man2html?1+grep">grep</A></B>(1),
<B><A HREF="/cgi-bin/man/man2html?7+regex">regex</A></B>(7)
<P>
The glibc manual section,
<I>Regular Expressions</I>
<A NAME="lbAO">&nbsp;</A>
<H2>COLOPHON</H2>
This page is part of release 5.05 of the Linux
<I>man-pages</I>
project.
A description of the project,
information about reporting bugs,
and the latest version of this page,
can be found at
<A HREF="https://www.kernel.org/doc/man-pages/.">https://www.kernel.org/doc/man-pages/.</A>
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="25"><A HREF="#lbAB">NAME</A><DD>
<DT id="26"><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT id="27"><A HREF="#lbAD">DESCRIPTION</A><DD>
<DL>
<DT id="28"><A HREF="#lbAE">POSIX regex compiling</A><DD>
<DT id="29"><A HREF="#lbAF">POSIX regex matching</A><DD>
<DT id="30"><A HREF="#lbAG">Byte offsets</A><DD>
<DT id="31"><A HREF="#lbAH">POSIX error reporting</A><DD>
<DT id="32"><A HREF="#lbAI">POSIX pattern buffer freeing</A><DD>
</DL>
<DT id="33"><A HREF="#lbAJ">RETURN VALUE</A><DD>
<DT id="34"><A HREF="#lbAK">ERRORS</A><DD>
<DT id="35"><A HREF="#lbAL">ATTRIBUTES</A><DD>
<DT id="36"><A HREF="#lbAM">CONFORMING TO</A><DD>
<DT id="37"><A HREF="#lbAN">SEE ALSO</A><DD>
<DT id="38"><A HREF="#lbAO">COLOPHON</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:54 GMT, March 31, 2021
</BODY>
</HTML>