516 lines
10 KiB
HTML
516 lines
10 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of REGEX</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>REGEX</H1>
|
|
Section: Linux Programmer's Manual (3)<BR>Updated: 2019-10-10<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
regcomp, regexec, regerror, regfree - POSIX regex functions
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<PRE>
|
|
<B>#include <<A HREF="file:///usr/include/sys/types.h">sys/types.h</A>></B>
|
|
<B>#include <<A HREF="file:///usr/include/regex.h">regex.h</A>></B>
|
|
|
|
<B>int regcomp(regex_t *</B><I>preg</I><B>, const char *</B><I>regex</I><B>, int </B><I>cflags</I><B>);</B>
|
|
|
|
<B>int regexec(const regex_t *</B><I>preg</I><B>, const char *</B><I>string</I><B>, size_t </B><I>nmatch</I><B>,</B>
|
|
<B> regmatch_t </B><I>pmatch[]</I><B>, int </B><I>eflags</I><B>);</B>
|
|
|
|
<B>size_t regerror(int </B><I>errcode</I><B>, const regex_t *</B><I>preg</I><B>, char *</B><I>errbuf</I><B>,</B>
|
|
<B> size_t </B><I>errbuf_size</I><B>);</B>
|
|
|
|
<B>void regfree(regex_t *</B><I>preg</I><B>);</B>
|
|
</PRE>
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
<A NAME="lbAE"> </A>
|
|
<H3>POSIX regex compiling</H3>
|
|
|
|
<B>regcomp</B>()
|
|
|
|
is used to compile a regular expression into a form that is suitable
|
|
for subsequent
|
|
<B>regexec</B>()
|
|
|
|
searches.
|
|
<P>
|
|
|
|
<B>regcomp</B>()
|
|
|
|
is supplied with
|
|
<I>preg</I>,
|
|
|
|
a pointer to a pattern buffer storage area;
|
|
<I>regex</I>,
|
|
|
|
a pointer to the null-terminated string and
|
|
<I>cflags</I>,
|
|
|
|
flags used to determine the type of compilation.
|
|
<P>
|
|
|
|
All regular expression searching must be done via a compiled pattern
|
|
buffer, thus
|
|
<B>regexec</B>()
|
|
|
|
must always be supplied with the address of a
|
|
<B>regcomp</B>()
|
|
|
|
initialized pattern buffer.
|
|
<P>
|
|
|
|
<I>cflags</I>
|
|
|
|
may be the
|
|
bitwise-<B>or</B>
|
|
|
|
of zero or more of the following:
|
|
<DL COMPACT>
|
|
<DT id="1"><B>REG_EXTENDED</B>
|
|
|
|
<DD>
|
|
Use
|
|
<B>POSIX</B>
|
|
|
|
Extended Regular Expression syntax when interpreting
|
|
<I>regex</I>.
|
|
|
|
If not set,
|
|
<B>POSIX</B>
|
|
|
|
Basic Regular Expression syntax is used.
|
|
<DT id="2"><B>REG_ICASE</B>
|
|
|
|
<DD>
|
|
Do not differentiate case.
|
|
Subsequent
|
|
<B>regexec</B>()
|
|
|
|
searches using this pattern buffer will be case insensitive.
|
|
<DT id="3"><B>REG_NOSUB</B>
|
|
|
|
<DD>
|
|
Do not report position of matches.
|
|
The
|
|
<I>nmatch</I>
|
|
|
|
and
|
|
<I>pmatch</I>
|
|
|
|
arguments to
|
|
<B>regexec</B>()
|
|
|
|
are ignored if the pattern buffer supplied was compiled with this flag set.
|
|
<DT id="4"><B>REG_NEWLINE</B>
|
|
|
|
<DD>
|
|
Match-any-character operators don't match a newline.
|
|
<DT id="5"><DD>
|
|
A nonmatching list
|
|
(<B>[^...]</B>)
|
|
|
|
not containing a newline does not match a newline.
|
|
<DT id="6"><DD>
|
|
Match-beginning-of-line operator
|
|
(<B>^</B>)
|
|
|
|
matches the empty string immediately after a newline, regardless of
|
|
whether
|
|
<I>eflags</I>,
|
|
|
|
the execution flags of
|
|
<B>regexec</B>(),
|
|
|
|
contains
|
|
<B>REG_NOTBOL</B>.
|
|
|
|
<DT id="7"><DD>
|
|
Match-end-of-line operator
|
|
(<B>$</B>)
|
|
|
|
matches the empty string immediately before a newline, regardless of
|
|
whether
|
|
<I>eflags</I>
|
|
|
|
contains
|
|
<B>REG_NOTEOL</B>.
|
|
|
|
</DL>
|
|
<A NAME="lbAF"> </A>
|
|
<H3>POSIX regex matching</H3>
|
|
|
|
<B>regexec</B>()
|
|
|
|
is used to match a null-terminated string
|
|
against the precompiled pattern buffer,
|
|
<I>preg</I>.
|
|
|
|
<I>nmatch</I>
|
|
|
|
and
|
|
<I>pmatch</I>
|
|
|
|
are used to provide information regarding the location of any matches.
|
|
<I>eflags</I>
|
|
|
|
may be the
|
|
bitwise-<B>or</B>
|
|
|
|
of one or both of
|
|
<B>REG_NOTBOL</B>
|
|
|
|
and
|
|
<B>REG_NOTEOL</B>
|
|
|
|
which cause changes in matching behavior described below.
|
|
<DL COMPACT>
|
|
<DT id="8"><B>REG_NOTBOL</B>
|
|
|
|
<DD>
|
|
The match-beginning-of-line operator always fails to match (but see the
|
|
compilation flag
|
|
<B>REG_NEWLINE</B>
|
|
|
|
above).
|
|
This flag may be used when different portions of a string are passed to
|
|
<B>regexec</B>()
|
|
|
|
and the beginning of the string should not be interpreted as the
|
|
beginning of the line.
|
|
<DT id="9"><B>REG_NOTEOL</B>
|
|
|
|
<DD>
|
|
The match-end-of-line operator always fails to match (but see the
|
|
compilation flag
|
|
<B>REG_NEWLINE</B>
|
|
|
|
above).
|
|
<DT id="10"><B>REG_STARTEND</B>
|
|
|
|
<DD>
|
|
Use
|
|
<I>pmatch[0]</I>
|
|
|
|
on the input string, starting at byte
|
|
<I>pmatch[0].rm_so</I>
|
|
|
|
and ending before byte
|
|
<I>pmatch[0].rm_eo</I>.
|
|
|
|
This allows matching embedded NUL bytes
|
|
and avoids a
|
|
<B><A HREF="/cgi-bin/man/man2html?3+strlen">strlen</A></B>(3)
|
|
|
|
on large strings.
|
|
It does not use
|
|
<I>nmatch</I>
|
|
|
|
on input, and does not change
|
|
<B>REG_NOTBOL</B>
|
|
|
|
or
|
|
<B>REG_NEWLINE</B>
|
|
|
|
processing.
|
|
This flag is a BSD extension, not present in POSIX.
|
|
</DL>
|
|
<A NAME="lbAG"> </A>
|
|
<H3>Byte offsets</H3>
|
|
|
|
Unless
|
|
<B>REG_NOSUB</B>
|
|
|
|
was set for the compilation of the pattern buffer, it is possible to
|
|
obtain match addressing information.
|
|
<I>pmatch</I>
|
|
|
|
must be dimensioned to have at least
|
|
<I>nmatch</I>
|
|
|
|
elements.
|
|
These are filled in by
|
|
<B>regexec</B>()
|
|
|
|
with substring match addresses.
|
|
The offsets of the subexpression starting at the
|
|
<I>i</I>th
|
|
|
|
open parenthesis are stored in
|
|
<I>pmatch[i]</I>.
|
|
|
|
The entire regular expression's match addresses are stored in
|
|
<I>pmatch[0]</I>.
|
|
|
|
(Note that to return the offsets of
|
|
<I>N</I>
|
|
|
|
subexpression matches,
|
|
<I>nmatch</I>
|
|
|
|
must be at least
|
|
<I>N+1</I>.)
|
|
|
|
Any unused structure elements will contain the value -1.
|
|
<P>
|
|
|
|
The
|
|
<I>regmatch_t</I>
|
|
|
|
structure which is the type of
|
|
<I>pmatch</I>
|
|
|
|
is defined in
|
|
<I><<A HREF="file:///usr/include/regex.h">regex.h</A>></I>.
|
|
|
|
<P>
|
|
|
|
|
|
|
|
typedef struct {
|
|
<BR> regoff_t rm_so;
|
|
<BR> regoff_t rm_eo;
|
|
} regmatch_t;
|
|
|
|
|
|
<P>
|
|
|
|
Each
|
|
<I>rm_so</I>
|
|
|
|
element that is not -1 indicates the start offset of the next largest
|
|
substring match within the string.
|
|
The relative
|
|
<I>rm_eo</I>
|
|
|
|
element indicates the end offset of the match,
|
|
which is the offset of the first character after the matching text.
|
|
<A NAME="lbAH"> </A>
|
|
<H3>POSIX error reporting</H3>
|
|
|
|
<B>regerror</B>()
|
|
|
|
is used to turn the error codes that can be returned by both
|
|
<B>regcomp</B>()
|
|
|
|
and
|
|
<B>regexec</B>()
|
|
|
|
into error message strings.
|
|
<P>
|
|
|
|
<B>regerror</B>()
|
|
|
|
is passed the error code,
|
|
<I>errcode</I>,
|
|
|
|
the pattern buffer,
|
|
<I>preg</I>,
|
|
|
|
a pointer to a character string buffer,
|
|
<I>errbuf</I>,
|
|
|
|
and the size of the string buffer,
|
|
<I>errbuf_size</I>.
|
|
|
|
It returns the size of the
|
|
<I>errbuf</I>
|
|
|
|
required to contain the null-terminated error message string.
|
|
If both
|
|
<I>errbuf</I>
|
|
|
|
and
|
|
<I>errbuf_size</I>
|
|
|
|
are nonzero,
|
|
<I>errbuf</I>
|
|
|
|
is filled in with the first
|
|
<I>errbuf_size - 1</I>
|
|
|
|
characters of the error message and a terminating null byte ('\0').
|
|
<A NAME="lbAI"> </A>
|
|
<H3>POSIX pattern buffer freeing</H3>
|
|
|
|
Supplying
|
|
<B>regfree</B>()
|
|
|
|
with a precompiled pattern buffer,
|
|
<I>preg</I>
|
|
|
|
will free the memory allocated to the pattern buffer by the compiling
|
|
process,
|
|
<B>regcomp</B>().
|
|
|
|
<A NAME="lbAJ"> </A>
|
|
<H2>RETURN VALUE</H2>
|
|
|
|
<B>regcomp</B>()
|
|
|
|
returns zero for a successful compilation or an error code for failure.
|
|
<P>
|
|
|
|
<B>regexec</B>()
|
|
|
|
returns zero for a successful match or
|
|
<B>REG_NOMATCH</B>
|
|
|
|
for failure.
|
|
<A NAME="lbAK"> </A>
|
|
<H2>ERRORS</H2>
|
|
|
|
The following errors can be returned by
|
|
<B>regcomp</B>():
|
|
|
|
<DL COMPACT>
|
|
<DT id="11"><B>REG_BADBR</B>
|
|
|
|
<DD>
|
|
Invalid use of back reference operator.
|
|
<DT id="12"><B>REG_BADPAT</B>
|
|
|
|
<DD>
|
|
Invalid use of pattern operators such as group or list.
|
|
<DT id="13"><B>REG_BADRPT</B>
|
|
|
|
<DD>
|
|
Invalid use of repetition operators such as using '*'
|
|
as the first character.
|
|
<DT id="14"><B>REG_EBRACE</B>
|
|
|
|
<DD>
|
|
Un-matched brace interval operators.
|
|
<DT id="15"><B>REG_EBRACK</B>
|
|
|
|
<DD>
|
|
Un-matched bracket list operators.
|
|
<DT id="16"><B>REG_ECOLLATE</B>
|
|
|
|
<DD>
|
|
Invalid collating element.
|
|
<DT id="17"><B>REG_ECTYPE</B>
|
|
|
|
<DD>
|
|
Unknown character class name.
|
|
<DT id="18"><B>REG_EEND</B>
|
|
|
|
<DD>
|
|
Nonspecific error.
|
|
This is not defined by POSIX.2.
|
|
<DT id="19"><B>REG_EESCAPE</B>
|
|
|
|
<DD>
|
|
Trailing backslash.
|
|
<DT id="20"><B>REG_EPAREN</B>
|
|
|
|
<DD>
|
|
Un-matched parenthesis group operators.
|
|
<DT id="21"><B>REG_ERANGE</B>
|
|
|
|
<DD>
|
|
Invalid use of the range operator; for example, the ending point of the range
|
|
occurs prior to the starting point.
|
|
<DT id="22"><B>REG_ESIZE</B>
|
|
|
|
<DD>
|
|
Compiled regular expression requires a pattern buffer larger than 64 kB.
|
|
This is not defined by POSIX.2.
|
|
<DT id="23"><B>REG_ESPACE</B>
|
|
|
|
<DD>
|
|
The regex routines ran out of memory.
|
|
<DT id="24"><B>REG_ESUBREG</B>
|
|
|
|
<DD>
|
|
Invalid back reference to a subexpression.
|
|
</DL>
|
|
<A NAME="lbAL"> </A>
|
|
<H2>ATTRIBUTES</H2>
|
|
|
|
For an explanation of the terms used in this section, see
|
|
<B><A HREF="/cgi-bin/man/man2html?7+attributes">attributes</A></B>(7).
|
|
|
|
<TABLE BORDER>
|
|
<TR VALIGN=top><TD><B>Interface</B></TD><TD><B>Attribute</B></TD><TD><B>Value</B><BR></TD></TR>
|
|
<TR VALIGN=top><TD>
|
|
<B>regcomp</B>(),
|
|
|
|
<B>regexec</B>()
|
|
|
|
</TD><TD>Thread safety</TD><TD>MT-Safe locale<BR></TD></TR>
|
|
<TR VALIGN=top><TD>
|
|
<B>regerror</B>()
|
|
|
|
</TD><TD>Thread safety</TD><TD>MT-Safe env<BR></TD></TR>
|
|
<TR VALIGN=top><TD>
|
|
<B>regfree</B>()
|
|
|
|
</TD><TD>Thread safety</TD><TD>MT-Safe<BR></TD></TR>
|
|
</TABLE>
|
|
|
|
<A NAME="lbAM"> </A>
|
|
<H2>CONFORMING TO</H2>
|
|
|
|
POSIX.1-2001, POSIX.1-2008.
|
|
<A NAME="lbAN"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+grep">grep</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?7+regex">regex</A></B>(7)
|
|
|
|
<P>
|
|
|
|
The glibc manual section,
|
|
<I>Regular Expressions</I>
|
|
|
|
<A NAME="lbAO"> </A>
|
|
<H2>COLOPHON</H2>
|
|
|
|
This page is part of release 5.05 of the Linux
|
|
<I>man-pages</I>
|
|
|
|
project.
|
|
A description of the project,
|
|
information about reporting bugs,
|
|
and the latest version of this page,
|
|
can be found at
|
|
<A HREF="https://www.kernel.org/doc/man-pages/.">https://www.kernel.org/doc/man-pages/.</A>
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="25"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="26"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="27"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DL>
|
|
<DT id="28"><A HREF="#lbAE">POSIX regex compiling</A><DD>
|
|
<DT id="29"><A HREF="#lbAF">POSIX regex matching</A><DD>
|
|
<DT id="30"><A HREF="#lbAG">Byte offsets</A><DD>
|
|
<DT id="31"><A HREF="#lbAH">POSIX error reporting</A><DD>
|
|
<DT id="32"><A HREF="#lbAI">POSIX pattern buffer freeing</A><DD>
|
|
</DL>
|
|
<DT id="33"><A HREF="#lbAJ">RETURN VALUE</A><DD>
|
|
<DT id="34"><A HREF="#lbAK">ERRORS</A><DD>
|
|
<DT id="35"><A HREF="#lbAL">ATTRIBUTES</A><DD>
|
|
<DT id="36"><A HREF="#lbAM">CONFORMING TO</A><DD>
|
|
<DT id="37"><A HREF="#lbAN">SEE ALSO</A><DD>
|
|
<DT id="38"><A HREF="#lbAO">COLOPHON</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:54 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|