298 lines
8.5 KiB
HTML
298 lines
8.5 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of GLOB</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>GLOB</H1>
|
|
Section: Linux Programmer's Manual (7)<BR>Updated: 2016-10-08<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
glob - globbing pathnames
|
|
<A NAME="lbAC"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
Long ago, in UNIX V6, there was a program
|
|
<I>/etc/glob</I>
|
|
|
|
that would expand wildcard patterns.
|
|
Soon afterward this became a shell built-in.
|
|
<P>
|
|
|
|
These days there is also a library routine
|
|
<B><A HREF="/cgi-bin/man/man2html?3+glob">glob</A></B>(3)
|
|
|
|
that will perform this function for a user program.
|
|
<P>
|
|
|
|
The rules are as follows (POSIX.2, 3.13).
|
|
<A NAME="lbAD"> </A>
|
|
<H3>Wildcard matching</H3>
|
|
|
|
A string is a wildcard pattern if it contains one of the
|
|
characters '?', '*' or '['.
|
|
Globbing is the operation
|
|
that expands a wildcard pattern into the list of pathnames
|
|
matching the pattern.
|
|
Matching is defined by:
|
|
<P>
|
|
|
|
A '?' (not between brackets) matches any single character.
|
|
<P>
|
|
|
|
A '*' (not between brackets) matches any string,
|
|
including the empty string.
|
|
<P>
|
|
|
|
<B>Character classes</B>
|
|
|
|
<P>
|
|
|
|
An expression "<I>[...]</I>" where the first character after the
|
|
leading '[' is not an '!' matches a single character,
|
|
namely any of the characters enclosed by the brackets.
|
|
The string enclosed by the brackets cannot be empty;
|
|
therefore ']' can be allowed between the brackets, provided
|
|
that it is the first character.
|
|
(Thus, "<I>[][!]</I>" matches the
|
|
three characters '[', ']' and '!'.)
|
|
<P>
|
|
|
|
<B>Ranges</B>
|
|
|
|
<P>
|
|
|
|
There is one special convention:
|
|
two characters separated by '-' denote a range.
|
|
(Thus, "<I>[A-Fa-f0-9]</I>"
|
|
is equivalent to "<I>[ABCDEFabcdef0123456789]</I>".)
|
|
One may include '-' in its literal meaning by making it the
|
|
first or last character between the brackets.
|
|
(Thus, "<I>[]-]</I>" matches just the two characters ']' and '-',
|
|
and "<I>[--0]</I>" matches the
|
|
three characters '-', '.', '0', since '/'
|
|
cannot be matched.)
|
|
<P>
|
|
|
|
<B>Complementation</B>
|
|
|
|
<P>
|
|
|
|
An expression "<I>[!...]</I>" matches a single character, namely
|
|
any character that is not matched by the expression obtained
|
|
by removing the first '!' from it.
|
|
(Thus, "<I>[!]a-]</I>" matches any
|
|
single character except ']', 'a' and '-'.)
|
|
<P>
|
|
|
|
One can remove the special meaning of '?', '*' and '[' by
|
|
preceding them by a backslash, or, in case this is part of
|
|
a shell command line, enclosing them in quotes.
|
|
Between brackets these characters stand for themselves.
|
|
Thus, "<I>[[?*\]</I>" matches the
|
|
four characters '[', '?', '*' and '\'.
|
|
<A NAME="lbAE"> </A>
|
|
<H3>Pathnames</H3>
|
|
|
|
Globbing is applied on each of the components of a pathname
|
|
separately.
|
|
A '/' in a pathname cannot be matched by a '?' or '*'
|
|
wildcard, or by a range like "<I>[.-0]</I>".
|
|
A range containing an explicit '/' character is syntactically incorrect.
|
|
(POSIX requires that syntactically incorrect patterns are left unchanged.)
|
|
<P>
|
|
|
|
If a filename starts with a '.',
|
|
this character must be matched explicitly.
|
|
(Thus, <I>rm *</I> will not remove .profile, and <I>tar c *</I> will not
|
|
archive all your files; <I>tar c .</I> is better.)
|
|
<A NAME="lbAF"> </A>
|
|
<H3>Empty lists</H3>
|
|
|
|
The nice and simple rule given above: "expand a wildcard pattern
|
|
into the list of matching pathnames" was the original UNIX
|
|
definition.
|
|
It allowed one to have patterns that expand into
|
|
an empty list, as in
|
|
<P>
|
|
|
|
<PRE>
|
|
xv -wait 0 *.gif *.jpg
|
|
</PRE>
|
|
|
|
<P>
|
|
|
|
where perhaps no *.gif files are present (and this is not
|
|
an error).
|
|
However, POSIX requires that a wildcard pattern is left
|
|
unchanged when it is syntactically incorrect, or the list of
|
|
matching pathnames is empty.
|
|
With
|
|
<I>bash</I>
|
|
|
|
one can force the classical behavior using this command:
|
|
<P>
|
|
|
|
<BR> shopt -s nullglob
|
|
|
|
<P>
|
|
|
|
(Similar problems occur elsewhere.
|
|
For example, where old scripts have
|
|
<P>
|
|
|
|
<PRE>
|
|
rm `find . -name "*~"`
|
|
</PRE>
|
|
|
|
<P>
|
|
|
|
new scripts require
|
|
<P>
|
|
|
|
<PRE>
|
|
rm -f nosuchfile `find . -name "*~"`
|
|
</PRE>
|
|
|
|
<P>
|
|
|
|
to avoid error messages from
|
|
<I>rm</I>
|
|
|
|
called with an empty argument list.)
|
|
<A NAME="lbAG"> </A>
|
|
<H2>NOTES</H2>
|
|
|
|
<A NAME="lbAH"> </A>
|
|
<H3>Regular expressions</H3>
|
|
|
|
Note that wildcard patterns are not regular expressions,
|
|
although they are a bit similar.
|
|
First of all, they match
|
|
filenames, rather than text, and secondly, the conventions
|
|
are not the same: for example, in a regular expression '*' means zero or
|
|
more copies of the preceding thing.
|
|
<P>
|
|
|
|
Now that regular expressions have bracket expressions where
|
|
the negation is indicated by a '^', POSIX has declared the
|
|
effect of a wildcard pattern "<I>[^...]</I>" to be undefined.
|
|
<A NAME="lbAI"> </A>
|
|
<H3>Character classes and internationalization</H3>
|
|
|
|
Of course ranges were originally meant to be ASCII ranges,
|
|
so that "<I>[ -%]</I>" stands for "<I>[ !"#$%]</I>" and "<I>[a-z]</I>" stands
|
|
for "any lowercase letter".
|
|
Some UNIX implementations generalized this so that a range X-Y
|
|
stands for the set of characters with code between the codes for
|
|
X and for Y.
|
|
However, this requires the user to know the
|
|
character coding in use on the local system, and moreover, is
|
|
not convenient if the collating sequence for the local alphabet
|
|
differs from the ordering of the character codes.
|
|
Therefore, POSIX extended the bracket notation greatly,
|
|
both for wildcard patterns and for regular expressions.
|
|
In the above we saw three types of items that can occur in a bracket
|
|
expression: namely (i) the negation, (ii) explicit single characters,
|
|
and (iii) ranges.
|
|
POSIX specifies ranges in an internationally
|
|
more useful way and adds three more types:
|
|
<P>
|
|
|
|
(iii) Ranges X-Y comprise all characters that fall between X
|
|
and Y (inclusive) in the current collating sequence as defined
|
|
by the
|
|
<B>LC_COLLATE</B>
|
|
|
|
category in the current locale.
|
|
<P>
|
|
|
|
(iv) Named character classes, like
|
|
<P>
|
|
|
|
<PRE>
|
|
[:alnum:] [:alpha:] [:blank:] [:cntrl:]
|
|
[:digit:] [:graph:] [:lower:] [:print:]
|
|
[:punct:] [:space:] [:upper:] [:xdigit:]
|
|
</PRE>
|
|
|
|
<P>
|
|
|
|
so that one can say "<I>[[:lower:]]</I>" instead of "<I>[a-z]</I>", and have
|
|
things work in Denmark, too, where there are three letters past 'z'
|
|
in the alphabet.
|
|
These character classes are defined by the
|
|
<B>LC_CTYPE</B>
|
|
|
|
category
|
|
in the current locale.
|
|
<P>
|
|
|
|
(v) Collating symbols, like "<I>[.ch.]</I>" or "<I>[.a-acute.]</I>",
|
|
where the string between "<I>[.</I>" and "<I>.]</I>" is a collating
|
|
element defined for the current locale.
|
|
Note that this may
|
|
be a multicharacter element.
|
|
<P>
|
|
|
|
(vi) Equivalence class expressions, like "<I>[=a=]</I>",
|
|
where the string between "<I>[=</I>" and "<I>=]</I>" is any collating
|
|
element from its equivalence class, as defined for the
|
|
current locale.
|
|
For example, "<I>[[=a=]]</I>" might be equivalent
|
|
to "<I>[aáàäâ]</I>", that is,
|
|
to "<I>[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]</I>".
|
|
<A NAME="lbAJ"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+sh">sh</A></B>(1),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?3+fnmatch">fnmatch</A></B>(3),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?3+glob">glob</A></B>(3),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?7+locale">locale</A></B>(7),
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?7+regex">regex</A></B>(7)
|
|
|
|
<A NAME="lbAK"> </A>
|
|
<H2>COLOPHON</H2>
|
|
|
|
This page is part of release 5.05 of the Linux
|
|
<I>man-pages</I>
|
|
|
|
project.
|
|
A description of the project,
|
|
information about reporting bugs,
|
|
and the latest version of this page,
|
|
can be found at
|
|
<A HREF="https://www.kernel.org/doc/man-pages/.">https://www.kernel.org/doc/man-pages/.</A>
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="2"><A HREF="#lbAC">DESCRIPTION</A><DD>
|
|
<DL>
|
|
<DT id="3"><A HREF="#lbAD">Wildcard matching</A><DD>
|
|
<DT id="4"><A HREF="#lbAE">Pathnames</A><DD>
|
|
<DT id="5"><A HREF="#lbAF">Empty lists</A><DD>
|
|
</DL>
|
|
<DT id="6"><A HREF="#lbAG">NOTES</A><DD>
|
|
<DL>
|
|
<DT id="7"><A HREF="#lbAH">Regular expressions</A><DD>
|
|
<DT id="8"><A HREF="#lbAI">Character classes and internationalization</A><DD>
|
|
</DL>
|
|
<DT id="9"><A HREF="#lbAJ">SEE ALSO</A><DD>
|
|
<DT id="10"><A HREF="#lbAK">COLOPHON</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:06:08 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|