5758 lines
112 KiB
HTML
5758 lines
112 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of GAWK</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>GAWK</H1>
|
|
Section: Utility Commands (1)<BR>Updated: May 22 2019<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
gawk - pattern scanning and processing language
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<B>gawk</B>
|
|
|
|
[ <FONT SIZE="-1">POSIX</FONT> or <FONT SIZE="-1">GNU</FONT> style options ]
|
|
<B>-f</B>
|
|
|
|
<I>program-file</I>
|
|
|
|
[
|
|
<B>--</B>
|
|
|
|
] file ...
|
|
<BR>
|
|
|
|
<B>gawk</B>
|
|
|
|
[ <FONT SIZE="-1">POSIX</FONT> or <FONT SIZE="-1">GNU</FONT> style options ]
|
|
[
|
|
<B>--</B>
|
|
|
|
]
|
|
<I>program-text</I>
|
|
|
|
file ...
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
<I>Gawk</I>
|
|
|
|
is the <FONT SIZE="-1">GNU</FONT> Project's implementation of the <FONT SIZE="-1">AWK</FONT> programming language.
|
|
It conforms to the definition of the language in
|
|
the <FONT SIZE="-1">POSIX</FONT> 1003.1 standard.
|
|
This version in turn is based on the description in
|
|
<I>The AWK Programming Language</I>,
|
|
|
|
by Aho, Kernighan, and Weinberger.
|
|
<I>Gawk</I>
|
|
|
|
provides the additional features found in the current version
|
|
of Brian Kernighan's
|
|
<I>awk</I>
|
|
|
|
and numerous <FONT SIZE="-1">GNU</FONT>-specific extensions.
|
|
<P>
|
|
|
|
The command line consists of options to
|
|
<I>gawk</I>
|
|
|
|
itself, the <FONT SIZE="-1">AWK</FONT> program text (if not supplied via the
|
|
<B>-f</B>
|
|
|
|
or
|
|
<B>--include</B>
|
|
|
|
options), and values to be made
|
|
available in the
|
|
<B>ARGC</B>
|
|
|
|
and
|
|
<B>ARGV</B>
|
|
|
|
pre-defined <FONT SIZE="-1">AWK</FONT> variables.
|
|
<P>
|
|
|
|
When
|
|
<I>gawk</I>
|
|
|
|
is invoked with the
|
|
<B>--profile</B>
|
|
|
|
option, it starts gathering profiling statistics
|
|
from the execution of the program.
|
|
<I>Gawk</I>
|
|
|
|
runs more slowly in this mode, and automatically produces an execution
|
|
profile in the file
|
|
<B>awkprof.out</B>
|
|
|
|
when done.
|
|
See the
|
|
<B>--profile</B>
|
|
|
|
option, below.
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
also has an integrated debugger. An interactive debugging session can
|
|
be started by supplying the
|
|
<B>--debug</B>
|
|
|
|
option to the command line. In this mode of execution,
|
|
<I>gawk</I>
|
|
|
|
loads the
|
|
AWK source code and then prompts for debugging commands.
|
|
<I>Gawk</I>
|
|
|
|
can only debug AWK program source provided with the
|
|
<B>-f</B>
|
|
|
|
and
|
|
<B>--include</B>
|
|
|
|
options.
|
|
The debugger is documented in <I>GAWK: Effective AWK Programming</I>.
|
|
<A NAME="lbAE"> </A>
|
|
<H2>OPTION FORMAT</H2>
|
|
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
options may be either traditional <FONT SIZE="-1">POSIX</FONT>-style one letter options,
|
|
or <FONT SIZE="-1">GNU</FONT>-style long options. <FONT SIZE="-1">POSIX</FONT> options start with a single ``-'',
|
|
while long options start with ``--''.
|
|
Long options are provided for both <FONT SIZE="-1">GNU</FONT>-specific features and
|
|
for <FONT SIZE="-1">POSIX</FONT>-mandated features.
|
|
<P>
|
|
|
|
<I>Gawk</I>-specific
|
|
|
|
options are typically used in long-option form.
|
|
Arguments to long options are either joined with the option
|
|
by an
|
|
<B>=</B>
|
|
|
|
sign, with no intervening spaces, or they may be provided in the
|
|
next command line argument.
|
|
Long options may be abbreviated, as long as the abbreviation
|
|
remains unique.
|
|
<P>
|
|
|
|
Additionally, every long option has a corresponding short
|
|
option, so that the option's functionality may be used from
|
|
within
|
|
<B>#!</B>
|
|
|
|
executable scripts.
|
|
<A NAME="lbAF"> </A>
|
|
<H2>OPTIONS</H2>
|
|
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
accepts the following options.
|
|
Standard options are listed first, followed by options for
|
|
<I>gawk</I>
|
|
|
|
extensions, listed alphabetically by short option.
|
|
<DL COMPACT>
|
|
<DT id="1">
|
|
<DD>
|
|
<B>-f</B><I> program-file</I>
|
|
|
|
<DT id="2">
|
|
<DD>
|
|
<B>--file</B><I> program-file</I>
|
|
|
|
Read the <FONT SIZE="-1">AWK</FONT> program source from the file
|
|
<I>program-file</I>,
|
|
|
|
instead of from the first command line argument.
|
|
Multiple
|
|
<B>-f</B>
|
|
|
|
(or
|
|
<B>--file</B>)
|
|
|
|
options may be used.
|
|
Files read with
|
|
<B>-f</B>
|
|
|
|
are treated as if they begin with an implicit <B>@namespace "awk"</B> statement.
|
|
<DT id="3">
|
|
<DD>
|
|
<B>-F</B><I> fs</I>
|
|
|
|
<DT id="4">
|
|
<DD>
|
|
<B>--field-separator</B><I> fs</I>
|
|
|
|
Use
|
|
<I>fs</I>
|
|
|
|
for the input field separator (the value of the
|
|
<B>FS</B>
|
|
|
|
predefined
|
|
variable).
|
|
<DT id="5">
|
|
<DD>
|
|
<B>-v</B><I> var</I><B>=</B><I>val</I>
|
|
<DT id="6">
|
|
<DD>
|
|
<B>--assign </B><I>var</I><B>=</B><I>val</I>
|
|
Assign the value
|
|
<I>val</I>
|
|
|
|
to the variable
|
|
<I>var</I>,
|
|
|
|
before execution of the program begins.
|
|
Such variable values are available to the
|
|
<B>BEGIN</B>
|
|
|
|
rule of an <FONT SIZE="-1">AWK</FONT> program.
|
|
<DT id="7">
|
|
<DD>
|
|
<B>-b</B>
|
|
|
|
<DT id="8">
|
|
<DD>
|
|
<B>--characters-as-bytes</B>
|
|
|
|
Treat all input data as single-byte characters. In other words,
|
|
don't pay any attention to the locale information when attempting to
|
|
process strings as multibyte characters.
|
|
The
|
|
<B>--posix</B>
|
|
|
|
option overrides this one.
|
|
|
|
<DT id="9">
|
|
<DD>
|
|
<B>-c</B>
|
|
|
|
<DT id="10">
|
|
<DD>
|
|
<B>--traditional</B>
|
|
|
|
Run in
|
|
<I>compatibility</I>
|
|
|
|
mode. In compatibility mode,
|
|
<I>gawk</I>
|
|
|
|
behaves identically to Brian Kernighan's
|
|
<I>awk</I>;
|
|
|
|
none of the <FONT SIZE="-1">GNU</FONT>-specific extensions are recognized.
|
|
|
|
|
|
|
|
See
|
|
<B>GNU EXTENSIONS</B>,
|
|
|
|
below, for more information.
|
|
<DT id="11">
|
|
<DD>
|
|
<B>-C</B>
|
|
|
|
<DT id="12">
|
|
<DD>
|
|
<B>--copyright</B>
|
|
|
|
Print the short version of the <FONT SIZE="-1">GNU</FONT> copyright information message on
|
|
the standard output and exit successfully.
|
|
<DT id="13">
|
|
<DD>
|
|
<B>-d</B>[<I>file</I>]
|
|
<DT id="14">
|
|
<DD>
|
|
<B>--dump-variables</B>[<B>=</B><I>file</I>]
|
|
Print a sorted list of global variables, their types and final values to
|
|
<I>file</I>.
|
|
|
|
If no
|
|
<I>file</I>
|
|
|
|
is provided,
|
|
<I>gawk</I>
|
|
|
|
uses a file named
|
|
<B>awkvars.out</B>
|
|
|
|
in the current directory.
|
|
<P>
|
|
Having a list of all the global variables is a good way to look for
|
|
typographical errors in your programs.
|
|
You would also use this option if you have a large program with a lot of
|
|
functions, and you want to be sure that your functions don't
|
|
inadvertently use global variables that you meant to be local.
|
|
(This is a particularly easy mistake to make with simple variable
|
|
names like
|
|
<B>i</B>,
|
|
|
|
<B>j</B>,
|
|
|
|
and so on.)
|
|
<DT id="15">
|
|
<DD>
|
|
<B>-D</B>[<I>file</I>]
|
|
<DT id="16">
|
|
<DD>
|
|
<B>--debug</B>[<B>=</B><I>file</I>]
|
|
Enable debugging of <FONT SIZE="-1">AWK</FONT> programs.
|
|
By default, the debugger reads commands interactively from the keyboard
|
|
(standard input).
|
|
The optional
|
|
<I>file</I>
|
|
|
|
argument specifies a file with a list
|
|
of commands for the debugger to execute non-interactively.
|
|
<DT id="17">
|
|
<DD>
|
|
<B>-e </B><I>program-text</I>
|
|
|
|
<DT id="18">
|
|
<DD>
|
|
<B>--source</B><I> program-text</I>
|
|
|
|
Use
|
|
<I>program-text</I>
|
|
|
|
as <FONT SIZE="-1">AWK</FONT> program source code.
|
|
This option allows the easy intermixing of library functions (used via the
|
|
<B>-f</B>
|
|
|
|
and
|
|
<B>--include</B>
|
|
|
|
options) with source code entered on the command line.
|
|
It is intended primarily for medium to large <FONT SIZE="-1">AWK</FONT> programs used
|
|
in shell scripts.
|
|
Each argument supplied via
|
|
<B>-e</B>
|
|
|
|
is treated as if it begins with an implicit <B>@namespace "awk"</B> statement.
|
|
<DT id="19">
|
|
<DD>
|
|
<B>-E </B><I>file</I>
|
|
|
|
<DT id="20">
|
|
<DD>
|
|
<B>--exec</B><I> file</I>
|
|
|
|
Similar to
|
|
<B>-f</B>,
|
|
|
|
however, this is option is the last one processed.
|
|
This should be used with
|
|
<B>#!</B>
|
|
|
|
scripts, particularly for CGI applications, to avoid
|
|
passing in options or source code (!) on the command line
|
|
from a URL.
|
|
This option disables command-line variable assignments.
|
|
<DT id="21">
|
|
<DD>
|
|
<B>-g</B>
|
|
|
|
<DT id="22">
|
|
<DD>
|
|
<B>--gen-pot</B>
|
|
|
|
Scan and parse the <FONT SIZE="-1">AWK</FONT> program, and generate a <FONT SIZE="-1">GNU</FONT>
|
|
<B>.pot</B>
|
|
|
|
(Portable Object Template)
|
|
format file on standard output with entries for all localizable
|
|
strings in the program. The program itself is not executed.
|
|
See the <FONT SIZE="-1">GNU</FONT>
|
|
<I>gettext</I>
|
|
|
|
distribution for more information on
|
|
<B>.pot</B>
|
|
|
|
files.
|
|
<DT id="23">
|
|
<DD>
|
|
<B>-h</B>
|
|
|
|
<DT id="24">
|
|
<DD>
|
|
<B>--help</B>
|
|
|
|
Print a relatively short summary of the available options on
|
|
the standard output.
|
|
(Per the
|
|
<I>GNU Coding Standards</I>,
|
|
|
|
these options cause an immediate, successful exit.)
|
|
<DT id="25">
|
|
<DD>
|
|
<B>-i </B><I>include-file</I>
|
|
|
|
<DT id="26">
|
|
<DD>
|
|
<B>--include</B><I> include-file</I>
|
|
|
|
Load an awk source library.
|
|
This searches for the library using the
|
|
<B>AWKPATH</B>
|
|
|
|
environment variable. If the initial search fails, another attempt will
|
|
be made after appending the
|
|
<B>.awk</B>
|
|
|
|
suffix. The file will be loaded only
|
|
once (i.e., duplicates are eliminated), and the code does not constitute
|
|
the main program source.
|
|
Files read with
|
|
<B>--include</B>
|
|
|
|
are treated as if they begin with an implicit <B>@namespace "awk"</B> statement.
|
|
<DT id="27">
|
|
<DD>
|
|
<B>-l </B><I>lib</I>
|
|
|
|
<DT id="28">
|
|
<DD>
|
|
<B>--load</B><I> lib</I>
|
|
|
|
Load a
|
|
<I>gawk</I>
|
|
|
|
extension from the shared library
|
|
<I>lib</I>.
|
|
|
|
This searches for the library using the
|
|
<B>AWKLIBPATH</B>
|
|
|
|
environment variable. If the initial search fails, another attempt will
|
|
be made after appending the default shared library suffix for the platform.
|
|
The library initialization routine is expected to be named
|
|
<B>dl_load()</B>.
|
|
|
|
<DT id="29">
|
|
<DD>
|
|
<B>-L </B>[<B></B><I>value</I>]
|
|
|
|
<DT id="30">
|
|
<DD>
|
|
<B>--lint</B>[<B>=</B><I>value</I>]
|
|
|
|
Provide warnings about constructs that are
|
|
dubious or non-portable to other <FONT SIZE="-1">AWK</FONT> implementations.
|
|
With an optional argument of
|
|
<B>fatal</B>,
|
|
|
|
lint warnings become fatal errors.
|
|
This may be drastic, but its use will certainly encourage the
|
|
development of cleaner <FONT SIZE="-1">AWK</FONT> programs.
|
|
With an optional argument of
|
|
<B>invalid</B>,
|
|
|
|
only warnings about things that are
|
|
actually invalid are issued. (This is not fully implemented yet.)
|
|
With an optional argument of
|
|
<B>no-ext</B>,
|
|
|
|
warnings about
|
|
<I>gawk</I>
|
|
|
|
extensions are disabled.
|
|
<DT id="31">
|
|
<DD>
|
|
<B>-M</B>
|
|
|
|
<DT id="32">
|
|
<DD>
|
|
<B>--bignum</B>
|
|
|
|
Force arbitrary precision arithmetic on numbers. This option has
|
|
no effect if
|
|
<I>gawk</I>
|
|
|
|
is not compiled to use the GNU MPFR and GMP libraries.
|
|
(In such a case,
|
|
<I>gawk</I>
|
|
|
|
issues a warning.)
|
|
<DT id="33">
|
|
<DD>
|
|
<B>-n</B>
|
|
|
|
<DT id="34">
|
|
<DD>
|
|
<B>--non-decimal-data</B>
|
|
|
|
Recognize octal and hexadecimal values in input data.
|
|
<I>Use this option with great caution!</I>
|
|
|
|
<DT id="35">
|
|
<DD>
|
|
<B>-N</B>
|
|
|
|
<DT id="36">
|
|
<DD>
|
|
<B>--use-lc-numeric</B>
|
|
|
|
Force
|
|
<I>gawk</I>
|
|
|
|
to use the locale's decimal point character when parsing input data.
|
|
Although the POSIX standard requires this behavior, and
|
|
<I>gawk</I>
|
|
|
|
does so when
|
|
<B>--posix</B>
|
|
|
|
is in effect, the default is to follow traditional behavior and use a
|
|
period as the decimal point, even in locales where the period is not the
|
|
decimal point character. This option overrides the default behavior,
|
|
without the full draconian strictness of the
|
|
<B>--posix</B>
|
|
|
|
option.
|
|
|
|
<DT id="37">
|
|
<DD>
|
|
<B>-o</B>[<I>file</I>]
|
|
<DT id="38">
|
|
<DD>
|
|
<B>--pretty-print</B>[<B>=</B><I>file</I>]
|
|
Output a pretty printed version of the program to
|
|
<I>file</I>.
|
|
|
|
If no
|
|
<I>file</I>
|
|
|
|
is provided,
|
|
<I>gawk</I>
|
|
|
|
uses a file named
|
|
<B>awkprof.out</B>
|
|
|
|
in the current directory.
|
|
This option implies
|
|
<B>--no-optimize</B>.
|
|
|
|
<DT id="39">
|
|
<DD>
|
|
<B>-O</B>
|
|
|
|
<DT id="40">
|
|
<DD>
|
|
<B>--optimize</B>
|
|
|
|
Enable
|
|
<I>gawk</I>'s
|
|
|
|
default optimizations upon the internal representation of the program.
|
|
Currently, this just includes simple constant folding.
|
|
This option is on by default.
|
|
<DT id="41">
|
|
<DD>
|
|
<B>-p</B>[<I>prof-file</I>]
|
|
<DT id="42">
|
|
<DD>
|
|
<B>--profile</B>[<B>=</B><I>prof-file</I>]
|
|
Start a profiling session, and send the profiling data to
|
|
<I>prof-file</I>.
|
|
|
|
The default is
|
|
<B>awkprof.out</B>.
|
|
|
|
The profile contains execution counts of each statement in the program
|
|
in the left margin and function call counts for each user-defined function.
|
|
This option implies
|
|
<B>--no-optimize</B>.
|
|
|
|
<DT id="43">
|
|
<DD>
|
|
<B>-P</B>
|
|
|
|
<DT id="44">
|
|
<DD>
|
|
<B>--posix</B>
|
|
|
|
This turns on
|
|
<I>compatibility</I>
|
|
|
|
mode, with the following additional restrictions:
|
|
<DL COMPACT><DT id="45"><DD>
|
|
<DL COMPACT>
|
|
<DT id="46">•<DD>
|
|
<B>\x</B>
|
|
|
|
escape sequences are not recognized.
|
|
<DT id="47">•<DD>
|
|
You cannot continue lines after
|
|
<B>?</B>
|
|
|
|
and
|
|
<B>:</B>.
|
|
|
|
<DT id="48">•<DD>
|
|
The synonym
|
|
<B>func</B>
|
|
|
|
for the keyword
|
|
<B>function</B>
|
|
|
|
is not recognized.
|
|
<DT id="49">•<DD>
|
|
The operators
|
|
<B>**</B>
|
|
|
|
and
|
|
<B>**=</B>
|
|
|
|
cannot be used in place of
|
|
<B>^</B>
|
|
|
|
and
|
|
<B>^=</B>.
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="50">
|
|
<DD>
|
|
<B>-r</B>
|
|
|
|
<DT id="51">
|
|
<DD>
|
|
<B>--re-interval</B>
|
|
|
|
Enable the use of
|
|
<I>interval expressions</I>
|
|
|
|
in regular expression matching
|
|
(see
|
|
<B>Regular Expressions</B>,
|
|
|
|
below).
|
|
Interval expressions were not traditionally available in the
|
|
<FONT SIZE="-1">AWK</FONT> language. The <FONT SIZE="-1">POSIX</FONT> standard added them, to make
|
|
<I>awk</I>
|
|
|
|
and
|
|
<I>egrep</I>
|
|
|
|
consistent with each other.
|
|
They are enabled by default, but this option remains for use together with
|
|
<B>--traditional</B>.
|
|
|
|
<DT id="52">
|
|
<DD>
|
|
<B>-s</B>
|
|
|
|
<DT id="53">
|
|
<DD>
|
|
<B>--no-optimize</B>
|
|
|
|
Disable
|
|
<I>gawk</I>'s
|
|
|
|
default optimizations upon the internal representation of the program.
|
|
<DT id="54">
|
|
<DD>
|
|
<B>-S</B>
|
|
|
|
<DT id="55">
|
|
<DD>
|
|
<B>--sandbox</B>
|
|
|
|
Run
|
|
<I>gawk</I>
|
|
|
|
in sandbox mode, disabling the
|
|
<B>system()</B>
|
|
|
|
function, input redirection with
|
|
<B>getline</B>,
|
|
|
|
output redirection with
|
|
<B>print</B> and <B>printf</B>,
|
|
|
|
and loading dynamic extensions.
|
|
Command execution (through pipelines) is also disabled.
|
|
This effectively blocks a script from accessing local resources,
|
|
except for the files specified on the command line.
|
|
<DT id="56">
|
|
<DD>
|
|
<B>-t</B>
|
|
|
|
<DT id="57">
|
|
<DD>
|
|
<B>--lint-old</B>
|
|
|
|
Provide warnings about constructs that are
|
|
not portable to the original version of <FONT SIZE="-1">UNIX</FONT>
|
|
<I>awk</I>.
|
|
|
|
<DT id="58">
|
|
<DD>
|
|
<B>-V</B>
|
|
|
|
<DT id="59">
|
|
<DD>
|
|
<B>--version</B>
|
|
|
|
Print version information for this particular copy of
|
|
<I>gawk</I>
|
|
|
|
on the standard output.
|
|
This is useful mainly for knowing if the current copy of
|
|
<I>gawk</I>
|
|
|
|
on your system
|
|
is up to date with respect to whatever the Free Software Foundation
|
|
is distributing.
|
|
This is also useful when reporting bugs.
|
|
(Per the
|
|
<I>GNU Coding Standards</I>,
|
|
|
|
these options cause an immediate, successful exit.)
|
|
<DT id="60"><B>--</B>
|
|
|
|
<DD>
|
|
Signal the end of options. This is useful to allow further arguments to the
|
|
<FONT SIZE="-1">AWK</FONT> program itself to start with a ``-''.
|
|
This provides consistency with the argument parsing convention used
|
|
by most other <FONT SIZE="-1">POSIX</FONT> programs.
|
|
</DL>
|
|
<P>
|
|
|
|
In compatibility mode,
|
|
any other options are flagged as invalid, but are otherwise ignored.
|
|
In normal operation, as long as program text has been supplied, unknown
|
|
options are passed on to the <FONT SIZE="-1">AWK</FONT> program in the
|
|
<B>ARGV</B>
|
|
|
|
array for processing. This is particularly useful for running <FONT SIZE="-1">AWK</FONT>
|
|
programs via the
|
|
<B>#!</B>
|
|
|
|
executable interpreter mechanism.
|
|
<P>
|
|
|
|
For <FONT SIZE="-1">POSIX</FONT> compatibility, the
|
|
<B>-W</B>
|
|
|
|
option may be used, followed by the name of a long option.
|
|
<A NAME="lbAG"> </A>
|
|
<H2>AWK PROGRAM EXECUTION</H2>
|
|
|
|
<P>
|
|
|
|
An <FONT SIZE="-1">AWK</FONT> program consists of a sequence of
|
|
optional directives,
|
|
pattern-action statements,
|
|
and optional function definitions.
|
|
<DL COMPACT><DT id="61"><DD>
|
|
<P>
|
|
|
|
<B>@include "</B><I>filename</I><B>"
|
|
<BR>
|
|
|
|
@load "</B><I>filename</I><B>"
|
|
<BR>
|
|
|
|
@namespace "</B><I>name</I><B>"
|
|
<BR>
|
|
|
|
</B><I>pattern</I><B><TT> </TT>{ </B><I>action statements</I><B> }</B><BR>
|
|
<BR>
|
|
|
|
<B>function </B><I>name</I><B>(</B><I>parameter list</I><B>) { </B><I>statements</I><B> }</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
first reads the program source from the
|
|
<I>program-file</I>(s)
|
|
|
|
if specified,
|
|
from arguments to
|
|
<B>--source</B>,
|
|
|
|
or from the first non-option argument on the command line.
|
|
The
|
|
<B>-f</B>
|
|
|
|
and
|
|
<B>--source</B>
|
|
|
|
options may be used multiple times on the command line.
|
|
<I>Gawk</I>
|
|
|
|
reads the program text as if all the
|
|
<I>program-file</I>s
|
|
|
|
and command line source texts
|
|
had been concatenated together. This is useful for building libraries
|
|
of <FONT SIZE="-1">AWK</FONT> functions, without having to include them in each new <FONT SIZE="-1">AWK</FONT>
|
|
program that uses them. It also provides the ability to mix library
|
|
functions with command line programs.
|
|
<P>
|
|
|
|
In addition, lines beginning with
|
|
<B>@include</B>
|
|
|
|
may be used to include other source files into your program,
|
|
making library use even easier. This is equivalent
|
|
to using the
|
|
<B>--include</B>
|
|
|
|
option.
|
|
<P>
|
|
|
|
Lines beginning with
|
|
<B>@load</B>
|
|
|
|
may be used to load extension functions into your program. This is equivalent
|
|
to using the
|
|
<B>--load</B>
|
|
|
|
option.
|
|
<P>
|
|
|
|
The environment variable
|
|
<B>AWKPATH</B>
|
|
|
|
specifies a search path to use when finding source files named with
|
|
the
|
|
<B>-f</B>
|
|
|
|
and
|
|
<B>--include</B>
|
|
|
|
options. If this variable does not exist, the default path is
|
|
<B>".:/usr/local/share/awk"</B>.
|
|
(The actual directory may vary, depending upon how
|
|
<I>gawk</I>
|
|
|
|
was built and installed.)
|
|
If a file name given to the
|
|
<B>-f</B>
|
|
|
|
option contains a ``/'' character, no path search is performed.
|
|
<P>
|
|
|
|
The environment variable
|
|
<B>AWKLIBPATH</B>
|
|
|
|
specifies a search path to use when finding source files named with
|
|
the
|
|
<B>--load</B>
|
|
|
|
option. If this variable does not exist, the default path is
|
|
<B>"/usr/local/lib/gawk"</B>.
|
|
(The actual directory may vary, depending upon how
|
|
<I>gawk</I>
|
|
|
|
was built and installed.)
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
executes <FONT SIZE="-1">AWK</FONT> programs in the following order.
|
|
First,
|
|
all variable assignments specified via the
|
|
<B>-v</B>
|
|
|
|
option are performed.
|
|
Next,
|
|
<I>gawk</I>
|
|
|
|
compiles the program into an internal form.
|
|
Then,
|
|
<I>gawk</I>
|
|
|
|
executes the code in the
|
|
<B>BEGIN</B>
|
|
|
|
rule(s) (if any),
|
|
and then proceeds to read
|
|
each file named in the
|
|
<B>ARGV</B>
|
|
|
|
array (up to
|
|
<B>ARGV[ARGC-1]</B>).
|
|
|
|
If there are no files named on the command line,
|
|
<I>gawk</I>
|
|
|
|
reads the standard input.
|
|
<P>
|
|
|
|
If a filename on the command line has the form
|
|
<I>var</I><B>=</B><I>val</I>
|
|
|
|
it is treated as a variable assignment. The variable
|
|
<I>var</I>
|
|
|
|
will be assigned the value
|
|
<I>val</I>.
|
|
|
|
(This happens after any
|
|
<B>BEGIN</B>
|
|
|
|
rule(s) have been run.)
|
|
Command line variable assignment
|
|
is most useful for dynamically assigning values to the variables
|
|
<FONT SIZE="-1">AWK</FONT> uses to control how input is broken into fields and records.
|
|
It is also useful for controlling state if multiple passes are needed over
|
|
a single data file.
|
|
<P>
|
|
|
|
If the value of a particular element of
|
|
<B>ARGV</B>
|
|
|
|
is empty (<B>""</B>),
|
|
<I>gawk</I>
|
|
|
|
skips over it.
|
|
<P>
|
|
|
|
For each input file,
|
|
if a
|
|
<B>BEGINFILE</B>
|
|
|
|
rule exists,
|
|
<I>gawk</I>
|
|
|
|
executes the associated code
|
|
before processing the contents of the file. Similarly,
|
|
<I>gawk</I>
|
|
|
|
executes
|
|
the code associated with
|
|
<B>ENDFILE</B>
|
|
|
|
after processing the file.
|
|
<P>
|
|
|
|
For each record in the input,
|
|
<I>gawk</I>
|
|
|
|
tests to see if it matches any
|
|
<I>pattern</I>
|
|
|
|
in the <FONT SIZE="-1">AWK</FONT> program.
|
|
For each pattern that the record matches,
|
|
<I>gawk</I>
|
|
|
|
executes the associated
|
|
<I>action</I>.
|
|
|
|
The patterns are tested in the order they occur in the program.
|
|
<P>
|
|
|
|
Finally, after all the input is exhausted,
|
|
<I>gawk</I>
|
|
|
|
executes the code in the
|
|
<B>END</B>
|
|
|
|
rule(s) (if any).
|
|
<A NAME="lbAH"> </A>
|
|
<H3>Command Line Directories</H3>
|
|
|
|
<P>
|
|
|
|
According to POSIX, files named on the
|
|
<I>awk</I>
|
|
|
|
command line must be
|
|
text files. The behavior is ``undefined'' if they are not. Most versions
|
|
of
|
|
<I>awk</I>
|
|
|
|
treat a directory on the command line as a fatal error.
|
|
<P>
|
|
|
|
Starting with version 4.0 of
|
|
<I>gawk</I>,
|
|
|
|
a directory on the command line
|
|
produces a warning, but is otherwise skipped. If either of the
|
|
<B>--posix</B>
|
|
|
|
or
|
|
<B>--traditional</B>
|
|
|
|
options is given, then
|
|
<I>gawk</I>
|
|
|
|
reverts to
|
|
treating directories on the command line as a fatal error.
|
|
<A NAME="lbAI"> </A>
|
|
<H2>VARIABLES, RECORDS AND FIELDS</H2>
|
|
|
|
<FONT SIZE="-1">AWK</FONT> variables are dynamic; they come into existence when they are
|
|
first used. Their values are either floating-point numbers or strings,
|
|
or both,
|
|
depending upon how they are used.
|
|
Additionally,
|
|
<I>gawk</I>
|
|
|
|
allows variables to have regular-expression type.
|
|
<FONT SIZE="-1">AWK</FONT> also has one dimensional
|
|
arrays; arrays with multiple dimensions may be simulated.
|
|
<I>Gawk</I>
|
|
|
|
provides true arrays of arrays; see
|
|
<B>Arrays</B>,
|
|
|
|
below.
|
|
Several pre-defined variables are set as a program
|
|
runs; these are described as needed and summarized below.
|
|
<A NAME="lbAJ"> </A>
|
|
<H3>Records</H3>
|
|
|
|
Normally, records are separated by newline characters. You can control how
|
|
records are separated by assigning values to the built-in variable
|
|
<B>RS</B>.
|
|
|
|
If
|
|
<B>RS</B>
|
|
|
|
is any single character, that character separates records.
|
|
Otherwise,
|
|
<B>RS</B>
|
|
|
|
is a regular expression. Text in the input that matches this
|
|
regular expression separates the record.
|
|
However, in compatibility mode,
|
|
only the first character of its string
|
|
value is used for separating records.
|
|
If
|
|
<B>RS</B>
|
|
|
|
is set to the null string, then records are separated by
|
|
empty lines.
|
|
When
|
|
<B>RS</B>
|
|
|
|
is set to the null string, the newline character always acts as
|
|
a field separator, in addition to whatever value
|
|
<B>FS</B>
|
|
|
|
may have.
|
|
<A NAME="lbAK"> </A>
|
|
<H3>Fields</H3>
|
|
|
|
<P>
|
|
|
|
As each input record is read,
|
|
<I>gawk</I>
|
|
|
|
splits the record into
|
|
<I>fields</I>,
|
|
|
|
using the value of the
|
|
<B>FS</B>
|
|
|
|
variable as the field separator.
|
|
If
|
|
<B>FS</B>
|
|
|
|
is a single character, fields are separated by that character.
|
|
If
|
|
<B>FS</B>
|
|
|
|
is the null string, then each individual character becomes a
|
|
separate field.
|
|
Otherwise,
|
|
<B>FS</B>
|
|
|
|
is expected to be a full regular expression.
|
|
In the special case that
|
|
<B>FS</B>
|
|
|
|
is a single space, fields are separated
|
|
by runs of spaces and/or tabs and/or newlines.
|
|
<B>NOTE</B>:
|
|
|
|
The value of
|
|
<B>IGNORECASE</B>
|
|
|
|
(see below) also affects how fields are split when
|
|
<B>FS</B>
|
|
|
|
is a regular expression, and how records are separated when
|
|
<B>RS</B>
|
|
|
|
is a regular expression.
|
|
<P>
|
|
|
|
If the
|
|
<B>FIELDWIDTHS</B>
|
|
|
|
variable is set to a space-separated list of numbers, each field is
|
|
expected to have fixed width, and
|
|
<I>gawk</I>
|
|
|
|
splits up the record using the specified widths.
|
|
Each field width may optionally be preceded by a colon-separated
|
|
value specifying the number of characters to skip before the field starts.
|
|
The value of
|
|
<B>FS</B>
|
|
|
|
is ignored.
|
|
Assigning a new value to
|
|
<B>FS</B>
|
|
|
|
or
|
|
<B>FPAT</B>
|
|
|
|
overrides the use of
|
|
<B>FIELDWIDTHS</B>.
|
|
|
|
<P>
|
|
|
|
Similarly, if the
|
|
<B>FPAT</B>
|
|
|
|
variable is set to a string representing a regular expression,
|
|
each field is made up of text that matches that regular expression. In
|
|
this case, the regular expression describes the fields themselves,
|
|
instead of the text that separates the fields.
|
|
Assigning a new value to
|
|
<B>FS</B>
|
|
|
|
or
|
|
<B>FIELDWIDTHS</B>
|
|
|
|
overrides the use of
|
|
<B>FPAT</B>.
|
|
|
|
<P>
|
|
|
|
Each field in the input record may be referenced by its position:
|
|
<B>$1</B>,
|
|
|
|
<B>$2</B>,
|
|
|
|
and so on.
|
|
<B>$0</B>
|
|
|
|
is the whole record,
|
|
including leading and trailing whitespace.
|
|
Fields need not be referenced by constants:
|
|
<DL COMPACT><DT id="62"><DD>
|
|
<P>
|
|
|
|
<B>
|
|
n = 5
|
|
<BR>
|
|
|
|
print $n
|
|
</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
prints the fifth field in the input record.
|
|
<P>
|
|
|
|
The variable
|
|
<B>NF</B>
|
|
|
|
is set to the total number of fields in the input record.
|
|
<P>
|
|
|
|
References to non-existent fields (i.e., fields after
|
|
<B>$NF</B>)
|
|
|
|
produce the null string. However, assigning to a non-existent field
|
|
(e.g.,
|
|
<B>$(NF+2) = 5</B>)
|
|
|
|
increases the value of
|
|
<B>NF</B>,
|
|
|
|
creates any intervening fields with the null string as their values, and
|
|
causes the value of
|
|
<B>$0</B>
|
|
|
|
to be recomputed, with the fields being separated by the value of
|
|
<B>OFS</B>.
|
|
|
|
References to negative numbered fields cause a fatal error.
|
|
Decrementing
|
|
<B>NF</B>
|
|
|
|
causes the values of fields past the new value to be lost, and the value of
|
|
<B>$0</B>
|
|
|
|
to be recomputed, with the fields being separated by the value of
|
|
<B>OFS</B>.
|
|
|
|
<P>
|
|
|
|
Assigning a value to an existing field
|
|
causes the whole record to be rebuilt when
|
|
<B>$0</B>
|
|
|
|
is referenced.
|
|
Similarly, assigning a value to
|
|
<B>$0</B>
|
|
|
|
causes the record to be resplit, creating new
|
|
values for the fields.
|
|
<A NAME="lbAL"> </A>
|
|
<H3>Built-in Variables</H3>
|
|
|
|
<P>
|
|
|
|
<I>Gawk</I>'s
|
|
|
|
built-in variables are:
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="63"><B>ARGC</B>
|
|
|
|
<DD>
|
|
The number of command line arguments (does not include options to
|
|
<I>gawk</I>,
|
|
|
|
or the program source).
|
|
<DT id="64"><B>ARGIND</B>
|
|
|
|
<DD>
|
|
The index in
|
|
<B>ARGV</B>
|
|
|
|
of the current file being processed.
|
|
<DT id="65"><B>ARGV</B>
|
|
|
|
<DD>
|
|
Array of command line arguments. The array is indexed from
|
|
0 to
|
|
<B>ARGC</B>
|
|
|
|
- 1.
|
|
Dynamically changing the contents of
|
|
<B>ARGV</B>
|
|
|
|
can control the files used for data.
|
|
<DT id="66"><B>BINMODE</B>
|
|
|
|
<DD>
|
|
On non-POSIX systems, specifies use of ``binary'' mode for all file I/O.
|
|
Numeric values of 1, 2, or 3, specify that input files, output files, or
|
|
all files, respectively, should use binary I/O.
|
|
String values of <B>"r"</B>, or <B>"w"</B> specify that input files, or output files,
|
|
respectively, should use binary I/O.
|
|
String values of <B>"rw"</B> or <B>"wr"</B> specify that all files
|
|
should use binary I/O.
|
|
Any other string value is treated as <B>"rw"</B>, but generates a warning message.
|
|
<DT id="67"><B>CONVFMT</B>
|
|
|
|
<DD>
|
|
The conversion format for numbers, <B>"%.6g"</B>, by default.
|
|
<DT id="68"><B>ENVIRON</B>
|
|
|
|
<DD>
|
|
An array containing the values of the current environment.
|
|
The array is indexed by the environment variables, each element being
|
|
the value of that variable (e.g., <B>ENVIRON["HOME"]</B> might be
|
|
<B>"/home/arnold"</B>).
|
|
<P>
|
|
In POSIX mode,
|
|
changing this array does not affect the environment seen by programs which
|
|
<I>gawk</I>
|
|
|
|
spawns via redirection or the
|
|
<B>system()</B>
|
|
|
|
function.
|
|
Otherwise,
|
|
<I>gawk</I>
|
|
|
|
updates its real environment so that programs it spawns see
|
|
the changes.
|
|
<DT id="69"><B>ERRNO</B>
|
|
|
|
<DD>
|
|
If a system error occurs either doing a redirection for
|
|
<B>getline</B>,
|
|
|
|
during a read for
|
|
<B>getline</B>,
|
|
|
|
or during a
|
|
<B>close()</B>,
|
|
|
|
then
|
|
<B>ERRNO</B>
|
|
|
|
is set to
|
|
a string describing the error.
|
|
The value is subject to translation in non-English locales.
|
|
If the string in
|
|
<B>ERRNO</B>
|
|
|
|
corresponds to a system error in the
|
|
<I><A HREF="/cgi-bin/man/man2html?3+errno">errno</A></I>(3)
|
|
|
|
variable, then the numeric value can be found in
|
|
<B>PROCINFO[errno].</B>
|
|
|
|
For non-system errors,
|
|
<B>PROCINFO[errno]</B>
|
|
|
|
will be zero.
|
|
<DT id="70"><B>FIELDWIDTHS</B>
|
|
|
|
<DD>
|
|
A whitespace-separated list of field widths. When set,
|
|
<I>gawk</I>
|
|
|
|
parses the input into fields of fixed width, instead of using the
|
|
value of the
|
|
<B>FS</B>
|
|
|
|
variable as the field separator.
|
|
Each field width may optionally be preceded by a colon-separated
|
|
value specifying the number of characters to skip before the field starts.
|
|
See
|
|
<B>Fields</B>,
|
|
|
|
above.
|
|
<DT id="71"><B>FILENAME</B>
|
|
|
|
<DD>
|
|
The name of the current input file.
|
|
If no files are specified on the command line, the value of
|
|
<B>FILENAME</B>
|
|
|
|
is ``-''.
|
|
However,
|
|
<B>FILENAME</B>
|
|
|
|
is undefined inside the
|
|
<B>BEGIN</B>
|
|
|
|
rule
|
|
(unless set by
|
|
<B>getline</B>).
|
|
|
|
<DT id="72"><B>FNR</B>
|
|
|
|
<DD>
|
|
The input record number in the current input file.
|
|
<DT id="73"><B>FPAT</B>
|
|
|
|
<DD>
|
|
A regular expression describing the contents of the
|
|
fields in a record.
|
|
When set,
|
|
<I>gawk</I>
|
|
|
|
parses the input into fields, where the fields match the
|
|
regular expression, instead of using the
|
|
value of
|
|
<B>FS</B>
|
|
|
|
as the field separator.
|
|
See
|
|
<B>Fields</B>,
|
|
|
|
above.
|
|
<DT id="74"><B>FS</B>
|
|
|
|
<DD>
|
|
The input field separator, a space by default. See
|
|
<B>Fields</B>,
|
|
|
|
above.
|
|
<DT id="75"><B>FUNCTAB</B>
|
|
|
|
<DD>
|
|
An array whose indices and corresponding values
|
|
are the names of all the user-defined
|
|
or extension functions in the program.
|
|
<B>NOTE</B>:
|
|
|
|
You may not use the
|
|
<B>delete</B>
|
|
|
|
statement with the
|
|
<B>FUNCTAB</B>
|
|
|
|
array.
|
|
<DT id="76"><B>IGNORECASE</B>
|
|
|
|
<DD>
|
|
Controls the case-sensitivity of all regular expression
|
|
and string operations. If
|
|
<B>IGNORECASE</B>
|
|
|
|
has a non-zero value, then string comparisons and
|
|
pattern matching in rules,
|
|
field splitting with
|
|
<B>FS</B>
|
|
|
|
and
|
|
<B>FPAT</B>,
|
|
|
|
record separating with
|
|
<B>RS</B>,
|
|
|
|
regular expression
|
|
matching with
|
|
<B>~</B>
|
|
|
|
and
|
|
<B>!~</B>,
|
|
|
|
and the
|
|
<B>gensub()</B>,
|
|
|
|
<B>gsub()</B>,
|
|
|
|
<B>index()</B>,
|
|
|
|
<B>match()</B>,
|
|
|
|
<B>patsplit()</B>,
|
|
|
|
<B>split()</B>,
|
|
|
|
and
|
|
<B>sub()</B>
|
|
|
|
built-in functions all ignore case when doing regular expression
|
|
operations.
|
|
<B>NOTE</B>:
|
|
|
|
Array subscripting is
|
|
<I>not</I>
|
|
|
|
affected.
|
|
However, the
|
|
<B>asort()</B>
|
|
|
|
and
|
|
<B>asorti()</B>
|
|
|
|
functions are affected.
|
|
<P>
|
|
Thus, if
|
|
<B>IGNORECASE</B>
|
|
|
|
is not equal to zero,
|
|
<B>/aB/</B>
|
|
|
|
matches all of the strings <B>"ab"</B>, <B>"aB"</B>, <B>"Ab"</B>,
|
|
and <B>"AB"</B>.
|
|
As with all <FONT SIZE="-1">AWK</FONT> variables, the initial value of
|
|
<B>IGNORECASE</B>
|
|
|
|
is zero, so all regular expression and string
|
|
operations are normally case-sensitive.
|
|
<DT id="77"><B>LINT</B>
|
|
|
|
<DD>
|
|
Provides dynamic control of the
|
|
<B>--lint</B>
|
|
|
|
option from within an <FONT SIZE="-1">AWK</FONT> program.
|
|
When true,
|
|
<I>gawk</I>
|
|
|
|
prints lint warnings. When false, it does not.
|
|
When assigned the string value <B>"fatal"</B>,
|
|
lint warnings become fatal errors, exactly like
|
|
<B>--lint=fatal</B>.
|
|
|
|
Any other true value just prints warnings.
|
|
<DT id="78"><B>NF</B>
|
|
|
|
<DD>
|
|
The number of fields in the current input record.
|
|
<DT id="79"><B>NR</B>
|
|
|
|
<DD>
|
|
The total number of input records seen so far.
|
|
<DT id="80"><B>OFMT</B>
|
|
|
|
<DD>
|
|
The output format for numbers, <B>"%.6g"</B>, by default.
|
|
<DT id="81"><B>OFS</B>
|
|
|
|
<DD>
|
|
The output field separator, a space by default.
|
|
<DT id="82"><B>ORS</B>
|
|
|
|
<DD>
|
|
The output record separator, by default a newline.
|
|
<DT id="83"><B>PREC</B>
|
|
|
|
<DD>
|
|
The working precision of arbitrary precision floating-point
|
|
numbers, 53 by default.
|
|
<DT id="84"><B>PROCINFO</B>
|
|
|
|
<DD>
|
|
The elements of this array provide access to information about the
|
|
running <FONT SIZE="-1">AWK</FONT> program.
|
|
On some systems,
|
|
there may be elements in the array, <B>"group1"</B> through
|
|
<B>"group</B><I>n</I><B>"</B> for some
|
|
<I>n</I>,
|
|
|
|
which is the number of supplementary groups that the process has.
|
|
Use the
|
|
<B>in</B>
|
|
|
|
operator to test for these elements.
|
|
The following elements are guaranteed to be available:
|
|
<DL COMPACT><DT id="85"><DD>
|
|
<DL COMPACT>
|
|
<DT id="86"><B>PROCINFO["argv"]</B><DD>
|
|
The command line arguments as received by
|
|
<I>gawk</I>
|
|
|
|
at the C-language level.
|
|
The subscripts start from zero.
|
|
<DT id="87"><B>PROCINFO["egid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getegid">getegid</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="88"><B>PROCINFO["errno"]</B><DD>
|
|
The value of
|
|
<I><A HREF="/cgi-bin/man/man2html?3+errno">errno</A></I>(3)
|
|
|
|
when
|
|
<B>ERRNO</B>
|
|
|
|
is set to the associated error message.
|
|
<DT id="89"><B>PROCINFO["euid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+geteuid">geteuid</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="90"><B>PROCINFO["FS"]</B><DD>
|
|
<B>"FS"</B> if field splitting with
|
|
<B>FS</B>
|
|
|
|
is in effect,
|
|
<B>"FPAT"</B> if field splitting with
|
|
<B>FPAT</B>
|
|
|
|
is in effect,
|
|
<B>"FIELDWIDTHS"</B> if field splitting with
|
|
<B>FIELDWIDTHS</B>
|
|
|
|
is in effect,
|
|
or <B>"API"</B> if API input parser field splitting
|
|
is in effect.
|
|
<DT id="91"><B>PROCINFO["gid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getgid">getgid</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="92"><B>PROCINFO["identifiers"]</B><DD>
|
|
A subarray, indexed by the names of all identifiers used in the
|
|
text of the AWK program.
|
|
The values indicate what
|
|
<I>gawk</I>
|
|
|
|
knows about the identifiers after it has finished parsing the program; they are
|
|
<I>not</I>
|
|
|
|
updated while the program runs.
|
|
For each identifier, the value of the element is one of the following:
|
|
<DL COMPACT><DT id="93"><DD>
|
|
<DL COMPACT>
|
|
<DT id="94"><B>"array"</B><DD>
|
|
The identifier is an array.
|
|
<DT id="95"><B>"builtin"</B><DD>
|
|
The identifier is a built-in function.
|
|
<DT id="96"><B>"extension"</B><DD>
|
|
The identifier is an extension function loaded via
|
|
<B>@load</B>
|
|
|
|
or
|
|
<B>--load</B>.
|
|
|
|
<DT id="97"><B>"scalar"</B><DD>
|
|
The identifier is a scalar.
|
|
<DT id="98"><B>"untyped"</B><DD>
|
|
The identifier is untyped (could be used as a scalar or array,
|
|
<I>gawk</I>
|
|
|
|
doesn't know yet).
|
|
<DT id="99"><B>"user"</B><DD>
|
|
The identifier is a user-defined function.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="100"><B>PROCINFO["pgrpid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getpgrp">getpgrp</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="101"><B>PROCINFO["pid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getpid">getpid</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="102"><B>PROCINFO["platform"]</B><DD>
|
|
A string indicating the platform for which
|
|
<I>gawk</I>
|
|
|
|
was compiled. It is one of:
|
|
<DL COMPACT><DT id="103"><DD>
|
|
<DL COMPACT>
|
|
<DT id="104"><B>"djgpp"</B>, <B>"mingw"</B><DD>
|
|
Microsoft Windows, using either DJGPP, or MinGW, respectively.
|
|
<DT id="105"><B>"os2"</B><DD>
|
|
OS/2.
|
|
<DT id="106"><B>"posix"</B><DD>
|
|
GNU/Linux, Cygwin, Mac OS X, and legacy Unix systems.
|
|
<DT id="107"><B>"vms"</B><DD>
|
|
OpenVMS or Vax/VMS.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="108"><B>PROCINFO["ppid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getppid">getppid</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="109"><B>PROCINFO["strftime"]</B><DD>
|
|
The default time format string for
|
|
<B>strftime()</B>.
|
|
|
|
Changing its value affects how
|
|
<B>strftime()</B>
|
|
|
|
formats time values when called with no arguments.
|
|
<DT id="110"><B>PROCINFO["uid"]</B><DD>
|
|
The value of the
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getuid">getuid</A></I>(2)
|
|
|
|
system call.
|
|
<DT id="111"><B>PROCINFO["version"]</B><DD>
|
|
The version of
|
|
<I>gawk</I>.
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
The following elements are present if loading dynamic
|
|
extensions is available:
|
|
<DL COMPACT>
|
|
<DT id="112"><B>PROCINFO["api_major"]</B><DD>
|
|
The major version of the extension API.
|
|
<DT id="113"><B>PROCINFO["api_minor"]</B><DD>
|
|
The minor version of the extension API.
|
|
</DL>
|
|
<P>
|
|
|
|
The following elements are available if MPFR support is
|
|
compiled into
|
|
<I>gawk</I>:
|
|
|
|
<DL COMPACT>
|
|
<DT id="114"><B>PROCINFO["gmp_version"]</B><DD>
|
|
The version of the GNU GMP library used for arbitrary precision
|
|
number support in
|
|
<I>gawk</I>.
|
|
|
|
<DT id="115"><B>PROCINFO["mpfr_version"]</B><DD>
|
|
The version of the GNU MPFR library used for arbitrary precision
|
|
number support in
|
|
<I>gawk</I>.
|
|
|
|
<DT id="116"><B>PROCINFO["prec_max"]</B><DD>
|
|
The maximum precision supported by the GNU MPFR library for
|
|
arbitrary precision floating-point numbers.
|
|
<DT id="117"><B>PROCINFO["prec_min"]</B><DD>
|
|
The minimum precision allowed by the GNU MPFR library for
|
|
arbitrary precision floating-point numbers.
|
|
</DL>
|
|
<P>
|
|
|
|
The following elements may set by a program to
|
|
change
|
|
<I>gawk</I>'s
|
|
|
|
behavior:
|
|
<DL COMPACT>
|
|
<DT id="118"><B>PROCINFO["NONFATAL"]</B><DD>
|
|
If this exists, then I/O errors for all redirections become nonfatal.
|
|
<DT id="119"><B>PROCINFO["</B><I>name</I><B>", "NONFATAL"]</B><DD>
|
|
Make I/O errors for
|
|
<I>name</I>
|
|
|
|
be nonfatal.
|
|
<DT id="120"><B>PROCINFO["</B><I>command</I><B>", "pty"]</B><DD>
|
|
Use a pseudo-tty for two-way communication with
|
|
<I>command</I>
|
|
|
|
instead of setting up two one-way pipes.
|
|
<DT id="121"><B>PROCINFO["</B><I>input</I><B>", "READ_TIMEOUT"]</B><DD>
|
|
The timeout in milliseconds for reading data from
|
|
<I>input</I>,
|
|
|
|
where
|
|
<I>input</I>
|
|
|
|
is a redirection string or a filename. A value of zero or
|
|
less than zero means no timeout.
|
|
<DT id="122"><B>PROCINFO["</B><I>input</I><B>", "RETRY"]</B><DD>
|
|
If an I/O error that may be retried occurs when reading data from
|
|
<I>input</I>,
|
|
|
|
and this array entry exists, then
|
|
<B>getline</B>
|
|
|
|
returns -2 instead of following the default behavior of returning -1
|
|
and configuring
|
|
<I>input</I>
|
|
|
|
to return no further data.
|
|
An I/O error that may be retried is one where
|
|
<I><A HREF="/cgi-bin/man/man2html?3+errno">errno</A></I>(3)
|
|
|
|
has the value EAGAIN, EWOULDBLOCK, EINTR, or ETIMEDOUT.
|
|
This may be useful in conjunction with
|
|
<B>PROCINFO["</B><I>input</I><B>", "READ_TIMEOUT"]</B>
|
|
or in situations where a file descriptor has been configured to behave in a
|
|
non-blocking fashion.
|
|
<DT id="123"><B>PROCINFO["sorted_in"]</B><DD>
|
|
If this element exists in
|
|
<B>PROCINFO</B>,
|
|
|
|
then its value controls the order in which array elements
|
|
are traversed in
|
|
<B>for</B>
|
|
|
|
loops.
|
|
Supported values are
|
|
<B>"@ind_str_asc"</B>,
|
|
<B>"@ind_num_asc"</B>,
|
|
<B>"@val_type_asc"</B>,
|
|
<B>"@val_str_asc"</B>,
|
|
<B>"@val_num_asc"</B>,
|
|
<B>"@ind_str_desc"</B>,
|
|
<B>"@ind_num_desc"</B>,
|
|
<B>"@val_type_desc"</B>,
|
|
<B>"@val_str_desc"</B>,
|
|
<B>"@val_num_desc"</B>,
|
|
and
|
|
<B>"@unsorted"</B>.
|
|
The value can also be the name (as a
|
|
<I>string</I>)
|
|
|
|
of any comparison function defined
|
|
as follows:
|
|
<P>
|
|
|
|
<B>function cmp_func(i1, v1, i2, v2)</B>
|
|
|
|
<P>
|
|
where
|
|
<I>i1</I>
|
|
|
|
and
|
|
<I>i2</I>
|
|
|
|
are the indices, and
|
|
<I>v1</I>
|
|
|
|
and
|
|
<I>v2</I>
|
|
|
|
are the
|
|
corresponding values of the two elements being compared.
|
|
It should return a number less than, equal to, or greater than 0,
|
|
depending on how the elements of the array are to be ordered.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="124"><B>ROUNDMODE</B>
|
|
|
|
<DD>
|
|
The rounding mode to use for arbitrary precision arithmetic on
|
|
numbers, by default <B>"N"</B> (IEEE-754 roundTiesToEven mode).
|
|
The accepted values are:
|
|
<DL COMPACT><DT id="125"><DD>
|
|
<DL COMPACT>
|
|
<DT id="126"><B>"A"</B> or <B>"a"</B><DD>
|
|
for rounding away from zero.
|
|
These are only available if your version of
|
|
the GNU MPFR library supports rounding away from zero.
|
|
<DT id="127"><B>"D"</B> or <B>"d"</B><DD>
|
|
for roundTowardNegative.
|
|
<DT id="128"><B>"N"</B> or <B>"n"</B><DD>
|
|
for roundTiesToEven.
|
|
<DT id="129"><B>"U"</B> or <B>"u"</B><DD>
|
|
for roundTowardPositive.
|
|
<DT id="130"><B>"Z"</B> or <B>"z"</B><DD>
|
|
for roundTowardZero.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="131"><B>RS</B>
|
|
|
|
<DD>
|
|
The input record separator, by default a newline.
|
|
<DT id="132"><B>RT</B>
|
|
|
|
<DD>
|
|
The record terminator.
|
|
<I>Gawk</I>
|
|
|
|
sets
|
|
<B>RT</B>
|
|
|
|
to the input text that matched the character or regular expression
|
|
specified by
|
|
<B>RS</B>.
|
|
|
|
<DT id="133"><B>RSTART</B>
|
|
|
|
<DD>
|
|
The index of the first character matched by
|
|
<B>match()</B>;
|
|
|
|
0 if no match.
|
|
(This implies that character indices start at one.)
|
|
<DT id="134"><B>RLENGTH</B>
|
|
|
|
<DD>
|
|
The length of the string matched by
|
|
<B>match()</B>;
|
|
|
|
-1 if no match.
|
|
<DT id="135"><B>SUBSEP</B>
|
|
|
|
<DD>
|
|
The string used to separate multiple subscripts in array
|
|
elements, by default <B>"\034"</B>.
|
|
<DT id="136"><B>SYMTAB</B>
|
|
|
|
<DD>
|
|
An array whose indices are the names of all currently defined
|
|
global variables and arrays in the program. The array may be used
|
|
for indirect access to read or write the value of a variable:
|
|
<P>
|
|
<B>
|
|
</B><PRE>
|
|
foo = 5
|
|
SYMTAB["foo"] = 4
|
|
print foo # prints 4
|
|
</PRE>
|
|
|
|
|
|
|
|
<P>
|
|
The
|
|
<B>typeof()</B>
|
|
|
|
function may be used to test if an element in
|
|
<B>SYMTAB</B>
|
|
|
|
is an array.
|
|
You may not use the
|
|
<B>delete</B>
|
|
|
|
statement with the
|
|
<B>SYMTAB</B>
|
|
|
|
array, nor assign to elements with an index that is
|
|
not a variable name.
|
|
<DT id="137"><B>TEXTDOMAIN</B>
|
|
|
|
<DD>
|
|
The text domain of the <FONT SIZE="-1">AWK</FONT> program; used to find the localized
|
|
translations for the program's strings.
|
|
</DL>
|
|
<A NAME="lbAM"> </A>
|
|
<H3>Arrays</H3>
|
|
|
|
<P>
|
|
|
|
Arrays are subscripted with an expression between square brackets
|
|
(<B>[</B> and <B>]</B>).
|
|
|
|
If the expression is an expression list
|
|
(<I>expr</I>, <I>expr</I> ...)
|
|
|
|
then the array subscript is a string consisting of the
|
|
concatenation of the (string) value of each expression,
|
|
separated by the value of the
|
|
<B>SUBSEP</B>
|
|
|
|
variable.
|
|
This facility is used to simulate multiply dimensioned
|
|
arrays. For example:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="138"><DD>
|
|
<B>
|
|
i = "A"; j = "B"; k = "C"
|
|
<BR>
|
|
|
|
x[i, j, k] = "hello, world\n"
|
|
</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
assigns the string <B>"hello, world\n"</B> to the element of the array
|
|
<B>x</B>
|
|
|
|
which is indexed by the string <B>"A\034B\034C"</B>. All arrays in <FONT SIZE="-1">AWK</FONT>
|
|
are associative, i.e., indexed by string values.
|
|
<P>
|
|
|
|
The special operator
|
|
<B>in</B>
|
|
|
|
may be used to test if an array has an index consisting of a particular
|
|
value:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="139"><DD>
|
|
<B>
|
|
</B><PRE>
|
|
if (val in array)
|
|
print array[val]
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
If the array has multiple subscripts, use
|
|
<B>(i, j) in array</B>.
|
|
|
|
<P>
|
|
|
|
The
|
|
<B>in</B>
|
|
|
|
construct may also be used in a
|
|
<B>for</B>
|
|
|
|
loop to iterate over all the elements of an array.
|
|
However, the
|
|
<B>(i, j) in array</B>
|
|
|
|
construct only works in tests, not in
|
|
<B>for</B>
|
|
|
|
loops.
|
|
<P>
|
|
|
|
An element may be deleted from an array using the
|
|
<B>delete</B>
|
|
|
|
statement.
|
|
The
|
|
<B>delete</B>
|
|
|
|
statement may also be used to delete the entire contents of an array,
|
|
just by specifying the array name without a subscript.
|
|
<P>
|
|
|
|
<I>gawk</I>
|
|
|
|
supports true multidimensional arrays. It does not require that
|
|
such arrays be ``rectangular'' as in C or C++.
|
|
For example:
|
|
<P>
|
|
<DL COMPACT><DT id="140"><DD>
|
|
<B>
|
|
</B><PRE>
|
|
a[1] = 5
|
|
a[2][1] = 6
|
|
a[2][2] = 7
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
<B>NOTE</B>:
|
|
|
|
You may need to tell
|
|
<I>gawk</I>
|
|
|
|
that an array element is really a subarray in order to use it where
|
|
<I>gawk</I>
|
|
|
|
expects an array (such as in the second argument to
|
|
<B>split()</B>).
|
|
|
|
You can do this by creating an element in the subarray and then
|
|
deleting it with the
|
|
<B>delete</B>
|
|
|
|
statement.
|
|
<A NAME="lbAN"> </A>
|
|
<H3>Namespaces</H3>
|
|
|
|
<I>Gawk</I>
|
|
|
|
provides a simple
|
|
<I>namespace</I>
|
|
|
|
facility to help work around the fact that all variables in
|
|
AWK are global.
|
|
<P>
|
|
|
|
A
|
|
<I>qualified name</I>
|
|
|
|
consists of a two simple identifiers joined by a double colon
|
|
(<B>::</B>).
|
|
|
|
The left-hand identifier represents the namespace and the right-hand
|
|
identifier is the variable within it.
|
|
All simple (non-qualified) names are considered to be in the
|
|
``current'' namespace; the default namespace is
|
|
<B>awk</B>.
|
|
|
|
However, simple identifiers consisting solely of uppercase
|
|
letters are forced into the
|
|
<B>awk</B>
|
|
|
|
namespace, even if the current namespace is different.
|
|
<P>
|
|
|
|
You change the current namespace with an
|
|
<B>@namespace "</B><I>name</I><B>"</B>
|
|
directive.
|
|
<P>
|
|
|
|
The standard predefined builtin function names may not be used as
|
|
namespace names. The names of additional functions provided by
|
|
<I>gawk</I>
|
|
|
|
may be used as namespace names or as simple identifiers in other
|
|
namespaces.
|
|
For more details, see <I>GAWK: Effective AWK Programming</I>.
|
|
<A NAME="lbAO"> </A>
|
|
<H3>Variable Typing And Conversion</H3>
|
|
|
|
<P>
|
|
|
|
Variables and fields
|
|
may be (floating point) numbers, or strings, or both.
|
|
They may also be regular expressions. How the
|
|
value of a variable is interpreted depends upon its context. If used in
|
|
a numeric expression, it will be treated as a number; if used as a string
|
|
it will be treated as a string.
|
|
<P>
|
|
|
|
To force a variable to be treated as a number, add zero to it; to force it
|
|
to be treated as a string, concatenate it with the null string.
|
|
<P>
|
|
|
|
Uninitialized variables have the numeric value zero and the string value ""
|
|
(the null, or empty, string).
|
|
<P>
|
|
|
|
When a string must be converted to a number, the conversion is accomplished
|
|
using
|
|
<I><A HREF="/cgi-bin/man/man2html?3+strtod">strtod</A></I>(3).
|
|
|
|
A number is converted to a string by using the value of
|
|
<B>CONVFMT</B>
|
|
|
|
as a format string for
|
|
<I><A HREF="/cgi-bin/man/man2html?3+sprintf">sprintf</A></I>(3),
|
|
|
|
with the numeric value of the variable as the argument.
|
|
However, even though all numbers in <FONT SIZE="-1">AWK</FONT> are floating-point,
|
|
integral values are
|
|
<I>always</I>
|
|
|
|
converted as integers. Thus, given
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="141"><DD>
|
|
<B>
|
|
</B><PRE>
|
|
CONVFMT = "%2.2f"
|
|
a = 12
|
|
b = a ""
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
the variable
|
|
<B>b</B>
|
|
|
|
has a string value of <B>"12"</B> and not <B>"12.00"</B>.
|
|
<P>
|
|
|
|
<B>NOTE</B>:
|
|
|
|
When operating in POSIX mode (such as with the
|
|
<B>--posix</B>
|
|
|
|
option),
|
|
beware that locale settings may interfere with the way
|
|
decimal numbers are treated: the decimal separator of the numbers you
|
|
are feeding to
|
|
<I>gawk</I>
|
|
|
|
must conform to what your locale would expect, be it
|
|
a comma (,) or a period (.).
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
performs comparisons as follows:
|
|
If two variables are numeric, they are compared numerically.
|
|
If one value is numeric and the other has a string value that is a
|
|
``numeric string,'' then comparisons are also done numerically.
|
|
Otherwise, the numeric value is converted to a string and a string
|
|
comparison is performed.
|
|
Two strings are compared, of course, as strings.
|
|
<P>
|
|
|
|
Note that string constants, such as <B>"57"</B>, are
|
|
<I>not</I>
|
|
|
|
numeric strings, they are string constants.
|
|
The idea of ``numeric string''
|
|
only applies to fields,
|
|
<B>getline</B>
|
|
|
|
input,
|
|
<B>FILENAME</B>,
|
|
|
|
<B>ARGV</B>
|
|
|
|
elements,
|
|
<B>ENVIRON</B>
|
|
|
|
elements and the elements of an array created by
|
|
<B>split()</B>
|
|
|
|
or
|
|
<B>patsplit()</B>
|
|
|
|
that are numeric strings.
|
|
The basic idea is that
|
|
<I>user input</I>,
|
|
|
|
and only user input, that looks numeric,
|
|
should be treated that way.
|
|
<A NAME="lbAP"> </A>
|
|
<H3>Octal and Hexadecimal Constants</H3>
|
|
|
|
You may use C-style octal and hexadecimal constants in your AWK
|
|
program source code.
|
|
For example, the octal value
|
|
<B>011</B>
|
|
|
|
is equal to decimal
|
|
<B>9</B>,
|
|
|
|
and the hexadecimal value
|
|
<B>0x11</B>
|
|
|
|
is equal to decimal 17.
|
|
<A NAME="lbAQ"> </A>
|
|
<H3>String Constants</H3>
|
|
|
|
<P>
|
|
|
|
String constants in <FONT SIZE="-1">AWK</FONT> are sequences of characters enclosed
|
|
between double quotes (like <B>"value"</B>). Within strings, certain
|
|
<I>escape sequences</I>
|
|
|
|
are recognized, as in C. These are:
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="142"><B>\\</B>
|
|
|
|
<DD>
|
|
A literal backslash.
|
|
<DT id="143"><B>\a</B>
|
|
|
|
<DD>
|
|
The ``alert'' character; usually the <FONT SIZE="-1">ASCII</FONT> <FONT SIZE="-1">BEL</FONT> character.
|
|
<DT id="144"><B>\b</B>
|
|
|
|
<DD>
|
|
Backspace.
|
|
<DT id="145"><B>\f</B>
|
|
|
|
<DD>
|
|
Form-feed.
|
|
<DT id="146"><B>\n</B>
|
|
|
|
<DD>
|
|
Newline.
|
|
<DT id="147"><B>\r</B>
|
|
|
|
<DD>
|
|
Carriage return.
|
|
<DT id="148"><B>\t</B>
|
|
|
|
<DD>
|
|
Horizontal tab.
|
|
<DT id="149"><B>\v</B>
|
|
|
|
<DD>
|
|
Vertical tab.
|
|
<DT id="150"><B>\x</B><I>hex digits</I>
|
|
|
|
<DD>
|
|
The character represented by the string of hexadecimal digits following
|
|
the
|
|
<B>\x</B>.
|
|
|
|
Up to two
|
|
following hexadecimal digits are considered part of
|
|
the escape sequence.
|
|
E.g., <B>"\x1B"</B> is the <FONT SIZE="-1">ASCII</FONT> <FONT SIZE="-1">ESC</FONT> (escape) character.
|
|
<DT id="151"><B>\</B><I>ddd</I>
|
|
|
|
<DD>
|
|
The character represented by the 1-, 2-, or 3-digit sequence of octal
|
|
digits.
|
|
E.g., <B>"\033"</B> is the <FONT SIZE="-1">ASCII</FONT> <FONT SIZE="-1">ESC</FONT> (escape) character.
|
|
<DT id="152"><B>\</B><I>c</I>
|
|
|
|
<DD>
|
|
The literal character
|
|
<I>c</I>.
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
In compatibility mode, the characters represented by octal and
|
|
hexadecimal escape sequences are treated literally when used in
|
|
regular expression constants. Thus,
|
|
<B>/a\52b/</B>
|
|
|
|
is equivalent to
|
|
<B>/a\*b/</B>.
|
|
|
|
<A NAME="lbAR"> </A>
|
|
<H3>Regexp Constants</H3>
|
|
|
|
A regular expression constant is a sequence of characters enclosed
|
|
between forward slashes (like
|
|
<B>/value/</B>).
|
|
|
|
Regular expression matching is described more fully below; see
|
|
<B>Regular Expressions</B>.
|
|
|
|
<P>
|
|
|
|
The escape sequences described earlier may also be used inside
|
|
constant regular expressions
|
|
(e.g.,
|
|
<B>/[ \t\f\n\r\v]/</B>
|
|
|
|
matches whitespace characters).
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
provides
|
|
<I>strongly typed</I>
|
|
|
|
regular expression constants. These are written with a leading
|
|
<B>@</B>
|
|
|
|
symbol (like so:
|
|
<B>@/value/</B>).
|
|
|
|
Such constants may be assigned to scalars (variables, array elements)
|
|
and passed to user-defined functions. Variables that have been so
|
|
assigned have regular expression type.
|
|
<A NAME="lbAS"> </A>
|
|
<H2>PATTERNS AND ACTIONS</H2>
|
|
|
|
<FONT SIZE="-1">AWK</FONT> is a line-oriented language. The pattern comes first, and then the
|
|
action. Action statements are enclosed in
|
|
<B>{</B>
|
|
|
|
and
|
|
<B>}</B>.
|
|
|
|
Either the pattern may be missing, or the action may be missing, but,
|
|
of course, not both. If the pattern is missing, the action
|
|
executes for every single record of input.
|
|
A missing action is equivalent to
|
|
<DL COMPACT><DT id="153"><DD>
|
|
<P>
|
|
|
|
<B>{ print }</B>
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
which prints the entire record.
|
|
<P>
|
|
|
|
Comments begin with the
|
|
<B>#</B>
|
|
|
|
character, and continue until the
|
|
end of the line.
|
|
Empty lines may be used to separate statements.
|
|
Normally, a statement ends with a newline, however, this is not the
|
|
case for lines ending in
|
|
a comma,
|
|
<B>{</B>,
|
|
|
|
<B>?</B>,
|
|
|
|
<B>:</B>,
|
|
|
|
<B>&&</B>,
|
|
|
|
or
|
|
<B>||</B>.
|
|
|
|
Lines ending in
|
|
<B>do</B>
|
|
|
|
or
|
|
<B>else</B>
|
|
|
|
also have their statements automatically continued on the following line.
|
|
In other cases, a line can be continued by ending it with a ``\'',
|
|
in which case the newline is ignored. However, a ``\'' after a
|
|
<B>#</B>
|
|
|
|
is not special.
|
|
<P>
|
|
|
|
Multiple statements may
|
|
be put on one line by separating them with a ``;''.
|
|
This applies to both the statements within the action part of a
|
|
pattern-action pair (the usual case),
|
|
and to the pattern-action statements themselves.
|
|
<A NAME="lbAT"> </A>
|
|
<H3>Patterns</H3>
|
|
|
|
<FONT SIZE="-1">AWK</FONT> patterns may be one of the following:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="154"><DD>
|
|
<PRE>
|
|
<B>BEGIN</B>
|
|
<B>END</B>
|
|
<B>BEGINFILE</B>
|
|
<B>ENDFILE</B>
|
|
<B>/</B><I>regular expression</I><B>/</B>
|
|
<I>relational expression</I>
|
|
<I>pattern</I><B> && </B><I>pattern</I>
|
|
<I>pattern</I><B> || </B><I>pattern</I>
|
|
<I>pattern</I><B> ? </B><I>pattern</I><B> : </B><I>pattern</I>
|
|
<B>(</B><I>pattern</I><B>)</B>
|
|
<B>!</B><I> pattern</I>
|
|
<I>pattern1</I><B>, </B><I>pattern2</I>
|
|
</PRE>
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
<B>BEGIN</B>
|
|
|
|
and
|
|
<B>END</B>
|
|
|
|
are two special kinds of patterns which are not tested against
|
|
the input.
|
|
The action parts of all
|
|
<B>BEGIN</B>
|
|
|
|
patterns are merged as if all the statements had
|
|
been written in a single
|
|
<B>BEGIN</B>
|
|
|
|
rule. They are executed before any
|
|
of the input is read. Similarly, all the
|
|
<B>END</B>
|
|
|
|
rules are merged,
|
|
and executed when all the input is exhausted (or when an
|
|
<B>exit</B>
|
|
|
|
statement is executed).
|
|
<B>BEGIN</B>
|
|
|
|
and
|
|
<B>END</B>
|
|
|
|
patterns cannot be combined with other patterns in pattern expressions.
|
|
<B>BEGIN</B>
|
|
|
|
and
|
|
<B>END</B>
|
|
|
|
patterns cannot have missing action parts.
|
|
<P>
|
|
|
|
<B>BEGINFILE</B>
|
|
|
|
and
|
|
<B>ENDFILE</B>
|
|
|
|
are additional special patterns whose actions are executed
|
|
before reading the first record of each command-line input file
|
|
and after reading the last record of each file.
|
|
Inside the
|
|
<B>BEGINFILE</B>
|
|
|
|
rule, the value of
|
|
<B>ERRNO</B>
|
|
|
|
is the empty string if the file was opened successfully.
|
|
Otherwise, there is some problem with the file and the code should
|
|
use
|
|
<B>nextfile</B>
|
|
|
|
to skip it. If that is not done,
|
|
<I>gawk</I>
|
|
|
|
produces its usual fatal error for files that cannot be opened.
|
|
<P>
|
|
|
|
For
|
|
<B>/</B><I>regular expression</I><B>/</B>
|
|
|
|
patterns, the associated statement is executed for each input record that matches
|
|
the regular expression.
|
|
Regular expressions are the same as those in
|
|
<I><A HREF="/cgi-bin/man/man2html?1+egrep">egrep</A></I>(1),
|
|
|
|
and are summarized below.
|
|
<P>
|
|
|
|
A
|
|
<I>relational expression</I>
|
|
|
|
may use any of the operators defined below in the section on actions.
|
|
These generally test whether certain fields match certain regular expressions.
|
|
<P>
|
|
|
|
The
|
|
<B>&&</B>,
|
|
|
|
<B>||</B>,
|
|
|
|
and
|
|
<B>!</B>
|
|
|
|
operators are logical AND, logical OR, and logical NOT, respectively, as in C.
|
|
They do short-circuit evaluation, also as in C, and are used for combining
|
|
more primitive pattern expressions. As in most languages, parentheses
|
|
may be used to change the order of evaluation.
|
|
<P>
|
|
|
|
The
|
|
<B>?:</B>
|
|
|
|
operator is like the same operator in C. If the first pattern is true
|
|
then the pattern used for testing is the second pattern, otherwise it is
|
|
the third. Only one of the second and third patterns is evaluated.
|
|
<P>
|
|
|
|
The
|
|
<I>pattern1</I><B>, </B><I>pattern2</I>
|
|
|
|
form of an expression is called a
|
|
<I>range pattern</I>.
|
|
|
|
It matches all input records starting with a record that matches
|
|
<I>pattern1</I>,
|
|
|
|
and continuing until a record that matches
|
|
<I>pattern2</I>,
|
|
|
|
inclusive. It does not combine with any other sort of pattern expression.
|
|
<A NAME="lbAU"> </A>
|
|
<H3>Regular Expressions</H3>
|
|
|
|
Regular expressions are the extended kind found in
|
|
<I>egrep</I>.
|
|
|
|
They are composed of characters as follows:
|
|
<DL COMPACT>
|
|
<DT id="155"><I>c</I>
|
|
|
|
<DD>
|
|
Matches the non-metacharacter
|
|
<I>c</I>.
|
|
|
|
<DT id="156"><I>\c</I>
|
|
|
|
<DD>
|
|
Matches the literal character
|
|
<I>c</I>.
|
|
|
|
<DT id="157"><B>.</B>
|
|
|
|
<DD>
|
|
Matches any character
|
|
<I>including</I>
|
|
|
|
newline.
|
|
<DT id="158"><B>^</B>
|
|
|
|
<DD>
|
|
Matches the beginning of a string.
|
|
<DT id="159"><B>$</B>
|
|
|
|
<DD>
|
|
Matches the end of a string.
|
|
<DT id="160"><B>[</B><I>abc...</I><B>]</B>
|
|
|
|
<DD>
|
|
A character list: matches any of the characters
|
|
<I>abc...</I>.
|
|
|
|
You may include a range of characters by separating them with a dash.
|
|
To include a literal dash in the list, put it first or last.
|
|
<DT id="161"><B>[^</B><I>abc...</I><B>]</B><DD>
|
|
A negated character list: matches any character except
|
|
<I>abc...</I>.
|
|
|
|
<DT id="162"><I>r1</I><B>|</B><I>r2</I>
|
|
|
|
<DD>
|
|
Alternation: matches either
|
|
<I>r1</I>
|
|
|
|
or
|
|
<I>r2</I>.
|
|
|
|
<DT id="163"><I>r1r2</I>
|
|
|
|
<DD>
|
|
Concatenation: matches
|
|
<I>r1</I>,
|
|
|
|
and then
|
|
<I>r2</I>.
|
|
|
|
<DT id="164"><I>r</I><B>+</B>
|
|
|
|
<DD>
|
|
Matches one or more
|
|
<I>r</I>'s.
|
|
|
|
<DT id="165"><I>r</I><B>*</B>
|
|
|
|
<DD>
|
|
Matches zero or more
|
|
<I>r</I>'s.
|
|
|
|
<DT id="166"><I>r</I><B>?</B>
|
|
|
|
<DD>
|
|
Matches zero or one
|
|
<I>r</I>'s.
|
|
|
|
<DT id="167"><B>(</B><I>r</I><B>)</B>
|
|
|
|
<DD>
|
|
Grouping: matches
|
|
<I>r</I>.
|
|
|
|
<DT id="168">
|
|
<DD>
|
|
<I>r</I><B>{</B><I>n</I><B>}</B>
|
|
|
|
<DT id="169">
|
|
<DD>
|
|
<I>r</I><B>{</B><I>n</I><B>,}</B>
|
|
|
|
<DT id="170">
|
|
<DD>
|
|
<I>r</I><B>{</B><I>n</I><B>,</B><I>m</I><B>}</B>
|
|
|
|
One or two numbers inside braces denote an
|
|
<I>interval expression</I>.
|
|
|
|
If there is one number in the braces, the preceding regular expression
|
|
<I>r</I>
|
|
|
|
is repeated
|
|
<I>n</I>
|
|
|
|
times. If there are two numbers separated by a comma,
|
|
<I>r</I>
|
|
|
|
is repeated
|
|
<I>n</I>
|
|
|
|
to
|
|
<I>m</I>
|
|
|
|
times.
|
|
If there is one number followed by a comma, then
|
|
<I>r</I>
|
|
|
|
is repeated at least
|
|
<I>n</I>
|
|
|
|
times.
|
|
<DT id="171"><B>\y</B>
|
|
|
|
<DD>
|
|
Matches the empty string at either the beginning or the
|
|
end of a word.
|
|
<DT id="172"><B>\B</B>
|
|
|
|
<DD>
|
|
Matches the empty string within a word.
|
|
<DT id="173"><B>\<</B>
|
|
|
|
<DD>
|
|
Matches the empty string at the beginning of a word.
|
|
<DT id="174"><B>\></B>
|
|
|
|
<DD>
|
|
Matches the empty string at the end of a word.
|
|
<DT id="175"><B>\s</B>
|
|
|
|
<DD>
|
|
Matches any whitespace character.
|
|
<DT id="176"><B>\S</B>
|
|
|
|
<DD>
|
|
Matches any nonwhitespace character.
|
|
<DT id="177"><B>\w</B>
|
|
|
|
<DD>
|
|
Matches any word-constituent character (letter, digit, or underscore).
|
|
<DT id="178"><B>\W</B>
|
|
|
|
<DD>
|
|
Matches any character that is not word-constituent.
|
|
<DT id="179"><B>\`</B>
|
|
|
|
<DD>
|
|
Matches the empty string at the beginning of a buffer (string).
|
|
<DT id="180"><B>\'</B>
|
|
|
|
<DD>
|
|
Matches the empty string at the end of a buffer.
|
|
</DL>
|
|
<P>
|
|
|
|
The escape sequences that are valid in string constants (see
|
|
<B>String Constants</B>)
|
|
|
|
are also valid in regular expressions.
|
|
<P>
|
|
|
|
<I>Character classes</I>
|
|
|
|
are a feature introduced in the <FONT SIZE="-1">POSIX</FONT> standard.
|
|
A character class is a special notation for describing
|
|
lists of characters that have a specific attribute, but where the
|
|
actual characters themselves can vary from country to country and/or
|
|
from character set to character set. For example, the notion of what
|
|
is an alphabetic character differs in the USA and in France.
|
|
<P>
|
|
|
|
A character class is only valid in a regular expression
|
|
<I>inside</I>
|
|
|
|
the brackets of a character list. Character classes consist of
|
|
<B>[:</B>,
|
|
|
|
a keyword denoting the class, and
|
|
<B>:]</B>.
|
|
|
|
The character
|
|
classes defined by the <FONT SIZE="-1">POSIX</FONT> standard are:
|
|
<DL COMPACT>
|
|
<DT id="181"><B>[:alnum:]</B>
|
|
|
|
<DD>
|
|
Alphanumeric characters.
|
|
<DT id="182"><B>[:alpha:]</B>
|
|
|
|
<DD>
|
|
Alphabetic characters.
|
|
<DT id="183"><B>[:blank:]</B>
|
|
|
|
<DD>
|
|
Space or tab characters.
|
|
<DT id="184"><B>[:cntrl:]</B>
|
|
|
|
<DD>
|
|
Control characters.
|
|
<DT id="185"><B>[:digit:]</B>
|
|
|
|
<DD>
|
|
Numeric characters.
|
|
<DT id="186"><B>[:graph:]</B>
|
|
|
|
<DD>
|
|
Characters that are both printable and visible.
|
|
(A space is printable, but not visible, while an
|
|
<B>a</B>
|
|
|
|
is both.)
|
|
<DT id="187"><B>[:lower:]</B>
|
|
|
|
<DD>
|
|
Lowercase alphabetic characters.
|
|
<DT id="188"><B>[:print:]</B>
|
|
|
|
<DD>
|
|
Printable characters (characters that are not control characters.)
|
|
<DT id="189"><B>[:punct:]</B>
|
|
|
|
<DD>
|
|
Punctuation characters (characters that are not letter, digits,
|
|
control characters, or space characters).
|
|
<DT id="190"><B>[:space:]</B>
|
|
|
|
<DD>
|
|
Space characters (such as space, tab, and formfeed, to name a few).
|
|
<DT id="191"><B>[:upper:]</B>
|
|
|
|
<DD>
|
|
Uppercase alphabetic characters.
|
|
<DT id="192"><B>[:xdigit:]</B>
|
|
|
|
<DD>
|
|
Characters that are hexadecimal digits.
|
|
</DL>
|
|
<P>
|
|
|
|
For example, before the <FONT SIZE="-1">POSIX</FONT> standard, to match alphanumeric
|
|
characters, you would have had to write
|
|
<B>/[A-Za-z0-9]/</B>.
|
|
|
|
If your character set had other alphabetic characters in it, this would not
|
|
match them, and if your character set collated differently from
|
|
<FONT SIZE="-1">ASCII</FONT>, this might not even match the
|
|
<FONT SIZE="-1">ASCII</FONT> alphanumeric characters.
|
|
With the <FONT SIZE="-1">POSIX</FONT> character classes, you can write
|
|
<B>/[[:alnum:]]/</B>,
|
|
|
|
and this matches
|
|
the alphabetic and numeric characters in your character set,
|
|
no matter what it is.
|
|
<P>
|
|
|
|
Two additional special sequences can appear in character lists.
|
|
These apply to non-<FONT SIZE="-1">ASCII</FONT> character sets, which can have single symbols
|
|
(called
|
|
<I>collating elements</I>)
|
|
|
|
that are represented with more than one
|
|
character, as well as several characters that are equivalent for
|
|
<I>collating</I>,
|
|
|
|
or sorting, purposes. (E.g., in French, a plain ``e''
|
|
and a grave-accented ``e`'' are equivalent.)
|
|
<DL COMPACT>
|
|
<DT id="193">Collating Symbols<DD>
|
|
A collating symbol is a multi-character collating element enclosed in
|
|
<B>[.</B>
|
|
|
|
and
|
|
<B>.]</B>.
|
|
|
|
For example, if
|
|
<B>ch</B>
|
|
|
|
is a collating element, then
|
|
<B>[[.ch.]]</B>
|
|
|
|
is a regular expression that matches this collating element, while
|
|
<B>[ch]</B>
|
|
|
|
is a regular expression that matches either
|
|
<B>c</B>
|
|
|
|
or
|
|
<B>h</B>.
|
|
|
|
<DT id="194">Equivalence Classes<DD>
|
|
An equivalence class is a locale-specific name for a list of
|
|
characters that are equivalent. The name is enclosed in
|
|
<B>[=</B>
|
|
|
|
and
|
|
<B>=]</B>.
|
|
|
|
For example, the name
|
|
<B>e</B>
|
|
|
|
might be used to represent all of
|
|
``e'', ``e''', and ``e`''.
|
|
In this case,
|
|
<B>[[=e=]]</B>
|
|
|
|
is a regular expression
|
|
that matches any of
|
|
<B>e</B>,
|
|
|
|
<B>e'</B>,
|
|
|
|
or
|
|
<B>e`</B>.
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
These features are very valuable in non-English speaking locales.
|
|
The library functions that
|
|
<I>gawk</I>
|
|
|
|
uses for regular expression matching
|
|
currently only recognize <FONT SIZE="-1">POSIX</FONT> character classes; they do not recognize
|
|
collating symbols or equivalence classes.
|
|
<P>
|
|
|
|
The
|
|
<B>\y</B>,
|
|
|
|
<B>\B</B>,
|
|
|
|
<B>\<</B>,
|
|
|
|
<B>\></B>,
|
|
|
|
<B>\s</B>,
|
|
|
|
<B>\S</B>,
|
|
|
|
<B>\w</B>,
|
|
|
|
<B>\W</B>,
|
|
|
|
<B>\`</B>,
|
|
|
|
and
|
|
<B>\'</B>
|
|
|
|
operators are specific to
|
|
<I>gawk</I>;
|
|
|
|
they are extensions based on facilities in the <FONT SIZE="-1">GNU</FONT> regular expression libraries.
|
|
<P>
|
|
|
|
The various command line options
|
|
control how
|
|
<I>gawk</I>
|
|
|
|
interprets characters in regular expressions.
|
|
<DL COMPACT>
|
|
<DT id="195">No options<DD>
|
|
In the default case,
|
|
<I>gawk</I>
|
|
|
|
provides all the facilities of
|
|
<FONT SIZE="-1">POSIX</FONT> regular expressions and the <FONT SIZE="-1">GNU</FONT> regular expression operators described above.
|
|
<DT id="196"><B>--posix</B>
|
|
|
|
<DD>
|
|
Only <FONT SIZE="-1">POSIX</FONT> regular expressions are supported, the <FONT SIZE="-1">GNU</FONT> operators are not special.
|
|
(E.g.,
|
|
<B>\w</B>
|
|
|
|
matches a literal
|
|
<B>w</B>).
|
|
|
|
<DT id="197"><B>--traditional</B>
|
|
|
|
<DD>
|
|
Traditional <FONT SIZE="-1">UNIX</FONT>
|
|
<I>awk</I>
|
|
|
|
regular expressions are matched. The <FONT SIZE="-1">GNU</FONT> operators
|
|
are not special, and interval expressions are not available.
|
|
Characters described by octal and hexadecimal escape sequences are
|
|
treated literally, even if they represent regular expression metacharacters.
|
|
<DT id="198"><B>--re-interval</B>
|
|
|
|
<DD>
|
|
Allow interval expressions in regular expressions, even if
|
|
<B>--traditional</B>
|
|
|
|
has been provided.
|
|
</DL>
|
|
<A NAME="lbAV"> </A>
|
|
<H3>Actions</H3>
|
|
|
|
Action statements are enclosed in braces,
|
|
<B>{</B>
|
|
|
|
and
|
|
<B>}</B>.
|
|
|
|
Action statements consist of the usual assignment, conditional, and looping
|
|
statements found in most languages. The operators, control statements,
|
|
and input/output statements
|
|
available are patterned after those in C.
|
|
<A NAME="lbAW"> </A>
|
|
<H3>Operators</H3>
|
|
|
|
<P>
|
|
|
|
The operators in <FONT SIZE="-1">AWK</FONT>, in order of decreasing precedence, are:
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="199"><B>(</B>...<B>)</B>
|
|
|
|
<DD>
|
|
Grouping
|
|
<DT id="200"><B>$</B>
|
|
|
|
<DD>
|
|
Field reference.
|
|
<DT id="201"><B>++ --</B>
|
|
|
|
<DD>
|
|
Increment and decrement, both prefix and postfix.
|
|
<DT id="202"><B>^</B>
|
|
|
|
<DD>
|
|
Exponentiation (<B>**</B> may also be used, and <B>**=</B> for
|
|
the assignment operator).
|
|
<DT id="203"><B>+ - !</B>
|
|
|
|
<DD>
|
|
Unary plus, unary minus, and logical negation.
|
|
<DT id="204"><B>* / %</B>
|
|
|
|
<DD>
|
|
Multiplication, division, and modulus.
|
|
<DT id="205"><B>+ -</B>
|
|
|
|
<DD>
|
|
Addition and subtraction.
|
|
<DT id="206"><I>space</I>
|
|
|
|
<DD>
|
|
String concatenation.
|
|
<DT id="207"><B>| |&</B>
|
|
|
|
<DD>
|
|
Piped I/O for
|
|
<B>getline</B>,
|
|
|
|
<B>print</B>,
|
|
|
|
and
|
|
<B>printf</B>.
|
|
|
|
<DT id="208"><B>< > <= >= == !=</B>
|
|
|
|
<DD>
|
|
The regular relational operators.
|
|
<DT id="209"><B>~ !~</B>
|
|
|
|
<DD>
|
|
Regular expression match, negated match.
|
|
<B>NOTE</B>:
|
|
|
|
Do not use a constant regular expression
|
|
(<B>/foo/</B>)
|
|
|
|
on the left-hand side of a
|
|
<B>~</B>
|
|
|
|
or
|
|
<B>!~</B>.
|
|
|
|
Only use one on the right-hand side. The expression
|
|
<B>/foo/ ~ </B><I>exp</I>
|
|
|
|
has the same meaning as <B>(($0 ~ /foo/) ~ </B><I>exp</I><B>)</B>.
|
|
This is usually
|
|
<I>not</I>
|
|
|
|
what you want.
|
|
<DT id="210"><B>in</B>
|
|
|
|
<DD>
|
|
Array membership.
|
|
<DT id="211"><B>&&</B>
|
|
|
|
<DD>
|
|
Logical AND.
|
|
<DT id="212"><B>||</B>
|
|
|
|
<DD>
|
|
Logical OR.
|
|
<DT id="213"><B>?:</B>
|
|
|
|
<DD>
|
|
The C conditional expression. This has the form
|
|
<I>expr1</I><B> ? </B><I>expr2</I><B> : </B><I>expr3</I>.
|
|
If
|
|
<I>expr1</I>
|
|
|
|
is true, the value of the expression is
|
|
<I>expr2</I>,
|
|
|
|
otherwise it is
|
|
<I>expr3</I>.
|
|
|
|
Only one of
|
|
<I>expr2</I>
|
|
|
|
and
|
|
<I>expr3</I>
|
|
|
|
is evaluated.
|
|
<DT id="214"><B>= += -= *= /= %= ^=</B>
|
|
|
|
<DD>
|
|
Assignment. Both absolute assignment
|
|
<B>(</B><I>var</I><B> = </B><I>value</I><B>)</B>
|
|
|
|
and operator-assignment (the other forms) are supported.
|
|
</DL>
|
|
<A NAME="lbAX"> </A>
|
|
<H3>Control Statements</H3>
|
|
|
|
<P>
|
|
|
|
The control statements are
|
|
as follows:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="215"><DD>
|
|
<PRE>
|
|
<B>if (</B><I>condition</I><B>) </B><I>statement</I> [ <B>else</B><I> statement </I>]
|
|
<B>while (</B><I>condition</I><B>) </B><I>statement </I>
|
|
<B>do </B><I>statement </I><B>while (</B><I>condition</I><B>)</B>
|
|
<B>for (</B><I>expr1</I><B>; </B><I>expr2</I><B>; </B><I>expr3</I><B>) </B><I>statement</I>
|
|
<B>for (</B><I>var </I><B>in</B><I> array</I><B>) </B><I>statement</I>
|
|
<B>break</B>
|
|
<B>continue</B>
|
|
<B>delete </B><I>array</I><B>[</B><I>index</I><B>]</B>
|
|
<B>delete </B><I>array</I>
|
|
<B>exit</B> [ <I>expression</I> ]
|
|
<B>{ </B><I>statements </I><B>}</B>
|
|
<B>switch (</B><I>expression</I><B>) {
|
|
case </B><I>value</I><B>|</B><I>regex</I><B> : </B><I>statement
|
|
...
|
|
</I>[ <B>default: </B><I>statement </I>]
|
|
<B>}</B>
|
|
</PRE>
|
|
|
|
</DL>
|
|
|
|
<A NAME="lbAY"> </A>
|
|
<H3>I/O Statements</H3>
|
|
|
|
<P>
|
|
|
|
The input/output statements are as follows:
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="216"><B>close(</B><I>file </I>[<B>, </B><I>how</I>]<B>)</B><DD>
|
|
Close file, pipe or coprocess.
|
|
The optional
|
|
<I>how</I>
|
|
|
|
should only be used when closing one end of a
|
|
two-way pipe to a coprocess.
|
|
It must be a string value, either
|
|
<B>"to"</B> or <B>"from"</B>.
|
|
<DT id="217"><B>getline</B>
|
|
|
|
<DD>
|
|
Set
|
|
<B>$0</B>
|
|
|
|
from the next input record; set
|
|
<B>NF</B>,
|
|
|
|
<B>NR</B>,
|
|
|
|
<B>FNR</B>,
|
|
|
|
<B>RT</B>.
|
|
|
|
<DT id="218"><B>getline <</B><I>file</I>
|
|
|
|
<DD>
|
|
Set
|
|
<B>$0</B>
|
|
|
|
from the next record of
|
|
<I>file</I>;
|
|
|
|
set
|
|
<B>NF</B>,
|
|
|
|
<B>RT</B>.
|
|
|
|
<DT id="219"><B>getline</B><I> var</I>
|
|
|
|
<DD>
|
|
Set
|
|
<I>var</I>
|
|
|
|
from the next input record; set
|
|
<B>NR</B>,
|
|
|
|
<B>FNR</B>,
|
|
|
|
<B>RT</B>.
|
|
|
|
<DT id="220"><B>getline</B><I> var</I><B> <</B><I>file</I>
|
|
|
|
<DD>
|
|
Set
|
|
<I>var</I>
|
|
|
|
from the next record of
|
|
<I>file</I>;
|
|
|
|
set
|
|
<B>RT</B>.
|
|
|
|
<DT id="221"><I>command</I><B> | getline </B>[<I>var</I>]<DD>
|
|
Run
|
|
<I>command</I>,
|
|
|
|
piping the output either into
|
|
<B>$0</B>
|
|
|
|
or
|
|
<I>var</I>,
|
|
|
|
as above, and
|
|
<B>RT</B>.
|
|
|
|
<DT id="222"><I>command</I><B> |& getline </B>[<I>var</I>]<DD>
|
|
Run
|
|
<I>command</I>
|
|
|
|
as a coprocess
|
|
piping the output either into
|
|
<B>$0</B>
|
|
|
|
or
|
|
<I>var</I>,
|
|
|
|
as above, and
|
|
<B>RT</B>.
|
|
|
|
Coprocesses are a
|
|
<I>gawk</I>
|
|
|
|
extension.
|
|
(The <I>command</I>
|
|
|
|
can also be a socket. See the subsection
|
|
<B>Special File Names</B>,
|
|
|
|
below.)
|
|
<DT id="223"><B>next</B>
|
|
|
|
<DD>
|
|
Stop processing the current input record.
|
|
Read the next input record
|
|
and start processing over with the first pattern in the
|
|
<FONT SIZE="-1">AWK</FONT> program.
|
|
Upon reaching the end of the input data,
|
|
execute any
|
|
<B>END</B>
|
|
|
|
rule(s).
|
|
<DT id="224"><B>nextfile</B>
|
|
|
|
<DD>
|
|
Stop processing the current input file. The next input record read
|
|
comes from the next input file.
|
|
Update
|
|
<B>FILENAME</B>
|
|
|
|
and
|
|
<B>ARGIND</B>,
|
|
|
|
reset
|
|
<B>FNR</B>
|
|
|
|
to 1, and start processing over with the first pattern in the
|
|
<FONT SIZE="-1">AWK</FONT> program.
|
|
Upon reaching the end of the input data,
|
|
execute any
|
|
<B>ENDFILE</B>
|
|
|
|
and
|
|
<B>END</B>
|
|
|
|
rule(s).
|
|
<DT id="225"><B>print</B>
|
|
|
|
<DD>
|
|
Print the current record.
|
|
The output record is terminated with the value of
|
|
<B>ORS</B>.
|
|
|
|
<DT id="226"><B>print</B><I> expr-list</I>
|
|
|
|
<DD>
|
|
Print expressions.
|
|
Each expression is separated by the value of
|
|
<B>OFS</B>.
|
|
|
|
The output record is terminated with the value of
|
|
<B>ORS</B>.
|
|
|
|
<DT id="227"><B>print</B><I> expr-list</I><B> ></B><I>file</I>
|
|
|
|
<DD>
|
|
Print expressions on
|
|
<I>file</I>.
|
|
|
|
Each expression is separated by the value of
|
|
<B>OFS</B>.
|
|
|
|
The output record is terminated with the value of
|
|
<B>ORS</B>.
|
|
|
|
<DT id="228"><B>printf</B><I> fmt, expr-list</I>
|
|
|
|
<DD>
|
|
Format and print.
|
|
See <B>The </B><I>printf </I><B>Statement</B>, below.
|
|
<DT id="229"><B>printf</B><I> fmt, expr-list</I><B> ></B><I>file</I>
|
|
|
|
<DD>
|
|
Format and print on
|
|
<I>file</I>.
|
|
|
|
<DT id="230"><B>system(</B><I>cmd-line</I><B>)</B>
|
|
|
|
<DD>
|
|
Execute the command
|
|
<I>cmd-line</I>,
|
|
|
|
and return the exit status.
|
|
(This may not be available on non-<FONT SIZE="-1">POSIX</FONT> systems.)
|
|
See <I>GAWK: Effective AWK Programming</I> for the full details on the exit status.
|
|
<DT id="231"><B>fflush(</B>[<I>file</I>]<B>)</B><DD>
|
|
Flush any buffers associated with the open output file or pipe
|
|
<I>file</I>.
|
|
|
|
If
|
|
<I>file</I>
|
|
|
|
is missing or if it
|
|
is the null string,
|
|
then flush all open output files and pipes.
|
|
</DL>
|
|
<P>
|
|
|
|
Additional output redirections are allowed for
|
|
<B>print</B>
|
|
|
|
and
|
|
<B>printf</B>.
|
|
|
|
<DL COMPACT>
|
|
<DT id="232"><B>print ... >></B><I> file</I>
|
|
|
|
<DD>
|
|
Append output to the
|
|
<I>file</I>.
|
|
|
|
<DT id="233"><B>print ... |</B><I> command</I>
|
|
|
|
<DD>
|
|
Write on a pipe.
|
|
<DT id="234"><B>print ... |&</B><I> command</I>
|
|
|
|
<DD>
|
|
Send data to a coprocess or socket.
|
|
(See also the subsection
|
|
<B>Special File Names</B>,
|
|
|
|
below.)
|
|
</DL>
|
|
<P>
|
|
|
|
The
|
|
<B>getline</B>
|
|
|
|
command returns 1 on success, zero on end of file, and -1 on an error.
|
|
If the
|
|
<I><A HREF="/cgi-bin/man/man2html?3+errno">errno</A></I>(3)
|
|
|
|
value indicates that the I/O operation may be retried,
|
|
and <B>PROCINFO["</B><I>input</I>", "RETRY"]
|
|
is set, then -2 is returned instead of -1, and further calls to
|
|
<B>getline</B>
|
|
|
|
may be attempted.
|
|
Upon an error,
|
|
<B>ERRNO</B>
|
|
|
|
is set to a string describing the problem.
|
|
<P>
|
|
|
|
<B>NOTE</B>:
|
|
|
|
Failure in opening a two-way socket results in a non-fatal error being
|
|
returned to the calling function. If using a pipe, coprocess, or socket to
|
|
<B>getline</B>,
|
|
|
|
or from
|
|
<B>print</B>
|
|
|
|
or
|
|
<B>printf</B>
|
|
|
|
within a loop, you
|
|
<I>must</I>
|
|
|
|
use
|
|
<B>close()</B>
|
|
|
|
to create new instances of the command or socket.
|
|
<FONT SIZE="-1">AWK</FONT> does not automatically close pipes, sockets, or coprocesses when
|
|
they return EOF.
|
|
<A NAME="lbAZ"> </A>
|
|
<H3>The <I>printf</I> Statement</H3>
|
|
|
|
<P>
|
|
|
|
The <FONT SIZE="-1">AWK</FONT> versions of the
|
|
<B>printf</B>
|
|
|
|
statement and
|
|
<B>sprintf()</B>
|
|
|
|
function
|
|
(see below)
|
|
accept the following conversion specification formats:
|
|
<DL COMPACT>
|
|
<DT id="235"><B>%a</B>,<B> %A</B>
|
|
|
|
<DD>
|
|
A floating point number of the form
|
|
[<B>-</B>]<B>0x</B><I>h</I><B>.</B><I>hhhh</I><B>p+-</B><I>dd</I>
|
|
(C99 hexadecimal floating point format).
|
|
For
|
|
<B>%A</B>,
|
|
|
|
uppercase letters are used instead of lowercase ones.
|
|
<DT id="236"><B>%c</B>
|
|
|
|
<DD>
|
|
A single character.
|
|
If the argument used for
|
|
<B>%c</B>
|
|
|
|
is numeric, it is treated as a character and printed.
|
|
Otherwise, the argument is assumed to be a string, and the only first
|
|
character of that string is printed.
|
|
<DT id="237"><B>%d</B>,<B> %i</B>
|
|
|
|
<DD>
|
|
A decimal number (the integer part).
|
|
<DT id="238"><B>%e</B>,<B> %E</B>
|
|
|
|
<DD>
|
|
A floating point number of the form
|
|
[<B>-</B>]<I>d</I><B>.</B><I>dddddd</I><B>e</B>[<B>+-</B>]<I>dd</I>.
|
|
The
|
|
<B>%E</B>
|
|
|
|
format uses
|
|
<B>E</B>
|
|
|
|
instead of
|
|
<B>e</B>.
|
|
|
|
<DT id="239"><B>%f</B>,<B> %F</B>
|
|
|
|
<DD>
|
|
A floating point number of the form
|
|
[<B>-</B>]<I>ddd</I><B>.</B><I>dddddd</I>.
|
|
If the system library supports it,
|
|
<B>%F</B>
|
|
|
|
is available as well. This is like
|
|
<B>%f</B>,
|
|
|
|
but uses capital letters for special ``not a number''
|
|
and ``infinity'' values. If
|
|
<B>%F</B>
|
|
|
|
is not available,
|
|
<I>gawk</I>
|
|
|
|
uses
|
|
<B>%f</B>.
|
|
|
|
<DT id="240"><B>%g</B>,<B> %G</B>
|
|
|
|
<DD>
|
|
Use
|
|
<B>%e</B>
|
|
|
|
or
|
|
<B>%f</B>
|
|
|
|
conversion, whichever is shorter, with nonsignificant zeros suppressed.
|
|
The
|
|
<B>%G</B>
|
|
|
|
format uses
|
|
<B>%E</B>
|
|
|
|
instead of
|
|
<B>%e</B>.
|
|
|
|
<DT id="241"><B>%o</B>
|
|
|
|
<DD>
|
|
An unsigned octal number (also an integer).
|
|
<DT id="242">
|
|
<DD>
|
|
<B>%u</B>
|
|
|
|
An unsigned decimal number (again, an integer).
|
|
<DT id="243"><B>%s</B>
|
|
|
|
<DD>
|
|
A character string.
|
|
<DT id="244"><B>%x</B>,<B> %X</B>
|
|
|
|
<DD>
|
|
An unsigned hexadecimal number (an integer).
|
|
The
|
|
<B>%X</B>
|
|
|
|
format uses
|
|
<B>ABCDEF</B>
|
|
|
|
instead of
|
|
<B>abcdef</B>.
|
|
|
|
<DT id="245"><B>%%</B>
|
|
|
|
<DD>
|
|
A single
|
|
<B>%</B>
|
|
|
|
character; no argument is converted.
|
|
</DL>
|
|
<P>
|
|
|
|
Optional, additional parameters may lie between the
|
|
<B>%</B>
|
|
|
|
and the control letter:
|
|
<DL COMPACT>
|
|
<DT id="246"><I>count</I><B>$</B>
|
|
|
|
<DD>
|
|
Use the
|
|
<I>count</I>'th
|
|
|
|
argument at this point in the formatting.
|
|
This is called a
|
|
<I>positional specifier</I>
|
|
|
|
and
|
|
is intended primarily for use in translated versions of
|
|
format strings, not in the original text of an AWK program.
|
|
It is a
|
|
<I>gawk</I>
|
|
|
|
extension.
|
|
<DT id="247"><B>-</B>
|
|
|
|
<DD>
|
|
The expression should be left-justified within its field.
|
|
<DT id="248"><I>space</I>
|
|
|
|
<DD>
|
|
For numeric conversions, prefix positive values with a space, and
|
|
negative values with a minus sign.
|
|
<DT id="249"><B>+</B>
|
|
|
|
<DD>
|
|
The plus sign, used before the width modifier (see below),
|
|
says to always supply a sign for numeric conversions, even if the data
|
|
to be formatted is positive. The
|
|
<B>+</B>
|
|
|
|
overrides the space modifier.
|
|
<DT id="250"><B>#</B>
|
|
|
|
<DD>
|
|
Use an ``alternate form'' for certain control letters.
|
|
For
|
|
<B>%o</B>,
|
|
|
|
supply a leading zero.
|
|
For
|
|
<B>%x</B>,
|
|
|
|
and
|
|
<B>%X</B>,
|
|
|
|
supply a leading
|
|
<B>0x</B>
|
|
|
|
or
|
|
<B>0X</B>
|
|
|
|
for
|
|
a nonzero result.
|
|
For
|
|
<B>%e</B>,
|
|
|
|
<B>%E</B>,
|
|
|
|
<B>%f</B>
|
|
|
|
and
|
|
<B>%F</B>,
|
|
|
|
the result always contains a
|
|
decimal point.
|
|
For
|
|
<B>%g</B>,
|
|
|
|
and
|
|
<B>%G</B>,
|
|
|
|
trailing zeros are not removed from the result.
|
|
<DT id="251"><B>0</B>
|
|
|
|
<DD>
|
|
A leading
|
|
<B>0</B>
|
|
|
|
(zero) acts as a flag, indicating that output should be
|
|
padded with zeroes instead of spaces.
|
|
This applies only to the numeric output formats.
|
|
This flag only has an effect when the field width is wider than the
|
|
value to be printed.
|
|
<DT id="252"><B>'</B>
|
|
|
|
<DD>
|
|
A single quote character instructs
|
|
<I>gawk</I>
|
|
|
|
to insert the locale's thousands-separator character
|
|
into decimal numbers, and to also use the locale's
|
|
decimal point character with floating point formats.
|
|
This requires correct locale support in the C library
|
|
and in the definition of the current locale.
|
|
<DT id="253"><I>width</I>
|
|
|
|
<DD>
|
|
The field should be padded to this width. The field is normally padded
|
|
with spaces. With the
|
|
<B>0</B>
|
|
|
|
flag, it is padded with zeroes.
|
|
<DT id="254"><B>.</B><I>prec</I>
|
|
|
|
<DD>
|
|
A number that specifies the precision to use when printing.
|
|
For the
|
|
<B>%e</B>,
|
|
|
|
<B>%E</B>,
|
|
|
|
<B>%f</B>
|
|
|
|
and
|
|
<B>%F</B>,
|
|
|
|
formats, this specifies the
|
|
number of digits you want printed to the right of the decimal point.
|
|
For the
|
|
<B>%g</B>,
|
|
|
|
and
|
|
<B>%G</B>
|
|
|
|
formats, it specifies the maximum number
|
|
of significant digits. For the
|
|
<B>%d</B>,
|
|
|
|
<B>%i</B>,
|
|
|
|
<B>%o</B>,
|
|
|
|
<B>%u</B>,
|
|
|
|
<B>%x</B>,
|
|
|
|
and
|
|
<B>%X</B>
|
|
|
|
formats, it specifies the minimum number of
|
|
digits to print. For the
|
|
<B>%s </B>
|
|
|
|
format,
|
|
it specifies the maximum number of
|
|
characters from the string that should be printed.
|
|
</DL>
|
|
<P>
|
|
|
|
The dynamic
|
|
<I>width</I>
|
|
|
|
and
|
|
<I>prec</I>
|
|
|
|
capabilities of the ISO C
|
|
<B>printf()</B>
|
|
|
|
routines are supported.
|
|
A
|
|
<B>*</B>
|
|
|
|
in place of either the
|
|
<I>width</I>
|
|
|
|
or
|
|
<I>prec</I>
|
|
|
|
specifications causes their values to be taken from
|
|
the argument list to
|
|
<B>printf</B>
|
|
|
|
or
|
|
<B>sprintf()</B>.
|
|
|
|
To use a positional specifier with a dynamic width or precision,
|
|
supply the
|
|
<I>count</I><B>$</B>
|
|
|
|
after the
|
|
<B>*</B>
|
|
|
|
in the format string.
|
|
For example, <B>"%3$*2$.*1$s"</B>.
|
|
<A NAME="lbBA"> </A>
|
|
<H3>Special File Names</H3>
|
|
|
|
<P>
|
|
|
|
When doing I/O redirection from either
|
|
<B>print</B>
|
|
|
|
or
|
|
<B>printf</B>
|
|
|
|
into a file,
|
|
or via
|
|
<B>getline</B>
|
|
|
|
from a file,
|
|
<I>gawk</I>
|
|
|
|
recognizes certain special filenames internally. These filenames
|
|
allow access to open file descriptors inherited from
|
|
<I>gawk</I>'s
|
|
|
|
parent process (usually the shell).
|
|
These file names may also be used on the command line to name data files.
|
|
The filenames are:
|
|
<DL COMPACT>
|
|
<DT id="255"><B>-</B>
|
|
|
|
<DD>
|
|
The standard input.
|
|
<DT id="256"><B>/dev/stdin</B>
|
|
|
|
<DD>
|
|
The standard input.
|
|
<DT id="257"><B>/dev/stdout</B>
|
|
|
|
<DD>
|
|
The standard output.
|
|
<DT id="258"><B>/dev/stderr</B>
|
|
|
|
<DD>
|
|
The standard error output.
|
|
<DT id="259"><B>/dev/fd/</B><I>n</I>
|
|
|
|
<DD>
|
|
The file associated with the open file descriptor
|
|
<I>n</I>.
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
These are particularly useful for error messages. For example:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="260"><DD>
|
|
<B>
|
|
print "You blew it!" > "/dev/stderr"
|
|
</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
whereas you would otherwise have to use
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="261"><DD>
|
|
<B>
|
|
print "You blew it!" | "cat 1>&2"
|
|
</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
The following special filenames may be used with the
|
|
<B>|&</B>
|
|
|
|
coprocess operator for creating TCP/IP network connections:
|
|
<DL COMPACT>
|
|
<DT id="262">
|
|
<DD>
|
|
<B>/inet/tcp/</B><I>lport</I><B>/</B><I>rhost</I><B>/</B><I>rport</I>
|
|
|
|
<DT id="263">
|
|
<DD>
|
|
<B>/inet4/tcp/</B><I>lport</I><B>/</B><I>rhost</I><B>/</B><I>rport</I>
|
|
|
|
<DT id="264">
|
|
<DD>
|
|
<B>/inet6/tcp/</B><I>lport</I><B>/</B><I>rhost</I><B>/</B><I>rport</I>
|
|
|
|
Files for a TCP/IP connection on local port
|
|
<I>lport</I>
|
|
|
|
to
|
|
remote host
|
|
<I>rhost</I>
|
|
|
|
on remote port
|
|
<I>rport</I>.
|
|
|
|
Use a port of
|
|
<B>0</B>
|
|
|
|
to have the system pick a port.
|
|
Use
|
|
<B>/inet4</B>
|
|
|
|
to force an IPv4 connection,
|
|
and
|
|
<B>/inet6</B>
|
|
|
|
to force an IPv6 connection.
|
|
Plain
|
|
<B>/inet</B>
|
|
|
|
uses the system default (most likely IPv4).
|
|
Usable only with the
|
|
<B>|&</B>
|
|
|
|
two-way I/O operator.
|
|
<DT id="265">
|
|
<DD>
|
|
<B>/inet/udp/</B><I>lport</I><B>/</B><I>rhost</I><B>/</B><I>rport</I>
|
|
|
|
<DT id="266">
|
|
<DD>
|
|
<B>/inet4/udp/</B><I>lport</I><B>/</B><I>rhost</I><B>/</B><I>rport</I>
|
|
|
|
<DT id="267">
|
|
<DD>
|
|
<B>/inet6/udp/</B><I>lport</I><B>/</B><I>rhost</I><B>/</B><I>rport</I>
|
|
|
|
Similar, but use UDP/IP instead of TCP/IP.
|
|
</DL>
|
|
<A NAME="lbBB"> </A>
|
|
<H3>Numeric Functions</H3>
|
|
|
|
<P>
|
|
|
|
<FONT SIZE="-1">AWK</FONT> has the following built-in arithmetic functions:
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="268"><B>atan2(</B><I>y</I><B>,</B><I> x</I><B>)</B>
|
|
|
|
<DD>
|
|
Return the arctangent of
|
|
<I>y/x</I>
|
|
|
|
in radians.
|
|
<DT id="269"><B>cos(</B><I>expr</I><B>)</B>
|
|
|
|
<DD>
|
|
Return the cosine of
|
|
<I>expr</I>,
|
|
|
|
which is in radians.
|
|
<DT id="270"><B>exp(</B><I>expr</I><B>)</B>
|
|
|
|
<DD>
|
|
The exponential function.
|
|
<DT id="271"><B>int(</B><I>expr</I><B>)</B>
|
|
|
|
<DD>
|
|
Truncate to integer.
|
|
|
|
<DT id="272"><B>log(</B><I>expr</I><B>)</B>
|
|
|
|
<DD>
|
|
The natural logarithm function.
|
|
<DT id="273"><B>rand()</B>
|
|
|
|
<DD>
|
|
Return a random number
|
|
<I>N</I>,
|
|
|
|
between zero and one,
|
|
such that 0 ≤ <I>N</I> < 1.
|
|
<DT id="274"><B>sin(</B><I>expr</I><B>)</B>
|
|
|
|
<DD>
|
|
Return the sine of
|
|
<I>expr</I>,
|
|
|
|
which is in radians.
|
|
<DT id="275"><B>sqrt(</B><I>expr</I><B>)</B>
|
|
|
|
<DD>
|
|
Return the square root of
|
|
<I>expr</I>.
|
|
|
|
<DT id="276"><B>srand(</B>[<I>expr</I>]<B>)</B><DD>
|
|
Use
|
|
<I>expr</I>
|
|
|
|
as the new seed for the random number generator. If no
|
|
<I>expr</I>
|
|
|
|
is provided, use the time of day.
|
|
Return the previous seed for the random
|
|
number generator.
|
|
</DL>
|
|
<A NAME="lbBC"> </A>
|
|
<H3>String Functions</H3>
|
|
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
has the following built-in string functions:
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="277"><B>asort(</B><I>s </I>[<B>, </B><I>d</I> [<B>, </B><I>how</I>] ]<B>)</B><DD>
|
|
Return the number of elements in the source
|
|
array
|
|
<I>s</I>.
|
|
|
|
Sort
|
|
the contents of
|
|
<I>s</I>
|
|
|
|
using
|
|
<I>gawk</I>'s
|
|
|
|
normal rules for
|
|
comparing values, and replace the indices of the
|
|
sorted values
|
|
<I>s</I>
|
|
|
|
with sequential
|
|
integers starting with 1. If the optional
|
|
destination array
|
|
<I>d</I>
|
|
|
|
is specified,
|
|
first duplicate
|
|
<I>s</I>
|
|
|
|
into
|
|
<I>d</I>,
|
|
|
|
and then sort
|
|
<I>d</I>,
|
|
|
|
leaving the indices of the
|
|
source array
|
|
<I>s</I>
|
|
|
|
unchanged. The optional string
|
|
<I>how</I>
|
|
|
|
controls the direction and the comparison mode.
|
|
Valid values for
|
|
<I>how</I>
|
|
|
|
are
|
|
any of the strings valid for
|
|
<B>PROCINFO["sorted_in"]</B>.
|
|
It can also be the name of a user-defined
|
|
comparison function as described in
|
|
<B>PROCINFO["sorted_in"]</B>.
|
|
<DT id="278"><B>asorti(</B><I>s </I>[<B>, </B><I>d</I> [<B>, </B><I>how</I>] ]<B>)</B><DD>
|
|
Return the number of elements in the source
|
|
array
|
|
<I>s</I>.
|
|
|
|
The behavior is the same as that of
|
|
<B>asort()</B>,
|
|
|
|
except that the array
|
|
<I>indices</I>
|
|
|
|
are used for sorting, not the array values.
|
|
When done, the array is indexed numerically, and
|
|
the values are those of the original indices.
|
|
The original values are lost; thus provide
|
|
a second array if you wish to preserve the original.
|
|
The purpose of the optional string
|
|
<I>how</I>
|
|
|
|
is the same as described
|
|
previously for
|
|
<B>asort()</B>.
|
|
|
|
<DT id="279"><B>gensub(</B><I>r</I><B>, </B><I>s</I><B>, </B><I>h </I>[<B>, </B><I>t</I>]<B>)</B><DD>
|
|
Search the target string
|
|
<I>t</I>
|
|
|
|
for matches of the regular expression
|
|
<I>r</I>.
|
|
|
|
If
|
|
<I>h</I>
|
|
|
|
is a string beginning with
|
|
<B>g</B>
|
|
|
|
or
|
|
<B>G</B>,
|
|
|
|
then replace all matches of
|
|
<I>r</I>
|
|
|
|
with
|
|
<I>s</I>.
|
|
|
|
Otherwise,
|
|
<I>h</I>
|
|
|
|
is a number indicating which match of
|
|
<I>r</I>
|
|
|
|
to replace.
|
|
If
|
|
<I>t</I>
|
|
|
|
is not supplied, use
|
|
<B>$0</B>
|
|
|
|
instead.
|
|
Within the replacement text
|
|
<I>s</I>,
|
|
|
|
the sequence
|
|
<B>\</B><I>n</I>,
|
|
|
|
where
|
|
<I>n</I>
|
|
|
|
is a digit from 1 to 9, may be used to indicate just the text that
|
|
matched the
|
|
<I>n</I>'th
|
|
|
|
parenthesized subexpression. The sequence
|
|
<B>\0</B>
|
|
|
|
represents the entire matched text, as does the character
|
|
<B>&</B>.
|
|
|
|
Unlike
|
|
<B>sub()</B>
|
|
|
|
and
|
|
<B>gsub()</B>,
|
|
|
|
the modified string is returned as the result of the function,
|
|
and the original target string is
|
|
<I>not</I>
|
|
|
|
changed.
|
|
<DT id="280"><B>gsub(</B><I>r</I><B>, </B><I>s </I>[<B>, </B><I>t</I>]<B>)</B><DD>
|
|
For each substring matching the regular expression
|
|
<I>r</I>
|
|
|
|
in the string
|
|
<I>t</I>,
|
|
|
|
substitute the string
|
|
<I>s</I>,
|
|
|
|
and return the number of substitutions.
|
|
If
|
|
<I>t</I>
|
|
|
|
is not supplied, use
|
|
<B>$0</B>.
|
|
|
|
An
|
|
<B>&</B>
|
|
|
|
in the replacement text is replaced with the text that was actually matched.
|
|
Use
|
|
<B>\&</B>
|
|
|
|
to get a literal
|
|
<B>&</B>.
|
|
|
|
(This must be typed as <B>"\\&"</B>;
|
|
see <I>GAWK: Effective AWK Programming</I>
|
|
for a fuller discussion of the rules for ampersands
|
|
and backslashes in the replacement text of
|
|
<B>sub()</B>,
|
|
|
|
<B>gsub()</B>,
|
|
|
|
and
|
|
<B>gensub()</B>.)
|
|
|
|
<DT id="281"><B>index(</B><I>s</I><B>,</B><I> t</I><B>)</B>
|
|
|
|
<DD>
|
|
Return the index of the string
|
|
<I>t</I>
|
|
|
|
in the string
|
|
<I>s</I>,
|
|
|
|
or zero if
|
|
<I>t</I>
|
|
|
|
is not present.
|
|
(This implies that character indices start at one.)
|
|
It is a fatal error to use a regexp constant for
|
|
<I>t</I>.
|
|
|
|
<DT id="282"><B>length(</B>[<I>s</I>]<B>)<DD>
|
|
Return the length of the string
|
|
</B><I>s</I>,
|
|
|
|
or the length of
|
|
<B>$0</B>
|
|
|
|
if
|
|
<I>s</I>
|
|
|
|
is not supplied.
|
|
As a non-standard extension, with an array argument,
|
|
<B>length()</B>
|
|
|
|
returns the number of elements in the array.
|
|
<DT id="283"><B>match(</B><I>s</I><B>, </B><I>r </I>[<B>, </B><I>a</I>]<B>)</B><DD>
|
|
Return the position in
|
|
<I>s</I>
|
|
|
|
where the regular expression
|
|
<I>r</I>
|
|
|
|
occurs, or zero if
|
|
<I>r</I>
|
|
|
|
is not present, and set the values of
|
|
<B>RSTART</B>
|
|
|
|
and
|
|
<B>RLENGTH</B>.
|
|
|
|
Note that the argument order is the same as for the
|
|
<B>~</B>
|
|
|
|
operator:
|
|
<I>str</I><B> ~</B>
|
|
|
|
<I>re</I>.
|
|
|
|
|
|
If array
|
|
<I>a</I>
|
|
|
|
is provided,
|
|
<I>a</I>
|
|
|
|
is cleared and then elements 1 through
|
|
<I>n</I>
|
|
|
|
are filled with the portions of
|
|
<I>s</I>
|
|
|
|
that match the corresponding parenthesized
|
|
subexpression in
|
|
<I>r</I>.
|
|
|
|
The zero'th element of
|
|
<I>a</I>
|
|
|
|
contains the portion
|
|
of
|
|
<I>s</I>
|
|
|
|
matched by the entire regular expression
|
|
<I>r</I>.
|
|
|
|
Subscripts
|
|
<B>a[</B><I>n</I><B>, "start"]</B>,
|
|
and
|
|
<B>a[</B><I>n</I><B>, "length"]</B>
|
|
provide the starting index in the string and length
|
|
respectively, of each matching substring.
|
|
<DT id="284"><B>patsplit(</B><I>s</I><B>, </B><I>a </I>[<B>, </B><I>r</I> [<B>, </B><I>seps</I>] ]<B>)</B><DD>
|
|
Split the string
|
|
<I>s</I>
|
|
|
|
into the array
|
|
<I>a</I>
|
|
|
|
and the separators array
|
|
<I>seps</I>
|
|
|
|
on the regular expression
|
|
<I>r</I>,
|
|
|
|
and return the number of fields.
|
|
Element values are the portions of
|
|
<I>s</I>
|
|
|
|
that matched
|
|
<I>r</I>.
|
|
|
|
The value of
|
|
<B>seps[</B><I>i</I><B>]</B>
|
|
|
|
is the possibly null separator that appeared after
|
|
<B>a[</B><I>i</I><B>]</B>.
|
|
|
|
The value of
|
|
<B>seps[0]</B>
|
|
|
|
is the possibly null leading separator.
|
|
If
|
|
<I>r</I>
|
|
|
|
is omitted,
|
|
<B>FPAT</B>
|
|
|
|
is used instead.
|
|
The arrays
|
|
<I>a</I>
|
|
|
|
and
|
|
<I>seps</I>
|
|
|
|
are cleared first.
|
|
Splitting behaves identically to field splitting with
|
|
<B>FPAT</B>,
|
|
|
|
described above.
|
|
<DT id="285"><B>split(</B><I>s</I><B>, </B><I>a </I>[<B>, </B><I>r</I> [<B>, </B><I>seps</I>] ]<B>)</B><DD>
|
|
Split the string
|
|
<I>s</I>
|
|
|
|
into the array
|
|
<I>a</I>
|
|
|
|
and the separators array
|
|
<I>seps</I>
|
|
|
|
on the regular expression
|
|
<I>r</I>,
|
|
|
|
and return the number of fields. If
|
|
<I>r</I>
|
|
|
|
is omitted,
|
|
<B>FS</B>
|
|
|
|
is used instead.
|
|
The arrays
|
|
<I>a</I>
|
|
|
|
and
|
|
<I>seps</I>
|
|
|
|
are cleared first.
|
|
<B>seps[</B><I>i</I><B>]</B>
|
|
|
|
is the field separator matched by
|
|
<I>r</I>
|
|
|
|
between
|
|
<B>a[</B><I>i</I><B>]</B>
|
|
|
|
and
|
|
<B>a[</B><I>i</I><B>+1]</B>.
|
|
|
|
If
|
|
<I>r</I>
|
|
|
|
is a single space, then leading whitespace in
|
|
<I>s</I>
|
|
|
|
goes into the extra array element
|
|
<B>seps[0]</B>
|
|
|
|
and trailing whitespace goes into the extra array element
|
|
<B>seps[</B><I>n</I><B>]</B>,
|
|
|
|
where
|
|
<I>n</I>
|
|
|
|
is the return value of
|
|
<B>split(</B><I>s</I><B>, </B><I>a</I><B>, </B><I>r</I><B>, </B><I>seps</I><B>)</B>.
|
|
|
|
Splitting behaves identically to field splitting, described above.
|
|
In particular, if
|
|
<I>r</I>
|
|
|
|
is a single-character string, that string acts as the separator,
|
|
even if it happens to be a regular expression metacharacter.
|
|
<DT id="286"><B>sprintf(</B><I>fmt</I><B>,</B><I> expr-list</I><B>)</B>
|
|
|
|
<DD>
|
|
Print
|
|
<I>expr-list</I>
|
|
|
|
according to
|
|
<I>fmt</I>,
|
|
|
|
and return the resulting string.
|
|
<DT id="287"><B>strtonum(</B><I>str</I><B>)</B>
|
|
|
|
<DD>
|
|
Examine
|
|
<I>str</I>,
|
|
|
|
and return its numeric value.
|
|
If
|
|
<I>str</I>
|
|
|
|
begins
|
|
with a leading
|
|
<B>0</B>,
|
|
|
|
treat it
|
|
as an octal number.
|
|
If
|
|
<I>str</I>
|
|
|
|
begins
|
|
with a leading
|
|
<B>0x</B>
|
|
|
|
or
|
|
<B>0X</B>,
|
|
|
|
treat it
|
|
as a hexadecimal number.
|
|
Otherwise, assume it is a decimal number.
|
|
<DT id="288"><B>sub(</B><I>r</I><B>, </B><I>s </I>[<B>, </B><I>t</I>]<B>)</B><DD>
|
|
Just like
|
|
<B>gsub()</B>,
|
|
|
|
but replace only the first matching substring.
|
|
Return either zero or one.
|
|
<DT id="289"><B>substr(</B><I>s</I><B>, </B><I>i </I>[<B>, </B><I>n</I>]<B>)</B><DD>
|
|
Return the at most
|
|
<I>n</I>-character
|
|
|
|
substring of
|
|
<I>s</I>
|
|
|
|
starting at
|
|
<I>i</I>.
|
|
|
|
If
|
|
<I>n</I>
|
|
|
|
is omitted, use the rest of
|
|
<I>s</I>.
|
|
|
|
<DT id="290"><B>tolower(</B><I>str</I><B>)</B>
|
|
|
|
<DD>
|
|
Return a copy of the string
|
|
<I>str</I>,
|
|
|
|
with all the uppercase characters in
|
|
<I>str</I>
|
|
|
|
translated to their corresponding lowercase counterparts.
|
|
Non-alphabetic characters are left unchanged.
|
|
<DT id="291"><B>toupper(</B><I>str</I><B>)</B>
|
|
|
|
<DD>
|
|
Return a copy of the string
|
|
<I>str</I>,
|
|
|
|
with all the lowercase characters in
|
|
<I>str</I>
|
|
|
|
translated to their corresponding uppercase counterparts.
|
|
Non-alphabetic characters are left unchanged.
|
|
</DL>
|
|
<P>
|
|
|
|
<I>Gawk</I>
|
|
|
|
is multibyte aware. This means that
|
|
<B>index()</B>,
|
|
|
|
<B>length()</B>,
|
|
|
|
<B>substr()</B>
|
|
|
|
and
|
|
<B>match()</B>
|
|
|
|
all work in terms of characters, not bytes.
|
|
<A NAME="lbBD"> </A>
|
|
<H3>Time Functions</H3>
|
|
|
|
Since one of the primary uses of <FONT SIZE="-1">AWK</FONT> programs is processing log files
|
|
that contain time stamp information,
|
|
<I>gawk</I>
|
|
|
|
provides the following functions for obtaining time stamps and
|
|
formatting them.
|
|
<P>
|
|
|
|
<DL COMPACT>
|
|
<DT id="292"><B>mktime(</B><I>datespec</I> [<B>, </B><I>utc-flag</I>]<B>)</B><DD>
|
|
Turn
|
|
<I>datespec</I>
|
|
|
|
into a time stamp of the same form as returned by
|
|
<B>systime()</B>,
|
|
|
|
and return the result.
|
|
The
|
|
<I>datespec</I>
|
|
|
|
is a string of the form
|
|
<I>YYYY MM DD HH MM SS[ DST]</I>.
|
|
|
|
The contents of the string are six or seven numbers representing respectively
|
|
the full year including century,
|
|
the month from 1 to 12,
|
|
the day of the month from 1 to 31,
|
|
the hour of the day from 0 to 23,
|
|
the minute from 0 to 59,
|
|
the second from 0 to 60,
|
|
and an optional daylight saving flag.
|
|
The values of these numbers need not be within the ranges specified;
|
|
for example, an hour of -1 means 1 hour before midnight.
|
|
The origin-zero Gregorian calendar is assumed,
|
|
with year 0 preceding year 1 and year -1 preceding year 0.
|
|
If
|
|
<I>utc-flag</I>
|
|
|
|
is present and is non-zero or non-null, the time is assumed to be in
|
|
the UTC time zone; otherwise, the
|
|
time is assumed to be in the local time zone.
|
|
If the
|
|
<I>DST</I>
|
|
|
|
daylight saving flag is positive,
|
|
the time is assumed to be daylight saving time;
|
|
if zero, the time is assumed to be standard time;
|
|
and if negative (the default),
|
|
<B>mktime()</B>
|
|
|
|
attempts to determine whether daylight saving time is in effect
|
|
for the specified time.
|
|
If
|
|
<I>datespec</I>
|
|
|
|
does not contain enough elements or if the resulting time
|
|
is out of range,
|
|
<B>mktime()</B>
|
|
|
|
returns -1.
|
|
<DT id="293"><B>strftime(</B>[<I>format </I>[<B>, </B><I>timestamp</I>[<B>, </B><I>utc-flag</I>]]]<B>)</B><DD>
|
|
Format
|
|
<I>timestamp</I>
|
|
|
|
according to the specification in
|
|
<I>format</I>.
|
|
|
|
If
|
|
<I>utc-flag</I>
|
|
|
|
is present and is non-zero or non-null, the result
|
|
is in UTC, otherwise the result is in local time.
|
|
The
|
|
<I>timestamp</I>
|
|
|
|
should be of the same form as returned by
|
|
<B>systime()</B>.
|
|
|
|
If
|
|
<I>timestamp</I>
|
|
|
|
is missing, the current time of day is used.
|
|
If
|
|
<I>format</I>
|
|
|
|
is missing, a default format equivalent to the output of
|
|
<I><A HREF="/cgi-bin/man/man2html?1+date">date</A></I>(1)
|
|
|
|
is used.
|
|
The default format is available in
|
|
<B>PROCINFO[strftime]</B>.
|
|
|
|
See the specification for the
|
|
<B>strftime()</B>
|
|
|
|
function in ISO C for the format conversions that are
|
|
guaranteed to be available.
|
|
<DT id="294"><B>systime()</B>
|
|
|
|
<DD>
|
|
Return the current time of day as the number of seconds since the Epoch
|
|
(1970-01-01 00:00:00 UTC on <FONT SIZE="-1">POSIX</FONT> systems).
|
|
</DL>
|
|
<A NAME="lbBE"> </A>
|
|
<H3>Bit Manipulations Functions</H3>
|
|
|
|
<I>Gawk</I>
|
|
|
|
supplies the following bit manipulation functions.
|
|
They work by converting double-precision floating point
|
|
values to
|
|
<B>uintmax_t</B>
|
|
|
|
integers, doing the operation, and then converting the
|
|
result back to floating point.
|
|
<P>
|
|
|
|
<B>NOTE</B>:
|
|
|
|
Passing negative operands to any of these functions causes
|
|
a fatal error.
|
|
<P>
|
|
|
|
The functions are:
|
|
<DL COMPACT>
|
|
<DT id="295"><B>and(</B><I>v1</I><B>, </B><I>v2 </I>[, ...]<B>)</B><DD>
|
|
Return the bitwise AND of the values provided in the argument list.
|
|
There must be at least two.
|
|
<DT id="296"><B>compl(</B><I>val</I><B>)</B><DD>
|
|
Return the bitwise complement of
|
|
<I>val</I>.
|
|
|
|
<DT id="297"><B>lshift(</B><I>val</I><B>, </B><I>count</I><B>)</B><DD>
|
|
Return the value of
|
|
<I>val</I>,
|
|
|
|
shifted left by
|
|
<I>count</I>
|
|
|
|
bits.
|
|
<DT id="298"><B>or(</B><I>v1</I><B>, </B><I>v2 </I>[, ...]<B>)</B><DD>
|
|
Return the bitwise OR of the values provided in the argument list.
|
|
There must be at least two.
|
|
<DT id="299"><B>rshift(</B><I>val</I><B>, </B><I>count</I><B>)</B><DD>
|
|
Return the value of
|
|
<I>val</I>,
|
|
|
|
shifted right by
|
|
<I>count</I>
|
|
|
|
bits.
|
|
<DT id="300"><B>xor(</B><I>v1</I><B>, </B><I>v2 </I>[, ...]<B>)</B><DD>
|
|
Return the bitwise XOR of the values provided in the argument list.
|
|
There must be at least two.
|
|
</DL>
|
|
<P>
|
|
|
|
<A NAME="lbBF"> </A>
|
|
<H3>Type Functions</H3>
|
|
|
|
The following functions provide type related information about
|
|
their arguments.
|
|
<DL COMPACT>
|
|
<DT id="301"><B>isarray(</B><I>x</I><B>)</B><DD>
|
|
Return true if
|
|
<I>x</I>
|
|
|
|
is an array, false otherwise.
|
|
This function is mainly for use with the elements of multidimensional arrays
|
|
and with function parameters.
|
|
<DT id="302"><B>typeof(</B><I>x</I><B>)</B><DD>
|
|
Return a string indicating the type of
|
|
<I>x</I>.
|
|
|
|
The string will be one of
|
|
<B>"array"</B>,
|
|
<B>"number"</B>,
|
|
<B>"regexp"</B>,
|
|
<B>"string"</B>,
|
|
<B>"strnum"</B>,
|
|
<B>"unassigned"</B>,
|
|
or
|
|
<B>"undefined"</B>.
|
|
</DL>
|
|
<A NAME="lbBG"> </A>
|
|
<H3>Internationalization Functions</H3>
|
|
|
|
The following functions may be used from within your AWK program for
|
|
translating strings at run-time.
|
|
For full details, see <I>GAWK: Effective AWK Programming</I>.
|
|
<DL COMPACT>
|
|
<DT id="303"><B>bindtextdomain(</B><I>directory </I>[<B>, </B><I>domain</I>]<B>)</B><DD>
|
|
Specify the directory where
|
|
<I>gawk</I>
|
|
|
|
looks for the
|
|
<B>.gmo</B>
|
|
|
|
files, in case they
|
|
will not or cannot be placed in the ``standard'' locations
|
|
(e.g., during testing).
|
|
It returns the directory where
|
|
<I>domain</I>
|
|
|
|
is ``bound.''
|
|
<P>
|
|
The default
|
|
<I>domain</I>
|
|
|
|
is the value of
|
|
<B>TEXTDOMAIN</B>.
|
|
|
|
If
|
|
<I>directory</I>
|
|
|
|
is the null string (<B>""</B>), then
|
|
<B>bindtextdomain()</B>
|
|
|
|
returns the current binding for the
|
|
given
|
|
<I>domain</I>.
|
|
|
|
<DT id="304"><B>dcgettext(</B><I>string </I>[<B>, </B><I>domain </I>[<B>, </B><I>category</I>]]<B>)</B><DD>
|
|
Return the translation of
|
|
<I>string</I>
|
|
|
|
in text domain
|
|
<I>domain</I>
|
|
|
|
for locale category
|
|
<I>category</I>.
|
|
|
|
The default value for
|
|
<I>domain</I>
|
|
|
|
is the current value of
|
|
<B>TEXTDOMAIN</B>.
|
|
|
|
The default value for
|
|
<I>category</I>
|
|
|
|
is <B>"LC_MESSAGES"</B>.
|
|
<P>
|
|
If you supply a value for
|
|
<I>category</I>,
|
|
|
|
it must be a string equal to
|
|
one of the known locale categories described
|
|
in <I>GAWK: Effective AWK Programming</I>.
|
|
You must also supply a text domain. Use
|
|
<B>TEXTDOMAIN</B>
|
|
|
|
if you want to use the current domain.
|
|
<DT id="305"><B>dcngettext(</B><I>string1</I><B>, </B><I>string2</I><B>, </B><I>number </I>[<B>, </B><I>domain </I>[<B>, </B><I>category</I>]]<B>)</B><DD>
|
|
Return the plural form used for
|
|
<I>number</I>
|
|
|
|
of the translation of
|
|
<I>string1</I>
|
|
|
|
and
|
|
<I>string2</I>
|
|
|
|
in
|
|
text domain
|
|
<I>domain</I>
|
|
|
|
for locale category
|
|
<I>category</I>.
|
|
|
|
The default value for
|
|
<I>domain</I>
|
|
|
|
is the current value of
|
|
<B>TEXTDOMAIN</B>.
|
|
|
|
The default value for
|
|
<I>category</I>
|
|
|
|
is <B>"LC_MESSAGES"</B>.
|
|
<P>
|
|
If you supply a value for
|
|
<I>category</I>,
|
|
|
|
it must be a string equal to
|
|
one of the known locale categories described
|
|
in <I>GAWK: Effective AWK Programming</I>.
|
|
You must also supply a text domain. Use
|
|
<B>TEXTDOMAIN</B>
|
|
|
|
if you want to use the current domain.
|
|
</DL>
|
|
<A NAME="lbBH"> </A>
|
|
<H2>USER-DEFINED FUNCTIONS</H2>
|
|
|
|
Functions in <FONT SIZE="-1">AWK</FONT> are defined as follows:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="306"><DD>
|
|
<B>function </B><I>name</I><B>(</B><I>parameter list</I><B>) { </B><I>statements </I><B>}</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
Functions execute when they are called from within expressions
|
|
in either patterns or actions. Actual parameters supplied in the function
|
|
call are used to instantiate the formal parameters declared in the function.
|
|
Arrays are passed by reference, other variables are passed by value.
|
|
<P>
|
|
|
|
Since functions were not originally part of the <FONT SIZE="-1">AWK</FONT> language, the provision
|
|
for local variables is rather clumsy: They are declared as extra parameters
|
|
in the parameter list. The convention is to separate local variables from
|
|
real parameters by extra spaces in the parameter list. For example:
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="307"><DD>
|
|
<B>
|
|
</B><PRE>
|
|
function f(p, q, a, b) # a and b are local
|
|
{
|
|
...
|
|
}
|
|
|
|
/abc/ { ... ; f(1, 2) ; ... }
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
The left parenthesis in a function call is required
|
|
to immediately follow the function name,
|
|
without any intervening whitespace.
|
|
This avoids a syntactic ambiguity with the concatenation operator.
|
|
This restriction does not apply to the built-in functions listed above.
|
|
<P>
|
|
|
|
Functions may call each other and may be recursive.
|
|
Function parameters used as local variables are initialized
|
|
to the null string and the number zero upon function invocation.
|
|
<P>
|
|
|
|
Use
|
|
<B>return</B><I> expr</I>
|
|
|
|
to return a value from a function. The return value is undefined if no
|
|
value is provided, or if the function returns by ``falling off'' the
|
|
end.
|
|
<P>
|
|
|
|
As a
|
|
<I>gawk</I>
|
|
|
|
extension, functions may be called indirectly. To do this, assign
|
|
the name of the function to be called, as a string, to a variable.
|
|
Then use the variable as if it were the name of a function, prefixed with an
|
|
<B>@</B>
|
|
|
|
sign, like so:
|
|
<DL COMPACT><DT id="308"><DD>
|
|
<B>
|
|
</B><PRE>
|
|
function myfunc()
|
|
{
|
|
print "myfunc called"
|
|
...
|
|
}
|
|
|
|
{ ...
|
|
the_func = "myfunc"
|
|
@the_func() # call through the_func to myfunc
|
|
...
|
|
}
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
|
|
As of version 4.1.2, this works with user-defined functions,
|
|
built-in functions, and extension functions.
|
|
<P>
|
|
|
|
If
|
|
<B>--lint</B>
|
|
|
|
has been provided,
|
|
<I>gawk</I>
|
|
|
|
warns about calls to undefined functions at parse time,
|
|
instead of at run time.
|
|
Calling an undefined function at run time is a fatal error.
|
|
<P>
|
|
|
|
The word
|
|
<B>func</B>
|
|
|
|
may be used in place of
|
|
<B>function</B>,
|
|
|
|
although this is deprecated.
|
|
<A NAME="lbBI"> </A>
|
|
<H2>DYNAMICALLY LOADING NEW FUNCTIONS</H2>
|
|
|
|
You can dynamically add new functions written in C or C++ to the running
|
|
<I>gawk</I>
|
|
|
|
interpreter with the
|
|
<B>@load</B>
|
|
|
|
statement.
|
|
The full details are beyond the scope of this manual page;
|
|
see <I>GAWK: Effective AWK Programming</I>.
|
|
<A NAME="lbBJ"> </A>
|
|
<H2>SIGNALS</H2>
|
|
|
|
The
|
|
<I>gawk</I>
|
|
|
|
profiler accepts two signals.
|
|
<B>SIGUSR1</B>
|
|
|
|
causes it to dump a profile and function call stack to the
|
|
profile file, which is either
|
|
<B>awkprof.out</B>,
|
|
|
|
or whatever file was named with the
|
|
<B>--profile</B>
|
|
|
|
option. It then continues to run.
|
|
<B>SIGHUP</B>
|
|
|
|
causes
|
|
<I>gawk</I>
|
|
|
|
to dump the profile and function call stack and then exit.
|
|
<A NAME="lbBK"> </A>
|
|
<H2>INTERNATIONALIZATION</H2>
|
|
|
|
<P>
|
|
|
|
String constants are sequences of characters enclosed in double
|
|
quotes. In non-English speaking environments, it is possible to mark
|
|
strings in the <FONT SIZE="-1">AWK</FONT> program as requiring translation to the local
|
|
natural language. Such strings are marked in the <FONT SIZE="-1">AWK</FONT> program with
|
|
a leading underscore (``_''). For example,
|
|
<P>
|
|
<DL COMPACT><DT id="309"><DD>
|
|
<B>
|
|
gawk 'BEGIN { print "hello, world" }'
|
|
</DL>
|
|
|
|
<P>
|
|
</B>
|
|
always prints
|
|
<B>hello, world</B>.
|
|
|
|
But,
|
|
<P>
|
|
<DL COMPACT><DT id="310"><DD>
|
|
<B>
|
|
gawk 'BEGIN { print _"hello, world" }'
|
|
</DL>
|
|
|
|
<P>
|
|
</B>
|
|
might print
|
|
<B>bonjour, monde</B>
|
|
|
|
in France.
|
|
<P>
|
|
|
|
There are several steps involved in producing and running a localizable
|
|
<FONT SIZE="-1">AWK</FONT> program.
|
|
<DL COMPACT>
|
|
<DT id="311">1.<DD>
|
|
Add a
|
|
<B>BEGIN</B>
|
|
|
|
action to assign a value to the
|
|
<B>TEXTDOMAIN</B>
|
|
|
|
variable to set the text domain to a name associated with your program:
|
|
<P>
|
|
|
|
<B>
|
|
BEGIN { TEXTDOMAIN = "myprog" }
|
|
</B>
|
|
|
|
<P>
|
|
This allows
|
|
<I>gawk</I>
|
|
|
|
to find the
|
|
<B>.gmo</B>
|
|
|
|
file associated with your program.
|
|
Without this step,
|
|
<I>gawk</I>
|
|
|
|
uses the
|
|
<B>messages</B>
|
|
|
|
text domain,
|
|
which likely does not contain translations for your program.
|
|
<DT id="312">2.<DD>
|
|
Mark all strings that should be translated with leading underscores.
|
|
<DT id="313">3.<DD>
|
|
If necessary, use the
|
|
<B>dcgettext()</B>
|
|
|
|
and/or
|
|
<B>bindtextdomain()</B>
|
|
|
|
functions in your program, as appropriate.
|
|
<DT id="314">4.<DD>
|
|
Run
|
|
<B>gawk --gen-pot -f myprog.awk > myprog.pot</B>
|
|
|
|
to generate a
|
|
<B>.pot</B>
|
|
|
|
file for your program.
|
|
<DT id="315">5.<DD>
|
|
Provide appropriate translations, and build and install the corresponding
|
|
<B>.gmo</B>
|
|
|
|
files.
|
|
</DL>
|
|
<P>
|
|
|
|
The internationalization features are described in full detail in <I>GAWK: Effective AWK Programming</I>.
|
|
<A NAME="lbBL"> </A>
|
|
<H2>POSIX COMPATIBILITY</H2>
|
|
|
|
A primary goal for
|
|
<I>gawk</I>
|
|
|
|
is compatibility with the <FONT SIZE="-1">POSIX</FONT> standard, as well as with the
|
|
latest version of Brian Kernighan's
|
|
<I>awk</I>.
|
|
|
|
To this end,
|
|
<I>gawk</I>
|
|
|
|
incorporates the following user visible
|
|
features which are not described in the <FONT SIZE="-1">AWK</FONT> book,
|
|
but are part of the Brian Kernighan's version of
|
|
<I>awk</I>,
|
|
|
|
and are in the <FONT SIZE="-1">POSIX</FONT> standard.
|
|
<P>
|
|
|
|
The book indicates that command line variable assignment happens when
|
|
<I>awk</I>
|
|
|
|
would otherwise open the argument as a file, which is after the
|
|
<B>BEGIN</B>
|
|
|
|
rule is executed. However, in earlier implementations, when such an
|
|
assignment appeared before any file names, the assignment would happen
|
|
<I>before</I>
|
|
|
|
the
|
|
<B>BEGIN</B>
|
|
|
|
rule was run. Applications came to depend on this ``feature.''
|
|
When
|
|
<I>awk</I>
|
|
|
|
was changed to match its documentation, the
|
|
<B>-v</B>
|
|
|
|
option for assigning variables before program execution was added to
|
|
accommodate applications that depended upon the old behavior.
|
|
(This feature was agreed upon by both the Bell Laboratories developers
|
|
and the <FONT SIZE="-1">GNU</FONT> developers.)
|
|
<P>
|
|
|
|
When processing arguments,
|
|
<I>gawk</I>
|
|
|
|
uses the special option ``--'' to signal the end of
|
|
arguments.
|
|
In compatibility mode, it warns about but otherwise ignores
|
|
undefined options.
|
|
In normal operation, such arguments are passed on to the <FONT SIZE="-1">AWK</FONT> program for
|
|
it to process.
|
|
<P>
|
|
|
|
The <FONT SIZE="-1">AWK</FONT> book does not define the return value of
|
|
<B>srand()</B>.
|
|
|
|
The <FONT SIZE="-1">POSIX</FONT> standard
|
|
has it return the seed it was using, to allow keeping track
|
|
of random number sequences. Therefore
|
|
<B>srand()</B>
|
|
|
|
in
|
|
<I>gawk</I>
|
|
|
|
also returns its current seed.
|
|
<P>
|
|
|
|
Other features are:
|
|
The use of multiple
|
|
<B>-f</B>
|
|
|
|
options (from MKS
|
|
<I>awk</I>);
|
|
|
|
the
|
|
<B>ENVIRON</B>
|
|
|
|
array; the
|
|
<B>\a</B>,
|
|
|
|
and
|
|
<B>\v</B>
|
|
|
|
escape sequences (done originally in
|
|
<I>gawk</I>
|
|
|
|
and fed back into the Bell Laboratories version); the
|
|
<B>tolower()</B>
|
|
|
|
and
|
|
<B>toupper()</B>
|
|
|
|
built-in functions (from the Bell Laboratories version); and the ISO C conversion specifications in
|
|
<B>printf</B>
|
|
|
|
(done first in the Bell Laboratories version).
|
|
<A NAME="lbBM"> </A>
|
|
<H2>HISTORICAL FEATURES</H2>
|
|
|
|
There is one feature of historical <FONT SIZE="-1">AWK</FONT> implementations that
|
|
<I>gawk</I>
|
|
|
|
supports:
|
|
It is possible to call the
|
|
<B>length()</B>
|
|
|
|
built-in function not only with no argument, but even without parentheses!
|
|
Thus,
|
|
<DL COMPACT><DT id="316"><DD>
|
|
<P>
|
|
|
|
<B>
|
|
a = length<TT> </TT># Holy Algol 60, Batman!<BR>
|
|
</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
is the same as either of
|
|
<DL COMPACT><DT id="317"><DD>
|
|
<P>
|
|
|
|
<B>
|
|
a = length()
|
|
<BR>
|
|
|
|
a = length($0)
|
|
</B>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
Using this feature is poor practice, and
|
|
<I>gawk</I>
|
|
|
|
issues a warning about its use if
|
|
<B>--lint</B>
|
|
|
|
is specified on the command line.
|
|
<A NAME="lbBN"> </A>
|
|
<H2>GNU EXTENSIONS</H2>
|
|
|
|
<I>Gawk</I>
|
|
|
|
has a too-large number of extensions to <FONT SIZE="-1">POSIX</FONT>
|
|
<I>awk</I>.
|
|
|
|
They are described in this section. All the extensions described here
|
|
can be disabled by
|
|
invoking
|
|
<I>gawk</I>
|
|
|
|
with the
|
|
<B>--traditional</B>
|
|
|
|
or
|
|
<B>--posix</B>
|
|
|
|
options.
|
|
<P>
|
|
|
|
The following features of
|
|
<I>gawk</I>
|
|
|
|
are not available in
|
|
<FONT SIZE="-1">POSIX</FONT>
|
|
<I>awk</I>.
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="318">•<DD>
|
|
No path search is performed for files named via the
|
|
<B>-f</B>
|
|
|
|
option. Therefore the
|
|
<B>AWKPATH</B>
|
|
|
|
environment variable is not special.
|
|
|
|
<DT id="319">•<DD>
|
|
There is no facility for doing file inclusion
|
|
(<I>gawk</I>'s
|
|
|
|
<B>@include</B>
|
|
|
|
mechanism).
|
|
<DT id="320">•<DD>
|
|
There is no facility for dynamically adding new functions
|
|
written in C
|
|
(<I>gawk</I>'s
|
|
|
|
<B>@load</B>
|
|
|
|
mechanism).
|
|
<DT id="321">•<DD>
|
|
The
|
|
<B>\x</B>
|
|
|
|
escape sequence.
|
|
<DT id="322">•<DD>
|
|
The ability to continue lines after
|
|
<B>?</B>
|
|
|
|
and
|
|
<B>:</B>.
|
|
|
|
<DT id="323">•<DD>
|
|
Octal and hexadecimal constants in AWK programs.
|
|
|
|
<DT id="324">•<DD>
|
|
The
|
|
<B>ARGIND</B>,
|
|
|
|
<B>BINMODE</B>,
|
|
|
|
<B>ERRNO</B>,
|
|
|
|
<B>LINT</B>,
|
|
|
|
<B>PREC</B>,
|
|
|
|
<B>ROUNDMODE</B>,
|
|
|
|
<B>RT</B>
|
|
|
|
and
|
|
<B>TEXTDOMAIN</B>
|
|
|
|
variables are not special.
|
|
<DT id="325">•<DD>
|
|
The
|
|
<B>IGNORECASE</B>
|
|
|
|
variable and its side-effects are not available.
|
|
<DT id="326">•<DD>
|
|
The
|
|
<B>FIELDWIDTHS</B>
|
|
|
|
variable and fixed-width field splitting.
|
|
<DT id="327">•<DD>
|
|
The
|
|
<B>FPAT</B>
|
|
|
|
variable and field splitting based on field values.
|
|
<DT id="328">•<DD>
|
|
The
|
|
<B>FUNCTAB</B>,
|
|
|
|
<B>SYMTAB</B>,
|
|
|
|
and
|
|
<B>PROCINFO</B>
|
|
|
|
arrays are not available.
|
|
|
|
<DT id="329">•<DD>
|
|
The use of
|
|
<B>RS</B>
|
|
|
|
as a regular expression.
|
|
<DT id="330">•<DD>
|
|
The special file names available for I/O redirection are not recognized.
|
|
<DT id="331">•<DD>
|
|
The
|
|
<B>|&</B>
|
|
|
|
operator for creating coprocesses.
|
|
<DT id="332">•<DD>
|
|
The
|
|
<B>BEGINFILE</B>
|
|
|
|
and
|
|
<B>ENDFILE</B>
|
|
|
|
special patterns are not available.
|
|
|
|
<DT id="333">•<DD>
|
|
The ability to split out individual characters using the null string
|
|
as the value of
|
|
<B>FS</B>,
|
|
|
|
and as the third argument to
|
|
<B>split()</B>.
|
|
|
|
<DT id="334">•<DD>
|
|
An optional fourth argument to
|
|
<B>split()</B>
|
|
|
|
to receive the separator texts.
|
|
<DT id="335">•<DD>
|
|
The optional second argument to the
|
|
<B>close()</B>
|
|
|
|
function.
|
|
<DT id="336">•<DD>
|
|
The optional third argument to the
|
|
<B>match()</B>
|
|
|
|
function.
|
|
<DT id="337">•<DD>
|
|
The ability to use positional specifiers with
|
|
<B>printf</B>
|
|
|
|
and
|
|
<B>sprintf()</B>.
|
|
|
|
<DT id="338">•<DD>
|
|
The ability to pass an array to
|
|
<B>length()</B>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<DT id="339">•<DD>
|
|
The
|
|
<B>and()</B>,
|
|
|
|
<B>asort()</B>,
|
|
|
|
<B>asorti()</B>,
|
|
|
|
<B>bindtextdomain()</B>,
|
|
|
|
<B>compl()</B>,
|
|
|
|
<B>dcgettext()</B>,
|
|
|
|
<B>dcngettext()</B>,
|
|
|
|
<B>gensub()</B>,
|
|
|
|
<B>lshift()</B>,
|
|
|
|
<B>mktime()</B>,
|
|
|
|
<B>or()</B>,
|
|
|
|
<B>patsplit()</B>,
|
|
|
|
<B>rshift()</B>,
|
|
|
|
<B>strftime()</B>,
|
|
|
|
<B>strtonum()</B>,
|
|
|
|
<B>systime()</B>
|
|
|
|
and
|
|
<B>xor()</B>
|
|
|
|
functions.
|
|
|
|
<DT id="340">•<DD>
|
|
Localizable strings.
|
|
<DT id="341">•<DD>
|
|
Non-fatal I/O.
|
|
<DT id="342">•<DD>
|
|
Retryable I/O.
|
|
</DL>
|
|
<P>
|
|
|
|
The <FONT SIZE="-1">AWK</FONT> book does not define the return value of the
|
|
<B>close()</B>
|
|
|
|
function.
|
|
<I>Gawk</I>'s
|
|
|
|
<B>close()</B>
|
|
|
|
returns the value from
|
|
<I><A HREF="/cgi-bin/man/man2html?3+fclose">fclose</A></I>(3),
|
|
|
|
or
|
|
<I><A HREF="/cgi-bin/man/man2html?3+pclose">pclose</A></I>(3),
|
|
|
|
when closing an output file or pipe, respectively.
|
|
It returns the process's exit status when closing an input pipe.
|
|
The return value is -1 if the named file, pipe
|
|
or coprocess was not opened with a redirection.
|
|
<P>
|
|
|
|
When
|
|
<I>gawk</I>
|
|
|
|
is invoked with the
|
|
<B>--traditional</B>
|
|
|
|
option,
|
|
if the
|
|
<I>fs</I>
|
|
|
|
argument to the
|
|
<B>-F</B>
|
|
|
|
option is ``t'', then
|
|
<B>FS</B>
|
|
|
|
is set to the tab character.
|
|
Note that typing
|
|
<B>gawk -F\t ...</B>
|
|
|
|
simply causes the shell to quote the ``t,'' and does not pass
|
|
``\t'' to the
|
|
<B>-F</B>
|
|
|
|
option.
|
|
Since this is a rather ugly special case, it is not the default behavior.
|
|
This behavior also does not occur if
|
|
<B>--posix</B>
|
|
|
|
has been specified.
|
|
To really get a tab character as the field separator, it is best to use
|
|
single quotes:
|
|
<B>gawk -F'\t' ...</B>.
|
|
|
|
|
|
<A NAME="lbBO"> </A>
|
|
<H2>ENVIRONMENT VARIABLES</H2>
|
|
|
|
The
|
|
<B>AWKPATH</B>
|
|
|
|
environment variable can be used to provide a list of directories that
|
|
<I>gawk</I>
|
|
|
|
searches when looking for files named via the
|
|
<B>-f</B>,
|
|
|
|
<B>--file</B>,
|
|
|
|
<B>-i</B>
|
|
|
|
and
|
|
<B>--include</B>
|
|
|
|
options, and the
|
|
<B>@include</B>
|
|
|
|
directive. If the initial search fails, the path is searched again after
|
|
appending
|
|
<B>.awk</B>
|
|
|
|
to the filename.
|
|
<P>
|
|
|
|
The
|
|
<B>AWKLIBPATH</B>
|
|
|
|
environment variable can be used to provide a list of directories that
|
|
<I>gawk</I>
|
|
|
|
searches when looking for files named via the
|
|
<B>-l</B>
|
|
|
|
and
|
|
<B>--load</B>
|
|
|
|
options.
|
|
<P>
|
|
|
|
The
|
|
<B>GAWK_READ_TIMEOUT</B>
|
|
|
|
environment variable can be used to specify a timeout
|
|
in milliseconds for reading input from a terminal, pipe
|
|
or two-way communication including sockets.
|
|
<P>
|
|
|
|
For connection to a remote host via socket,
|
|
<B>GAWK_SOCK_RETRIES</B>
|
|
|
|
controls the number of retries, and
|
|
<B>GAWK_MSEC_SLEEP</B>
|
|
|
|
the interval between retries.
|
|
The interval is in milliseconds. On systems that do not support
|
|
<I><A HREF="/cgi-bin/man/man2html?3+usleep">usleep</A></I>(3),
|
|
|
|
the value is rounded up to an integral number of seconds.
|
|
<P>
|
|
|
|
If
|
|
<B>POSIXLY_CORRECT</B>
|
|
|
|
exists in the environment, then
|
|
<I>gawk</I>
|
|
|
|
behaves exactly as if
|
|
<B>--posix</B>
|
|
|
|
had been specified on the command line.
|
|
If
|
|
<B>--lint</B>
|
|
|
|
has been specified,
|
|
<I>gawk</I>
|
|
|
|
issues a warning message to this effect.
|
|
<A NAME="lbBP"> </A>
|
|
<H2>EXIT STATUS</H2>
|
|
|
|
If the
|
|
<B>exit</B>
|
|
|
|
statement is used with a value,
|
|
then
|
|
<I>gawk</I>
|
|
|
|
exits with
|
|
the numeric value given to it.
|
|
<P>
|
|
|
|
Otherwise, if there were no problems during execution,
|
|
<I>gawk</I>
|
|
|
|
exits with the value of the C constant
|
|
<B>EXIT_SUCCESS</B>.
|
|
|
|
This is usually zero.
|
|
<P>
|
|
|
|
If an error occurs,
|
|
<I>gawk</I>
|
|
|
|
exits with the value of
|
|
the C constant
|
|
<B>EXIT_FAILURE</B>.
|
|
|
|
This is usually one.
|
|
<P>
|
|
|
|
If
|
|
<I>gawk</I>
|
|
|
|
exits because of a fatal error, the exit
|
|
status is 2. On non-POSIX systems, this value may be mapped to
|
|
<B>EXIT_FAILURE</B>.
|
|
|
|
<A NAME="lbBQ"> </A>
|
|
<H2>VERSION INFORMATION</H2>
|
|
|
|
This man page documents
|
|
<I>gawk</I>,
|
|
|
|
version 5.0.
|
|
<A NAME="lbBR"> </A>
|
|
<H2>AUTHORS</H2>
|
|
|
|
The original version of <FONT SIZE="-1">UNIX</FONT>
|
|
<I>awk</I>
|
|
|
|
was designed and implemented by Alfred Aho,
|
|
Peter Weinberger, and Brian Kernighan of Bell Laboratories. Brian Kernighan
|
|
continues to maintain and enhance it.
|
|
<P>
|
|
|
|
Paul Rubin and Jay Fenlason,
|
|
of the Free Software Foundation, wrote
|
|
<I>gawk</I>,
|
|
|
|
to be compatible with the original version of
|
|
<I>awk</I>
|
|
|
|
distributed in Seventh Edition <FONT SIZE="-1">UNIX</FONT>.
|
|
John Woods contributed a number of bug fixes.
|
|
David Trueman, with contributions
|
|
from Arnold Robbins, made
|
|
<I>gawk</I>
|
|
|
|
compatible with the new version of <FONT SIZE="-1">UNIX</FONT>
|
|
<I>awk</I>.
|
|
|
|
Arnold Robbins is the current maintainer.
|
|
<P>
|
|
|
|
See <I>GAWK: Effective AWK Programming</I> for a full list of the contributors to
|
|
<I>gawk</I>
|
|
|
|
and its documentation.
|
|
<P>
|
|
|
|
See the
|
|
<B>README</B>
|
|
|
|
file in the
|
|
<I>gawk</I>
|
|
|
|
distribution for up-to-date information about maintainers
|
|
and which ports are currently supported.
|
|
<A NAME="lbBS"> </A>
|
|
<H2>BUG REPORTS</H2>
|
|
|
|
If you find a bug in
|
|
<I>gawk</I>,
|
|
|
|
please send electronic mail to
|
|
<B><A HREF="mailto:bug-gawk@gnu.org">bug-gawk@gnu.org</A></B>.
|
|
|
|
Please include your operating system and its revision, the version of
|
|
<I>gawk</I>
|
|
|
|
(from
|
|
<B>gawk --version</B>),
|
|
|
|
which C compiler you used to compile it, and a test program
|
|
and data that are as small as possible for reproducing the problem.
|
|
<P>
|
|
|
|
Before sending a bug report, please do the following things. First, verify that
|
|
you have the latest version of
|
|
<I>gawk</I>.
|
|
|
|
Many bugs (usually subtle ones) are fixed at each release, and if
|
|
yours is out of date, the problem may already have been solved.
|
|
Second, please see if setting the environment variable
|
|
<B>LC_ALL</B>
|
|
|
|
to
|
|
<B>LC_ALL=C</B>
|
|
|
|
causes things to behave as you expect. If so, it's a locale issue,
|
|
and may or may not really be a bug.
|
|
Finally, please read this man page and the reference manual carefully to
|
|
be sure that what you think is a bug really is, instead of just a quirk
|
|
in the language.
|
|
<P>
|
|
|
|
Whatever you do, do
|
|
<B>NOT</B>
|
|
|
|
post a bug report in
|
|
<B>comp.lang.awk</B>.
|
|
|
|
While the
|
|
<I>gawk</I>
|
|
|
|
developers occasionally read this newsgroup, posting bug reports there
|
|
is an unreliable way to report bugs.
|
|
Similarly, do
|
|
<B>NOT</B>
|
|
|
|
use a web forum (such as Stack Overflow) for reporting bugs.
|
|
Instead, please use the electronic mail
|
|
addresses given above.
|
|
Really.
|
|
<P>
|
|
|
|
If you're using a GNU/Linux or BSD-based system,
|
|
you may wish to submit a bug report to the vendor of your distribution.
|
|
That's fine, but please send a copy to the official email address as well,
|
|
since there's no guarantee that the bug report will be forwarded to the
|
|
<I>gawk</I>
|
|
|
|
maintainer.
|
|
<A NAME="lbBT"> </A>
|
|
<H2>BUGS</H2>
|
|
|
|
The
|
|
<B>-F</B>
|
|
|
|
option is not necessary given the command line variable assignment feature;
|
|
it remains only for backwards compatibility.
|
|
<A NAME="lbBU"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?1+egrep">egrep</A></I>(1),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?1+sed">sed</A></I>(1),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getpid">getpid</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getppid">getppid</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getpgrp">getpgrp</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getuid">getuid</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+geteuid">geteuid</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getgid">getgid</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getegid">getegid</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?2+getgroups">getgroups</A></I>(2),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?3+printf">printf</A></I>(3),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?3+strftime">strftime</A></I>(3),
|
|
|
|
<I><A HREF="/cgi-bin/man/man2html?3+usleep">usleep</A></I>(3)
|
|
|
|
<P>
|
|
|
|
<I>The AWK Programming Language</I>,
|
|
|
|
Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
|
|
Addison-Wesley, 1988. ISBN 0-201-07981-X.
|
|
<P>
|
|
|
|
<I>GAWK: Effective AWK Programming</I>,
|
|
Edition 5.0, shipped with the
|
|
<I>gawk</I>
|
|
|
|
source.
|
|
The current version of this document is available online at
|
|
<B><A HREF="https://www.gnu.org/software/gawk/manual">https://www.gnu.org/software/gawk/manual</A></B>.
|
|
|
|
<P>
|
|
|
|
The GNU
|
|
<B>gettext</B>
|
|
|
|
documentation, available online at
|
|
<B><A HREF="https://www.gnu.org/software/gettext">https://www.gnu.org/software/gettext</A></B>.
|
|
|
|
<A NAME="lbBV"> </A>
|
|
<H2>EXAMPLES</H2>
|
|
|
|
<PRE>
|
|
Print and sort the login names of all users:
|
|
|
|
<B> BEGIN { FS = ":" }
|
|
{ print $1 | "sort" }
|
|
|
|
</B>Count lines in a file:
|
|
|
|
<B> { nlines++ }
|
|
END { print nlines }
|
|
|
|
</B>Precede each line by its number in the file:
|
|
|
|
<B> { print FNR, $0 }
|
|
|
|
</B>Concatenate and line number (a variation on a theme):
|
|
|
|
<B> { print NR, $0 }
|
|
|
|
</B>Run an external command for particular lines of data:
|
|
|
|
<B> tail -f access_log |
|
|
awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
|
|
</B></PRE>
|
|
|
|
<A NAME="lbBW"> </A>
|
|
<H2>ACKNOWLEDGEMENTS</H2>
|
|
|
|
Brian Kernighan
|
|
provided valuable assistance during testing and debugging.
|
|
We thank him.
|
|
<A NAME="lbBX"> </A>
|
|
<H2>COPYING PERMISSIONS</H2>
|
|
|
|
Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996,
|
|
1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005, 2007, 2009,
|
|
2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019,
|
|
Free Software Foundation, Inc.
|
|
<P>
|
|
|
|
Permission is granted to make and distribute verbatim copies of
|
|
this manual page provided the copyright notice and this permission
|
|
notice are preserved on all copies.
|
|
|
|
<P>
|
|
|
|
Permission is granted to copy and distribute modified versions of this
|
|
manual page under the conditions for verbatim copying, provided that
|
|
the entire resulting derived work is distributed under the terms of a
|
|
permission notice identical to this one.
|
|
<P>
|
|
|
|
Permission is granted to copy and distribute translations of this
|
|
manual page into another language, under the above conditions for
|
|
modified versions, except that this permission notice may be stated in
|
|
a translation approved by the Foundation.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="343"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="344"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="345"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="346"><A HREF="#lbAE">OPTION FORMAT</A><DD>
|
|
<DT id="347"><A HREF="#lbAF">OPTIONS</A><DD>
|
|
<DT id="348"><A HREF="#lbAG">AWK PROGRAM EXECUTION</A><DD>
|
|
<DL>
|
|
<DT id="349"><A HREF="#lbAH">Command Line Directories</A><DD>
|
|
</DL>
|
|
<DT id="350"><A HREF="#lbAI">VARIABLES, RECORDS AND FIELDS</A><DD>
|
|
<DL>
|
|
<DT id="351"><A HREF="#lbAJ">Records</A><DD>
|
|
<DT id="352"><A HREF="#lbAK">Fields</A><DD>
|
|
<DT id="353"><A HREF="#lbAL">Built-in Variables</A><DD>
|
|
<DT id="354"><A HREF="#lbAM">Arrays</A><DD>
|
|
<DT id="355"><A HREF="#lbAN">Namespaces</A><DD>
|
|
<DT id="356"><A HREF="#lbAO">Variable Typing And Conversion</A><DD>
|
|
<DT id="357"><A HREF="#lbAP">Octal and Hexadecimal Constants</A><DD>
|
|
<DT id="358"><A HREF="#lbAQ">String Constants</A><DD>
|
|
<DT id="359"><A HREF="#lbAR">Regexp Constants</A><DD>
|
|
</DL>
|
|
<DT id="360"><A HREF="#lbAS">PATTERNS AND ACTIONS</A><DD>
|
|
<DL>
|
|
<DT id="361"><A HREF="#lbAT">Patterns</A><DD>
|
|
<DT id="362"><A HREF="#lbAU">Regular Expressions</A><DD>
|
|
<DT id="363"><A HREF="#lbAV">Actions</A><DD>
|
|
<DT id="364"><A HREF="#lbAW">Operators</A><DD>
|
|
<DT id="365"><A HREF="#lbAX">Control Statements</A><DD>
|
|
<DT id="366"><A HREF="#lbAY">I/O Statements</A><DD>
|
|
<DT id="367"><A HREF="#lbAZ">The <I>printf</I> Statement</A><DD>
|
|
<DT id="368"><A HREF="#lbBA">Special File Names</A><DD>
|
|
<DT id="369"><A HREF="#lbBB">Numeric Functions</A><DD>
|
|
<DT id="370"><A HREF="#lbBC">String Functions</A><DD>
|
|
<DT id="371"><A HREF="#lbBD">Time Functions</A><DD>
|
|
<DT id="372"><A HREF="#lbBE">Bit Manipulations Functions</A><DD>
|
|
<DT id="373"><A HREF="#lbBF">Type Functions</A><DD>
|
|
<DT id="374"><A HREF="#lbBG">Internationalization Functions</A><DD>
|
|
</DL>
|
|
<DT id="375"><A HREF="#lbBH">USER-DEFINED FUNCTIONS</A><DD>
|
|
<DT id="376"><A HREF="#lbBI">DYNAMICALLY LOADING NEW FUNCTIONS</A><DD>
|
|
<DT id="377"><A HREF="#lbBJ">SIGNALS</A><DD>
|
|
<DT id="378"><A HREF="#lbBK">INTERNATIONALIZATION</A><DD>
|
|
<DT id="379"><A HREF="#lbBL">POSIX COMPATIBILITY</A><DD>
|
|
<DT id="380"><A HREF="#lbBM">HISTORICAL FEATURES</A><DD>
|
|
<DT id="381"><A HREF="#lbBN">GNU EXTENSIONS</A><DD>
|
|
<DT id="382"><A HREF="#lbBO">ENVIRONMENT VARIABLES</A><DD>
|
|
<DT id="383"><A HREF="#lbBP">EXIT STATUS</A><DD>
|
|
<DT id="384"><A HREF="#lbBQ">VERSION INFORMATION</A><DD>
|
|
<DT id="385"><A HREF="#lbBR">AUTHORS</A><DD>
|
|
<DT id="386"><A HREF="#lbBS">BUG REPORTS</A><DD>
|
|
<DT id="387"><A HREF="#lbBT">BUGS</A><DD>
|
|
<DT id="388"><A HREF="#lbBU">SEE ALSO</A><DD>
|
|
<DT id="389"><A HREF="#lbBV">EXAMPLES</A><DD>
|
|
<DT id="390"><A HREF="#lbBW">ACKNOWLEDGEMENTS</A><DD>
|
|
<DT id="391"><A HREF="#lbBX">COPYING PERMISSIONS</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:12 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|