2458 lines
50 KiB
HTML
2458 lines
50 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of MAWK</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>MAWK</H1>
|
|
Section: USER COMMANDS (1)<BR>Updated: 2019-12-31<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
mawk - pattern scanning and text processing language
|
|
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
<B>mawk</B>
|
|
[-<B>W
|
|
</B><I>option</I>]
|
|
|
|
[-<B>F
|
|
</B><I>value</I>]
|
|
|
|
[-<B>v
|
|
</B><I>var=value</I>]
|
|
|
|
[--] 'program text' [file ...]
|
|
<BR>
|
|
|
|
<B>mawk</B>
|
|
[-<B>W
|
|
</B><I>option</I>]
|
|
|
|
[-<B>F
|
|
</B><I>value</I>]
|
|
|
|
[-<B>v
|
|
</B><I>var=value</I>]
|
|
|
|
[-<B>f
|
|
</B><I>program-file</I>]
|
|
|
|
[--] [file ...]
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
<B>mawk</B>
|
|
is an interpreter for the AWK Programming Language.
|
|
The AWK language
|
|
is useful for manipulation of data files,
|
|
text retrieval and processing,
|
|
and for prototyping and experimenting with algorithms.
|
|
<B>mawk</B>
|
|
is a <I>new awk</I> meaning it implements the AWK language as
|
|
defined in Aho, Kernighan and Weinberger,
|
|
<I>The AWK Programming Language,</I>
|
|
|
|
Addison-Wesley Publishing, 1988 (hereafter referred to as
|
|
the AWK book.)
|
|
<B>mawk</B>
|
|
conforms to the POSIX 1003.2
|
|
(draft 11.3)
|
|
definition of the AWK language
|
|
which contains a few features not described in the AWK book,
|
|
and <B>mawk</B> provides a small number of extensions.
|
|
<P>
|
|
|
|
An AWK program is a sequence of <I>pattern {action}</I> pairs and
|
|
function definitions.
|
|
Short programs are entered on the command line
|
|
usually enclosed in ' ' to avoid shell
|
|
interpretation.
|
|
Longer programs can be read in from a
|
|
file with the -f option.
|
|
Data input is read from the list of files on
|
|
the command line or from standard input when the list is empty.
|
|
The input is broken into records as determined by the
|
|
record separator variable, <B>RS</B>.
|
|
Initially,
|
|
<B>RS</B>
|
|
|
|
= ``\n'' and records are synonymous with lines.
|
|
Each record is compared against each
|
|
<I>pattern</I>
|
|
|
|
and if it matches, the program text for
|
|
<I>{action}</I>
|
|
|
|
is executed.
|
|
|
|
<A NAME="lbAE"> </A>
|
|
<H2>OPTIONS</H2>
|
|
|
|
<DL COMPACT>
|
|
<DT id="1">-<B>F </B><I>value</I><DD>
|
|
sets the field separator, <B>FS</B>, to
|
|
<I>value</I>.
|
|
|
|
<DT id="2">-<B>f </B><I>file<DD>
|
|
Program text is read from file</I> instead of from the
|
|
command line.
|
|
Multiple
|
|
<B>-f</B>
|
|
|
|
options are allowed.
|
|
<DT id="3">-<B>v </B><I>var=value</I><DD>
|
|
assigns
|
|
<I>value</I>
|
|
|
|
to program variable
|
|
<I>var</I>.
|
|
|
|
<DT id="4">--<DD>
|
|
indicates the unambiguous end of options.
|
|
</DL>
|
|
<P>
|
|
|
|
The above options will be available with any POSIX compatible
|
|
implementation of AWK.
|
|
Implementation specific options are prefaced with
|
|
<B>-W</B>.
|
|
|
|
<B>mawk</B>
|
|
provides these:
|
|
<DL COMPACT>
|
|
<DT id="5">-<B>W </B>dump<DD>
|
|
writes an assembler like listing of the internal
|
|
representation of the program to stdout and exits 0
|
|
(on successful compilation).
|
|
<DT id="6">-<B>W </B>exec <I>file<DD>
|
|
Program text is read from
|
|
file</I>
|
|
|
|
and this is the last option.
|
|
<DT id="7"><DD>
|
|
This is a useful alternative to -<B>f</B> on systems that support the
|
|
<B>#!</B>
|
|
|
|
``magic number'' convention for executable scripts.
|
|
Those implicitly pass the pathname of the script itself as the final
|
|
parameter, and expect no more than one ``-'' option on the <B>#!</B> line.
|
|
Because <B>mawk</B> can combine multiple -<B>W</B> options separated by
|
|
commas, you can use this option when an additional -<B>W</B> option is needed.
|
|
<DT id="8">-<B>W </B>help<DD>
|
|
prints a usage message to stderr and exits (same as ``-<B>W </B>usage'').
|
|
<DT id="9">-<B>W </B>interactive<DD>
|
|
sets unbuffered writes to stdout and line buffered reads from stdin.
|
|
Records from stdin are lines regardless of the value of
|
|
<B>RS</B>.
|
|
|
|
<DT id="10">-<B>W </B>posix_space<DD>
|
|
forces
|
|
<B>mawk</B>
|
|
not to consider '\n' to be space.
|
|
<DT id="11">-<B>W </B>random=<I>num</I><DD>
|
|
calls <B>srand</B> with the given parameter
|
|
(and overrides the auto-seeding behavior).
|
|
<DT id="12">-<B>W </B>sprintf=<I>num</I><DD>
|
|
adjusts the size of
|
|
<B>mawk</B>'s
|
|
internal sprintf buffer to
|
|
<I>num</I>
|
|
|
|
bytes.
|
|
More than rare use of this option indicates
|
|
<B>mawk</B>
|
|
should be recompiled.
|
|
<DT id="13">-<B>W </B>usage<DD>
|
|
prints a usage message to stderr and exits (same as ``-<B>W </B>help'').
|
|
<DT id="14">-<B>W </B>version<DD>
|
|
<B>mawk</B>
|
|
writes its version and copyright
|
|
to stdout and compiled limits to
|
|
stderr and exits 0.
|
|
</DL>
|
|
<P>
|
|
|
|
<B>mawk</B> accepts abbreviations for any of these options, e.g.,
|
|
``-<B>W </B>v'' and ``-<B>W</B>v''
|
|
both tell <B>mawk</B> to show its version.
|
|
<P>
|
|
|
|
<B>mawk</B>
|
|
allows multiple <B>-W</B> options to be combined by separating the options
|
|
with commas, e.g., -Wsprint=2000,posix.
|
|
This is useful for executable
|
|
<B>#!</B>
|
|
|
|
``magic number'' invocations in which only one argument is supported,
|
|
e.g., -<B>Winteractive,exec</B>.
|
|
|
|
<A NAME="lbAF"> </A>
|
|
<H2>THE AWK LANGUAGE</H2>
|
|
|
|
<A NAME="lbAG"> </A>
|
|
<H3>1. Program structure</H3>
|
|
|
|
An AWK program is a sequence of
|
|
</B><I>pattern {action}</I>
|
|
|
|
pairs and user
|
|
function definitions.
|
|
<P>
|
|
|
|
A pattern can be:
|
|
<PRE>
|
|
<DL COMPACT><DT id="15"><DD><B>BEGIN</B>
|
|
<B>END</B>
|
|
expression
|
|
expression , expression
|
|
|
|
</DL>
|
|
</PRE>
|
|
|
|
One, but not both,
|
|
of <I>pattern {action}</I> can be omitted.
|
|
If
|
|
<I>{action}</I>
|
|
|
|
is omitted it is implicitly { print }.
|
|
If
|
|
<I>pattern</I>
|
|
|
|
is omitted, then it is implicitly matched.
|
|
<B>BEGIN</B>
|
|
|
|
and
|
|
<B>END</B>
|
|
|
|
patterns require an action.
|
|
<P>
|
|
|
|
Statements are terminated by newlines, semi-colons or both.
|
|
Groups of statements such as
|
|
actions or loop bodies are blocked via { ... } as in C.
|
|
The last statement in a block doesn't need a terminator.
|
|
Blank lines have no meaning; an empty statement is terminated with a
|
|
semi-colon.
|
|
Long statements can be continued with a backslash, \.
|
|
A statement can be broken
|
|
without a backslash after a comma, left brace, &&, ||,
|
|
<B>do</B>,
|
|
|
|
<B>else</B>,
|
|
|
|
the right parenthesis of an
|
|
<B>if</B>,
|
|
|
|
<B>while</B>
|
|
|
|
or
|
|
<B>for</B>
|
|
|
|
statement, and the
|
|
right parenthesis of a function definition.
|
|
A comment starts with # and extends to, but does not include
|
|
the end of line.
|
|
<P>
|
|
|
|
The following statements control program flow inside blocks.
|
|
<DL COMPACT><DT id="16"><DD>
|
|
<P>
|
|
|
|
<B>if</B>
|
|
|
|
( <I>expr</I> )
|
|
<I>statement</I>
|
|
|
|
<P>
|
|
|
|
<B>if</B>
|
|
|
|
( <I>expr</I> )
|
|
<I>statement</I>
|
|
|
|
<B>else</B>
|
|
|
|
<I>statement</I>
|
|
|
|
<P>
|
|
|
|
<B>while</B>
|
|
|
|
( <I>expr</I> )
|
|
<I>statement</I>
|
|
|
|
<P>
|
|
|
|
<B>do</B>
|
|
|
|
<I>statement</I>
|
|
|
|
<B>while</B>
|
|
|
|
( <I>expr</I> )
|
|
<P>
|
|
|
|
<B>for</B>
|
|
|
|
(
|
|
<I>opt_expr</I> ;
|
|
<I>opt_expr</I> ;
|
|
<I>opt_expr</I>
|
|
)
|
|
<I>statement</I>
|
|
|
|
<P>
|
|
|
|
<B>for</B>
|
|
|
|
( <I>var </I><B>in </B><I>array</I> )
|
|
<I>statement</I>
|
|
|
|
<P>
|
|
|
|
<B>continue</B>
|
|
|
|
<P>
|
|
|
|
<B>break</B>
|
|
|
|
</DL>
|
|
|
|
|
|
<A NAME="lbAH"> </A>
|
|
<H3>2. Data types, conversion and comparison</H3>
|
|
|
|
There are two basic data types, numeric and string.
|
|
Numeric constants can be integer like -2,
|
|
decimal like 1.08, or in scientific notation like -1.1e4 or .28E-3.
|
|
All numbers are represented internally and all
|
|
computations are done in floating point arithmetic.
|
|
So for example, the expression
|
|
0.2e2 == 20
|
|
is true and true is represented as 1.0.
|
|
<P>
|
|
|
|
String constants are enclosed in double quotes.
|
|
<P>
|
|
<CENTER>
|
|
"This is a string with a newline at the end.\n"<BR>
|
|
</CENTER>
|
|
|
|
<P>
|
|
Strings can be continued across a line by escaping (\) the newline.
|
|
The following escape sequences are recognized.
|
|
</B><PRE>
|
|
|
|
\\ \
|
|
\" "
|
|
\a alert, ascii 7
|
|
\b backspace, ascii 8
|
|
\t tab, ascii 9
|
|
\n newline, ascii 10
|
|
\v vertical tab, ascii 11
|
|
\f formfeed, ascii 12
|
|
\r carriage return, ascii 13
|
|
\ddd 1, 2 or 3 octal digits for ascii ddd
|
|
\xhh 1 or 2 hex digits for ascii hh
|
|
|
|
</PRE>
|
|
|
|
If you escape any other character \c, you get \c, i.e.,
|
|
<B>mawk</B>
|
|
ignores the escape.
|
|
<P>
|
|
|
|
There are really three basic data types; the third is
|
|
<I>number and string</I>
|
|
|
|
which has both a numeric value and a string value
|
|
at the same time.
|
|
User defined variables come into existence when first referenced
|
|
and are initialized to
|
|
<I>null</I>,
|
|
|
|
a number and string value which has numeric value 0 and string value
|
|
"".
|
|
Non-trivial number and string typed data come from input
|
|
and are typically stored in fields.
|
|
(See section 4).
|
|
<P>
|
|
|
|
The type of an expression is determined by its context and automatic
|
|
type conversion occurs if needed.
|
|
For example, to evaluate the statements
|
|
<PRE>
|
|
|
|
y = x + 2 ; z = x "hello"
|
|
|
|
</PRE>
|
|
|
|
The value stored in variable y will be typed numeric.
|
|
If x is not numeric,
|
|
the value read from x is converted to numeric before it is added to
|
|
2 and stored in y.
|
|
The value stored in variable z will be typed
|
|
string, and the value of x will be converted to string if necessary
|
|
and concatenated with "hello".
|
|
(Of course, the value and type stored in x is not changed by any conversions.)
|
|
A string expression is converted to numeric using its longest
|
|
numeric prefix as with
|
|
<B><A HREF="/cgi-bin/man/man2html?3+atof">atof</A></B>(3).
|
|
A numeric expression is converted to string by replacing
|
|
<I>expr</I>
|
|
|
|
with
|
|
<B>sprintf(CONVFMT</B>,
|
|
|
|
<I>expr</I>),
|
|
|
|
unless
|
|
<I>expr</I>
|
|
|
|
can be represented on the host machine as an exact integer then
|
|
it is converted to <B>sprintf</B>("%d", <I>expr</I>).
|
|
<B>Sprintf()</B>
|
|
|
|
is an AWK built-in that duplicates the functionality of
|
|
<B><A HREF="/cgi-bin/man/man2html?3+sprintf">sprintf</A></B>(3),
|
|
and
|
|
<B>CONVFMT</B>
|
|
is a built-in variable used for internal conversion
|
|
from number to string and initialized to "%.6g".
|
|
Explicit type conversions can be forced,
|
|
<I>expr</I> ""
|
|
is string and
|
|
<I>expr</I>+0
|
|
|
|
is numeric.
|
|
<P>
|
|
|
|
To evaluate,
|
|
<I>expr</I>1 <B>rel-op <I>expr</I>2,
|
|
if both operands are numeric or number and string then the comparison
|
|
is numeric; if both operands are string the comparison is string;
|
|
if one operand is string, the non-string operand is converted and
|
|
the comparison is string.
|
|
The result is numeric, 1 or 0.
|
|
<P>
|
|
|
|
In boolean contexts such as,
|
|
if</B> ( <I>expr</I> ) <I>statement</I>,
|
|
a string expression evaluates true if and only if it is not the
|
|
empty string "";
|
|
numeric values if and only if not numerically zero.
|
|
|
|
<A NAME="lbAI"> </A>
|
|
<H3>3. Regular expressions</H3>
|
|
|
|
In the AWK language, records, fields and strings are often
|
|
tested for matching a
|
|
</B><I>regular expression</I>.
|
|
|
|
Regular expressions are enclosed in slashes, and
|
|
<PRE>
|
|
|
|
<I>expr</I> ~ /<I>r</I>/
|
|
|
|
</PRE>
|
|
|
|
is an AWK expression that evaluates to 1 if <I>expr</I> ``matches''
|
|
<I>r</I>,
|
|
|
|
which means a substring of <I>expr</I> is in the set of strings
|
|
defined by
|
|
<I>r</I>.
|
|
|
|
With no match the expression evaluates to 0; replacing
|
|
~ with the ``not match'' operator, !~ , reverses the meaning.
|
|
As pattern-action pairs,
|
|
<PRE>
|
|
|
|
/<I>r</I>/ { <I>action</I> } and <B>$0</B> ~ /<I>r</I>/ { <I>action</I> }
|
|
|
|
</PRE>
|
|
|
|
are the same,
|
|
and for each input record that matches
|
|
<I>r</I>,
|
|
|
|
<I>action</I>
|
|
|
|
is executed.
|
|
In fact, /<I>r</I>/ is an AWK expression that is
|
|
equivalent to (<B>$0</B> ~ /<I>r</I>/) anywhere except when on the
|
|
right side of a match operator or passed as an argument to
|
|
a built-in function that expects a regular expression
|
|
argument.
|
|
<P>
|
|
|
|
AWK uses extended regular expressions as with
|
|
the <B>-E</B> option of <B><A HREF="/cgi-bin/man/man2html?1+grep">grep</A></B>(1).
|
|
The regular expression metacharacters, i.e., those with special
|
|
meaning in regular expressions are
|
|
<PRE>
|
|
|
|
\ ^ $ . [ ] | ( ) * + ?
|
|
|
|
</PRE>
|
|
|
|
Regular expressions are built up from characters as follows:
|
|
<DL COMPACT><DT id="17"><DD>
|
|
<DL COMPACT>
|
|
<DT id="18"><I>c</I><DD>
|
|
matches any non-metacharacter
|
|
<I>c</I>.
|
|
|
|
<DT id="19">\<I>c</I><DD>
|
|
matches a character defined by the same escape sequences used
|
|
in string constants or the literal
|
|
character
|
|
<I>c</I>
|
|
|
|
if
|
|
\<I>c</I>
|
|
is not an escape sequence.
|
|
<DT id="20">.<DD>
|
|
matches any character (including newline).
|
|
<DT id="21">^<DD>
|
|
matches the front of a string.
|
|
<DT id="22">$<DD>
|
|
matches the back of a string.
|
|
<DT id="23">[c1c2c3...]<DD>
|
|
matches any character in the class
|
|
c1c2c3... .
|
|
An interval of characters is denoted
|
|
c1-c2 inside a class [...].
|
|
<DT id="24">[^c1c2c3...]<DD>
|
|
matches any character not in the class
|
|
c1c2c3...
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
Regular expressions are built up from other regular expressions
|
|
as follows:
|
|
<DL COMPACT><DT id="25"><DD>
|
|
<DL COMPACT>
|
|
<DT id="26"><I>r</I>1<I>r</I>2<DD>
|
|
matches
|
|
<I>r</I>1
|
|
followed immediately by
|
|
<I>r</I>2
|
|
(concatenation).
|
|
<DT id="27"><I>r</I>1 | <I>r</I>2<DD>
|
|
matches
|
|
<I>r</I>1 or
|
|
<I>r</I>2
|
|
(alternation).
|
|
<DT id="28"><I>r</I>*<DD>
|
|
matches <I>r</I> repeated zero or more times.
|
|
<DT id="29"><I>r</I>+<DD>
|
|
matches <I>r</I> repeated one or more times.
|
|
<DT id="30"><I>r</I>?<DD>
|
|
matches <I>r</I> zero or once.
|
|
<DT id="31">(<I>r</I>)<DD>
|
|
matches <I>r</I>, providing grouping.
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
The increasing precedence of operators is alternation,
|
|
concatenation and
|
|
unary (*, + or ?).
|
|
<P>
|
|
|
|
For example,
|
|
<PRE>
|
|
|
|
/^[_a-zA-Z][_a-zA-Z0-9]*$/ and
|
|
/^[-+]?([0-9]+\.?|\.[0-9])[0-9]*([eE][-+]?[0-9]+)?$/
|
|
|
|
</PRE>
|
|
|
|
are matched by AWK identifiers and AWK numeric constants
|
|
respectively.
|
|
Note that ``.'' has to be escaped to be
|
|
recognized as a decimal point, and that metacharacters are not
|
|
special inside character classes.
|
|
<P>
|
|
|
|
Any expression can be used on the right hand side of the ~ or !~
|
|
operators or
|
|
passed to a built-in that expects
|
|
a regular expression.
|
|
If needed, it is converted to string, and then interpreted
|
|
as a regular expression.
|
|
For example,
|
|
<PRE>
|
|
|
|
BEGIN { identifier = "[_a-zA-Z][_a-zA-Z0-9]*" }
|
|
|
|
$0 ~ "^" identifier
|
|
|
|
</PRE>
|
|
|
|
prints all lines that start with an AWK identifier.
|
|
<P>
|
|
|
|
<B>mawk</B>
|
|
recognizes the empty regular expression, //, which matches the
|
|
empty string and hence is matched by any string at the front,
|
|
back and between every character.
|
|
For example,
|
|
<PRE>
|
|
|
|
echo abc | mawk { gsub(//, "X") ; print }
|
|
XaXbXcX
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAJ"> </A>
|
|
<H3>4. Records and fields</H3>
|
|
|
|
Records are read in one at a time, and stored in the
|
|
</B><I>field</I>
|
|
|
|
variable
|
|
<B>$0</B>.
|
|
|
|
The record is split into
|
|
<I>fields</I>
|
|
|
|
which are stored in
|
|
<B>$1</B>,
|
|
|
|
<B>$2</B>, ...,
|
|
|
|
<B>$NF</B>.
|
|
|
|
The built-in variable
|
|
<B>NF</B>
|
|
|
|
is set to the number of fields,
|
|
and
|
|
<B>NR</B>
|
|
|
|
and
|
|
<B>FNR</B>
|
|
|
|
are incremented by 1.
|
|
Fields above
|
|
<B>$NF</B>
|
|
|
|
are set to "".
|
|
<P>
|
|
|
|
Assignment to
|
|
<B>$0</B>
|
|
|
|
causes the fields and
|
|
<B>NF</B>
|
|
|
|
to be recomputed.
|
|
Assignment to
|
|
<B>NF</B>
|
|
|
|
or to a field
|
|
causes
|
|
<B>$0</B>
|
|
|
|
to be reconstructed by
|
|
concatenating the
|
|
<B>$i's</B>
|
|
|
|
separated by
|
|
<B>OFS</B>.
|
|
|
|
Assignment to a field with index greater than
|
|
<B>NF</B>,
|
|
|
|
increases
|
|
<B>NF</B>
|
|
|
|
and causes
|
|
<B>$0</B>
|
|
|
|
to be reconstructed.
|
|
<P>
|
|
|
|
Data input stored in fields
|
|
is string, unless the entire field has numeric
|
|
form and then the type is number and string.
|
|
For example,
|
|
<P>
|
|
<PRE>
|
|
echo 24 24E |
|
|
mawk '{ print($1>100, $1>"100", $2>100, $2>"100") }'
|
|
0 1 1 1
|
|
</PRE>
|
|
|
|
<P>
|
|
<B>$0</B>
|
|
|
|
and
|
|
<B>$2</B>
|
|
|
|
are string and
|
|
<B>$1</B>
|
|
|
|
is number and string.
|
|
The first comparison is numeric, the second is string, the third is string
|
|
(100 is converted to "100"),
|
|
and the last is string.
|
|
|
|
<A NAME="lbAK"> </A>
|
|
<H3>5. Expressions and operators</H3>
|
|
|
|
<P>
|
|
|
|
The expression syntax is similar to C.
|
|
Primary expressions are numeric constants,
|
|
string constants, variables, fields, arrays and function calls.
|
|
The identifier
|
|
for a variable, array or function can be a sequence of
|
|
letters, digits and underscores, that does
|
|
not start with a digit.
|
|
Variables are not declared; they exist when first referenced and
|
|
are initialized to
|
|
</B><I>null</I>.
|
|
|
|
<P>
|
|
|
|
New
|
|
expressions are composed with the following operators in
|
|
order of increasing precedence.
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="32"><DD>
|
|
<PRE>
|
|
<I>assignment</I> = += -= *= /= %= ^=
|
|
<I>conditional</I> ? :
|
|
<I>logical or</I> ||
|
|
<I>logical and</I> &&
|
|
<I>array membership</I> <B>in
|
|
</B><I>matching</I> ~ !~
|
|
<I>relational</I> < > <= >= == !=
|
|
<I>concatenation</I> (no explicit operator)
|
|
<I>add ops</I> + -
|
|
<I>mul ops</I> * / %
|
|
<I>unary</I> + -
|
|
<I>logical not</I> !
|
|
<I>exponentiation</I> ^
|
|
<I>inc and dec</I> ++ -- (both post and pre)
|
|
<I>field</I> $
|
|
</DL>
|
|
|
|
</PRE>
|
|
|
|
Assignment, conditional and exponentiation associate right to
|
|
left; the other operators associate left to right.
|
|
Any expression can be parenthesized.
|
|
|
|
<A NAME="lbAL"> </A>
|
|
<H3>6. Arrays</H3>
|
|
|
|
|
|
Awk provides one-dimensional arrays.
|
|
Array elements are expressed
|
|
as </B><I>array</I>[<I>expr</I>].
|
|
<I>Expr</I>
|
|
|
|
is internally converted to string type, so, for example,
|
|
A[1] and A["1"] are the same element and the actual
|
|
index is "1".
|
|
Arrays indexed by strings are called associative arrays.
|
|
Initially an array is empty; elements exist when first accessed.
|
|
An expression,
|
|
<I>expr</I><B> in</B><I> array</I>
|
|
evaluates to 1 if
|
|
</B><I>array</I>[<I>expr</I>]
|
|
exists, else to 0.
|
|
<P>
|
|
|
|
There is a form of the
|
|
<B>for</B>
|
|
|
|
statement that loops over each index of an array.
|
|
<PRE>
|
|
|
|
<B>for</B> ( <I>var</I><B> in </B><I>array </I>) <I>statement</I>
|
|
|
|
</PRE>
|
|
|
|
sets
|
|
<I>var</I>
|
|
|
|
to each index of
|
|
<I>array</I>
|
|
|
|
and executes
|
|
<I>statement</I>.
|
|
|
|
The order that
|
|
<I>var</I>
|
|
|
|
transverses the indices of
|
|
<I>array</I>
|
|
|
|
is not defined.
|
|
<P>
|
|
|
|
The statement,
|
|
<B>delete</B>
|
|
|
|
</B><I>array</I>[<I>expr</I>],
|
|
causes
|
|
</B><I>array</I>[<I>expr</I>]
|
|
not to exist.
|
|
<B>mawk</B>
|
|
supports an extension,
|
|
<B>delete</B>
|
|
|
|
<I>array</I>,
|
|
|
|
which deletes all elements of
|
|
<I>array</I>.
|
|
|
|
<P>
|
|
|
|
Multidimensional arrays are synthesized with concatenation using
|
|
the built-in variable
|
|
<B>SUBSEP</B>.
|
|
|
|
<I>array</I>[<I>expr</I>1,<I>expr</I>2]
|
|
is equivalent to
|
|
<I>array</I>[<I>expr</I>1 <B>SUBSEP </B><I>expr</I>2].
|
|
Testing for a multidimensional element uses a parenthesized index,
|
|
such as
|
|
<P>
|
|
<PRE>
|
|
if ( (i, j) in A ) print A[i, j]
|
|
</PRE>
|
|
|
|
<P>
|
|
|
|
<A NAME="lbAM"> </A>
|
|
<H3><B>7. Builtin-variables</B></H3>
|
|
|
|
<P>
|
|
|
|
The following variables are built-in and initialized before program
|
|
execution.
|
|
<DL COMPACT><DT id="33"><DD>
|
|
<DL COMPACT>
|
|
<DT id="34"><B>ARGC</B>
|
|
|
|
<DD>
|
|
number of command line arguments.
|
|
<DT id="35"><B>ARGV</B>
|
|
|
|
<DD>
|
|
array of command line arguments, 0..ARGC-1.
|
|
<DT id="36"><B>CONVFMT</B>
|
|
|
|
<DD>
|
|
format for internal conversion of numbers to string,
|
|
initially = "%.6g".
|
|
<DT id="37"><B>ENVIRON</B>
|
|
|
|
<DD>
|
|
array indexed by environment variables.
|
|
An environment string,
|
|
<I>var=value</I> is stored as
|
|
<B>ENVIRON</B>[<I>var</I>] =
|
|
<I>value</I>.
|
|
|
|
<DT id="38"><B>FILENAME</B>
|
|
|
|
<DD>
|
|
name of the current input file.
|
|
<DT id="39"><B>FNR</B>
|
|
|
|
<DD>
|
|
current record number in
|
|
<B>FILENAME</B>.
|
|
|
|
<DT id="40"><B>FS</B>
|
|
|
|
<DD>
|
|
splits records into fields as a regular expression.
|
|
<DT id="41"><B>NF</B>
|
|
|
|
<DD>
|
|
number of fields in the current record.
|
|
<DT id="42"><B>NR</B>
|
|
|
|
<DD>
|
|
current record number in the total input stream.
|
|
<DT id="43"><B>OFMT</B>
|
|
|
|
<DD>
|
|
format for printing numbers; initially = "%.6g".
|
|
<DT id="44"><B>OFS</B>
|
|
|
|
<DD>
|
|
inserted between fields on output, initially = " ".
|
|
<DT id="45"><B>ORS</B>
|
|
|
|
<DD>
|
|
terminates each record on output, initially = "\n".
|
|
<DT id="46"><B>RLENGTH</B>
|
|
|
|
<DD>
|
|
length set by the last call to the built-in function,
|
|
<B>match()</B>.
|
|
|
|
<DT id="47"><B>RS</B>
|
|
|
|
<DD>
|
|
input record separator, initially = "\n".
|
|
<DT id="48"><B>RSTART</B>
|
|
|
|
<DD>
|
|
index set by the last call to
|
|
<B>match()</B>.
|
|
|
|
<DT id="49"><B>SUBSEP</B>
|
|
|
|
<DD>
|
|
used to build multiple array subscripts, initially = "\034".
|
|
</DL>
|
|
</DL>
|
|
|
|
|
|
<A NAME="lbAN"> </A>
|
|
<H3>8. Built-in functions</H3>
|
|
|
|
String functions
|
|
<DL COMPACT><DT id="50"><DD>
|
|
<DL COMPACT>
|
|
<DT id="51">gsub(</B><I>r,s,t</I>) gsub(<I>r,s</I>)<DD>
|
|
Global substitution, every match of regular expression
|
|
<I>r</I>
|
|
|
|
in variable
|
|
<I>t</I>
|
|
|
|
is replaced by string
|
|
<I>s</I>.
|
|
|
|
The number of replacements is returned.
|
|
If
|
|
<I>t</I>
|
|
|
|
is omitted,
|
|
<B>$0</B>
|
|
|
|
is used.
|
|
An & in the replacement string
|
|
<I>s</I>
|
|
|
|
is replaced by the matched substring of
|
|
<I>t</I>.
|
|
|
|
\& and \\ put literal & and \, respectively,
|
|
in the replacement string.
|
|
<DT id="52">index(<I>s,t</I>)<DD>
|
|
If
|
|
<I>t</I>
|
|
|
|
is a substring of
|
|
<I>s</I>,
|
|
|
|
then the position where
|
|
<I>t</I>
|
|
|
|
starts is returned, else 0 is returned.
|
|
The first character of
|
|
<I>s</I>
|
|
|
|
is in position 1.
|
|
<DT id="53">length(<I>s</I>)<DD>
|
|
Returns the length of string or array.
|
|
<I>s</I>.
|
|
|
|
<DT id="54">match(<I>s,r</I>)<DD>
|
|
Returns the index of the first longest match of regular expression
|
|
<I>r</I>
|
|
|
|
in string
|
|
<I>s</I>.
|
|
|
|
Returns 0 if no match.
|
|
As a side effect,
|
|
<B>RSTART</B>
|
|
|
|
is set to the return value.
|
|
<B>RLENGTH</B>
|
|
|
|
is set to the length of the match or -1 if no match.
|
|
If the empty string is matched,
|
|
<B>RLENGTH</B>
|
|
|
|
is set to 0, and 1 is returned if the match is at the front, and
|
|
length(<I>s</I>)+1 is returned if the match is at the back.
|
|
<DT id="55">split(<I>s,A,r</I>) split(<I>s,A</I>)<DD>
|
|
String
|
|
<I>s</I>
|
|
|
|
is split into fields by regular expression
|
|
<I>r</I>
|
|
|
|
and the fields are loaded into array
|
|
<I>A</I>.
|
|
|
|
The number of fields
|
|
is returned.
|
|
See section 11 below for more detail.
|
|
If
|
|
<I>r</I>
|
|
|
|
is omitted,
|
|
<B>FS</B>
|
|
|
|
is used.
|
|
<DT id="56">sprintf(<I>format,expr-list</I>)<DD>
|
|
Returns a string constructed from
|
|
<I>expr-list</I>
|
|
|
|
according to
|
|
<I>format</I>.
|
|
|
|
See the description of printf() below.
|
|
<DT id="57">sub(<I>r,s,t</I>) sub(<I>r,s</I>)<DD>
|
|
Single substitution, same as gsub() except at most one substitution.
|
|
<DT id="58">substr(<I>s,i,n</I>) substr(<I>s,i</I>)<DD>
|
|
Returns the substring of string
|
|
<I>s</I>,
|
|
|
|
starting at index
|
|
<I>i</I>,
|
|
|
|
of length
|
|
<I>n</I>.
|
|
|
|
If
|
|
<I>n</I>
|
|
|
|
is omitted, the suffix of
|
|
<I>s</I>,
|
|
|
|
starting at
|
|
<I>i</I>
|
|
|
|
is returned.
|
|
<DT id="59">tolower(<I>s</I>)<DD>
|
|
Returns a copy of
|
|
<I>s</I>
|
|
|
|
with all upper case characters converted to lower case.
|
|
<DT id="60">toupper(<I>s</I>)<DD>
|
|
Returns a copy of
|
|
<I>s</I>
|
|
|
|
with all lower case characters converted to upper case.
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
Time functions
|
|
<P>
|
|
|
|
These are available on systems which support the corresponding C
|
|
<B>mktime</B> and <B>strftime</B> functions:
|
|
<DL COMPACT><DT id="61"><DD>
|
|
<DL COMPACT>
|
|
<DT id="62">mktime(<I>specification</I>)<DD>
|
|
converts a date specification to a timestamp
|
|
with the same units as <B>systime</B>.
|
|
The date specification is a string containing the components of the
|
|
date as decimal integers:
|
|
<DL COMPACT><DT id="63"><DD>
|
|
<DL COMPACT>
|
|
<DT id="64">YYYY<DD>
|
|
the year, e.g., 2012
|
|
<DT id="65">MM<DD>
|
|
the month of the year starting at 1
|
|
<DT id="66">DD<DD>
|
|
the day of the month starting at 1
|
|
<DT id="67">HH<DD>
|
|
hour (0-23)
|
|
<DT id="68">MM<DD>
|
|
minute (0-59)
|
|
<DT id="69">SS<DD>
|
|
seconds (0-59)
|
|
<DT id="70">DST<DD>
|
|
tells how to treat timezone versus daylight savings time:
|
|
<DL COMPACT><DT id="71"><DD>
|
|
<DL COMPACT>
|
|
<DT id="72">positive<DD>
|
|
DST is in effect
|
|
<DT id="73">zero (default)<DD>
|
|
DST is not in effect
|
|
<DT id="74">negative<DD>
|
|
mktime()
|
|
should (use timezone information and system databases to) attempt to
|
|
determine whether DST is in effect at the specified time.
|
|
</DL>
|
|
</DL>
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="75">strftime([<I>format</I> [, <I>timestamp</I> [, <I>utc</I> ]]])<DD>
|
|
formats the given timestamp using the format (passed to the C <B>strftime</B>
|
|
function):
|
|
<DL COMPACT><DT id="76"><DD>
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
If the <I>format</I> parameter is missing, "%c" is used.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
If the <I>timestamp</I> parameter is missing, the current value from
|
|
<B>systime</B> is used.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
If the <I>utc</I> parameter is present and nonzero,
|
|
the result is in UTC.
|
|
Otherwise local time is used.
|
|
</DL>
|
|
|
|
<DT id="77">systime()<DD>
|
|
returns the current time of day as the number of seconds
|
|
since the Epoch (1970-01-01 00:00:00 UTC on POSIX systems).
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
Arithmetic functions
|
|
<DL COMPACT><DT id="78"><DD>
|
|
<P>
|
|
|
|
<PRE>
|
|
atan2(<I>y,x</I>) Arctan of <I>y</I>/<I>x</I> between -pi and pi.
|
|
|
|
cos(<I>x</I>) Cosine function, <I>x</I> in radians.
|
|
|
|
exp(<I>x</I>) Exponential function.
|
|
|
|
int(<I>x</I>) Returns <I>x</I> truncated towards zero.
|
|
|
|
log(<I>x</I>) Natural logarithm.
|
|
|
|
rand() Returns a random number between zero and one.
|
|
|
|
sin(<I>x</I>) Sine function, <I>x</I> in radians.
|
|
|
|
sqrt(<I>x</I>) Returns square root of <I>x</I>.
|
|
</PRE>
|
|
|
|
<DL COMPACT>
|
|
<DT id="79">srand(<I>expr</I>) srand()<DD>
|
|
Seeds the random number generator,
|
|
using the clock if <I>expr</I> is omitted,
|
|
and returns the value of the previous seed.
|
|
Srand(<I>expr</I>) is useful for repeating pseudo random sequences.
|
|
<DT id="80"><DD>
|
|
Note:
|
|
<B>mawk</B>
|
|
is normally configured to seed the random number generator from the clock
|
|
at startup, making it unnecessary to call srand().
|
|
This feature can be suppressed via conditional compile,
|
|
or overridden using the <B>-Wrandom</B> option.
|
|
</DL>
|
|
</DL>
|
|
|
|
|
|
<A NAME="lbAO"> </A>
|
|
<H3>9. Input and output</H3>
|
|
|
|
There are two output statements,
|
|
print</B>
|
|
|
|
and
|
|
<B>printf</B>.
|
|
|
|
<DL COMPACT><DT id="81"><DD>
|
|
<DL COMPACT>
|
|
<DT id="82">print<DD>
|
|
writes
|
|
<B>$0 ORS</B>
|
|
|
|
to standard output.
|
|
<DT id="83">print <I>expr</I>1, <I>expr</I>2, ..., <I>expr</I>n<DD>
|
|
writes
|
|
<I>expr</I>1 <B>OFS <I>expr</I>2 OFS</B> ... <I>expr</I>n
|
|
<B>ORS</B>
|
|
|
|
to standard output.
|
|
Numeric expressions are converted to string with
|
|
<B>OFMT</B>.
|
|
|
|
<DT id="84">printf <I>format, expr-list</I><DD>
|
|
duplicates the printf C library function writing to standard output.
|
|
The complete ANSI C format specifications are recognized with
|
|
conversions %c, %d, %e, %E, %f, %g, %G,
|
|
%i, %o, %s, %u, %x, %X and %%,
|
|
and conversion qualifiers h and l.
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
The argument list to print or printf can optionally be enclosed in
|
|
parentheses.
|
|
Print formats numbers using
|
|
<B>OFMT</B>
|
|
|
|
or "%d" for exact integers.
|
|
"%c" with a numeric argument prints the corresponding 8 bit
|
|
character, with a string argument it prints the first character of
|
|
the string.
|
|
The output of print and printf can be redirected to a file or
|
|
command by appending >
|
|
<I>file</I>,
|
|
|
|
>>
|
|
<I>file</I>
|
|
|
|
or
|
|
|
|
|
<I>command</I>
|
|
|
|
to the end of the print statement.
|
|
Redirection opens
|
|
<I>file</I>
|
|
|
|
or
|
|
<I>command</I>
|
|
|
|
only once, subsequent redirections append to the already open stream.
|
|
By convention,
|
|
<B>mawk</B>
|
|
associates the filename
|
|
<DL COMPACT><DT id="85"><DD>
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
"/dev/stderr" with stderr,
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
"/dev/stdout" with stdout,
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
"-" and "/dev/stdin" with stdin.
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
The association with stderr is especially useful because it allows
|
|
print and printf to be redirected to stderr.
|
|
These names can also be passed to functions.
|
|
<P>
|
|
|
|
The input function
|
|
<B>getline</B>
|
|
|
|
has the following variations.
|
|
<DL COMPACT><DT id="86"><DD>
|
|
<DL COMPACT>
|
|
<DT id="87">getline<DD>
|
|
reads into
|
|
<B>$0</B>,
|
|
|
|
updates the fields,
|
|
<B>NF</B>,
|
|
|
|
<B>NR</B>
|
|
|
|
and
|
|
<B>FNR</B>.
|
|
|
|
<DT id="88">getline < <I>file</I><DD>
|
|
reads into
|
|
<B>$0</B>
|
|
|
|
from <I>file</I>,
|
|
updates the fields and
|
|
<B>NF</B>.
|
|
|
|
<DT id="89">getline <I>var<DD>
|
|
reads the next record into
|
|
var</I>,
|
|
|
|
updates
|
|
<B>NR</B>
|
|
|
|
and
|
|
<B>FNR</B>.
|
|
|
|
<DT id="90">getline <I>var</I> < <I>file<DD>
|
|
reads the next record of
|
|
file</I>
|
|
|
|
into
|
|
<I>var</I>.
|
|
|
|
<DT id="91"><I>command</I> | getline<DD>
|
|
pipes a record from
|
|
<I>command</I>
|
|
|
|
into
|
|
<B>$0</B>
|
|
|
|
and updates the fields and
|
|
<B>NF</B>.
|
|
|
|
<DT id="92"><I>command</I> | getline <I>var<DD>
|
|
pipes a record from
|
|
command</I>
|
|
|
|
into
|
|
<I>var</I>.
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
Getline returns 0 on end-of-file, -1 on error, otherwise 1.
|
|
<P>
|
|
|
|
Commands on the end of pipes are executed by /bin/sh.
|
|
<P>
|
|
|
|
The function <B>close</B>(<I>expr</I>) closes the file or pipe
|
|
associated with
|
|
<I>expr</I>.
|
|
|
|
Close returns 0 if
|
|
<I>expr</I>
|
|
|
|
is an open file,
|
|
the exit status if
|
|
<I>expr</I>
|
|
|
|
is a piped command, and -1 otherwise.
|
|
Close is used to reread a file or command, make sure the other
|
|
end of an output pipe is finished or conserve file resources.
|
|
<P>
|
|
|
|
The function <B>fflush</B>(<I>expr</I>) flushes the output file or pipe
|
|
associated with
|
|
<I>expr</I>.
|
|
|
|
Fflush returns 0 if
|
|
<I>expr</I>
|
|
|
|
is an open output stream else -1.
|
|
Fflush without an argument flushes stdout.
|
|
Fflush with an empty argument ("") flushes all open output.
|
|
<P>
|
|
|
|
The function
|
|
<B>system</B>(<I>expr</I>)
|
|
uses the C runtime <B>system</B> call to execute
|
|
<I>expr</I>
|
|
|
|
and returns the corresponding wait status of the command as follows:
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
if the <B>system</B> call failed, setting the status to -1,
|
|
<I>mawk</I> returns that value.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
if the command exited normally,
|
|
<I>mawk</I> returns its exit-status.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
if the command exited due to a signal such as <B>SIGHUP</B>,
|
|
<I>mawk</I> returns the signal number plus 256.
|
|
<P>
|
|
|
|
Changes made to the
|
|
<B>ENVIRON</B>
|
|
|
|
array are not passed to commands executed with
|
|
<B>system</B>
|
|
|
|
or pipes.
|
|
<A NAME="lbAP"> </A>
|
|
<H3>10. User defined functions</H3>
|
|
|
|
The syntax for a user defined function is
|
|
</B><PRE>
|
|
|
|
<B>function</B> name( <I>args</I> ) { <I>statements</I> }
|
|
|
|
</PRE>
|
|
|
|
The function body can contain a return statement
|
|
<PRE>
|
|
|
|
<B>return</B><I> opt_expr</I>
|
|
|
|
</PRE>
|
|
|
|
A return statement is not required.
|
|
Function calls may be nested or recursive.
|
|
Functions are passed expressions by value
|
|
and arrays by reference.
|
|
Extra arguments serve as local variables
|
|
and are initialized to
|
|
<I>null</I>.
|
|
|
|
For example, csplit(<I>s,A</I>) puts each character of
|
|
<I>s</I>
|
|
|
|
into array
|
|
<I>A</I>
|
|
|
|
and returns the length of
|
|
<I>s</I>.
|
|
|
|
<PRE>
|
|
|
|
function csplit(s, A, n, i)
|
|
{
|
|
n = length(s)
|
|
for( i = 1 ; i <= n ; i++ ) A[i] = substr(s, i, 1)
|
|
return n
|
|
}
|
|
|
|
</PRE>
|
|
|
|
Putting extra space between passed arguments and local
|
|
variables is conventional.
|
|
Functions can be referenced before they are defined, but the
|
|
function name and the '(' of the arguments must touch to
|
|
avoid confusion with concatenation.
|
|
<P>
|
|
A function parameter is normally a scalar value (number or string).
|
|
If there is a forward reference to a function using an array as a parameter,
|
|
the function's corresponding parameter will be treated as an array.
|
|
|
|
<A NAME="lbAQ"> </A>
|
|
<H3>11. Splitting strings, records and files</H3>
|
|
|
|
Awk programs use the same algorithm to
|
|
split strings into arrays with split(), and records into fields
|
|
on
|
|
FS</B>.
|
|
|
|
<B>mawk</B>
|
|
uses essentially the same algorithm to split files into
|
|
records on
|
|
<B>RS</B>.
|
|
|
|
<P>
|
|
|
|
Split(<I>expr,A,sep</I>) works as follows:
|
|
<DL COMPACT><DT id="93"><DD>
|
|
<DL COMPACT>
|
|
<DT id="94">(1)<DD>
|
|
If
|
|
<I>sep</I>
|
|
|
|
is omitted, it is replaced by
|
|
<B>FS</B>.
|
|
|
|
<I>Sep</I>
|
|
|
|
can be an expression or regular expression.
|
|
If it is an expression of non-string type, it is converted to string.
|
|
<DT id="95">(2)<DD>
|
|
If
|
|
<I>sep</I>
|
|
|
|
= " " (a single space),
|
|
then <SPACE> is trimmed from the front and back of
|
|
<I>expr</I>,
|
|
|
|
and
|
|
<I>sep</I>
|
|
|
|
becomes <SPACE>.
|
|
<B>mawk</B>
|
|
defines <SPACE> as the regular expression
|
|
/[ \t\n]+/.
|
|
Otherwise
|
|
<I>sep</I>
|
|
|
|
is treated as a regular expression, except that meta-characters
|
|
are ignored for a string of length 1,
|
|
e.g.,
|
|
split(x, A, "*") and split(x, A, /\*/) are the same.
|
|
<DT id="96">(3)<DD>
|
|
If <I>expr</I> is not string, it is converted to string.
|
|
If <I>expr</I> is then the empty string "", split() returns 0
|
|
and
|
|
<I>A</I>
|
|
|
|
is set empty.
|
|
Otherwise,
|
|
all non-overlapping, non-null and longest matches of
|
|
<I>sep</I>
|
|
|
|
in
|
|
<I>expr</I>,
|
|
|
|
separate
|
|
<I>expr</I>
|
|
|
|
into fields which are loaded into
|
|
<I>A</I>.
|
|
|
|
The fields are placed in
|
|
A[1], A[2], ..., A[n] and split() returns n, the number
|
|
of fields which is the number
|
|
of matches plus one.
|
|
Data placed in
|
|
<I>A</I>
|
|
|
|
that looks numeric is typed number and string.
|
|
</DL>
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
Splitting records into fields works the same except the
|
|
pieces are loaded into
|
|
<B>$1</B>,
|
|
|
|
<B>$2</B>,...,
|
|
<B>$NF</B>.
|
|
|
|
If
|
|
<B>$0</B>
|
|
|
|
is empty,
|
|
<B>NF</B>
|
|
|
|
is set to 0 and all
|
|
<B>$i</B>
|
|
|
|
to "".
|
|
<P>
|
|
|
|
<B>mawk</B>
|
|
splits files into records by the same algorithm, but with the
|
|
slight difference that
|
|
<B>RS</B>
|
|
|
|
is really a terminator instead of a separator.
|
|
(<B>ORS</B> is really a terminator too).
|
|
<DL COMPACT><DT id="97"><DD>
|
|
<P>
|
|
|
|
E.g., if
|
|
<B>FS</B>
|
|
|
|
= ``:+'' and
|
|
<B>$0</B>
|
|
|
|
= ``a::b:'' , then
|
|
<B>NF</B>
|
|
|
|
= 3 and
|
|
<B>$1</B>
|
|
|
|
= ``a'',
|
|
<B>$2</B>
|
|
|
|
= ``b'' and
|
|
<B>$3</B>
|
|
|
|
= "", but
|
|
if ``a::b:'' is the contents of an input file and
|
|
<B>RS</B>
|
|
|
|
= ``:+'', then
|
|
there are two records ``a'' and ``b''.
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
<B>RS</B>
|
|
|
|
= " " is not special.
|
|
<P>
|
|
|
|
If
|
|
<B>FS</B>
|
|
|
|
= "", then
|
|
<B>mawk</B>
|
|
breaks the record into individual characters, and, similarly,
|
|
split(<I>s,A,</I>"") places the individual characters of
|
|
<I>s</I>
|
|
|
|
into
|
|
<I>A</I>.
|
|
|
|
|
|
<A NAME="lbAR"> </A>
|
|
<H3>12. Multi-line records</H3>
|
|
|
|
Since
|
|
mawk</B>
|
|
interprets
|
|
<B>RS</B>
|
|
|
|
as a regular expression, multi-line
|
|
records are easy.
|
|
Setting
|
|
<B>RS</B>
|
|
|
|
= "\n\n+", makes one or more blank
|
|
lines separate records.
|
|
If
|
|
<B>FS</B>
|
|
|
|
= " " (the default), then single
|
|
newlines, by the rules for <SPACE> above, become space and
|
|
single newlines are field separators.
|
|
<DL COMPACT><DT id="98"><DD>
|
|
<P>
|
|
|
|
For example, if
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
a file is "a b\nc\n\n",
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
<B>RS</B> = "\n\n+" and
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
<B>FS</B> = " ",
|
|
<P>
|
|
|
|
then there is one record ``a b\nc'' with three
|
|
fields ``a'', ``b'' and ``c'':
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
Changing
|
|
<B>FS</B>
|
|
|
|
= ``\n'', gives two
|
|
fields ``a b'' and ``c'';
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
changing
|
|
<B>FS</B>
|
|
|
|
= ``'', gives one field
|
|
identical to the record.
|
|
</DL>
|
|
|
|
<P>
|
|
|
|
If you want lines with spaces or tabs to be considered blank,
|
|
set
|
|
<B>RS</B>
|
|
|
|
= ``\n([ \t]*\n)+''.
|
|
For compatibility with other awks, setting
|
|
<B>RS</B>
|
|
|
|
= "" has the same
|
|
effect as if blank lines are stripped from the
|
|
front and back of files and then records are determined as if
|
|
<B>RS</B>
|
|
|
|
= ``\n\n+''.
|
|
POSIX requires that ``\n'' always separates records when
|
|
<B>RS</B>
|
|
|
|
= "" regardless of the value of
|
|
<B>FS</B>.
|
|
|
|
<B>mawk</B>
|
|
does not support this convention, because defining
|
|
``\n'' as <SPACE> makes it unnecessary.
|
|
|
|
<P>
|
|
|
|
Most of the time when you change
|
|
<B>RS</B>
|
|
|
|
for multi-line records, you
|
|
will also want to change
|
|
<B>ORS</B>
|
|
|
|
to ``\n\n'' so the record spacing is preserved on output.
|
|
|
|
<A NAME="lbAS"> </A>
|
|
<H3>13. Program execution</H3>
|
|
|
|
This section describes the order of program execution.
|
|
First
|
|
ARGC</B>
|
|
|
|
is set to the total number of command line arguments passed to
|
|
the execution phase of the program.
|
|
<B>ARGV[0]</B>
|
|
|
|
is set the name of the AWK interpreter and
|
|
<B>ARGV[1]</B> ...
|
|
<B>ARGV[ARGC-1]</B>
|
|
|
|
holds the remaining command line arguments exclusive of
|
|
options and program source.
|
|
For example with
|
|
<PRE>
|
|
|
|
mawk -f prog v=1 A t=hello B
|
|
|
|
</PRE>
|
|
|
|
<B>ARGC</B>
|
|
|
|
= 5 with
|
|
<B>ARGV[0]</B>
|
|
|
|
= "mawk",
|
|
<B>ARGV[1]</B>
|
|
|
|
= "v=1",
|
|
<B>ARGV[2]</B>
|
|
|
|
= "A",
|
|
<B>ARGV[3]</B>
|
|
|
|
= "t=hello" and
|
|
<B>ARGV[4]</B>
|
|
|
|
= "B".
|
|
<P>
|
|
|
|
Next, each
|
|
<B>BEGIN</B>
|
|
|
|
block is executed in order.
|
|
If the program consists
|
|
entirely of
|
|
<B>BEGIN</B>
|
|
|
|
blocks, then execution terminates, else
|
|
an input stream is opened and execution continues.
|
|
If
|
|
<B>ARGC</B>
|
|
|
|
equals 1,
|
|
the input stream is set to stdin,
|
|
else the command line arguments
|
|
<B>ARGV[1]</B> ...
|
|
|
|
<B>ARGV[ARGC-1]</B>
|
|
|
|
are examined for a file argument.
|
|
<P>
|
|
|
|
The command line arguments divide into three sets:
|
|
file arguments, assignment arguments and empty strings "".
|
|
An assignment has the form
|
|
<I>var</I>=<I>string</I>.
|
|
When an
|
|
<B>ARGV[i]</B>
|
|
|
|
is examined as a possible file argument,
|
|
if it is empty it is skipped;
|
|
if it is an assignment argument, the assignment to
|
|
<I>var</I>
|
|
|
|
takes place and
|
|
<B>i</B>
|
|
|
|
skips to the next argument;
|
|
else
|
|
<B>ARGV[i]</B>
|
|
|
|
is opened for input.
|
|
If it fails to open, execution terminates with exit code 2.
|
|
If no command line argument is a file argument, then input
|
|
comes from stdin.
|
|
Getline in a
|
|
<B>BEGIN</B>
|
|
|
|
action opens input.
|
|
``-'' as a file argument denotes stdin.
|
|
<P>
|
|
|
|
Once an input stream is open, each input record is tested
|
|
against each
|
|
<I>pattern</I>,
|
|
|
|
and if it matches, the associated
|
|
<I>action</I>
|
|
|
|
is executed.
|
|
An expression pattern matches if it is boolean true (see
|
|
the end of section 2).
|
|
A
|
|
<B>BEGIN</B>
|
|
|
|
pattern matches before any input has been read, and
|
|
an
|
|
<B>END</B>
|
|
|
|
pattern matches after all input has been read.
|
|
A range pattern,
|
|
<I>expr</I>1,<I>expr</I>2 ,
|
|
matches every record between the match of
|
|
<I>expr</I>1
|
|
|
|
and the match
|
|
<I>expr</I>2
|
|
|
|
inclusively.
|
|
<P>
|
|
|
|
When end of file occurs on the input stream, the remaining
|
|
command line arguments are examined for a file argument, and
|
|
if there is one it is opened, else the
|
|
<B>END</B>
|
|
|
|
<I>pattern</I>
|
|
|
|
is considered matched
|
|
and all
|
|
<B>END</B>
|
|
|
|
<I>actions</I>
|
|
|
|
are executed.
|
|
<P>
|
|
|
|
In the example, the assignment
|
|
v=1
|
|
takes place after the
|
|
<B>BEGIN</B>
|
|
|
|
<I>actions</I>
|
|
|
|
are executed, and
|
|
the data placed in
|
|
v
|
|
is typed number and string.
|
|
Input is then read from file A.
|
|
On end of file A,
|
|
t
|
|
is set to the string "hello",
|
|
and B is opened for input.
|
|
On end of file B, the
|
|
<B>END</B>
|
|
|
|
<I>actions</I>
|
|
|
|
are executed.
|
|
<P>
|
|
|
|
Program flow at the
|
|
<I>pattern</I>
|
|
|
|
<I>{action}</I>
|
|
|
|
level can be changed with the
|
|
<PRE>
|
|
|
|
<B>next
|
|
nextfile
|
|
exit </B><I>opt_expr</I>
|
|
|
|
</PRE>
|
|
|
|
statements:
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
A
|
|
<B>next</B>
|
|
|
|
statement
|
|
causes the next input record to be read and pattern testing
|
|
to restart with the first
|
|
<I>pattern {action}</I>
|
|
|
|
pair in the program.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
A
|
|
<B>nextfile</B>
|
|
|
|
statement tells <B>mawk</B> to stop processing the current input file.
|
|
It then updates FILENAME to the next file listed on the command line,
|
|
and resets FNR to 1.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
An
|
|
<B>exit</B>
|
|
|
|
statement
|
|
causes immediate execution of the
|
|
<B>END</B>
|
|
|
|
actions or program termination if there are none or
|
|
if the
|
|
<B>exit</B>
|
|
|
|
occurs in an
|
|
<B>END</B>
|
|
|
|
action.
|
|
The
|
|
<I>opt_expr</I>
|
|
|
|
sets the exit value of the program unless overridden by
|
|
a later
|
|
<B>exit</B>
|
|
|
|
or subsequent error.
|
|
|
|
<A NAME="lbAT"> </A>
|
|
<H2>EXAMPLES</H2>
|
|
|
|
<PRE>
|
|
1. emulate cat.
|
|
|
|
{ print }
|
|
|
|
2. emulate wc.
|
|
|
|
{ chars += length($0) + 1 # add one for the \n
|
|
words += NF
|
|
}
|
|
|
|
END{ print NR, words, chars }
|
|
|
|
3. count the number of unique ``real words''.
|
|
|
|
BEGIN { FS = "[^A-Za-z]+" }
|
|
|
|
{ for(i = 1 ; i <= NF ; i++) word[$i] = "" }
|
|
|
|
END { delete word[""]
|
|
for ( i in word ) cnt++
|
|
print cnt
|
|
}
|
|
|
|
</PRE>
|
|
|
|
4. sum the second field of
|
|
every record based on the first field.
|
|
<PRE>
|
|
|
|
$1 ~ /credit|gain/ { sum += $2 }
|
|
$1 ~ /debit|loss/ { sum -= $2 }
|
|
|
|
END { print sum }
|
|
|
|
5. sort a file, comparing as string
|
|
|
|
{ line[NR] = $0 "" } # make sure of comparison type
|
|
# in case some lines look numeric
|
|
|
|
END { isort(line, NR)
|
|
for(i = 1 ; i <= NR ; i++) print line[i]
|
|
}
|
|
|
|
#insertion sort of A[1..n]
|
|
function isort( A, n, i, j, hold)
|
|
{
|
|
for( i = 2 ; i <= n ; i++)
|
|
{
|
|
hold = A[j = i]
|
|
while ( A[j-1] > hold )
|
|
{ j-- ; A[j+1] = A[j] }
|
|
A[j] = hold
|
|
}
|
|
# sentinel A[0] = "" will be created if needed
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAU"> </A>
|
|
<H2>COMPATIBILITY ISSUES</H2>
|
|
|
|
<A NAME="lbAV"> </A>
|
|
<H3>MAWK 1.3.3 versus POSIX 1003.2 Draft 11.3</H3>
|
|
|
|
The POSIX 1003.2(draft 11.3) definition of the AWK language
|
|
is AWK as described in the AWK book with a few extensions
|
|
that appeared in SystemVR4 nawk.
|
|
The extensions are:
|
|
<DL COMPACT><DT id="99"><DD>
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
New functions: toupper() and tolower().
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
New variables: ENVIRON[] and CONVFMT.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
ANSI C conversion specifications for printf() and sprintf().
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
New command options: -v var=value, multiple -f options and
|
|
implementation options as arguments to -W.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
For systems (MS-DOS or Windows) which provide a <I>setmode</I> function,
|
|
an environment variable MAWKBINMODE and a built-in variable BINMODE.
|
|
The bits of the BINMODE value tell <I>mawk</I> how to modify the
|
|
<B>RS</B> and <B>ORS</B> variables:
|
|
<DL COMPACT><DT id="100"><DD>
|
|
<DL COMPACT>
|
|
<DT id="101">0<DD>
|
|
set standard input to binary mode,
|
|
and if BIT-2 is unset, set <B>RS</B> to "\r\n" (CR/LF) rather than "\n" (LF).
|
|
<DT id="102">1<DD>
|
|
set standard output to binary mode,
|
|
and if BIT-2 is unset, set <B>ORS</B> to "\r\n" (CR/LF) rather than "\n" (LF).
|
|
<DT id="103">2<DD>
|
|
suppress the assignment to <B>RS</B> and <B>ORS</B> of CR/LF,
|
|
making it possible to run scripts and generate output compatible
|
|
with Unix line-endings.
|
|
</DL>
|
|
</DL>
|
|
|
|
</DL>
|
|
|
|
<P>
|
|
POSIX AWK is oriented to operate on files a line at
|
|
a time.
|
|
<B>RS</B>
|
|
|
|
can be changed from "\n" to another single character,
|
|
but it
|
|
is hard to find any use for this --- there are no
|
|
examples in the AWK book.
|
|
By convention, <B>RS</B> = "", makes one or more blank lines
|
|
separate records, allowing multi-line records.
|
|
When <B>RS</B> = "", "\n" is always a field separator
|
|
regardless of the value in
|
|
<B>FS</B>.
|
|
|
|
<P>
|
|
|
|
<B>mawk</B>,
|
|
|
|
on the other hand,
|
|
allows
|
|
<B>RS</B>
|
|
|
|
to be a regular expression.
|
|
When "\n" appears in records, it is treated as space, and
|
|
<B>FS</B>
|
|
|
|
always determines fields.
|
|
<P>
|
|
|
|
Removing the line at a time paradigm can make some programs
|
|
simpler and can
|
|
often improve performance.
|
|
For example, redoing example 3 from above,
|
|
<PRE>
|
|
|
|
BEGIN { RS = "[^A-Za-z]+" }
|
|
|
|
{ word[ $0 ] = "" }
|
|
|
|
END { delete word[ "" ]
|
|
for( i in word ) cnt++
|
|
print cnt
|
|
}
|
|
|
|
</PRE>
|
|
|
|
counts the number of unique words by making each word a record.
|
|
On moderate size files,
|
|
<B>mawk</B>
|
|
executes twice as fast, because of the simplified inner loop.
|
|
<P>
|
|
|
|
The following program replaces each comment by a single space in
|
|
a C program file,
|
|
<PRE>
|
|
|
|
BEGIN {
|
|
RS = "/\*([^*]|\*+[^/*])*\*+/"
|
|
# comment is record separator
|
|
ORS = " "
|
|
getline hold
|
|
}
|
|
|
|
{ print hold ; hold = $0 }
|
|
|
|
END { printf "%s" , hold }
|
|
|
|
</PRE>
|
|
|
|
Buffering one record is needed to avoid terminating the last
|
|
record with a space.
|
|
<P>
|
|
|
|
With
|
|
<B>mawk</B>,
|
|
|
|
the following are all equivalent,
|
|
<PRE>
|
|
|
|
x ~ /a\+b/ x ~ "a\+b" x ~ "a\\+b"
|
|
|
|
</PRE>
|
|
|
|
The strings get scanned twice, once as string and once as
|
|
regular expression.
|
|
On the string scan,
|
|
<B>mawk</B> ignores the escape on non-escape characters while the AWK
|
|
book advocates
|
|
<I>\c</I>
|
|
|
|
be recognized as
|
|
<I>c</I>
|
|
|
|
which necessitates the double escaping of meta-characters in
|
|
strings.
|
|
POSIX explicitly declines to define the behavior which passively
|
|
forces programs that must run under a variety of awks to use
|
|
the more portable but less readable, double escape.
|
|
<P>
|
|
|
|
POSIX AWK does not recognize "/dev/std{in,out,err}".
|
|
Some systems provide an actual device for this,
|
|
allowing AWKs which do not implement the feature directly to support it.
|
|
<P>
|
|
|
|
POSIX AWK does not recognize \x hex escape
|
|
sequences in strings.
|
|
Unlike ANSI C,
|
|
<B>mawk</B> limits the number of digits that follows \x to two as the current
|
|
implementation only supports 8 bit characters.
|
|
The built-in
|
|
<B>fflush</B>
|
|
|
|
first appeared in a recent (1993) AT&T awk released to netlib, and is
|
|
not part of the POSIX standard.
|
|
Aggregate deletion with
|
|
<B>delete</B>
|
|
|
|
<I>array</I>
|
|
|
|
is not part of the POSIX standard.
|
|
<P>
|
|
|
|
POSIX explicitly leaves the behavior of
|
|
<B>FS</B>
|
|
|
|
= "" undefined, and mentions splitting the record into characters as
|
|
a possible interpretation, but currently this use is not portable
|
|
across implementations.
|
|
<A NAME="lbAW"> </A>
|
|
<H3>Random numbers</H3>
|
|
|
|
<P>
|
|
|
|
POSIX does not prescribe a method for initializing random numbers at startup.
|
|
<P>
|
|
|
|
In practice, most implementations do nothing special,
|
|
which makes <B>srand</B> and <B>rand</B> follow the C runtime library,
|
|
making the initial seed value 1.
|
|
Some implementations (Solaris XPG4 and Tru64)
|
|
return 0 from the first call to <B>srand</B>,
|
|
although the results from <B>rand</B> behave as if the initial seed is 1.
|
|
Other implementations return 1.
|
|
<P>
|
|
|
|
While
|
|
<B>mawk</B>
|
|
can call <B>srand</B> at startup with no parameter
|
|
(initializing random numbers from the clock),
|
|
this feature may be suppressed using conditional compilation.
|
|
<A NAME="lbAX"> </A>
|
|
<H3>Extensions added for compatibility for GAWK and BWK</H3>
|
|
|
|
<P>
|
|
|
|
<B>Nextfile</B>
|
|
|
|
is a <B>gawk</B> extension (also implemented by BWK awk),
|
|
is not yet part of the POSIX standard (as of October 2012),
|
|
although it has been accepted for the next revision of the standard.
|
|
<P>
|
|
|
|
<B>Mktime</B>,
|
|
|
|
<B>strftime</B> and
|
|
|
|
<B>systime</B>
|
|
|
|
are <B>gawk</B> extensions.
|
|
<P>
|
|
|
|
The "/dev/stdin" feature was added to <B>mawk</B> after 1.3.4,
|
|
for compatibility with <B>gawk</B> and BWK awk.
|
|
The corresponding "-" (alias for /dev/stdin) was present in mawk 1.3.3.
|
|
<A NAME="lbAY"> </A>
|
|
<H3>Subtle Differences not in POSIX or the AWK Book</H3>
|
|
|
|
<P>
|
|
|
|
Finally, here is how
|
|
<B>mawk</B>
|
|
handles exceptional cases not discussed in the
|
|
AWK book or the POSIX draft.
|
|
It is unsafe to assume
|
|
consistency across awks and safe to skip to
|
|
the next section.
|
|
<P>
|
|
|
|
<DL COMPACT><DT id="104"><DD>
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
substr(s, i, n) returns the characters of s in the intersection
|
|
of the closed interval [1, length(s)] and the half-open interval [i, i+n).
|
|
When this intersection is empty, the empty string is
|
|
returned; so substr("ABC", 1, 0) = "" and
|
|
substr("ABC", -4, 6) = "A".
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
Every string, including the empty string, matches the empty string
|
|
at the
|
|
front so, s ~ // and s ~ "", are always 1 as is match(s, //) and
|
|
match(s, "").
|
|
The last two set
|
|
<B>RLENGTH</B>
|
|
|
|
to 0.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
index(s, t) is always the same as match(s, t1) where t1 is the
|
|
same as t with metacharacters escaped.
|
|
Hence consistency
|
|
with match requires that
|
|
index(s, "") always returns 1.
|
|
Also the condition, index(s,t) != 0 if and only t is a substring
|
|
of s, requires index("","") = 1.
|
|
|
|
<BR> .IP • 4
|
|
|
|
|
|
If getline encounters end of file, getline var, leaves var
|
|
unchanged.
|
|
Similarly, on entry to the
|
|
<B>END</B>
|
|
|
|
actions,
|
|
<B>$0</B>,
|
|
|
|
the fields and
|
|
<B>NF</B>
|
|
|
|
have their value unaltered from the last record.
|
|
|
|
</DL>
|
|
<A NAME="lbAZ"> </A>
|
|
<H2>ENVIRONMENT VARIABLES</H2>
|
|
|
|
<B>Mawk</B> recognizes these variables:
|
|
<DL COMPACT><DT id="105"><DD>
|
|
<DL COMPACT>
|
|
<DT id="106">MAWKBINMODE<DD>
|
|
(see <B>COMPATIBILITY ISSUES</B>)
|
|
<DT id="107">MAWK_LONG_OPTIONS<DD>
|
|
If this is set, <B>mawk</B> uses its value to decide what to do with
|
|
GNU-style long options:
|
|
<DL COMPACT><DT id="108"><DD>
|
|
<DL COMPACT>
|
|
<DT id="109">allow<DD>
|
|
<B>Mawk</B> allows the option to be checked against the (small) set of
|
|
long options it recognizes.
|
|
<DT id="110">error<DD>
|
|
<B>Mawk</B> prints an error message and exits.
|
|
This is the default.
|
|
<DT id="111">ignore<DD>
|
|
<B>Mawk</B> ignores the option.
|
|
<DT id="112">warn<DD>
|
|
Print an warning message and otherwise ignore the option.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DT id="113"><DD>
|
|
If the variable is unset, <B>mawk</B> prints an error message and exits.
|
|
<DT id="114">WHINY_USERS<DD>
|
|
This is an undocumented <B>gawk</B> feature.
|
|
It tells <B>mawk</B> to sort array indices before it starts to iterate
|
|
over the elements of an array.
|
|
</DL>
|
|
</DL>
|
|
|
|
|
|
<A NAME="lbBA"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
<P>
|
|
|
|
<B><A HREF="/cgi-bin/man/man2html?1+grep">grep</A></B>(1)
|
|
<P>
|
|
|
|
Aho, Kernighan and Weinberger,
|
|
<I>The AWK Programming Language</I>,
|
|
|
|
Addison-Wesley Publishing, 1988, (the AWK book),
|
|
defines the language, opening with a tutorial
|
|
and advancing to many interesting programs that delve into
|
|
issues of software design and analysis relevant to programming
|
|
in any language.
|
|
<P>
|
|
|
|
<I>The GAWK Manual</I>,
|
|
|
|
The Free Software Foundation, 1991, is a tutorial
|
|
and language reference
|
|
that does not attempt the depth of the AWK book
|
|
and assumes the reader may be a novice programmer.
|
|
The section on AWK arrays is excellent.
|
|
It also discusses POSIX requirements for AWK.
|
|
|
|
<A NAME="lbBB"> </A>
|
|
<H2>BUGS</H2>
|
|
|
|
<P>
|
|
|
|
<B>mawk</B>
|
|
implements printf() and sprintf() using the C library functions,
|
|
printf and sprintf, so full ANSI compatibility requires an ANSI
|
|
C library.
|
|
In practice this means the h conversion qualifier may not be available.
|
|
Also <B>mawk</B> inherits any bugs or limitations of the library functions.
|
|
<P>
|
|
|
|
Implementors of the AWK language have shown a consistent lack
|
|
of imagination when naming their programs.
|
|
|
|
<A NAME="lbBC"> </A>
|
|
<H2>AUTHOR</H2>
|
|
|
|
Mike Brennan (<A HREF="mailto:brennan@whidbey.com">brennan@whidbey.com</A>).
|
|
<BR>
|
|
|
|
Thomas E. Dickey <<A HREF="mailto:dickey@invisible-island.net">dickey@invisible-island.net</A>>.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="115"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="116"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="117"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="118"><A HREF="#lbAE">OPTIONS</A><DD>
|
|
<DT id="119"><A HREF="#lbAF">THE AWK LANGUAGE</A><DD>
|
|
<DL>
|
|
<DT id="120"><A HREF="#lbAG"><B>1. Program structure</A><DD>
|
|
<DT id="121"><A HREF="#lbAH"><B>2. Data types, conversion and comparison</A><DD>
|
|
<DT id="122"><A HREF="#lbAI"><B>3. Regular expressions</A><DD>
|
|
<DT id="123"><A HREF="#lbAJ"><B>4. Records and fields</A><DD>
|
|
<DT id="124"><A HREF="#lbAK"><B>5. Expressions and operators</A><DD>
|
|
<DT id="125"><A HREF="#lbAL"><B>6. Arrays</A><DD>
|
|
<DT id="126"><A HREF="#lbAM"><B>7. Builtin-variables</B></A><DD>
|
|
<DT id="127"><A HREF="#lbAN"><B>8. Built-in functions</A><DD>
|
|
<DT id="128"><A HREF="#lbAO"><B>9. Input and output</A><DD>
|
|
<DT id="129"><A HREF="#lbAP"><B>10. User defined functions</A><DD>
|
|
<DT id="130"><A HREF="#lbAQ"><B>11. Splitting strings, records and files</A><DD>
|
|
<DT id="131"><A HREF="#lbAR"><B>12. Multi-line records</A><DD>
|
|
<DT id="132"><A HREF="#lbAS"><B>13. Program execution</A><DD>
|
|
</DL>
|
|
<DT id="133"><A HREF="#lbAT">EXAMPLES</A><DD>
|
|
<DT id="134"><A HREF="#lbAU">COMPATIBILITY ISSUES</A><DD>
|
|
<DL>
|
|
<DT id="135"><A HREF="#lbAV">MAWK 1.3.3 versus POSIX 1003.2 Draft 11.3</A><DD>
|
|
<DT id="136"><A HREF="#lbAW">Random numbers</A><DD>
|
|
<DT id="137"><A HREF="#lbAX">Extensions added for compatibility for GAWK and BWK</A><DD>
|
|
<DT id="138"><A HREF="#lbAY">Subtle Differences not in POSIX or the AWK Book</A><DD>
|
|
</DL>
|
|
<DT id="139"><A HREF="#lbAZ">ENVIRONMENT VARIABLES</A><DD>
|
|
<DT id="140"><A HREF="#lbBA">SEE ALSO</A><DD>
|
|
<DT id="141"><A HREF="#lbBB">BUGS</A><DD>
|
|
<DT id="142"><A HREF="#lbBC">AUTHOR</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:19 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|