444 lines
16 KiB
Racket
444 lines
16 KiB
Racket
#lang scribble/doc
|
|
@(require scribble/manual
|
|
(for-label scheme/base
|
|
scheme/contract
|
|
scheme/port
|
|
preprocessor/mztext))
|
|
|
|
|
|
@title[#:tag "mztext"]{@exec{mztext}}
|
|
|
|
@exec{mztext} is another Scheme-based preprocessing language. It can
|
|
be used as a preprocessor in a similar way to @exec{mzpp} since it
|
|
also uses @schememodname[preprocessor/pp-run] functionality. However,
|
|
@exec{mztext} uses a completely different processing principle, it is
|
|
similar to TeX rather than the simple interleaving of text and Scheme
|
|
code done by @exec{mzpp}.
|
|
|
|
Text is being input from file(s), and by default copied to the
|
|
standard output. However, there are some magic sequences that trigger
|
|
handlers that can take over this process---these handlers gain
|
|
complete control over what is being read and what is printed, and at
|
|
some point they hand control back to the main loop. On a high-level
|
|
point of view, this is similar to ``programming'' in TeX, where macros
|
|
accept as input the current input stream. The basic mechanism that
|
|
makes this programming is a @deftech{composite input port} which is a
|
|
prependable input port---so handlers are not limited to processing
|
|
input and printing output, they can append their output back on the
|
|
current input which will be reprocessed.
|
|
|
|
The bottom line of all this is that @exec{mztext} is can perform more
|
|
powerful preprocessing than the @exec{mzpp}, since you can define your own
|
|
language as the file is processed.
|
|
|
|
@section{Invoking mztext}
|
|
|
|
Use the @Flag{h} flag to get the available flags. SEE above for an
|
|
explanation of the @DFlag{run} flag.
|
|
|
|
@section{mztext processing: the standard command dispatcher}
|
|
|
|
@exec{mztext} can use arbitrary magic sequences, but for convenience,
|
|
there is a default built-in dispatcher that connects Scheme code with
|
|
the preprocessed text---by default, it is triggered by @litchar["@"].
|
|
When file processing encounters this marker, control is transferred to
|
|
the command dispatcher. In its turn, the command dispatcher reads a
|
|
Scheme expression (using @scheme[read]), evaluates it, and decides
|
|
what to do next. In case of a simple Scheme value, it is converted to
|
|
a string and pushed back on the preprocessed input. For example, the
|
|
following text:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
foo
|
|
@"bar"
|
|
@(+ 1 2)
|
|
@"@(* 3 4)"
|
|
@(/ (read) 3)12
|
|
}|
|
|
|
|
generates this output:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
foo
|
|
bar
|
|
3
|
|
12
|
|
4
|
|
}|
|
|
|
|
An explanation of a few lines:
|
|
|
|
@itemize[
|
|
|
|
@item{@litchar|{@"bar"}|, @litchar|{@(+ 1 2)}|---the Scheme objects
|
|
that is read is evaluated and displayed back on the input port which
|
|
is then printed.}
|
|
|
|
@item{@litchar|{@"@(* 3 4)"}| --- demonstrates that the results
|
|
are ``printed'' back on the input: the string that in this case
|
|
contains another use of @litchar["@"] which will then get read back
|
|
in, evaluated, and displayed.}
|
|
|
|
@item{@litchar|{@(/ (read) 3)12}| --- demonstrates that the Scheme
|
|
code can do anything with the current input.}
|
|
|
|
]
|
|
|
|
The complete behavior of the command dispatcher follows:
|
|
|
|
@itemize[
|
|
|
|
@item{If the marker sequence is followed by itself, then it is simply
|
|
displayed, using the default, @litchar["@@"] outputs a @litchar["@"].}
|
|
|
|
@item{Otherwise a Scheme expression is read and evaluated, and the result is
|
|
processed as follows:
|
|
|
|
@itemize[
|
|
|
|
@item{If the result consists of multiple values, each one is processed,}
|
|
|
|
@item{If it is @|void-const| or @scheme[#f], nothing is done,}
|
|
|
|
@item{If it is a structure of pairs, this structure is processed
|
|
recursively,}
|
|
|
|
@item{If it is a promise, it is forced and its value is used instead,}
|
|
|
|
@item{Strings, bytes, and paths are pushed back on the input stream,}
|
|
|
|
@item{Symbols, numbers, and characters are converted to strings and pushed
|
|
back on the input,}
|
|
|
|
@item{An input port will be perpended to the input, both processed as a
|
|
single input,}
|
|
|
|
@item{Procedures of one or zero arity are treated in a special way---see
|
|
below, other procedures cause an error}
|
|
|
|
@item{All other values are ignored.}
|
|
|
|
]
|
|
}
|
|
|
|
@item{When this processing is done, and printable results have been re-added
|
|
to the input port, control is returned to the main processing loop.}
|
|
|
|
]
|
|
|
|
A built-in convenient behavior is that if the evaluation of the Scheme
|
|
expression returned a @|void-const| or @scheme[#f] value (or multiple values that are
|
|
all @|void-const| or @scheme[#f]), then the next newline is swallowed using
|
|
@scheme[swallow-newline] (see below) if there is just white spaces before it.
|
|
|
|
During evaluation, printed output is displayed as is, without
|
|
re-processing. It is not hard to do that, but it is a little expensive,
|
|
so the choice is to ignore it. (A nice thing to do is to redesign this
|
|
so each evaluation is taken as a real filter, which is done in its own
|
|
thread, so when a Scheme expression is about to evaluated, it is done in
|
|
a new thread, and the current input is wired to that thread's output.
|
|
However, this is much too heavy for a "simple" preprocesser...)
|
|
|
|
So far, we get a language that is roughly the same as we get from @exec{mzpp}
|
|
(with the added benefit of reprocessing generated text, which could be
|
|
done in a better way using macros). The special treatment of procedure
|
|
values is what allows more powerful constructs. There are handled by
|
|
their arity (preferring a the nullary treatment over the unary one):
|
|
|
|
@itemize[
|
|
|
|
@item{A procedure of arity 0 is simply invoked, and its resulting value is
|
|
used. The procedure can freely use the input stream to retrieve
|
|
arguments. For example, here is how to define a standard C function
|
|
header for use in a Racket extension file:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
@(define (cfunc)
|
|
(format
|
|
"Scheme_Object *~a(int argc, Scheme_Object *argv[])\n"
|
|
(read-line)))
|
|
@cfunc foo
|
|
@cfunc bar
|
|
|
|
==>
|
|
|
|
Scheme_Object * foo(int argc, Scheme_Object *argv[])
|
|
Scheme_Object * bar(int argc, Scheme_Object *argv[])
|
|
}|
|
|
|
|
Note how @scheme[read-line] is used to retrieve an argument, and how this
|
|
results in an extra space in the actual argument value. Replacing
|
|
this with @scheme[read] will work slightly better, except that input will
|
|
have to be a Scheme token (in addition, this will not consume the
|
|
final newline so the extra one in the format string should be
|
|
removed). The @scheme[get-arg] function can be used to retrieve arguments
|
|
more easily---by default, it will return any text enclosed by
|
|
parenthesis, brackets, braces, or angle brackets (see below). For
|
|
example:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
@(define (tt)
|
|
(format "<tt>~a</tt>" (get-arg)))
|
|
@(define (ref)
|
|
(format "<a href=~s>~a</a>" (get-arg) (get-arg)))
|
|
@(define (ttref)
|
|
(format "<a href=~s>@tt{~a}</a>" (get-arg) (get-arg)))
|
|
@(define (reftt)
|
|
(format "<a href=~s>~a</a>" (get-arg) (tt)))
|
|
@ttref{racket-lang.org}{Racket}
|
|
@reftt{racket-lang.org}{Racket}
|
|
|
|
==>
|
|
|
|
<a href="racket-lang.org"><tt>Racket</tt></a>
|
|
<a href="racket-lang.org"><tt>Racket</tt></a>
|
|
}|
|
|
|
|
Note that in @scheme[reftt] we use @scheme[tt] without arguments since it will
|
|
retrieve its own arguments. This makes @scheme[ttref]'s approach more
|
|
natural, except that "calling" @scheme[tt] through a Scheme string doesn't
|
|
seem natural. For this there is a @scheme[defcommand] command (see below)
|
|
that can be used to define such functions without using Scheme code:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
@defcommand{tt}{X}{<tt>X</tt>}
|
|
@defcommand{ref}{url text}{<a href="url">text</a>}
|
|
@defcommand{ttref}{url text}{<a href="url">@tt{text}</a>}
|
|
@ttref{racket-lang.org}{Racket}
|
|
|
|
==>
|
|
|
|
<a href="racket-lang.org"><tt>Racket</tt></a>
|
|
}|}
|
|
|
|
@item{A procedure of arity 1 is invoked differently---it is applied on a
|
|
thunk that holds the "processing continuation". This application is
|
|
not expected to return, instead, the procedure can decide to hand over
|
|
control back to the main loop by using this thunk. This is a powerful
|
|
facility that is rarely needed, similarly to the fact that @scheme[call/cc]
|
|
is rarely needed in Scheme.}
|
|
|
|
]
|
|
|
|
Remember that when procedures are used, generated output is not
|
|
reprocessed, just like evaluating other expressions.
|
|
|
|
@section[#:tag "mztext-lib"]{Provided bindings}
|
|
|
|
@defmodule[preprocessor/mztext]
|
|
|
|
Similarly to @exec{mzpp}, @schememodname[preprocessor/mztext] contains
|
|
both the implementation as well as user-visible bindings.
|
|
|
|
Dispatching-related bindings:
|
|
|
|
@defproc*[([(command-marker) string?]
|
|
[(command-marker [str string?]) void?])]{
|
|
|
|
|
|
A string parameter-like procedure that can be used to set a
|
|
different command marker string. Defaults to @litchar["@"]. It can
|
|
also be set to @scheme[#f] which will disable the command dispatcher
|
|
altogether. Note that this is a procedure---it cannot be used with
|
|
@scheme[parameterize].}
|
|
|
|
@defproc*[([(dispatchers) (listof list?)]
|
|
[(dispatchers [disps (listof list?)]) void?])]{
|
|
|
|
A parameter-like procedure (same as @scheme[command-marker]) holding a list
|
|
of lists---each one a dispatcher regexp and a handler function. The
|
|
regexp should not have any parenthesized subgroups, use @scheme["(?:...)"] for
|
|
grouping. The handler function is invoked whenever the regexp is seen
|
|
on the input stream: it is invoked on two arguments---the matched
|
|
string and a continuation thunk. It is then responsible for the rest
|
|
of the processing, usually invoking the continuation thunk to resume
|
|
the default preprocessing. For example:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
@(define (foo-handler str cont)
|
|
(add-to-input (list->string
|
|
(reverse (string->list (get-arg)))))
|
|
(cont))
|
|
@(dispatchers (cons (list "foo" foo-handler) (dispatchers)))
|
|
foo{>Foo<oof}
|
|
|
|
==>
|
|
|
|
Foo
|
|
}|
|
|
|
|
Note that the standard command dispatcher uses the same facility, and
|
|
it is added by default to the dispatcher list unless @scheme[command-marker]
|
|
is set to @scheme[#f].}
|
|
|
|
|
|
@defproc[(make-composite-input [v any/c] ...) input-port?]{
|
|
|
|
Creates a composite input port, initialized by the given values
|
|
(input ports, strings, etc). The resulting port will read data from
|
|
each of the values in sequence, appending them together to form a
|
|
single input port. This is very similar to
|
|
@scheme[input-port-append], but it is extended to allow prepending
|
|
additional values to the beginning of the port using
|
|
@scheme[add-to-input]. The @exec{mztext} executable relies on this
|
|
functionality to be able to push text back on the input when it is
|
|
supposed to be reprocessed, so use only such ports for the current
|
|
input port.}
|
|
|
|
@defproc[(add-to-input [v any/c] ...) void?]{
|
|
|
|
This should be used to ``output'' a string (or an input port) back
|
|
on the current composite input port. As a special case, thunks can
|
|
be added to the input too---they will be executed when the ``read
|
|
header'' goes past them, and their output will be added back
|
|
instead. This is used to plant handlers that happen when reading
|
|
beyond a specific point (for example, this is how the directory is
|
|
changed to the processed file to allow relative includes). Other
|
|
simple values are converted to strings using @scheme[format], but
|
|
this might change.}
|
|
|
|
@defparam[paren-pairs pairs (listof (list/c string? string?))]{
|
|
|
|
This is a parameter holding a list of lists, each one holding two
|
|
strings which are matching open/close tokens for @scheme[get-arg].}
|
|
|
|
@defboolparam[get-arg-reads-word? on?]{
|
|
|
|
A parameter that holds a boolean value defaulting to @scheme[#f]. If true,
|
|
then @scheme[get-arg] will read a whole word (non-whitespace string delimited
|
|
by whitespaces) for arguments that are not parenthesized with a pair
|
|
in @scheme[paren-pairs].}
|
|
|
|
@defproc[(get-arg) (or/c string? eof-object?)]{
|
|
|
|
This function will retrieve a text argument surrounded by a paren
|
|
pair specified by @scheme[paren-pairs]. First, an open-pattern is
|
|
searched, and then text is assembled making sure that open-close
|
|
patterns are respected, until a matching close-pattern is found.
|
|
When this scan is performed, other parens are ignored, so if the
|
|
input stream has @litchar|{{[(}}|, the return value will be
|
|
@scheme["[("]. It is possible for both tokens to be the same, which
|
|
will have no nesting possible. If no open-pattern is found, the
|
|
first non-whitespace character is used, and if that is also not
|
|
found before the end of the input, an @scheme[eof] value is
|
|
returned. For example (using @scheme[defcommand] which uses
|
|
@scheme[get-arg]):
|
|
|
|
@verbatim[#:indent 2]|{
|
|
@(paren-pairs (cons (list "|" "|") (paren-pairs)))
|
|
@defcommand{verb}{X}{<tt>X</tt>}
|
|
@verb abc
|
|
@(get-arg-reads-word? #t)
|
|
@verb abc
|
|
@verb |FOO|
|
|
@verb
|
|
|
|
==>
|
|
|
|
<tt>a</tt>bc
|
|
<tt>abc</tt>
|
|
<tt>FOO</tt>
|
|
verb: expecting an argument for `X'
|
|
}|
|
|
|
|
}
|
|
|
|
@defproc[(get-arg*) (or/c string? eof-object?)]{
|
|
|
|
Similar to @scheme[get-arg], except that the resulting text is first
|
|
processed. Since arguments are usually text strings,
|
|
``programming'' can be considered as lazy evaluation, which
|
|
sometimes can be too inefficient (TeX suffers from the same
|
|
problem). The @scheme[get-arg*] function can be used to reduce some
|
|
inputs immediately after they have been read.}
|
|
|
|
@defproc[(swallow-newline) void?]{
|
|
|
|
This is a simple command that simply does this:
|
|
|
|
@schemeblock[
|
|
(regexp-try-match #rx"^[ \t]*\r?\n" (stdin))
|
|
]
|
|
|
|
The result is that a newline will be swallowed if there is only
|
|
whitespace from the current location to the end of the line. Note
|
|
that as a general principle @scheme[regexp-try-match] should be
|
|
preferred over @scheme[regexp-match] for @exec{mztext}'s
|
|
preprocessing.
|
|
|
|
}
|
|
|
|
|
|
@defproc[(defcommand [name any/c][args list?][text string?]) void?]{
|
|
|
|
This is a command that can be used to define simple template
|
|
commands. It should be used as a command, not from Scheme code
|
|
directly, and it should receive three arguments:
|
|
|
|
@itemize[
|
|
|
|
@item{The name for the new command (the contents of this argument is
|
|
converted to a string),}
|
|
|
|
@item{The list of arguments (the contents of this is turned to a list of
|
|
identifiers),}
|
|
|
|
@item{Arbitrary text, with @bold{textual} instances of the variables that
|
|
denote places they are used.}
|
|
|
|
]
|
|
|
|
For example, the sample code above:
|
|
|
|
@verbatim[#:indent 2]|{
|
|
@defcommand{ttref}{url text}{<a href="url">@tt{text}</a>}
|
|
}|
|
|
|
|
is translated to the following definition expression:
|
|
|
|
@schemeblock[
|
|
(define (ttref)
|
|
(let ((url (get-arg)) (text (get-arg)))
|
|
(list "<a href=\"" url "\">@tt{" text "}</a>")))
|
|
]
|
|
|
|
which is then evaluated. Note that the arguments play a role as both
|
|
Scheme identifiers and textual markers.
|
|
|
|
}
|
|
|
|
|
|
@defproc[(include [file path-string?] ...) void?]{
|
|
|
|
This will add all of the given inputs to the composite port and run
|
|
the preprocessor loop. In addition to the given inputs, some thunks
|
|
are added to the input port (see @scheme[add-to-input] above) to change
|
|
directory so relative includes work.
|
|
|
|
If it is called with no arguments, it will use @scheme[get-arg] to get an
|
|
input filename, therefore making it possible to use this as a
|
|
dispatcher command as well.}
|
|
|
|
@defproc[(preprocess [in (or/c path-string? input-port?)]) void?]{
|
|
|
|
This is the main entry point to the preprocessor---creating a new
|
|
composite port, setting internal parameters, then calling @scheme[include] to
|
|
start the preprocessing.}
|
|
|
|
|
|
@deftogether[(
|
|
@defthing[stdin parameter?]
|
|
@defthing[stdout parameter?]
|
|
@defthing[stderr parameter?]
|
|
@defthing[cd parameter?]
|
|
)]{
|
|
|
|
These are shorter names for the corresponding port parameters and
|
|
@scheme[current-directory].}
|
|
|
|
@defparam[current-file path path-string?]{
|
|
|
|
This is a parameter that holds the name of the currently processed
|
|
file, or #f if none.}
|
|
|