718 lines
31 KiB
Racket
718 lines
31 KiB
Racket
#reader(lib "docreader.ss" "scribble")
|
|
@require["mz.ss"]
|
|
@require[(lib "bnf.ss" "scribble")]
|
|
@require["reader-example.ss"]
|
|
@begin[
|
|
(define (ilitchar s)
|
|
(litchar s))
|
|
(define (nunterm s)
|
|
(nonterm s (subscript "n")))
|
|
(define (sub n) (subscript n))
|
|
(define (nonalpha)
|
|
@elem{; the next character must not be @schemelink[char-alphabetic?]{alphabetic}.})
|
|
]
|
|
@define[(graph-tag) @kleenerange[1 8]{@nonterm{digit@sub{10}}}]
|
|
@define[(graph-defn) @elem{@litchar{#}@graph-tag[]@litchar{=}}]
|
|
@define[(graph-ref) @elem{@litchar{#}@graph-tag[]@litchar{#}}]
|
|
|
|
@title{Syntax}
|
|
|
|
The syntax of a Scheme program is defined by
|
|
|
|
@itemize{
|
|
|
|
@item{a @defterm{read} phase that processes a character stream into
|
|
an S-expression, and}
|
|
|
|
@item{an @defterm{expand} phase that processes the S-expression based
|
|
on bindings in the lexical environment, where some parsing
|
|
steps can introduce new bindings for further parsing steps.}
|
|
|
|
}
|
|
|
|
Note that parsing is defined in terms of Unicode characters; see
|
|
@secref["unicode"] for information on how a byte stream is converted
|
|
to a character stream.
|
|
|
|
@section[#:tag "reader"]{Reading Data}
|
|
|
|
Scheme's reader is a recursive-descent parser that can be configured
|
|
through a @seclink["readtable"]{readtable} and various other
|
|
@seclink["parameters"]{parameters}. This section describes the reader's
|
|
parsing when using the default readtable.
|
|
|
|
Reading from a stream produces one datum. If the result datum is a
|
|
compound value, then reading the datum typically requires the reader
|
|
to call itself recursively to read the component data.
|
|
|
|
The reader can be invoked in either of two modes: @scheme[read] mode,
|
|
or @scheme[read-syntax] mode. In @scheme[read-syntax] mode, the result
|
|
is always a @seclink["stxobj"]{syntax object} that includes
|
|
source-location information wrapped around the sort of datum that
|
|
@scheme[read] mode would produce. In the case of pairs, vectors, and
|
|
boxes, morever, the content is also wrapped recursively as a syntax
|
|
object. Unless specified otherwise, this section describes the
|
|
reader's behavior in @scheme[read] mode, and @scheme[read-syntax] mode
|
|
does the same modulo wrapping the final result.
|
|
|
|
@subsection{Delimiters and Dispatch}
|
|
|
|
Along with @schemelink[char-whitespace?]{whitespace}, the following
|
|
characters are @defterm{delimiters}:
|
|
|
|
@t{
|
|
@hspace[2] @ilitchar{(} @ilitchar{)} @ilitchar{[} @ilitchar{]}
|
|
@ilitchar["["] @ilitchar["]"]
|
|
@ilitchar{"} @ilitchar{,} @ilitchar{'} @ilitchar{`}
|
|
@ilitchar{;}
|
|
}
|
|
|
|
A delimited sequence that starts with any other character is typically
|
|
parsed as either a symbol or number, but a few non-delimiter
|
|
characters play special roles:
|
|
|
|
@itemize{
|
|
|
|
@item{@litchar{#} has a special meaning as an initial character in a
|
|
delimited sequence; its meaning depends on the characters that
|
|
follow; see below.}
|
|
|
|
@item{@as-index{@litchar["|"]} starts a subsequence of characters to
|
|
be included verbatim in the delimited sequence (i.e,. they are
|
|
never treated as delimiters, and they are not case-folded when
|
|
case-insensitivity is enabled); the subsequence is terminated
|
|
by another @litchar["|"], and neither the initial nor
|
|
terminating @litchar["|"] is part of the subsequence.}
|
|
|
|
@item{@as-index{@litchar["\\"]} outside of a @litchar["|"] pair causes
|
|
the folowing character to be included verbatim in a delimited
|
|
sequence.}
|
|
|
|
}
|
|
|
|
More precisely, after skipping whitespace, the reader dispatches based
|
|
on the next character or characters in the input stream as follows:
|
|
|
|
@dispatch-table[
|
|
|
|
@dispatch[@litchar{(}]{starts a pair or list; see @secref["parse-pair"]}
|
|
@dispatch[@litchar{[}]{starts a pair or list; see @secref["parse-pair"]}
|
|
@dispatch[@litchar["{"]]{starts a pair or list; see @secref["parse-pair"]}
|
|
|
|
@dispatch[@litchar{)}]{matches @litchar{(} or raises @Exn{exn:fail:read}}
|
|
@dispatch[@litchar{]}]{matches @litchar{[} or raises @Exn{exn:fail:read}}
|
|
@dispatch[@litchar["}"]]{matches @litchar["{"] or raises @Exn{exn:fail:read}}
|
|
|
|
@dispatch[@litchar{"}]{starts a string; see @secref["parse-string"]}
|
|
@dispatch[@litchar{,}]{starts a quote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{`}]{starts a quasiquote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{,}]{starts an unquote or splicing unquote; see @secref["parse-quote"]}
|
|
|
|
@dispatch[@litchar{;}]{starts a line comment; see @secref["parse-comment"]}
|
|
|
|
@dispatch[@cilitchar{#t}]{true; see @secref["parse-boolean"]}
|
|
@dispatch[@cilitchar{#f}]{false; see @secref["parse-boolean"]}
|
|
|
|
@dispatch[@litchar{#(}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@litchar{#[}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@litchar["#{"]]{starts a vector; see @secref["parse-vector"]}
|
|
|
|
@dispatch[@litchar["#\\"]]{starts a character; see @secref["parse-character"]}
|
|
|
|
@dispatch[@litchar{#"}]{starts a byte string; see @secref["parse-string"]}
|
|
@dispatch[@litchar{#%}]{starts a symbol; see @secref["parse-symbol"]}
|
|
@dispatch[@litchar{#:}]{starts a keyword; see @secref["parse-keyword"]}
|
|
@dispatch[@litchar{#&}]{starts a box; see @secref["parse-box"]}
|
|
|
|
@dispatch[@litchar["#|"]]{starts a block comment; see @secref["parse-comment"]}
|
|
@dispatch[@litchar["#;"]]{starts an S-expression comment; see @secref["parse-comment"]}
|
|
@dispatch[@litchar{#,}]{starts a syntax quote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{#`}]{starts a syntax quasiquote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{#,}]{starts an syntax unquote or splicing unquote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar["#~"]]{starts compiled code; see @secref["compilation"]}
|
|
|
|
@dispatch[@cilitchar{#i}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#e}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#x}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#o}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#d}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#b}]{starts a number; see @secref["parse-number"]}
|
|
|
|
@dispatch[@cilitchar["#<<"]]{starts a string; see @secref["parse-string"]}
|
|
|
|
@dispatch[@litchar{#rx}]{starts a regular expression; see @secref["parse-regexp"]}
|
|
@dispatch[@litchar{#px}]{starts a regular expression; see @secref["parse-regexp"]}
|
|
|
|
@dispatch[@cilitchar{#ci}]{switches case sensitivity; see @secref["parse-symbol"]}
|
|
@dispatch[@cilitchar{#cs}]{switches case sensitivity; see @secref["parse-symbol"]}
|
|
|
|
@dispatch[@cilitchar["#sx"]]{starts a Scheme expression; see @secref["parse-honu"]}
|
|
|
|
@dispatch[@litchar["#hx"]]{starts a Honu expression; see @secref["parse-honu"]}
|
|
@dispatch[@litchar["#honu"]]{starts a Honu module; see @secref["parse-honu"]}
|
|
|
|
@dispatch[@litchar["#hash"]]{starts a hash table; see @secref["parse-hashtable"]}
|
|
|
|
@dispatch[@litchar["#reader"]]{starts a reader extension use; see @secref["parse-reader"]}
|
|
|
|
@dispatch[@elem{@litchar{#}@kleeneplus{@nonterm{digit@sub{10}}}@litchar{(}}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@elem{@litchar{#}@kleeneplus{@nonterm{digit@sub{10}}}@litchar{[}}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@elem{@litchar{#}@kleeneplus{@nonterm{digit@sub{10}}}@litchar["{"]}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@graph-defn[]]{binds a graph tag; see @secref["parse-graph"]}
|
|
@dispatch[@graph-ref[]]{uses a graph tag; see @secref["parse-graph"]}
|
|
|
|
@dispatch[@italic{otherwise}]{starts a symbol; see @secref["parse-symbol"]}
|
|
|
|
]
|
|
|
|
|
|
@subsection[#:tag "parse-symbol"]{Reading Symbols}
|
|
|
|
A sequence that does not start with a delimiter or @litchar{#} is
|
|
parsed as either a symbol or a number (see @secref["parse-number"]),
|
|
except that @litchar{.} by itself is never parsed as a symbol or
|
|
character. A @as-index{@litchar{#%}} also starts a symbol. A successful
|
|
number parse takes precedence over a symbol parse.
|
|
|
|
When the @scheme[read-case-sensitive] parameter is set to @scheme[#f],
|
|
characters in the sequence that are not quoted by @litchar["|"] or
|
|
@litchar["\\"] are first case-normalized. If the reader encounters
|
|
@as-index{@litchar{#ci}}, @litchar{#CI}, @litchar{#Ci}, or @litchar{#cI},
|
|
then it recursively reads the following datum in
|
|
case-insensitive mode. If the reader encounters @as-index{@litchar{#cs}},
|
|
@litchar{#CS}, @litchar{#Cs}, or @litchar{#cS}, then recursively reads
|
|
the following datum in case-sensitive mode.
|
|
|
|
@reader-examples[
|
|
"Apple"
|
|
"Ap#ple"
|
|
"Ap ple"
|
|
"Ap| |ple"
|
|
"Ap\\ ple"
|
|
"#ci Apple"
|
|
"#ci |A|pple"
|
|
"#ci \\Apple"
|
|
"#ci#cs Apple"
|
|
"#%Apple"
|
|
]
|
|
|
|
@subsection[#:tag "parse-number"]{Reading Numbers}
|
|
|
|
@index['("numbers" "parsing")]{A} sequence that does not start with a
|
|
delimiter is parsed as a number when it matches the following grammar
|
|
case-insenstively for @nonterm{number@sub{10}} (decimal), where
|
|
@metavar{n} is a meta-meta-variable in the grammar.
|
|
|
|
A number is optionally prefixed by an exactness specifier,
|
|
@as-index{@litchar{#e}} (exact) or @as-index{@litchar{#i}} (inexact),
|
|
which specifies its parsing as an exact or inexact number; see
|
|
@secref["numbers"] for information on number exactness. As the
|
|
non-terminal names suggest, a number that has no exactness specifier
|
|
and matches only @nunterm{inexact-number} is normally parsed as an
|
|
inexact number, otherwise it is parsed as an excat number. If the
|
|
@scheme[read-decimal-as-inexact] parameter is set to @scheme[#f], then
|
|
all numbers without an exactness specifier are instead parsed as
|
|
exact.
|
|
|
|
If the reader encounters @as-index{@litchar{#b}} (binary),
|
|
@as-index{@litchar{#o}} (octal), @as-index{@litchar{#d}} (decimal), or
|
|
@as-index{@litchar{#x}} (hexadecimal), it must be followed by a
|
|
sequence that is terminated by a delimiter or end-of-file, and that
|
|
matches the @nonterm{general-number@sub{2}},
|
|
@nonterm{general-number@sub{8}}, @nonterm{general-number@sub{10}}, or
|
|
@nonterm{general-number@sub{16}} grammar, respectively.
|
|
|
|
An @nunterm{exponent-mark} in an inexact number serves both to specify
|
|
an exponent and specify a numerical precision. If single-precision
|
|
IEEE floating point is supported (see @secref["number"]), the marks
|
|
@litchar{f} and @litchar{s} specifies single-precision. Otherwise, or
|
|
with any other mark, double-precision IEEE floating point is used.
|
|
|
|
@BNF[(list @nunterm{number} @BNF-alt[@nunterm{exact-number}
|
|
@nunterm{inexact-number}])
|
|
(list @nunterm{exact-number} @BNF-alt[@nunterm{exact-integer}
|
|
@nunterm{exact-rational}
|
|
@nunterm{exact-complex}])
|
|
(list @nunterm{exact-integer} @BNF-seq[@optional{@nonterm{sign}} @nunterm{unsigned-exact-integer}])
|
|
(list @nunterm{unsigned-exact-integer} @kleeneplus{@nunterm{digit}})
|
|
(list @nunterm{exact-rational} @BNF-seq[@nunterm{exact-integer} @litchar{/} @nunterm{unsigned-exact-integer}])
|
|
(list @nunterm{exact-complex} @BNF-seq[@nunterm{exact-rational} @nonterm{sign} @nunterm{exact-rational} @litchar{i}])
|
|
(list @nunterm{inexact-number} @BNF-alt[@nunterm{inexact-real}
|
|
@nunterm{inexact-complex}])
|
|
(list @nunterm{inexact-real} @BNF-seq[@optional{@nonterm{sign}} @nunterm{inexact-normal-real}]
|
|
@BNF-seq[@nonterm{sign} @nunterm{inexact-special-real}])
|
|
(list @nunterm{inexact-unsigned-real} @BNF-alt[@nunterm{inexact-normal-real}
|
|
@nunterm{inexact-special-real}])
|
|
(list @nunterm{inexact-normal-real} @BNF-seq[@nunterm{inexact-simple-real} @optional{@nunterm{exponent-mark}
|
|
@optional[@nonterm{sign}] @nunterm{inexact-base}}])
|
|
(list @nunterm{inexact-simple-real} @BNF-seq[@nunterm{inexact-base} @optional{@litchar{.}} @kleenestar{@litchar{#}}]
|
|
@BNF-seq[@optional{@nunterm{exact-integer}} @litchar{.} @nunterm{inexact-base}]
|
|
@BNF-seq[@nunterm{inexact-base} @litchar{/} @nunterm{inexact-base}])
|
|
(list @nunterm{inexact-special-real} @BNF-alt[@litchar{inf.0} @litchar{nan.0}])
|
|
(list @nunterm{inexact-base} @BNF-seq[@kleeneplus{@nunterm{digit}} @kleenestar{@litchar{#}}])
|
|
(list @nunterm{inexact-complex} @BNF-seq[@optional{@nunterm{inexact-real}} @nonterm{sign} @nunterm{inexact-unsigned-real} @litchar{i}]
|
|
@BNF-seq[@nunterm{inexact-real} @litchar["@"] @nunterm{inexact-real}])
|
|
|
|
|
|
(list @nonterm{sign} @BNF-alt[@litchar{+}
|
|
@litchar{-}])
|
|
(list @nonterm{digit@sub{16}} @BNF-alt[@nonterm{digit@sub{10}} @litchar{a} @litchar{b} @litchar{c} @litchar{d}
|
|
@litchar{e} @litchar{f}])
|
|
(list @nonterm{digit@sub{10}} @BNF-alt[@nonterm{digit@sub{8}} @litchar{8} @litchar{9}])
|
|
(list @nonterm{digit@sub{8}} @BNF-alt[@nonterm{digit@sub{2}} @litchar{2} @litchar{3}
|
|
@litchar{4} @litchar{5} @litchar{6} @litchar{7}])
|
|
(list @nonterm{digit@sub{2}} @BNF-alt[@litchar{0} @litchar{1}])
|
|
(list @nonterm{exponent-mark@sub{16}} @BNF-alt[@litchar{s} @litchar{d} @litchar{l}])
|
|
(list @nonterm{exponent-mark@sub{10}} @BNF-alt[@nonterm{exponent-mark@sub{16}} @litchar{e} @litchar{f}])
|
|
(list @nonterm{exponent-mark@sub{8}} @nonterm{exponent-mark@sub{10}})
|
|
(list @nonterm{exponent-mark@sub{2}} @nonterm{exponent-mark@sub{10}})
|
|
(list @nunterm{general-number} @BNF-seq[@optional{@nonterm{exactness}} @nunterm{number}])
|
|
(list @nunterm{exactness-number} @BNF-alt[@litchar{#e} @litchar{#i}])
|
|
]
|
|
|
|
@reader-examples[
|
|
"-1"
|
|
"1/2"
|
|
"1.0"
|
|
"1+2i"
|
|
"1/2+3/4i"
|
|
"1.0+3.0e7i"
|
|
"2e5"
|
|
"#i5"
|
|
"#e2e5"
|
|
"#x2e5"
|
|
"#b101"
|
|
]
|
|
|
|
@subsection[#:tag "parse-boolean"]{Reading Booleans}
|
|
|
|
A @as-index{@litchar{#t}} or @as-index{@litchar{#T}} is the complete
|
|
input syntax for the boolean constant true, and
|
|
@as-index{@litchar{#f}} or @as-index{@litchar{#F}} is the complete
|
|
input syntax for the boolean constant false.
|
|
|
|
@subsection[#:tag "parse-pair"]{Reading Pairs and Lists}
|
|
|
|
When the reader encounters a @as-index{@litchar{(}},
|
|
@as-index{@litchar["["]}, or @as-index{@litchar["{"]}, it starts
|
|
parsing a pair or list; see @secref["pairs"] for information on pairs
|
|
and lists.
|
|
|
|
To parse the pair or list, the reader recursively reads data
|
|
until a matching @as-index{@litchar{)}}, @as-index{@litchar{]}}, or
|
|
@as-index{@litchar["}"]} (respectively) is found, and it specially handles
|
|
a delimited @litchar{.}. Pairs @litchar{()}, @litchar{[]}, and
|
|
@litchar["{}"] are treated the same way, so the remainder of this
|
|
section simply uses ``parentheses'' to mean any of these pair.
|
|
|
|
If the reader finds no delimited @as-index{@litchar{.}} among the elements
|
|
between parentheses, then it produces a list containing the results of
|
|
the recursive reads.
|
|
|
|
If the reader finds two data between the matching parentheses
|
|
that are separated by a delimited @litchar{.}, then it creates a
|
|
pair. More generally, if it finds two or more data where the
|
|
last is preceeded by a delimited @litchar{.}, then it constructs
|
|
nested pairs: the next-to-last element is paired with the last, then
|
|
the third-to-last is paired with that pair, and so on.
|
|
|
|
If the reader finds three or more data between the matching
|
|
parentheses, and if a pair of delimited @litchar{.}s surrounds any
|
|
oter than the first and last elements, the result is a list
|
|
countaining the element surrounded by @litchar{.}s as the first
|
|
element, followed by the others in ther read order. This convention
|
|
supports a kind of @index["infix"]{infix} notation at the reader
|
|
level.
|
|
|
|
In @scheme[read-syntax] mode, the recursive reads for the pair/list
|
|
elements are themselves in @scheme[read-syntax] mode, so that the
|
|
result is list or pair of syntax objects that it itself wrapped as a
|
|
syntax object. If the reader constructs nested pairs because the input
|
|
included a single delimited @litchar{.}, then only the innermost pair
|
|
and outtermost pair are wrapped as syntax objects. Whether wrapping a
|
|
pair or list, if the pair or list was formed with @litchar{[} and
|
|
@litchar{]}, then a @scheme['paren-shape] property is attached to the
|
|
result with the value @scheme[#\[];if the list or pair was formed with
|
|
@litchar["{"] and @litchar["}"], then a @scheme['paren-shape] property
|
|
is attached to the result with the value @scheme[#\{].
|
|
|
|
If a delimited @litchar{.} appears in any other configuration, then
|
|
the @exnraise[exn:fail:read]. Similarly, if the reader encounters a
|
|
@litchar{)}, @litchar["]"], or @litchar["}"] that does not end a list
|
|
being parsed, then the @exnraise[exn:fail:read].
|
|
|
|
@reader-examples[
|
|
"()"
|
|
"(1 2 3)"
|
|
"{1 2 3}"
|
|
"[1 2 3]"
|
|
"(1 (2) 3)"
|
|
"(1 . 3)"
|
|
"(1 . (3))"
|
|
"(1 . 2 . 3)"
|
|
]
|
|
|
|
If the @scheme[read-square-bracket-as-paren] parameter is set to
|
|
@scheme[#f], then when then reader encounters @litchar{[} and
|
|
@litchar{]}, the @exnraise{exn:fail:read}. Similarly, If the
|
|
@scheme[read-curly-brace-as-paren] parameter is set to @scheme[#f],
|
|
then when then reader encounters @litchar["{"] and @litchar["}"], the
|
|
@exnraise{exn:fail:read}.
|
|
|
|
@subsection[#:tag "parse-string"]{Reading Strings}
|
|
|
|
@index['("strings" "parsing")]{When} the reader encouters
|
|
@as-index{@litchar{"}}, it begins parsing characters to form a string. The
|
|
string continues until it is terminated by another @litchar{"} (that
|
|
is not escaped by @litchar["\\"]).
|
|
|
|
Within a string sequence, the following escape sequences are
|
|
recognized:
|
|
|
|
@itemize{
|
|
|
|
@item{@as-index{@litchar["\\a"]}: alarm (ASCII 7)}
|
|
@item{@as-index{@litchar["\\b"]}: backspace (ASCII 8)}
|
|
@item{@as-index{@litchar["\\t"]}: tab (ASCII 9)}
|
|
@item{@as-index{@litchar["\\n"]}: linefeed (ASCII 10)}
|
|
@item{@as-index{@litchar["\\v"]}: vertical tab (ASCII 11)}
|
|
@item{@as-index{@litchar["\\f"]}: formfeed (ASCII 12)}
|
|
@item{@as-index{@litchar["\\r"]}: return (ASCII 13)}
|
|
@item{@as-index{@litchar["\\e"]}: escape (ASCII 27)}
|
|
|
|
@item{@as-index{@litchar["\\\""]}: double-quotes (without terminating the string)}
|
|
@item{@as-index{@litchar["\\'"]}: quote (i.e., the backslash has no effect)}
|
|
@item{@as-index{@litchar["\\\\"]}: backslash (i.e., the second is not an escaping backslash)}
|
|
|
|
@item{@as-index{@litchar["\\"]@kleenerange[1 3]{@nonterm{digit@sub{8}}}}:
|
|
Unicode for the octal number specified by @kleenerange[1
|
|
3]{digit@sub{8}} (i.e., 1 to 3 @nonterm{digit@sub{8}}s) where
|
|
each @nonterm{digit@sub{8}} is @litchar{0}, @litchar{1},
|
|
@litchar{2}, @litchar{3}, @litchar{4}, @litchar{5},
|
|
@litchar{6}, or @litchar{7}. A longer form takes precedence
|
|
over a shorter form, and the resulting octal number must be
|
|
between 0 and 255 decimal, otherwise the
|
|
@exnraise[exn:fail:read].}
|
|
|
|
@item{@as-index{@litchar["\\x"]@kleenerange[1
|
|
2]{@nonterm{digit@sub{16}}}}: Unicode for the hexadecimal
|
|
number specified by @kleenerange[1 2]{@nonterm{digit@sub{16}}},
|
|
where each @nonterm{digit@sub{16}} is @litchar{0}, @litchar{1},
|
|
@litchar{2}, @litchar{3}, @litchar{4}, @litchar{5},
|
|
@litchar{6}, @litchar{7}, @litchar{8}, @litchar{9},
|
|
@litchar{a}, @litchar{b}, @litchar{c}, @litchar{d},
|
|
@litchar{e}, or @litchar{f} (case-insensitive). The longer form
|
|
takes precedence over the shorter form.}
|
|
|
|
@item{@as-index{@litchar["\\u"]@kleenerange[1
|
|
4]{@nonterm{digit@sub{16}}}}: like @litchar["\\x"], but with up
|
|
to four hexadecimal digits (longer sequences take precedence).
|
|
The resulting hexadecimal number must be a valid argument to
|
|
@scheme[integer->char], otherwise the
|
|
@exnraise[exn:fail:read].}
|
|
|
|
@item{@as-index{@litchar["\\U"]@kleenerange[1
|
|
8]{@nonterm{digit@sub{16}}}}: like @litchar["\\x"], but with up
|
|
to eight hexadecimal digits (longer sequences take precedence).
|
|
The resulting hexadecimal number must be a valid argument to
|
|
@scheme[integer->char], otherwise the
|
|
@exnraise[exn:fail:read].}
|
|
|
|
@item{@as-index{@litchar["\\"]@nonterm{newline}}: elided, where
|
|
@nonterm{newline} is either a linefeed, carriage return, or
|
|
carriage return--linefeed combination. This convetion allows
|
|
single-line strings to span multiple lines in the source.}
|
|
|
|
}
|
|
|
|
If the reader encounteres any other use of a backslashe in a string
|
|
constant, the @exnraise[exn:fail:read].
|
|
|
|
@index['("byte strings" "parsing")]{A} string constant preceded by
|
|
@litchar{#} is parsed as a byte-string. (That is, @as-index{@litchar{#"}} starts
|
|
a byte-string literal.) See @secref["byte-strings"] for
|
|
information on byte strings. Byte string constants support the same
|
|
escape sequences as character strings, except @litchar["\\u"] and
|
|
@litchar["\\U"].
|
|
|
|
When the reader encounters @as-index{@litchar{#<<}}, it starts parsing a
|
|
@pidefterm{here string}. The characters following @litchar{#<<} until
|
|
a newline character define a terminator for the string. The content of
|
|
the string includes all characters between the @litchar{#<<} line and
|
|
a line whose only content is the specified terminator. More precisely,
|
|
the content of the string starts after a newline following
|
|
@litchar{#<<}, and it ends before a newline that is followed by the
|
|
terminator, where the terminator is itself followed by either a
|
|
newline or end-of-file. No escape sequences are recognized between the
|
|
starting and terminating lines; all characters are included in the
|
|
string (and terminator) literally. A return character is not treated
|
|
as a line separator in this context. If no characters appear between
|
|
@litchar{#<<} and a newline or end-of-file, or if an end-of-file is
|
|
encountered before a terminating line, the @exnraise[exn:fail:read].
|
|
|
|
@reader-examples[
|
|
"\"Apple\""
|
|
"\"\\x41pple\""
|
|
"\"\\\"Apple\\\"\""
|
|
"\"\\\\\""
|
|
"#\"Apple\""
|
|
]
|
|
|
|
@subsection[#:tag "parse-quote"]{Reading Quotes}
|
|
|
|
When the reader enounters @as-index{@litchar{'}}, then it recursively
|
|
reads one datum, and it forms a new list containing the symbol
|
|
@scheme['quote] and the following datum. This convention is mainly
|
|
useful for reading Scheme code, where @scheme['s] can be used as a
|
|
shorthand for @scheme[(code:quote s)].
|
|
|
|
Several other sequences are recognized and transformed in a similar
|
|
way. Longer prefixes take precedence over short ones:
|
|
|
|
@read-quote-table[(list @litchar{'} @scheme[quote])
|
|
(list @as-index{@litchar{`}} @scheme[quasiquote])
|
|
(list @as-index{@litchar{,}} @scheme[unquote])
|
|
(list @as-index{@litchar[",@"]} @scheme[unquote-splicing])
|
|
(list @as-index{@litchar{#'}} @scheme[syntax])
|
|
(list @as-index{@litchar{#`}} @scheme[quasisyntax])
|
|
(list @as-index{@litchar{#,}} @scheme[unsyntax])
|
|
(list @as-index{@litchar["#,@"]} @scheme[unsyntax-splicing])]
|
|
|
|
@reader-examples
|
|
[
|
|
"'apple"
|
|
"`(1 ,(+ 2 3))"
|
|
]
|
|
|
|
@subsection[#:tag "parse-comment"]{Reading Comments}
|
|
|
|
A @as-index{@litchar{;}} starts a line comment. When the reader
|
|
encounters @litchar{;}, then it skips past all characters until the
|
|
next linefeed or carriage return.
|
|
|
|
A @litchar["#|"] starts a nestable block comment. When the reader
|
|
encounters @litchar["#|"], then it skips past all characters until a
|
|
closing @litchar["|#"]. Pairs of matching @litchar["#|"] and
|
|
@litchar["|#"] can be nested.
|
|
|
|
A @litchar{#;} starts an S-expression comment. Then the reader
|
|
encounters @litchar{#;}, it recursively reads one datum, and then
|
|
discards the datum (continuing on to the next datum for the read
|
|
result).
|
|
|
|
@reader-examples
|
|
[
|
|
"; comment"
|
|
"#| a |# 1"
|
|
"#| #| a |# 1 |# 2"
|
|
"#;1 2"
|
|
]
|
|
|
|
@subsection[#:tag "parse-vector"]{Reading Vectors}
|
|
|
|
When the reader encounters a @litchar{#(}, @litchar{#[}, or
|
|
@litchar["#{"], it starts parsing a vector; see @secref["vectors"] for
|
|
information on vectors.
|
|
|
|
The elements of the vector are recursively read until a matching
|
|
@litchar{)}, @litchar{]}, or @litchar["}"] is found, just as for
|
|
lists (see @secref["parse-pair"]). A delimited @litchar{.} is not
|
|
allowed among the vector elements.
|
|
|
|
An optional vector length can be specified between the @litchar{#} and
|
|
@litchar["("], @litchar["["], or @litchar["{"]. The size is specified
|
|
using a sequence of decimal digits, and the number of elements
|
|
provided for the vector must be no more than the specified size. If
|
|
fewer elements are provided, the last provided element is used for the
|
|
remaining vector slots; if no elements are provided, then @scheme[0]
|
|
is used for all slots.
|
|
|
|
In @scheme[read-syntax] mode, each recursive read for the vector
|
|
elements is also in @scheme[read-syntax] mode, so that the wrapped
|
|
vector's elements are also wraped as syntax objects.
|
|
|
|
@reader-examples
|
|
[
|
|
"#(1 apple 3)"
|
|
"#3(\"apple\" \"banana\")"
|
|
"#3()"
|
|
]
|
|
|
|
@subsection[#:tag "parse-hashtable"]{Reading Hash Tables}
|
|
|
|
A @litchar{#hash} starts an immutable hash-table constant with key
|
|
matching based on @scheme[equal?]. The characters after @litchar{hash}
|
|
must parse as a list of pairs (see @secref["parse-pair"]) with a
|
|
specific use of delimited @litchar{.}: it must appear between the
|
|
elements of each pair in the list, and nowhere in the sequence of list
|
|
elements. The first element of each pair is used as the key for a
|
|
table entry, and the second element of each pair is the associated
|
|
value.
|
|
|
|
A @litchar{#hasheq} starts a hash table like @litchar{#hash}, except
|
|
that it constructs a hash table based on @scheme[eq?] instead of
|
|
@scheme[equal?].
|
|
|
|
In either case, the table is constructed by adding each mapping to the
|
|
hash table from left to right, so later mappings can hide earlier
|
|
mappings if the keys are equivalent.
|
|
|
|
@reader-examples
|
|
[
|
|
"#hash()"
|
|
"#hasheq()"
|
|
"#hash((\"a\" . 5))"
|
|
"#hasheq((a . 5) (b . 7))"
|
|
"#hasheq((a . 5) (a . 7))"
|
|
]
|
|
|
|
@subsection[#:tag "parse-box"]{Reading Boxes}
|
|
|
|
When the reader encounters a @litchar{#&}, it starts parsing a box;
|
|
see @secref["boxes"] for information on boxes. The content of the box
|
|
is determined by recursively reading the next datum.
|
|
|
|
In @scheme[read-syntax] mode, the recursive read for the box content
|
|
is also in @scheme[read-syntax] mode, so that the wrapped box's
|
|
content is also wraped as a syntax object.
|
|
|
|
@reader-examples
|
|
[
|
|
"#&17"
|
|
]
|
|
|
|
@subsection[#:tag "parse-character"]{Reading Characters}
|
|
|
|
A @litchar["#\\"] starts a character constant, which has one of the
|
|
following forms:
|
|
|
|
@itemize{
|
|
|
|
@item{ @litchar["#\\nul"] or @litchar["#\null"]: NUL (ASCII 0)@nonalpha[]}
|
|
@item{ @litchar["#\\backspace"]: backspace (ASCII 8)@nonalpha[]}
|
|
@item{ @litchar["#\\tab"]: tab (ASCII 9)@nonalpha[]}
|
|
@item{ @litchar["#\\newline"] or @litchar["#\\linefeed"]: linefeed (ASCII 10)@nonalpha[]}
|
|
@item{ @litchar["#\\vtab"]: vertical tab (ASCII 11)@nonalpha[]}
|
|
@item{ @litchar["#\\page"]: page break (ASCII 12)@nonalpha[]}
|
|
@item{ @litchar["#\\return"]: carriage return (ASCII 13)@nonalpha[]}
|
|
@item{ @litchar["#\\space"]: space (ASCII 32)@nonalpha[]}
|
|
@item{ @litchar["#\\rubout"]: delete (ASCII 127)@nonalpha[]}
|
|
|
|
@item{@litchar["#\\"]@kleenerange[1 3]{@nonterm{digit@sub{8}}}:
|
|
Unicode for the octal number specified by @kleenerange[1
|
|
3]{@nonterm{digit@sub{8}}}, as in string escapes (see
|
|
@secref["parse-string"]).}
|
|
|
|
@item{@litchar["#\\x"]@kleenerange[1 2]{@nonterm{digit@sub{16}}}:
|
|
Unicode for the hexadecimal number specified by @kleenerange[1
|
|
2]{@nonterm{digit@sub{16}}}, as in string escapes (see
|
|
@secref["parse-string"]).}
|
|
|
|
@item{@litchar["#\\u"]@kleenerange[1 4]{@nonterm{digit@sub{16}}}:
|
|
like @litchar["#\\x"], but with up to four hexadecimal digits.}
|
|
|
|
@item{@litchar["#\\U"]@kleenerange[1 6]{@nonterm{digit@sub{16}}}:
|
|
like @litchar["#\\x"], but with up to six hexadecimal digits.}
|
|
|
|
@item{@litchar["#\\"]@nonterm{c}: the character @nonterm{c}, as long
|
|
as @litchar["#\\"]@nonterm{c} and the characters following it
|
|
do not match any of the previous cases, and as long as the
|
|
character after @nonterm{c} is not
|
|
@schemelink[char-alphabetic?]{alphabetic}.}
|
|
|
|
}
|
|
|
|
@reader-examples
|
|
[
|
|
"#\\newline"
|
|
"#\\n"
|
|
"#\\u3BB"
|
|
"#\\\u3BB"
|
|
]
|
|
|
|
@subsection[#:tag "parse-keyword"]{Reading Keywords}
|
|
|
|
A @litchar{#:} starts a keyword. The parsing of a keyword after the
|
|
@litchar{#:} is the same as for a symbol, including case-folding in
|
|
case-insensitive mode, except that the part after @litchar{#:} is
|
|
never parsed as a number.
|
|
|
|
@reader-examples
|
|
[
|
|
"#:Apple"
|
|
"#:1"
|
|
]
|
|
|
|
@subsection[#:tag "parse-regexp"]{Reading Regular Expressions}
|
|
|
|
A @litchar{#rx} or @litchar{#px} starts a regular expression. The
|
|
characters immediately after @litchar{#rx} or @litchar{#px} must parse
|
|
as a string or byte string (see @secref["parse-string"]). A
|
|
@litchar{#rx} prefix starts a regular expression as would be
|
|
constructed by @scheme[regexp], @litchar{#px} as
|
|
constructed by @scheme[pregexp], @litchar{#rx#} as
|
|
constructed by @scheme[byte-regexp], and @litchar{#px#} as
|
|
constructed by @scheme[byte-pregexp].
|
|
|
|
@reader-examples
|
|
[
|
|
"#rx\".*\""
|
|
"#px\"[\\\\s]*\""
|
|
"#rx#\".*\""
|
|
"#px#\"[\\\\s]*\""
|
|
]
|
|
|
|
@subsection[#:tag "parse-graph"]{Reading Graph Structure}
|
|
|
|
A @graph-defn[] tags the following datum for reference via
|
|
@graph-ref[], which allows the reader to produce a datum that
|
|
have graph structure.
|
|
|
|
For a specific @graph-tag in a single read result, each @graph-ref[]
|
|
reference is replaced by the datum read for the corresponding
|
|
@graph-defn[]; the definition @graph-defn[] also produces just the
|
|
datum after it. A @graph-defn[] definition can appear at most
|
|
once, and a @graph-defn[] definition must appear before a @graph-ref[]
|
|
reference appears.
|
|
|
|
Although a comment parsed via @litchar{#;} discards the datum
|
|
afterward, @graph-defn[] definitions in the discarded datum
|
|
still can be referenced by other parts of the reader input, as long as
|
|
both the comment and the reference are grouped together by some other
|
|
form (i.e., some recursive read); a top-level @litchar{#;} comment
|
|
neither defines nor uses graph tags for other top-level forms.
|
|
|
|
@reader-examples
|
|
[
|
|
"(#1=100 #1# #1#)"
|
|
"#0=(1 . #0#)"
|
|
]
|
|
|
|
@subsection[#:tag "parse-reader"]{Reading via an External Reader}
|
|
|
|
When the reader encounters @litchar{#reader}, then it loads an
|
|
external reader procedure and applies it to the current input stream.
|
|
|
|
The reader recursively reads the next datum after @litchar{#reader},
|
|
and passes it to the procedure that is the value of the
|
|
@scheme[current-reader-guard] parameter; the result is used as a
|
|
module path. The module path is passed to @scheme[dynamic-require]
|
|
with either @scheme['read] or @scheme['read-syntax] (depending on
|
|
whether the reader is in @scheme[read] or @scheme[read-syntax]
|
|
mode).
|
|
|
|
The resulting procedure should accept the same arguments as
|
|
@scheme[read] or @scheme[read-syntax] in the case thar all optional
|
|
arguments are provided. The procedure is given the port whose stream
|
|
contained @litchar{#reader}, and it should produce a datum result. If
|
|
the result is a syntax object in @scheme[read] mode, then it is
|
|
converted to a datum using @scheme[syntax-object->datum]; if the
|
|
result is not a syntax object in @scheme[read-syntax] mode, then it is
|
|
converted to one using @scheme[datum->syntax-object]. See also
|
|
@secref["special-comments"] and @secref["recursive-reads"] for
|
|
information on special-comment results and recursive reads.
|
|
|
|
If the @scheme[read-accept-reader] parameter is set to @scheme[#f],
|
|
then if the reader encounters @litchar{#reader}, the
|
|
@exnraise[exn:fail:read].
|