889 lines
38 KiB
Racket
889 lines
38 KiB
Racket
#lang scribble/doc
|
|
@(require "mz.ss"
|
|
scribble/bnf
|
|
"reader-example.ss")
|
|
@(begin
|
|
(define (ilitchar s)
|
|
(litchar s))
|
|
(define (nunterm s)
|
|
(nonterm s (subscript "n")))
|
|
(define (sub n) (subscript n))
|
|
(define (nonalpha)
|
|
@elem{; the next character must not be @racketlink[char-alphabetic?]{alphabetic}.})
|
|
(define (graph-tag) @kleenerange[1 8]{@nonterm{digit@sub{10}}})
|
|
(define (graph-defn) @elem{@litchar{#}@graph-tag[]@litchar{=}})
|
|
(define (graph-ref) @elem{@litchar{#}@graph-tag[]@litchar{#}}))
|
|
|
|
@title[#:tag "reader" #:style 'quiet]{The Reader}
|
|
|
|
Racket's reader is a recursive-descent parser that can be configured
|
|
through a @seclink["readtables"]{readtable} and various other
|
|
@tech{parameters}. This section describes the reader's parsing when
|
|
using the default readtable.
|
|
|
|
Reading from a stream produces one @deftech{datum}. If the result
|
|
datum is a compound value, then reading the datum typically requires
|
|
the reader to call itself recursively to read the component data.
|
|
|
|
The reader can be invoked in either of two modes: @racket[read] mode,
|
|
or @racket[read-syntax] mode. In @racket[read-syntax] mode, the result
|
|
is always a @techlink{syntax object} that includes
|
|
source-location and (initially empty) lexical information wrapped
|
|
around the sort of datum that @racket[read] mode would produce. In the
|
|
case of pairs, vectors, and boxes, the content is also
|
|
wrapped recursively as a syntax object. Unless specified otherwise,
|
|
this section describes the reader's behavior in @racket[read] mode,
|
|
and @racket[read-syntax] mode does the same modulo wrapping of the final
|
|
result.
|
|
|
|
Reading is defined in terms of Unicode characters; see
|
|
@secref["ports"] for information on how a byte stream is converted
|
|
to a character stream.
|
|
|
|
@;------------------------------------------------------------------------
|
|
@section[#:tag "default-readtable-dispatch"]{Delimiters and Dispatch}
|
|
|
|
Along with @racketlink[char-whitespace?]{whitespace}, the following
|
|
characters are @defterm{delimiters}:
|
|
|
|
@t{
|
|
@hspace[2] @ilitchar{(} @ilitchar{)} @ilitchar{[} @ilitchar{]}
|
|
@ilitchar["{"] @ilitchar["}"]
|
|
@ilitchar{"} @ilitchar{,} @ilitchar{'} @ilitchar{`}
|
|
@ilitchar{;}
|
|
}
|
|
|
|
A delimited sequence that starts with any other character is typically
|
|
parsed as either a symbol or number, but a few non-delimiter
|
|
characters play special roles:
|
|
|
|
@itemize[
|
|
|
|
@item{@litchar{#} has a special meaning as an initial character in a
|
|
delimited sequence; its meaning depends on the characters that
|
|
follow; see below.}
|
|
|
|
@item{@as-index{@litchar{|}} starts a subsequence of characters to
|
|
be included verbatim in the delimited sequence (i.e., they are
|
|
never treated as delimiters, and they are not case-folded when
|
|
case-insensitivity is enabled); the subsequence is terminated
|
|
by another @litchar{|}, and neither the initial nor
|
|
terminating @litchar{|} is part of the subsequence.}
|
|
|
|
@item{@as-index{@litchar{\}} outside of a @litchar{|} pair causes
|
|
the following character to be included verbatim in a delimited
|
|
sequence.}
|
|
|
|
]
|
|
|
|
More precisely, after skipping whitespace, the reader dispatches based
|
|
on the next character or characters in the input stream as follows:
|
|
|
|
@dispatch-table[
|
|
|
|
@dispatch[@litchar{(}]{starts a pair or list; see @secref["parse-pair"]}
|
|
@dispatch[@litchar{[}]{starts a pair or list; see @secref["parse-pair"]}
|
|
@dispatch[@litchar["{"]]{starts a pair or list; see @secref["parse-pair"]}
|
|
|
|
@dispatch[@litchar{)}]{matches @litchar{(} or raises @Exn[exn:fail:read]}
|
|
@dispatch[@litchar{]}]{matches @litchar{[} or raises @Exn[exn:fail:read]}
|
|
@dispatch[@litchar["}"]]{matches @litchar["{"] or raises @Exn[exn:fail:read]}
|
|
|
|
@dispatch[@litchar{"}]{starts a string; see @secref["parse-string"]}
|
|
@dispatch[@litchar{'}]{starts a quote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{`}]{starts a quasiquote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{,}]{starts a [splicing] unquote; see @secref["parse-quote"]}
|
|
|
|
@dispatch[@litchar{;}]{starts a line comment; see @secref["parse-comment"]}
|
|
|
|
@dispatch[@cilitchar{#t}]{true; see @secref["parse-boolean"]}
|
|
@dispatch[@cilitchar{#f}]{false; see @secref["parse-boolean"]}
|
|
|
|
@dispatch[@litchar{#(}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@litchar{#[}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@litchar["#{"]]{starts a vector; see @secref["parse-vector"]}
|
|
|
|
@dispatch[@litchar{#s(}]{starts a structure literal; see @secref["parse-structure"]}
|
|
@dispatch[@litchar{#s[}]{starts a structure literal; see @secref["parse-structure"]}
|
|
@dispatch[@litchar["#s{"]]{starts a structure literal; see @secref["parse-structure"]}
|
|
|
|
@dispatch[@litchar{#\}]{starts a character; see @secref["parse-character"]}
|
|
|
|
@dispatch[@litchar{#"}]{starts a byte string; see @secref["parse-string"]}
|
|
@dispatch[@litchar{#%}]{starts a symbol; see @secref["parse-symbol"]}
|
|
@dispatch[@litchar{#:}]{starts a keyword; see @secref["parse-keyword"]}
|
|
@dispatch[@litchar{#&}]{starts a box; see @secref["parse-box"]}
|
|
|
|
@dispatch[@litchar{#|}]{starts a block comment; see @secref["parse-comment"]}
|
|
@dispatch[@litchar{#;}]{starts an S-expression comment; see @secref["parse-comment"]}
|
|
@dispatch[@litchar{#'}]{starts a syntax quote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{#! }]{starts a line comment; see @secref["parse-comment"]}
|
|
@dispatch[@litchar{#!/}]{starts a line comment; see @secref["parse-comment"]}
|
|
@dispatch[@litchar{#!}]{may start a reader extension; see @secref["parse-reader"]}
|
|
@dispatch[@litchar{#`}]{starts a syntax quasiquote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{#,}]{starts a syntax [splicing] unquote; see @secref["parse-quote"]}
|
|
@dispatch[@litchar{#~}]{starts compiled code; see @secref["print-compiled"]}
|
|
|
|
@dispatch[@cilitchar{#i}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#e}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#x}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#o}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#d}]{starts a number; see @secref["parse-number"]}
|
|
@dispatch[@cilitchar{#b}]{starts a number; see @secref["parse-number"]}
|
|
|
|
@dispatch[@cilitchar["#<<"]]{starts a string; see @secref["parse-string"]}
|
|
|
|
@dispatch[@litchar{#rx}]{starts a regular expression; see @secref["parse-regexp"]}
|
|
@dispatch[@litchar{#px}]{starts a regular expression; see @secref["parse-regexp"]}
|
|
|
|
@dispatch[@cilitchar{#ci}]{switches case sensitivity; see @secref["parse-symbol"]}
|
|
@dispatch[@cilitchar{#cs}]{switches case sensitivity; see @secref["parse-symbol"]}
|
|
|
|
@dispatch[@cilitchar["#sx"]]{starts a Racket expression; see @secref["parse-honu"]}
|
|
|
|
@dispatch[@litchar{#hx}]{starts a Honu expression; see @secref["parse-honu"]}
|
|
|
|
@dispatch[@litchar{#hash}]{starts a hash table; see @secref["parse-hashtable"]}
|
|
|
|
@dispatch[@litchar{#reader}]{starts a reader extension use; see @secref["parse-reader"]}
|
|
@dispatch[@litchar{#lang}]{starts a reader extension use; see @secref["parse-reader"]}
|
|
|
|
@dispatch[@elem{@litchar{#}@kleeneplus{@nonterm{digit@sub{10}}}@litchar{(}}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@elem{@litchar{#}@kleeneplus{@nonterm{digit@sub{10}}}@litchar{[}}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@elem{@litchar{#}@kleeneplus{@nonterm{digit@sub{10}}}@litchar["{"]}]{starts a vector; see @secref["parse-vector"]}
|
|
@dispatch[@graph-defn[]]{binds a graph tag; see @secref["parse-graph"]}
|
|
@dispatch[@graph-ref[]]{uses a graph tag; see @secref["parse-graph"]}
|
|
|
|
@dispatch[@italic{otherwise}]{starts a symbol; see @secref["parse-symbol"]}
|
|
|
|
]
|
|
|
|
|
|
@section[#:tag "parse-symbol"]{Reading Symbols}
|
|
|
|
@guideintro["symbols"]{the syntax of symbols}
|
|
|
|
A sequence that does not start with a delimiter or @litchar{#} is
|
|
parsed as either a symbol or a number (see
|
|
@secref["parse-number"]), except that @litchar{.} by itself is
|
|
never parsed as a symbol or character (unless the
|
|
@racket[read-accept-dot] parameter is set to @racket[#f]). A
|
|
@as-index{@litchar{#%}} also starts a symbol. A successful number
|
|
parse takes precedence over a symbol parse.
|
|
|
|
@index["case-sensitivity"]{@index["case-insensitive"]{When}} the
|
|
@racket[read-case-sensitive] @tech{parameter} is set to @racket[#f],
|
|
characters in the sequence that are not quoted by @litchar{|} or
|
|
@litchar{\} are first case-normalized. If the reader encounters
|
|
@as-index{@litchar{#ci}}, @litchar{#CI}, @litchar{#Ci}, or
|
|
@litchar{#cI}, then it recursively reads the following datum in
|
|
case-insensitive mode. If the reader encounters
|
|
@as-index{@litchar{#cs}}, @litchar{#CS}, @litchar{#Cs}, or
|
|
@litchar{#cS}, then it recursively reads the following datum in
|
|
case-sensitive mode.
|
|
|
|
@reader-examples[#:symbols? #f
|
|
"Apple"
|
|
"Ap#ple"
|
|
"Ap ple"
|
|
"Ap| |ple"
|
|
"Ap\\ ple"
|
|
"#ci Apple"
|
|
"#ci |A|pple"
|
|
"#ci \\Apple"
|
|
"#ci#cs Apple"
|
|
"#%Apple"
|
|
]
|
|
|
|
@section[#:tag "parse-number"]{Reading Numbers}
|
|
|
|
@guideintro["numbers"]{the syntax of numbers}
|
|
|
|
@section-index["numbers" "parsing"]
|
|
|
|
A sequence that does not start with a delimiter is parsed as a number
|
|
when it matches the following grammar case-insenstively for
|
|
@nonterm{number@sub{10}} (decimal), where @metavar{n} is a
|
|
meta-meta-variable in the grammar.
|
|
|
|
A number is optionally prefixed by an exactness specifier,
|
|
@as-index{@litchar{#e}} (exact) or @as-index{@litchar{#i}} (inexact),
|
|
which specifies its parsing as an exact or inexact number; see
|
|
@secref["numbers"] for information on number exactness. As the
|
|
non-terminal names suggest, a number that has no exactness specifier
|
|
and matches only @nunterm{inexact-number} is normally parsed as an
|
|
inexact number, otherwise it is parsed as an exact number. If the
|
|
@racket[read-decimal-as-inexact] @tech{parameter} is set to @racket[#f], then
|
|
all numbers without an exactness specifier are instead parsed as
|
|
exact.
|
|
|
|
If the reader encounters @as-index{@litchar{#b}} (binary),
|
|
@as-index{@litchar{#o}} (octal), @as-index{@litchar{#d}} (decimal), or
|
|
@as-index{@litchar{#x}} (hexadecimal), it must be followed by a
|
|
sequence that is terminated by a delimiter or end-of-file, and that
|
|
matches the @nonterm{general-number@sub{2}},
|
|
@nonterm{general-number@sub{8}}, @nonterm{general-number@sub{10}}, or
|
|
@nonterm{general-number@sub{16}} grammar, respectively.
|
|
|
|
An @nunterm{exponent-mark} in an inexact number serves both to specify
|
|
an exponent and to specify a numerical precision. If single-precision
|
|
IEEE floating point is supported (see @secref["numbers"]), the marks
|
|
@litchar{f} and @litchar{s} specify single-precision. Otherwise, or
|
|
with any other mark, double-precision IEEE floating point is used.
|
|
In addition, single- and double-precision specials are distinct;
|
|
specials with the @litchar{.0} suffix, like @racket[-nan.0] are
|
|
double-precision, whereas specials with the @litchar{.f} suffix are
|
|
single-precision.
|
|
|
|
@BNF[(list @nunterm{number} @BNF-alt[@nunterm{exact}
|
|
@nunterm{inexact}])
|
|
(list @nunterm{exact} @BNF-alt[@nunterm{exact-integer}
|
|
@nunterm{exact-rational}]
|
|
@nunterm{exact-complex})
|
|
(list @nunterm{exact-integer} @BNF-seq[@optional{@nonterm{sign}} @nunterm{digits}])
|
|
(list @nunterm{digits} @kleeneplus{@nunterm{digit}})
|
|
(list @nunterm{exact-rational} @BNF-seq[@nunterm{exact-integer} @litchar{/} @nunterm{unsigned-integer}])
|
|
(list @nunterm{exact-complex} @BNF-seq[@nunterm{exact-rational} @nonterm{sign} @nunterm{exact-rational} @litchar{i}])
|
|
(list @nunterm{inexact} @BNF-alt[@nunterm{inexact-real}
|
|
@nunterm{inexact-complex}])
|
|
(list @nunterm{inexact-real} @BNF-seq[@optional{@nonterm{sign}} @nunterm{inexact-normal}]
|
|
@BNF-seq[@nonterm{sign} @nunterm{inexact-special}])
|
|
(list @nunterm{inexact-unsigned} @BNF-alt[@nunterm{inexact-normal}
|
|
@nunterm{inexact-special}])
|
|
(list @nunterm{inexact-normal} @BNF-seq[@nunterm{inexact-simple} @optional{@nunterm{exp-mark}
|
|
@optional[@nonterm{sign}] @nunterm{digits#}}])
|
|
(list @nunterm{inexact-simple} @BNF-seq[@nunterm{digits#} @optional{@litchar{.}} @kleenestar{@litchar{#}}]
|
|
@BNF-seq[@optional{@nunterm{exact-integer}} @litchar{.} @nunterm{digits#}]
|
|
@BNF-seq[@nunterm{digits#} @litchar{/} @nunterm{digits#}])
|
|
(list @nunterm{inexact-special} @BNF-alt[@litchar{inf.0} @litchar{nan.0} @litchar{inf.f} @litchar{nan.f}])
|
|
(list @nunterm{digits#} @BNF-seq[@kleeneplus{@nunterm{digit}} @kleenestar{@litchar{#}}])
|
|
(list @nunterm{inexact-complex} @BNF-seq[@optional{@nunterm{inexact-real}} @nonterm{sign} @nunterm{inexact-unsigned} @litchar{i}]
|
|
@BNF-seq[@nunterm{inexact-real} @litchar["@"] @nunterm{inexact-real}])
|
|
|
|
|
|
(list @nonterm{sign} @BNF-alt[@litchar{+}
|
|
@litchar{-}])
|
|
(list @nonterm{digit@sub{16}} @BNF-alt[@nonterm{digit@sub{10}} @litchar{a} @litchar{b} @litchar{c} @litchar{d}
|
|
@litchar{e} @litchar{f}])
|
|
(list @nonterm{digit@sub{10}} @BNF-alt[@nonterm{digit@sub{8}} @litchar{8} @litchar{9}])
|
|
(list @nonterm{digit@sub{8}} @BNF-alt[@nonterm{digit@sub{2}} @litchar{2} @litchar{3}
|
|
@litchar{4} @litchar{5} @litchar{6} @litchar{7}])
|
|
(list @nonterm{digit@sub{2}} @BNF-alt[@litchar{0} @litchar{1}])
|
|
(list @nonterm{exp-mark@sub{16}} @BNF-alt[@litchar{s} @litchar{d} @litchar{l}])
|
|
(list @nonterm{exp-mark@sub{10}} @BNF-alt[@nonterm{exp-mark@sub{16}} @litchar{e} @litchar{f}])
|
|
(list @nonterm{exp-mark@sub{8}} @nonterm{exp-mark@sub{10}})
|
|
(list @nonterm{exp-mark@sub{2}} @nonterm{exp-mark@sub{10}})
|
|
(list @nunterm{general-number} @BNF-seq[@optional{@nonterm{exactness}} @nunterm{number}])
|
|
(list @nonterm{exactness} @BNF-alt[@litchar{#e} @litchar{#i}])
|
|
]
|
|
|
|
@reader-examples[
|
|
"-1"
|
|
"1/2"
|
|
"1.0"
|
|
"1+2i"
|
|
"1/2+3/4i"
|
|
"1.0+3.0e7i"
|
|
"2e5"
|
|
"#i5"
|
|
"#e2e5"
|
|
"#x2e5"
|
|
"#b101"
|
|
]
|
|
|
|
@section[#:tag "parse-boolean"]{Reading Booleans}
|
|
|
|
A @as-index{@litchar{#true}}, @as-index{@litchar{#t}},
|
|
@as-index{@litchar{#T}} followed by a delimiter is the input syntax
|
|
for the boolean constant ``true,'' and @as-index{@litchar{#false}},
|
|
@as-index{@litchar{#f}}, or @as-index{@litchar{#F}} followed by a
|
|
delimiter is the complete input syntax for the boolean constant
|
|
``false.''
|
|
|
|
|
|
@section[#:tag "parse-pair"]{Reading Pairs and Lists}
|
|
|
|
When the reader encounters a @as-index{@litchar{(}},
|
|
@as-index{@litchar{[}}, or @as-index{@litchar["{"]}, it starts
|
|
parsing a pair or list; see @secref["pairs"] for information on pairs
|
|
and lists.
|
|
|
|
To parse the pair or list, the reader recursively reads data
|
|
until a matching @as-index{@litchar{)}}, @as-index{@litchar{]}}, or
|
|
@as-index{@litchar["}"]} (respectively) is found, and it specially handles
|
|
a delimited @litchar{.}. Pairs @litchar{()}, @litchar{[]}, and
|
|
@litchar{{}} are treated the same way, so the remainder of this
|
|
section simply uses ``parentheses'' to mean any of these pair.
|
|
|
|
If the reader finds no delimited @as-index{@litchar{.}} among the elements
|
|
between parentheses, then it produces a list containing the results of
|
|
the recursive reads.
|
|
|
|
If the reader finds two data between the matching parentheses
|
|
that are separated by a delimited @litchar{.}, then it creates a
|
|
pair. More generally, if it finds two or more data where the
|
|
last datum is preceded by a delimited @litchar{.}, then it constructs
|
|
nested pairs: the next-to-last element is paired with the last, then
|
|
the third-to-last datum is paired with that pair, and so on.
|
|
|
|
If the reader finds three or more data between the matching
|
|
parentheses, and if a pair of delimited @litchar{.}s surrounds any
|
|
other than the first and last elements, the result is a list
|
|
containing the element surrounded by @litchar{.}s as the first
|
|
element, followed by the others in the read order. This convention
|
|
supports a kind of @as-index{infix} notation at the reader level.
|
|
|
|
In @racket[read-syntax] mode, the recursive reads for the pair/list
|
|
elements are themselves in @racket[read-syntax] mode, so that the
|
|
result is a list or pair of syntax objects that is itself wrapped as a
|
|
syntax object. If the reader constructs nested pairs because the input
|
|
included a single delimited @litchar{.}, then only the innermost pair
|
|
and outtermost pair are wrapped as syntax objects. Whether wrapping a
|
|
pair or list, if the pair or list was formed with @litchar{[} and
|
|
@litchar{]}, then a @indexed-racket['paren-shape] property is attached
|
|
to the result with the value @racket[#\[]; if the list or pair was
|
|
formed with @litchar["{"] and @litchar["}"], then a
|
|
@racket['paren-shape] property is attached to the result with the
|
|
value @racket[#\{].
|
|
|
|
If a delimited @litchar{.} appears in any other configuration, then
|
|
the @exnraise[exn:fail:read]. Similarly, if the reader encounters a
|
|
@litchar{)}, @litchar{]}, or @litchar["}"] that does not end a list
|
|
being parsed, then the @exnraise[exn:fail:read].
|
|
|
|
@reader-examples[
|
|
"()"
|
|
"(1 2 3)"
|
|
"{1 2 3}"
|
|
"[1 2 3]"
|
|
"(1 (2) 3)"
|
|
"(1 . 3)"
|
|
"(1 . (3))"
|
|
"(1 . 2 . 3)"
|
|
]
|
|
|
|
If the @racket[read-square-bracket-as-paren] @tech{parameter} is set to
|
|
@racket[#f], then when the reader encounters @litchar{[} and
|
|
@litchar{]}, the @exnraise{exn:fail:read}. Similarly, if the
|
|
@racket[read-curly-brace-as-paren] @tech{parameter} is set to @racket[#f],
|
|
then when the reader encounters @litchar["{"] and @litchar["}"], the
|
|
@exnraise{exn:fail:read}.
|
|
|
|
If the @racket[read-accept-dot] @tech{parameter} is set to
|
|
@racket[#f], then a delimited @litchar{.} triggers an
|
|
@racket[exn:fail:read] exception. If the
|
|
@racket[read-accept-infix-dot] @tech{parameter} is set to @racket[#f],
|
|
then multiple delimited @litchar{.}s trigger an @racket[exn:fail:read]
|
|
exception, instead of the infix conversion.
|
|
|
|
@section[#:tag "parse-string"]{Reading Strings}
|
|
|
|
@guideintro["strings"]{the syntax of strings}
|
|
|
|
@section-index["strings" "parsing"]
|
|
|
|
When the reader encounters @as-index{@litchar{"}}, it begins parsing
|
|
characters to form a string. The string continues until it is
|
|
terminated by another @litchar{"} (that is not escaped by
|
|
@litchar{\}).
|
|
|
|
Within a string sequence, the following escape sequences are
|
|
recognized:
|
|
|
|
@itemize[
|
|
|
|
@item{@as-index{@litchar{\a}}: alarm (ASCII 7)}
|
|
@item{@as-index{@litchar{\b}}: backspace (ASCII 8)}
|
|
@item{@as-index{@litchar{\t}}: tab (ASCII 9)}
|
|
@item{@as-index{@litchar{\n}}: linefeed (ASCII 10)}
|
|
@item{@as-index{@litchar{\v}}: vertical tab (ASCII 11)}
|
|
@item{@as-index{@litchar{\f}}: formfeed (ASCII 12)}
|
|
@item{@as-index{@litchar{\r}}: return (ASCII 13)}
|
|
@item{@as-index{@litchar{\e}}: escape (ASCII 27)}
|
|
|
|
@item{@as-index{@litchar{\"}}: double-quotes (without terminating the string)}
|
|
@item{@as-index{@litchar{\'}}: quote (i.e., the backslash has no effect)}
|
|
@item{@as-index{@litchar{\\}}: backslash (i.e., the second is not an escaping backslash)}
|
|
|
|
@item{@as-index{@litchar{\}@kleenerange[1 3]{@nonterm{digit@sub{8}}}}:
|
|
Unicode for the octal number specified by @kleenerange[1
|
|
3]{digit@sub{8}} (i.e., 1 to 3 @nonterm{digit@sub{8}}s), where
|
|
each @nonterm{digit@sub{8}} is @litchar{0}, @litchar{1},
|
|
@litchar{2}, @litchar{3}, @litchar{4}, @litchar{5},
|
|
@litchar{6}, or @litchar{7}. A longer form takes precedence
|
|
over a shorter form, and the resulting octal number must be
|
|
between 0 and 255 decimal, otherwise the
|
|
@exnraise[exn:fail:read].}
|
|
|
|
@item{@as-index{@litchar{\x}@kleenerange[1
|
|
2]{@nonterm{digit@sub{16}}}}: Unicode for the hexadecimal
|
|
number specified by @kleenerange[1 2]{@nonterm{digit@sub{16}}},
|
|
where each @nonterm{digit@sub{16}} is @litchar{0}, @litchar{1},
|
|
@litchar{2}, @litchar{3}, @litchar{4}, @litchar{5},
|
|
@litchar{6}, @litchar{7}, @litchar{8}, @litchar{9},
|
|
@litchar{a}, @litchar{b}, @litchar{c}, @litchar{d},
|
|
@litchar{e}, or @litchar{f} (case-insensitive). The longer form
|
|
takes precedence over the shorter form.}
|
|
|
|
@item{@as-index{@litchar{\u}@kleenerange[1
|
|
4]{@nonterm{digit@sub{16}}}}: like @litchar{\x}, but with up
|
|
to four hexadecimal digits (longer sequences take precedence).
|
|
The resulting hexadecimal number must be a valid argument to
|
|
@racket[integer->char], otherwise the
|
|
@exnraise[exn:fail:read].}
|
|
|
|
@item{@as-index{@litchar{\U}@kleenerange[1
|
|
8]{@nonterm{digit@sub{16}}}}: like @litchar{\x}, but with up
|
|
to eight hexadecimal digits (longer sequences take precedence).
|
|
The resulting hexadecimal number must be a valid argument to
|
|
@racket[integer->char], otherwise the
|
|
@exnraise[exn:fail:read].}
|
|
|
|
@item{@as-index{@litchar{\}@nonterm{newline}}: elided, where
|
|
@nonterm{newline} is either a linefeed, carriage return, or
|
|
carriage return--linefeed combination. This convention allows
|
|
single-line strings to span multiple lines in the source.}
|
|
|
|
]
|
|
|
|
If the reader encounteres any other use of a backslash in a string
|
|
constant, the @exnraise[exn:fail:read].
|
|
|
|
@guideintro["bytestrings"]{the syntax of byte strings}
|
|
|
|
@section-index["byte strings" "parsing"]
|
|
|
|
A string constant preceded by @litchar{#} is parsed as a
|
|
byte string. (That is, @as-index{@litchar{#"}} starts a byte-string
|
|
literal.) See @secref["bytestrings"] for information on byte
|
|
strings. Byte-string constants support the same escape sequences as
|
|
character strings, except @litchar{\u} and @litchar{\U}.
|
|
|
|
When the reader encounters @as-index{@litchar{#<<}}, it starts parsing a
|
|
@pidefterm{here string}. The characters following @litchar{#<<} until
|
|
a newline character define a terminator for the string. The content of
|
|
the string includes all characters between the @litchar{#<<} line and
|
|
a line whose only content is the specified terminator. More precisely,
|
|
the content of the string starts after a newline following
|
|
@litchar{#<<}, and it ends before a newline that is followed by the
|
|
terminator, where the terminator is itself followed by either a
|
|
newline or end-of-file. No escape sequences are recognized between the
|
|
starting and terminating lines; all characters are included in the
|
|
string (and terminator) literally. A return character is not treated
|
|
as a line separator in this context. If no characters appear between
|
|
@litchar{#<<} and a newline or end-of-file, or if an end-of-file is
|
|
encountered before a terminating line, the @exnraise[exn:fail:read].
|
|
|
|
@reader-examples[
|
|
"\"Apple\""
|
|
"\"\\x41pple\""
|
|
"\"\\\"Apple\\\"\""
|
|
"\"\\\\\""
|
|
"#\"Apple\""
|
|
]
|
|
|
|
@section[#:tag "parse-quote"]{Reading Quotes}
|
|
|
|
When the reader enounters @as-index{@litchar{'}}, it recursively
|
|
reads one datum and forms a new list containing the symbol
|
|
@racket['quote] and the following datum. This convention is mainly
|
|
useful for reading Racket code, where @racket['s] can be used as a
|
|
shorthand for @racket[(code:quote s)].
|
|
|
|
Several other sequences are recognized and transformed in a similar
|
|
way. Longer prefixes take precedence over short ones:
|
|
|
|
@read-quote-table[(list @litchar{'} @racket[quote])
|
|
(list @as-index{@litchar{`}} @racket[quasiquote])
|
|
(list @as-index{@litchar{,}} @racket[unquote])
|
|
(list @as-index{@litchar[",@"]} @racket[unquote-splicing])
|
|
(list @as-index{@litchar{#'}} @racket[syntax])
|
|
(list @as-index{@litchar{#`}} @racket[quasisyntax])
|
|
(list @as-index{@litchar{#,}} @racket[unsyntax])
|
|
(list @as-index{@litchar["#,@"]} @racket[unsyntax-splicing])]
|
|
|
|
@reader-examples[
|
|
"'apple"
|
|
"`(1 ,2)"
|
|
]
|
|
|
|
The @litchar{`}, @litchar{,}, and @litchar[",@"] forms are disabled when
|
|
the @racket[read-accept-quasiquote] @tech{parameter} is set to
|
|
@racket[#f], in which case the @exnraise[exn:fail:read] instead.
|
|
|
|
@section[#:tag "parse-comment"]{Reading Comments}
|
|
|
|
A @as-index{@litchar{;}} starts a line comment. When the reader
|
|
encounters @litchar{;}, it skips past all characters until the
|
|
next linefeed (ASCII 10), carriage return (ASCII 13), next-line
|
|
(Unicode @racket[#x0085]), line-separator (Unicode @racket[#x2028]),
|
|
or paragraph-separator (Unicode @racket[#x2029]) character.
|
|
|
|
A @as-index{@litchar{#|}} starts a nestable block comment. When the
|
|
reader encounters @litchar{#|}, it skips past all characters
|
|
until a closing @litchar{|#}. Pairs of matching @litchar{#|} and
|
|
@litchar{|#} can be nested.
|
|
|
|
A @as-index{@litchar{#;}} starts an S-expression comment. When the
|
|
reader encounters @litchar{#;}, it recursively reads one datum, and
|
|
then discards it (continuing on to the next datum for the read
|
|
result).
|
|
|
|
A @as-index{@litchar{#! }} (which is @litchar{#!} followed by a space)
|
|
or @as-index{@litchar{#!/}} starts a line comment that can be
|
|
continued to the next line by ending a line with @litchar{\}. This
|
|
form of comment normally appears at the beginning of a Unix script
|
|
file.
|
|
|
|
@reader-examples[
|
|
"; comment"
|
|
"#| a |# 1"
|
|
"#| #| a |# 1 |# 2"
|
|
"#;1 2"
|
|
"#!/bin/sh"
|
|
"#! /bin/sh"
|
|
]
|
|
|
|
@section[#:tag "parse-vector"]{Reading Vectors}
|
|
|
|
When the reader encounters a @litchar{#(}, @litchar{#[}, or
|
|
@litchar["#{"], it starts parsing a vector; see @secref["vectors"] for
|
|
information on vectors. The @litchar{#[} and @litchar["#{"] forms can
|
|
be disabled through the @racket[read-square-bracket-as-paren] and
|
|
@racket[read-curly-brace-as-paren] @tech{parameters}.
|
|
|
|
The elements of the vector are recursively read until a matching
|
|
@litchar{)}, @litchar{]}, or @litchar["}"] is found, just as for
|
|
lists (see @secref["parse-pair"]). A delimited @litchar{.} is not
|
|
allowed among the vector elements.
|
|
|
|
An optional vector length can be specified between the @litchar{#} and
|
|
@litchar{(}, @litchar{[}, or @litchar["{"]. The size is specified
|
|
using a sequence of decimal digits, and the number of elements
|
|
provided for the vector must be no more than the specified size. If
|
|
fewer elements are provided, the last provided element is used for the
|
|
remaining vector slots; if no elements are provided, then @racket[0]
|
|
is used for all slots.
|
|
|
|
In @racket[read-syntax] mode, each recursive read for the vector
|
|
elements is also in @racket[read-syntax] mode, so that the wrapped
|
|
vector's elements are also wrapped as syntax objects, and the vector is
|
|
immutable.
|
|
|
|
@reader-examples[
|
|
"#(1 apple 3)"
|
|
"#3(\"apple\" \"banana\")"
|
|
"#3()"
|
|
]
|
|
|
|
|
|
@section[#:tag "parse-structure"]{Reading Structures}
|
|
|
|
When the reader encounters a @litchar{#s(}, @litchar{#s[}, or
|
|
@litchar["#s{"], it starts parsing an instance of a @tech{prefab}
|
|
@tech{structure type}; see @secref["structures"] for information on
|
|
@tech{structure types}. The @litchar{#s[} and @litchar["#s{"] forms
|
|
can be disabled through the @racket[read-square-bracket-as-paren] and
|
|
@racket[read-curly-brace-as-paren] @tech{parameters}.
|
|
|
|
The elements of the structure are recursively read until a matching
|
|
@litchar{)}, @litchar{]}, or @litchar["}"] is found, just as for lists
|
|
(see @secref["parse-pair"]). A single delimited @litchar{.} is not
|
|
allowed among the elements, but two @litchar{.}s can be used as in a
|
|
list for an infix conversion.
|
|
|
|
The first element is used as the structure descriptor, and it must
|
|
have the form (when quoted) of a possible argument to
|
|
@racket[make-prefab-struct]; in the simplest case, it can be a
|
|
symbol. The remaining elements correspond to field values within the
|
|
structure.
|
|
|
|
In @racket[read-syntax] mode, the structure type must not have any
|
|
mutable fields. The structure's elements are read in
|
|
@racket[read-syntax] mode, so that the wrapped structure's elements
|
|
are also wrapped as syntax objects.
|
|
|
|
If the first structure element is not a valid @tech{prefab} structure
|
|
type key, or if the number of provided fields is inconsistent with the
|
|
indicated @tech{prefab} structure type, the @exnraise[exn:fail:read].
|
|
|
|
|
|
@section[#:tag "parse-hashtable"]{Reading Hash Tables}
|
|
|
|
A @as-index{@litchar{#hash}} starts an immutable hash-table constant
|
|
with key matching based on @racket[equal?]. The characters after
|
|
@litchar{hash} must parse as a list of pairs (see
|
|
@secref["parse-pair"]) with a specific use of delimited @litchar{.}:
|
|
it must appear between the elements of each pair in the list and
|
|
nowhere in the sequence of list elements. The first element of each
|
|
pair is used as the key for a table entry, and the second element of
|
|
each pair is the associated value.
|
|
|
|
A @as-index{@litchar{#hasheq}} starts a hash table like
|
|
@litchar{#hash}, except that it constructs a hash table based on
|
|
@racket[eq?] instead of @racket[equal?].
|
|
|
|
A @as-index{@litchar{#hasheqv}} starts a hash table like
|
|
@litchar{#hash}, except that it constructs a hash table based on
|
|
@racket[eqv?] instead of @racket[equal?].
|
|
|
|
In all cases, the table is constructed by adding each mapping to the
|
|
hash table from left to right, so later mappings can hide earlier
|
|
mappings if the keys are equivalent.
|
|
|
|
@reader-examples[
|
|
#:example-note @elem{, where @racket[make-...] stands for @racket[make-immutable-hash]}
|
|
"#hash()"
|
|
"#hasheq()"
|
|
"#hash((\"a\" . 5))"
|
|
"#hasheq((a . 5) (b . 7))"
|
|
"#hasheq((a . 5) (a . 7))"
|
|
]
|
|
|
|
@section[#:tag "parse-box"]{Reading Boxes}
|
|
|
|
When the reader encounters a @as-index{@litchar{#&}}, it starts
|
|
parsing a box; see @secref["boxes"] for information on boxes. The
|
|
content of the box is determined by recursively reading the next
|
|
datum.
|
|
|
|
In @racket[read-syntax] mode, the recursive read for the box content
|
|
is also in @racket[read-syntax] mode, so that the wrapped box's
|
|
content is also wrapped as a syntax object, and the box is immutable.
|
|
|
|
@reader-examples[
|
|
"#&17"
|
|
]
|
|
|
|
@section[#:tag "parse-character"]{Reading Characters}
|
|
|
|
@guideintro["characters"]{the syntax of characters}
|
|
|
|
A @as-index{@litchar{#\}} starts a character constant, which has
|
|
one of the following forms:
|
|
|
|
@itemize[
|
|
|
|
@item{ @litchar{#\nul} or @litchar{#\null}: NUL (ASCII 0)@nonalpha[]}
|
|
@item{ @litchar{#\backspace}: backspace (ASCII 8)@nonalpha[]}
|
|
@item{ @litchar{#\tab}: tab (ASCII 9)@nonalpha[]}
|
|
@item{ @litchar{#\newline} or @litchar{#\linefeed}: linefeed (ASCII 10)@nonalpha[]}
|
|
@item{ @litchar{#\vtab}: vertical tab (ASCII 11)@nonalpha[]}
|
|
@item{ @litchar{#\page}: page break (ASCII 12)@nonalpha[]}
|
|
@item{ @litchar{#\return}: carriage return (ASCII 13)@nonalpha[]}
|
|
@item{ @litchar{#\space}: space (ASCII 32)@nonalpha[]}
|
|
@item{ @litchar{#\rubout}: delete (ASCII 127)@nonalpha[]}
|
|
|
|
@item{@litchar{#\}@kleenerange[1 3]{@nonterm{digit@sub{8}}}:
|
|
Unicode for the octal number specified by @kleenerange[1
|
|
3]{@nonterm{digit@sub{8}}}, as in string escapes (see
|
|
@secref["parse-string"]).}
|
|
|
|
@item{@litchar{#\x}@kleenerange[1 2]{@nonterm{digit@sub{16}}}:
|
|
Unicode for the hexadecimal number specified by @kleenerange[1
|
|
2]{@nonterm{digit@sub{16}}}, as in string escapes (see
|
|
@secref["parse-string"]).}
|
|
|
|
@item{@litchar{#\u}@kleenerange[1 4]{@nonterm{digit@sub{16}}}:
|
|
like @litchar{#\x}, but with up to four hexadecimal digits.}
|
|
|
|
@item{@litchar{#\U}@kleenerange[1 6]{@nonterm{digit@sub{16}}}:
|
|
like @litchar{#\x}, but with up to six hexadecimal digits.}
|
|
|
|
@item{@litchar{#\}@nonterm{c}: the character @nonterm{c}, as long
|
|
as @litchar{#\}@nonterm{c} and the characters following it
|
|
do not match any of the previous cases, and as long as the
|
|
character after @nonterm{c} is not
|
|
@racketlink[char-alphabetic?]{alphabetic}.}
|
|
|
|
]
|
|
|
|
@reader-examples[
|
|
"#\\newline"
|
|
"#\\n"
|
|
"#\\u3BB"
|
|
"#\\\u3BB"
|
|
]
|
|
|
|
@section[#:tag "parse-keyword"]{Reading Keywords}
|
|
|
|
A @as-index{@litchar{#:}} starts a keyword. The parsing of a keyword
|
|
after the @litchar{#:} is the same as for a symbol, including
|
|
case-folding in case-insensitive mode, except that the part after
|
|
@litchar{#:} is never parsed as a number.
|
|
|
|
@reader-examples[
|
|
"#:Apple"
|
|
"#:1"
|
|
]
|
|
|
|
@section[#:tag "parse-regexp"]{Reading Regular Expressions}
|
|
|
|
A @as-index{@litchar{#rx}} or @as-index{@litchar{#px}} starts a
|
|
regular expression. The characters immediately after @litchar{#rx} or
|
|
@litchar{#px} must parse as a string or byte string (see
|
|
@secref["parse-string"]). A @litchar{#rx} prefix starts a regular
|
|
expression as would be constructed by @racket[regexp], @litchar{#px}
|
|
as constructed by @racket[pregexp], @litchar{#rx#} as constructed by
|
|
@racket[byte-regexp], and @litchar{#px#} as constructed by
|
|
@racket[byte-pregexp].
|
|
|
|
@reader-examples[
|
|
"#rx\".*\""
|
|
"#px\"[\\\\s]*\""
|
|
"#rx#\".*\""
|
|
"#px#\"[\\\\s]*\""
|
|
]
|
|
|
|
@section[#:tag "parse-graph"]{Reading Graph Structure}
|
|
|
|
@section-index["#0="]
|
|
@section-index["#0#"]
|
|
|
|
A @graph-defn[] tags the following datum for reference via
|
|
@graph-ref[], which allows the reader to produce a datum that
|
|
has graph structure.
|
|
|
|
For a specific @graph-tag[] in a single read result, each @graph-ref[]
|
|
reference is replaced by the datum read for the corresponding
|
|
@graph-defn[]; the definition @graph-defn[] also produces just the
|
|
datum after it. A @graph-defn[] definition can appear at most once,
|
|
and a @graph-defn[] definition must appear before a @graph-ref[]
|
|
reference appears, otherwise the @exnraise[exn:fail:read]. If the
|
|
@racket[read-accept-graph] parameter is set to @racket[#f], then
|
|
@graph-defn[] or @graph-ref[] triggers a @racket[exn:fail:read]
|
|
exception.
|
|
|
|
Although a comment parsed via @litchar{#;} discards the datum
|
|
afterward, @graph-defn[] definitions in the discarded datum
|
|
still can be referenced by other parts of the reader input, as long as
|
|
both the comment and the reference are grouped together by some other
|
|
form (i.e., some recursive read); a top-level @litchar{#;} comment
|
|
neither defines nor uses graph tags for other top-level forms.
|
|
|
|
@reader-examples[
|
|
"(#1=100 #1# #1#)"
|
|
"#0=(1 . #0#)"
|
|
]
|
|
|
|
@local-table-of-contents[]
|
|
|
|
@section[#:tag "parse-reader"]{Reading via an Extension}
|
|
|
|
@guideintro["hash-reader"]{reader extension}
|
|
|
|
When the reader encounters @as-index{@litchar{#reader}}, it loads
|
|
an external reader procedure and applies it to the current input
|
|
stream.
|
|
|
|
The reader recursively reads the next datum after @litchar{#reader},
|
|
and passes it to the procedure that is the value of the
|
|
@racket[current-reader-guard] @tech{parameter}; the result is used as a
|
|
module path. The module path is passed to @racket[dynamic-require]
|
|
with either @racket['read] or @racket['read-syntax] (depending on
|
|
whether the reader is in @racket[read] or @racket[read-syntax]
|
|
mode).
|
|
|
|
The arity of the resulting procedure determines whether it accepts
|
|
extra source-location information: a @racketidfont{read} procedure
|
|
accepts either one argument (an input port) or five, and a
|
|
@racketidfont{read-syntax} procedure accepts either two arguments (a
|
|
name value and an input port) or six. In either case, the four
|
|
optional arguments are the reader's module path (as a syntax object in
|
|
@racket[read-syntax] mode) followed by the line (positive exact
|
|
integer or @racket[#f]), column (non-negative exact integer or
|
|
@racket[#f]), and position (positive exact integer or @racket[#f]) of
|
|
the start of the @litchar{#reader} form. The input port is the one
|
|
whose stream contained @litchar{#reader}, where the stream position is
|
|
immediately after the recursively read module path.
|
|
|
|
The procedure should produce a datum result. If the result is a
|
|
syntax object in @racket[read] mode, then it is converted to a datum
|
|
using @racket[syntax->datum]; if the result is not a syntax object in
|
|
@racket[read-syntax] mode, then it is converted to one using
|
|
@racket[datum->syntax]. See also @secref["reader-procs"] for
|
|
information on the procedure's results.
|
|
|
|
If the @racket[read-accept-reader] @tech{parameter} is set to
|
|
@racket[#f], then if the reader encounters @litchar{#reader}, the
|
|
@exnraise[exn:fail:read].
|
|
|
|
@guideintro["hash-lang"]{@racketmodfont["#lang"]}
|
|
|
|
The @as-index{@litchar{#lang}} reader form is similar to
|
|
@litchar{#reader}, but more constrained: the @litchar{#lang} must be
|
|
followed by a single space (ASCII 32), and then a non-empty sequence
|
|
of alphanumeric ASCII, @litchar{+}, @litchar{-}, @litchar{_}, and/or
|
|
@litchar{/} characters terminated by
|
|
@racketlink[char-whitespace?]{whitespace} or an end-of-file. The
|
|
sequence must not start or end with @litchar{/}. A sequence
|
|
@litchar{#lang }@nonterm{name} is equivalent to
|
|
@litchar{#reader }@nonterm{name}@litchar{/lang/reader}. Note
|
|
that the terminating whitespace (if any) is not consumed before the
|
|
external reading procedure is called.
|
|
|
|
@guideintro["hash-languages"]{the creation languages for @hash-lang[]}
|
|
|
|
Finally, @as-index{@litchar{#!}} is an alias for @litchar{#lang}
|
|
followed by a space when @litchar{#!} is followed by alphanumeric
|
|
ASCII, @litchar{+}, @litchar{-}, or @litchar{_}. Use of this alias
|
|
is discouraged except as needed to construct programs that conform to
|
|
certain grammars, such as that of R@superscript{6}RS
|
|
@cite["Sperber07"].
|
|
|
|
@margin-note{The @racketmodname[syntax/module-reader] library provides a
|
|
domain-specific language for writing language readers.}
|
|
|
|
By convention, @litchar{#lang} normally appears at the beginning of a
|
|
file, possibly after comment forms, to specify the syntax of a module.
|
|
|
|
If the @racket[read-accept-reader] or @racket[read-accept-lang]
|
|
@tech{parameter} is set to @racket[#f], then if the reader encounters
|
|
@litchar{#lang} or equivalent @litchar{#!}, the @exnraise[exn:fail:read].
|
|
|
|
@subsection{S-Expression Reader Language}
|
|
|
|
@defmodulelang[s-exp]
|
|
|
|
@guideintro["s-exp"]{the @racketmodname[s-exp] meta-language}
|
|
|
|
The @racket[s-exp] ``language'' is a kind of meta-language. It
|
|
@racket[read]s the S-expression that follows @litchar{#lang s-exp} and
|
|
uses it as the language of a @racket[module] form. It also reads all
|
|
remaining S-expressions until an end-of-file, using them for the body
|
|
of the generated @racket[module].
|
|
|
|
That is,
|
|
|
|
@racketmod[
|
|
s-exp _module-path
|
|
_form ...
|
|
]
|
|
|
|
is equivalent to
|
|
|
|
@racketblock[
|
|
(module _name _module-path
|
|
_form ...)
|
|
]
|
|
|
|
@subsection{Chaining Reader Language}
|
|
|
|
@defmodulelang[reader]
|
|
|
|
@guideintro["hash-lang reader"]{the @racketmodname[reader] meta-language}
|
|
|
|
The @racket[reader] ``language'' is a kind of meta-language. It
|
|
@racket[read]s the S-expression that follows @litchar{#lang reader}
|
|
and uses it as a module path (relative to the module being read) that
|
|
effectively takes the place of @racketmodname[reader]. In other words,
|
|
the @racketmodname[reader] meta-language generalizes the syntax of the
|
|
module specified after @hash-lang[] to be a module path, and without
|
|
the implicit addition of @litchar{/lang/reader} to the path.
|
|
|
|
|
|
@section[#:tag "parse-honu"]{Honu Parsing}
|
|
|
|
See @|HonuManual| for information on @litchar{#hx} and
|
|
@litchar{#sx}.
|