racket/collects/scribblings/guide/reader-extension.scrbl
2010-07-24 20:43:17 -05:00

206 lines
8.1 KiB
Racket

#lang scribble/doc
@(require scribble/manual
scribble/bnf
scribble/eval
(for-label racket/match
syntax/readerr)
"guide-utils.ss"
"modfile.rkt")
@title[#:tag "hash-reader"]{Reader Extensions}
The @tech{reader} layer of the Racket language can be extended through
the @racketmetafont{#reader} form. A reader extension is implemented
as a module that is named after @racketmetafont{#reader}. The module
exports functions that parse raw characters into a form to be consumed
by the @tech{expander} layer.
The syntax of @racketmetafont{#reader} is
@racketblock[@#,(BNF-seq @litchar{#reader} @nonterm{module-path} @nonterm{reader-specific})]
where @nonterm{module-path} names a module that provides
@racketidfont{read} and @racketidfont{read-syntax} functions. The
@nonterm{reader-specific} part is a sequence of characters that is
parsed as determined by the @racketidfont{read} and
@racketidfont{read-syntax} functions from @nonterm{module-path}.
For example, suppose that file @filepath{five.rkt} contains
@racketmodfile["five.rkt"]
Then, the program
@racketmod[
racket/base
'(1 @#,(elem @racketmetafont{#reader} @racket["five.rkt"] @tt{23456} @racket[7]) 8)
]
is equivalent to
@racketmod[
racket/base
'(1 ("23456") 7 8)
]
because the @racketidfont{read} and @racketidfont{read-syntax}
functions of @filepath{five.rkt} both read five characters from the
input stream and put them into a string and then a list. The reader
functions from @filepath{five.rkt} are not obliged to follow Racket
lexical conventions and treat the continuous sequence @litchar{234567}
as a single number. Since only the @litchar{23456} part is consumed by
@racketidfont{read} or @racketidfont{read-syntax}, the @litchar{7}
remains to be parsed in the usual Racket way. Similarly, the reader
functions from @filepath{five.rkt} are not obliged to ignore
whitespace, and
@racketmod[
racket/base
'(1 @#,(elem @racketmetafont{#reader} @racket["five.rkt"] @hspace[1] @tt{2345} @racket[67]) 8)
]
is equivalent to
@racketmod[
racket/base
'(1 (" 2345") 67 8)
]
since the first character immediately after @racket["five.rkt"] is a
space.
A @racketmetafont{#reader} form can be used in the @tech{REPL}, too:
@interaction[
(eval:alts '@#,(elem @racketmetafont{#reader}@racket["five.rkt"]@tt{abcde}) '#reader"five.rkt"abcde)
]
@; ----------------------------------------------------------------------
@section{Source Locations}
The difference between @racketidfont{read} and
@racketidfont{read-syntax} is that @racketidfont{read} is meant to be
used for data while @racketidfont{read-syntax} is meant to be used to
parse programs. More precisely, the @racketidfont{read} function will
be used when the enclosing stream is being parsed by the Racket
@racket[read], and @racketidfont{read-syntax} is used when the
enclosing stream is being parsed by the Racket @racket[read-syntax]
function. Nothing requires @racketidfont{read} and
@racketidfont{read-syntax} to parse input in the same way, but making
them different would confuse programmers and tools.
The @racketidfont{read-syntax} function can return the same kind of
value as @racketidfont{read}, but it should normally return a
@tech{syntax object} that connects the parsed expression with source
locations. Unlike the @filepath{five.rkt} example, the
@racketidfont{read-syntax} function is typically implemented directly
to produce @tech{syntax objects}, and then @racketidfont{read} can use
@racketidfont{read-syntax} and strip away @tech{syntax object}
wrappers to produce a raw result.
The following @filepath{arith.rkt} module implements a reader to
parse simple infix arithmetic expressions into Racket forms. For
example, @litchar{1*2+3} parses into the Racket form @racket[(+ (* 1
2) 3)]. The supported operators are @litchar{+}, @litchar{-},
@litchar{*}, and @litchar{/}, while operands can be unsigned integers
or single-letter variables. The implementation uses
@racket[port-next-location] to obtain the current source location, and
it uses @racket[datum->syntax] to turn raw values into @tech{syntax
objects}.
@racketmodfile["arith.rkt"]
If the @filepath{arith.rkt} reader is used in an expression position,
then its parse result will be treated as a Racket expression. If it is
used in a quoted form, however, then it just produces a number or a
list:
@interaction[
(eval:alts @#,(elem @racketmetafont{#reader}@racket["arith.rkt"]@hspace[1]@tt{1*2+3}) #reader"arith.rkt" 1*2+3 )
(eval:alts '@#,(elem @racketmetafont{#reader}@racket["arith.rkt"]@hspace[1]@tt{1*2+3}) '#reader"arith.rkt" 1*2+3 )
]
The @filepath{arith.rkt} reader could also be used in positions that
make no sense. Since the @racketidfont{read-syntax} implementation
tracks source locations, syntax errors can at least refer to parts of
the input in terms of their original locations (at the beginning of
the error message):
@interaction[
(eval:alts (let @#,(elem @racketmetafont{#reader}@racket["arith.rkt"]@hspace[1]@tt{1*2+3}) 8)
(eval (parameterize ([read-accept-reader #t])
(read-syntax 'repl (let ([p @open-input-string{(let #reader"arith.rkt" 1*2+3 8)}])
(port-count-lines! p)
p)))))
]
@; ----------------------------------------------------------------------
@section[#:tag "readtable"]{Readtables}
A reader extension's ability to parse input characters in an arbitrary
way can be powerful, but many cases of lexical extension call for a
less general but more composable approach. In much the same way that
the @tech{expander} level of Racket syntax can be extended through
@tech{macros}, the @tech{reader} level of Racket syntax can be
composably extended through a @deftech{readtable}.
The Racket reader is a recursive-descent parser, and the
@tech{readtable} maps characters to parsing handlers. For example, the
default readtable maps @litchar{(} to a handler that recursively
parses subforms until it finds a @litchar{)}. The
@racket[current-readtable] @tech{parameter} determines the
@tech{readtable} that is used by @racket[read] or
@racket[read-syntax]. Rather than parsing raw characters directly, a
reader extension can install an extended @tech{readtable} and then
chain to @racket[read] or @racket[read-syntax].
@guideother{See @secref["parameterize"] for an introduction to
@tech{parameters}.}
The @racket[make-readtable] function constructs a new @tech{readtable}
as an extension of an existing one. It accepts a sequence of
specifications in terms of a character, a type of mapping for the
character, and (for certain types of mappings) a parsing
procedure. For example, to extend the readtable so that @litchar{$}
can be used to start and end infix expressions, implement a
@racket[parse-dollar] function and use:
@racketblock[
(make-readtable (current-readtable)
#\$ 'terminating-macro read-dollar)
]
The protocol for @racket[read-dollar] requires the function to accept
different numbers of arguments depending on whether it is being used
in @racket[read] or @racket[read-syntax] mode. In @racket[read] mode,
the parser function is given two arguments: the character that
triggered the parser function and the input port that is being
read. In @racket[read-syntax] mode, the function must accept four
additional arguments that provide the source location of the
character.
The following @filepath{dollar.rkt} module defines a
@racket[parse-dollar] function in terms of the @racketidfont{read} and
@racketidfont{read-syntax} functions provided by @filepath{arith.rkt},
and it puts @racket[parse-dollar] together with new @racketidfont{read} and
@racketidfont{read-syntax} functions that install the readtable and
chain to Racket's @racket[read] or @racket[read-syntax]:
@racketmodfile["dollar.rkt"]
With this reader extension, a single @racketmetafont{#reader} can be
used at the beginning of an expression to enable multiple uses of
@litchar{$} that switch to infix arithmetic:
@interaction[
(eval:alts @#,(elem @racketmetafont{#reader}@racket["dollar.rkt"]@hspace[1]
@racket[(let ([a @#,tt{$1*2+3$}] [b @#,tt{$5/6$}]) $a+b$)])
#reader"dollar.rkt" (let ([a $1*2+3$] [b $5/6$]) $a+b$))
]