[honu] checkpoint for docs

This commit is contained in:
Jon Rafkind 2011-09-20 17:37:37 -06:00
parent 13e16d2b12
commit 34689f1711
2 changed files with 160 additions and 216 deletions

View File

@ -463,8 +463,7 @@ Then, in the pattern above for 'if', 'then' would be bound to the following synt
(syntax->datum unparsed))
;; if parsed is #f then we don't want to expand to anything that will print
;; so use an empty form, begin, `parsed' could be #f becuase there was no expression
;; in the input such as parsing just ";". hygiene should ensure that this variable
;; will not collide with anything else
;; in the input such as parsing just ";".
(with-syntax ([parsed (if (not parsed) #'(begin) parsed)]
[(unparsed ...) unparsed])
(if (null? (syntax->datum #'(unparsed ...)))

View File

@ -1,236 +1,181 @@
#lang scribble/doc
@(require scribble/manual
scribble/bnf
(for-label scheme))
honu/core/read
(for-label honu/core/read))
@(define lcomma (litchar ", "))
@title{Honu}
@defterm{Honu} is a family of languages built on top of Racket. Honu
syntax resembles Java. Like Racket, however, Honu has no fixed syntax,
because Honu supports extensibility through macros and a base syntax
of @as-index{H-expressions}, which are analogous to S-expressions.
The Honu language currently exists only as a undocumented
prototype. Racket's parsing and printing of H-expressions is
independent of the Honu language, however, so it is documented here.
@defterm{Honu} is a language with Java-like syntax built on top of Racket.
Honu's main goal is to support syntactic abstraction mechanisms similar to
Racket. Currently, Honu is a prototype and may change without notice.
@table-of-contents[]
@; ----------------------------------------------------------------------
@section{H-expressions}
@defmodulelang[honu]
The Racket reader incorporates an H-expression reader, and Racket's
printer also supports printing values in Honu syntax. The reader can
be put into H-expression mode either by including @litchar{#hx} in the
input stream, or by calling @racket[read-honu] or
@racket[read-honu-syntax] instead of @racket[read] or
@racket[read-syntax]. Similarly, @racket[print] (or, more precisely,
the default print handler) produces Honu output when the
@racket[print-honu] parameter is set to @racket[#t].
@section{Get started}
To use Honu in a module, write the following line at the top of the file.
When the reader encounters @litchar{#hx}, it reads a single
H-expression, and it produces an S-expression that encodes the
H-expression. Except for atomic H-expressions, evaluating this
S-expression as Racket is unlikely to succeed. In other words,
H-expressions are not intended as a replacement for S-expressions to
represent Racket code.
@racketmod[honu]
Honu syntax is normally used via @litchar{#lang honu}, which reads
H-expressions repeatedly until an end-of-file is encountered, and
processes the result as a module in the Honu language.
Ignoring whitespace, an H-expression is either
@itemize[
@item{a number (see @secref["honu:numbers"]);}
@item{an identifier (see @secref["honu:identifiers"]);}
@item{a string (see @secref["honu:strings"]);}
@item{a character (see @secref["honu:chars"]);}
@item{a sequence of H-expressions between parentheses (see @secref["honu:parens"]);}
@item{a sequence of H-expressions between square brackets (see @secref["honu:parens"]);}
@item{a sequence of H-expressions between curly braces (see @secref["honu:parens"]);}
@item{a comment followed by an H-expression (see @secref["honu:comments"]);}
@item{@litchar{#;} followed by two H-expressions (see @secref["honu:comments"]);}
@item{@litchar{#hx} followed by an H-expression;}
@item{@litchar{#sx} followed by an S-expression (see @secref[#:doc
'(lib "scribblings/reference/reference.scrbl") "reader"]).}
]
Within a sequence of H-expressions, a sub-sequence between angle
brackets is represented specially (see @secref["honu:parens"]).
Whitespace for H-expressions is as in Racket: any character for which
@racket[char-whitespace?] returns true counts as a whitespace.
@; ----------------------------------------------------------------------
@subsection[#:tag "honu:numbers"]{Numbers}
The syntax for Honu numbers is the same as for Java. The S-expression
encoding of a particular H-expression number is the obvious Racket
number.
@; ----------------------------------------------------------------------
@subsection[#:tag "honu:identifiers"]{Identifiers}
The syntax for Honu identifiers is the union of Java identifiers plus
@litchar{;}, @litchar{,}, and a set of operator identifiers. An
@defterm{operator identifier} is any combination of the following
characters:
@t{
@hspace[2] @litchar{+} @litchar{-} @litchar{=} @litchar{?}
@litchar{:} @litchar{<} @litchar{>} @litchar{.} @litchar{!} @litchar{%}
@litchar{^} @litchar{&} @litchar{*} @litchar{/} @litchar{~} @litchar{|}
You can use Honu at the REPL on the command line by invoking racket like so
@verbatim{
racket -Iq honu
}
The S-expression encoding of an H-expression identifier is the obvious
Racket symbol.
@section{Reader}
Input is parsed to form maximally long identifiers. For example, the
input @litchar{int->int;} is parsed as four H-expressions represented
by symbols: @racket['int], @racket['->], @racket['int], and
@racket['|;|].
@; ----------------------------------------------------------------------
@subsection[#:tag "honu:strings"]{Strings}
The syntax for an H-expression string is exactly the same as for an
S-expression string, and an H-expression string is represented by the
obvious Racket string.
@; ----------------------------------------------------------------------
@subsection[#:tag "honu:chars"]{Characters}
The syntax for an H-expression character is the same as for an
H-expression string that has a single content character, except that a
@litchar{'} surrounds the character instead of @litchar{"}. The
S-expression representation of an H-expression character is the
obvious Racket character.
@; ----------------------------------------------------------------------
@subsection[#:tag "honu:parens"]{Parentheses, Brackets, and Braces}
A H-expression between @litchar{(} and @litchar{)}, @litchar{[} and
@litchar{]}, or @litchar["{"] and @litchar["}"] is represented by a
Racket list. The first element of the list is @racket['#%parens] for a
@litchar{(}...@litchar{)} sequence, @racket['#%brackets] for a
@litchar{[}...@litchar{]} sequence, or @racket['#%braces] for a
@litchar["{"]...@litchar["}"] sequence. The remaining elements are the
Racket representations for the grouped H-expressions in order.
In an H-expression sequence, when a @litchar{<} is followed by a
@litchar{>}, and when nothing between the @litchar{<} and @litchar{>}
is an immediate symbol containing a @litchar{=}, @litchar{&}, or
@litchar{|}, then the sub-sequence is represented by a Racket list
that starts with @racket['#%angles] and continues with the elements of
the sub-sequence between the @litchar{<} and @litchar{>}
(exclusive). This representation is applied recursively, so that angle
brackets can be nested.
An angle-bracketed sequence by itself is not a single H-expression,
since the @litchar{<} by itself is a single H-expression; the
angle-bracket conversion is performed only when representing sequences
of H-expressions.
Symbols with a @litchar{=}, @litchar{&}, or @litchar{|} prevent
angle-bracket formation because they correspond to operators that
normally have lower or equal precedence compared to less-than and
greater-than operators.
@; ----------------------------------------------------------------------
@subsection[#:tag "honu:comments"]{Comments}
An H-expression comment starts with either @litchar{//} or
@litchar{/*}. In the former case, the comment runs until a linefeed or
return. In the second case, the comment runs until @litchar{*/}, but
@litchar{/*}...@litchar{*/} comments can be nested. Comments are
treated like whitespace.
A @litchar{#;} starts an H-expression comment, as in S-expressions. It
is followed by an H-expression to be treated as whitespace. Note that
@litchar{#;} is equivalent to @litchar{#sx#;#hx}.
@; ----------------------------------------------------------------------
@subsection{Honu Output Printing}
Some Racket values have a standard H-expression representation. For
values with no H-expression representation but with a
@racket[read]able S-expression form, the Racket printer produces an
S-expression prefixed with @litchar{#sx}. For values with neither an
H-expression form nor a @racket[read]able S-expression form, then
printer produces output of the form @litchar{#<}...@litchar{>}, as in
Racket mode. The @racket[print-honu] parameter controls whether
Racket's printer produces Racket or Honu output.
The values with H-expression forms are as follows:
@subsection{Tokens}
The Honu reader, @racket[honu-read], will tokenize the input stream according to
the following regular expressions.
@itemize[
@item{Every real number has an H-expression form, although the
representation for an exact, non-integer rational number is
actually three H-expressions, where the middle H-expression is
@racket[/].}
@item{Every character string is represented the same in H-expression
form as its S-expression form.}
@item{Every character is represented like a single-character string,
but (1) using a @litchar{'} as the delimiter instead of
@litchar{"}, and (2) protecting a @litchar{'} character content
with a @litchar{\} instead of protecting @litchar{"} character
content.}
@item{A list is represented with the H-expression sequence
@litchar{list(}@nonterm{v}@|lcomma|...@litchar{)},
where each @nonterm{v} is the representation of each element of
the list.}
@item{A pair that is not a list is represented with the H-expression
sequence
@litchar{cons(}@nonterm{v1}@|lcomma|@nonterm{v2}@litchar{)},
where @nonterm{v1} and @nonterm{v2} are the representations of
the pair elements.}
@item{A vector's representation depends on the value of the
@racket[print-vector-length] parameter. If it is @racket[#f],
the vector is represented with the H-expression sequence
@litchar{vectorN(}@nonterm{v}@|lcomma|...@litchar{)}, where
each @nonterm{v} is the representation of each element of the
vector. If @racket[print-vector-length] is set to @racket[#t],
the vector is represented with the H-expression sequence
@litchar{vectorN(}@nonterm{n}@|lcomma|@nonterm{v}@|lcomma|...@litchar{)},
where @nonterm{n} is the length of the vector and each
@nonterm{v} is the representation of each element of the
vector, and multiple instances of the same value at the end of
the vector are represented by a single @nonterm{v}.}
@item{The empty list is represented as the H-expression
@litchar{null}.}
@item{True is represented as the H-expression @litchar{true}.}
@item{False is represented as the H-expression @litchar{false}.}
@item{Identifiers are [a-zA-Z_?][a-zA-Z_?0-9]*}
@item{Strings are "[^"]*"}
@item{Numbers are \d+(\.\d+)?}
@item{And the following tokens + = * / - ^ || | && <= >= <- < > !
:: := : ; ` ' . , ( ) { } [ ]}
]
@subsection{Structure}
After tokenization a Honu program will be converted into a tree with minimal
structure. Enclosing tokens will be grouped into a single object represented as
an s-expression. Enclosing tokens are pairs of (), {}, and [].
Consider the following stream of tokens
@codeblock|{
x ( 5 + 2 )
}|
This will be converted into
@codeblock|{
(x (#%parens 5 + 2))
}|
{} will be converted to (#%braces ...) and [] will be conveted to (#%brackets
...)
@defproc[(honu-read (port port?)) any]{
Read an s-expression from the given port.
}
@defproc[(honu-read-syntax (name any) (port port?)) any]{
Read a syntax object from the given port.
}
@defproc[(honu-lexer (port port?)) (list position-token?)]{
Tokenize a port into a stream of honu tokens.
}
@section{Parsing}
Honu is parsed using an algorithm based primarily on operator precedence. The
main focus of the operator precedence algorithm is to support infix operators.
In short, the algorithm operates in the following way
@itemlist[
@item{1. parse an @tech{expression}}
@item{2. check for a binary operator. if one is found then continue to step 3
otherwise return the expression from step 1 immediately.}
@item{3. parse another @tech{expression}}
@item{4. check for a binary operator. if one is found then check if its precedence is
higher than the operator found in step 2, and if so then continue parsing from
step 3. if the precedence is lower or an operator is not found then build an
infix expression from the left hand expression from step 1, the binary operator
in step 2, and the right hand expression in step 3.}
]
Parsing will maintain the following registers
@itemlist[
@item{@bold{left} - a function that takes the right hand side of an expression and
returns the infix expression by combining the left hand side and the
operator.}
@item{@bold{current} - the current right hand side}
@item{@bold{precedence} - represents the current precedence level}
@item{@bold{stream} - stream of tokens to parse}
]
This algorithm is illustrated with the following example. Consider the raw
stream of tokens
@codeblock|{ 1 + 2 * 3 - 9 }|
@tabular[
@list[
@list["left" (hspace 1) "current" (hspace 1) "precedence" (hspace 1) "stream"]
@list[@racket[(lambda (x) x)] (hspace 1)
@racket[#f] (hspace 1)
@racket[0] (hspace 1)
@codeblock|{1 + 2 * 3 - 9}|]
@list[@racket[(lambda (x) x)] (hspace 1)
@racket[1] (hspace 1)
@racket[0] (hspace 1)
@codeblock|{+ 2 * 3 - 9}|]
@list[@racket[(lambda (x) #'(+ 1 x))] (hspace 1)
@racket[#f] (hspace 1)
@racket[1] (hspace 1)
@codeblock|{2 * 3 - 9}|]
@list[@racket[(lambda (x) #'(+ 1 x))] (hspace 1)
@racket[2] (hspace 1)
@racket[1] (hspace 1)
@codeblock|{* 3 - 9}|]
@list[@racket[(lambda (x) (left #'(* 2 x)))] (hspace 1)
@racket[2] (hspace 1)
@racket[2] (hspace 1)
@codeblock|{3 - 9}|]
@list[@racket[(lambda (x) (left #'(* 2 x)))] (hspace 1)
@racket[3] (hspace 1)
@racket[2] (hspace 1)
@codeblock|{- 9}|]
@list[@racket[(lambda (x) #'(- (+ 1 (* 2 3)) x))] (hspace 1)
@racket[#f] (hspace 1)
@racket[1] (hspace 1)
@codeblock|{9}|]
@list[@racket[(lambda (x) #'(- (+ 1 (* 2 3)) x))] (hspace 1)
@racket[9] (hspace 1)
@racket[1] (hspace 1)
@codeblock|{}|]
]
]
When the stream of tokens is empty the @bold{current} register is passed as an
argument to the @bold{left} function which ultimately produces the expression
@codeblock|{(- (+ 1 (* 2 3)) 9)}|
In this example @racket[+] and @racket[-] both have a precedence of 1 while
@racket[*] has a precedence of 2. Currently, precedences can be any number that
can be compared with @racket[<=].
The example takes some liberties with respect to how the actual implementation
works. In particular the binary operators are syntax transformers that accept
the left and right hand expressions as parameters and return new syntax objects.
Also when the @racket[*] operator is parsed the @bold{left} function for
@racket[+] is nested inside the new function for @racket[*].
An @deftech{expression} can be one of the following
@itemlist[
@item{@bold{datum} - number, string, or symbol. @codeblock|{5}|}
@item{@bold{macro} - a symbol bound to a syntax transformer.
@codeblock|{cond x = 5: true, else: false}|}
@item{@bold{stop} - a symbol which immediately ends the current expression.
these are currently , ; :}
@item{@bold{lambda expression} - an identifier followed by @racket[(id ...)]
followed by a block of code in braces. @codeblock|{add(x, y){ x + y }}|}
@item{@bold{function application} - an expression followed by @racket[(arg
...)]. @codeblock|{f(2, 2)}|}
@item{@bold{list comprehension} - @codeblock|{[x + 1: x <- [1, 2, 3]]}|}
@item{@bold{block of code} - a series of expressions wrapped in braces.}
@item{@bold{expression grouping} - any expression inside a set of parenthesis
@codeblock|{(1 + 1) * 2}|}
]
@section{Macros}
@section{Language}
@section{Examples}