[honu] checkpoint for docs

2011-09-20 17:37:37 -06:00 · 2011-09-20 17:37:37 -06:00 · 34689f1711
commit 34689f1711
parent 13e16d2b12
2 changed files with 160 additions and 216 deletions
--- a/collects/honu/core/private/honu-typed-scheme.rkt
+++ b/collects/honu/core/private/honu-typed-scheme.rkt
@ -463,8 +463,7 @@ Then, in the pattern above for 'if', 'then' would be bound to the following synt
            (syntax->datum unparsed))
     ;; if parsed is #f then we don't want to expand to anything that will print
     ;; so use an empty form, begin, `parsed' could be #f becuase there was no expression
-     ;; in the input such as parsing just ";". hygiene should ensure that this variable
-     ;; will not collide with anything else
+     ;; in the input such as parsing just ";".
     (with-syntax ([parsed (if (not parsed) #'(begin) parsed)]
                   [(unparsed ...) unparsed])
       (if (null? (syntax->datum #'(unparsed ...)))
--- a/collects/scribblings/honu/honu.scrbl
+++ b/collects/scribblings/honu/honu.scrbl
@ -1,236 +1,181 @@
 #lang scribble/doc
@(require scribble/manual
          scribble/bnf
-          (for-label scheme))
+          honu/core/read
+          (for-label honu/core/read))

@(define lcomma (litchar ", "))

@title{Honu}

-@defterm{Honu} is a family of languages built on top of Racket. Honu
-syntax resembles Java. Like Racket, however, Honu has no fixed syntax,
-because Honu supports extensibility through macros and a base syntax
-of @as-index{H-expressions}, which are analogous to S-expressions.
-
-The Honu language currently exists only as a undocumented
-prototype. Racket's parsing and printing of H-expressions is
-independent of the Honu language, however, so it is documented here.
+@defterm{Honu} is a language with Java-like syntax built on top of Racket.
+Honu's main goal is to support syntactic abstraction mechanisms similar to
+Racket. Currently, Honu is a prototype and may change without notice.

@table-of-contents[]

@; ----------------------------------------------------------------------

-@section{H-expressions}
+@defmodulelang[honu]

-The Racket reader incorporates an H-expression reader, and Racket's
-printer also supports printing values in Honu syntax. The reader can
-be put into H-expression mode either by including @litchar{#hx} in the
-input stream, or by calling @racket[read-honu] or
-@racket[read-honu-syntax] instead of @racket[read] or
-@racket[read-syntax]. Similarly, @racket[print] (or, more precisely,
-the default print handler) produces Honu output when the
-@racket[print-honu] parameter is set to @racket[#t].
+@section{Get started}
+To use Honu in a module, write the following line at the top of the file.

-When the reader encounters @litchar{#hx}, it reads a single
-H-expression, and it produces an S-expression that encodes the
-H-expression. Except for atomic H-expressions, evaluating this
-S-expression as Racket is unlikely to succeed. In other words,
-H-expressions are not intended as a replacement for S-expressions to
-represent Racket code.
+@racketmod[honu]

-Honu syntax is normally used via @litchar{#lang honu}, which reads
-H-expressions repeatedly until an end-of-file is encountered, and
-processes the result as a module in the Honu language.
-
-Ignoring whitespace, an H-expression is either
-
-@itemize[
-
- @item{a number (see @secref["honu:numbers"]);}
-
- @item{an identifier (see @secref["honu:identifiers"]);}
-
- @item{a string (see @secref["honu:strings"]);}
-
- @item{a character (see @secref["honu:chars"]);}
-
- @item{a sequence of H-expressions between parentheses (see @secref["honu:parens"]);}
-
- @item{a sequence of H-expressions between square brackets (see @secref["honu:parens"]);}
-
- @item{a sequence of H-expressions between curly braces (see @secref["honu:parens"]);}
-
- @item{a comment followed by an H-expression (see @secref["honu:comments"]);}
-
- @item{@litchar{#;} followed by two H-expressions (see @secref["honu:comments"]);}
-
- @item{@litchar{#hx} followed by an H-expression;}
-
- @item{@litchar{#sx} followed by an S-expression (see @secref[#:doc
-'(lib "scribblings/reference/reference.scrbl") "reader"]).}
-
-]
-
-Within a sequence of H-expressions, a sub-sequence between angle
-brackets is represented specially (see @secref["honu:parens"]).
-
-Whitespace for H-expressions is as in Racket: any character for which
-@racket[char-whitespace?] returns true counts as a whitespace.
-
-@; ----------------------------------------------------------------------
-
-@subsection[#:tag "honu:numbers"]{Numbers}
-
-The syntax for Honu numbers is the same as for Java. The S-expression
-encoding of a particular H-expression number is the obvious Racket
-number.
-
-@; ----------------------------------------------------------------------
-
-@subsection[#:tag "honu:identifiers"]{Identifiers}
-
-The syntax for Honu identifiers is the union of Java identifiers plus
-@litchar{;}, @litchar{,}, and a set of operator identifiers. An
-@defterm{operator identifier} is any combination of the following
-characters:
-
-@t{
-  @hspace[2] @litchar{+} @litchar{-} @litchar{=} @litchar{?} 
-  @litchar{:} @litchar{<} @litchar{>} @litchar{.} @litchar{!} @litchar{%}
-  @litchar{^} @litchar{&} @litchar{*} @litchar{/} @litchar{~} @litchar{|}
+You can use Honu at the REPL on the command line by invoking racket like so
+@verbatim{
+racket -Iq honu
 }

-The S-expression encoding of an H-expression identifier is the obvious
-Racket symbol.
+@section{Reader}

-Input is parsed to form maximally long identifiers. For example, the
-input @litchar{int->int;} is parsed as four H-expressions represented
-by symbols: @racket['int], @racket['->], @racket['int], and
-@racket['|;|].
-
-@; ----------------------------------------------------------------------
-
-@subsection[#:tag "honu:strings"]{Strings}
-
-The syntax for an H-expression string is exactly the same as for an
-S-expression string, and an H-expression string is represented by the
-obvious Racket string.
-
-@; ----------------------------------------------------------------------
-
-@subsection[#:tag "honu:chars"]{Characters}
-
-The syntax for an H-expression character is the same as for an
-H-expression string that has a single content character, except that a
-@litchar{'} surrounds the character instead of @litchar{"}. The
-S-expression representation of an H-expression character is the
-obvious Racket character.
-
-@; ----------------------------------------------------------------------
-
-@subsection[#:tag "honu:parens"]{Parentheses, Brackets, and Braces}
-
-A H-expression between @litchar{(} and @litchar{)}, @litchar{[} and
-@litchar{]}, or @litchar["{"] and @litchar["}"] is represented by a
-Racket list. The first element of the list is @racket['#%parens] for a
-@litchar{(}...@litchar{)} sequence, @racket['#%brackets] for a
-@litchar{[}...@litchar{]} sequence, or @racket['#%braces] for a
-@litchar["{"]...@litchar["}"] sequence. The remaining elements are the
-Racket representations for the grouped H-expressions in order.
-
-In an H-expression sequence, when a @litchar{<} is followed by a
-@litchar{>}, and when nothing between the @litchar{<} and @litchar{>}
-is an immediate symbol containing a @litchar{=}, @litchar{&}, or
-@litchar{|}, then the sub-sequence is represented by a Racket list
-that starts with @racket['#%angles] and continues with the elements of
-the sub-sequence between the @litchar{<} and @litchar{>}
-(exclusive). This representation is applied recursively, so that angle
-brackets can be nested.
-
-An angle-bracketed sequence by itself is not a single H-expression,
-since the @litchar{<} by itself is a single H-expression; the
-angle-bracket conversion is performed only when representing sequences
-of H-expressions.
-
-Symbols with a @litchar{=}, @litchar{&}, or @litchar{|} prevent
-angle-bracket formation because they correspond to operators that
-normally have lower or equal precedence compared to less-than and
-greater-than operators.
-
-@; ----------------------------------------------------------------------
-
-@subsection[#:tag "honu:comments"]{Comments}
-
-An H-expression comment starts with either @litchar{//} or
-@litchar{/*}. In the former case, the comment runs until a linefeed or
-return. In the second case, the comment runs until @litchar{*/}, but
-@litchar{/*}...@litchar{*/} comments can be nested. Comments are
-treated like whitespace.
-
-A @litchar{#;} starts an H-expression comment, as in S-expressions. It
-is followed by an H-expression to be treated as whitespace. Note that
-@litchar{#;} is equivalent to @litchar{#sx#;#hx}.
-
-@; ----------------------------------------------------------------------
-
-@subsection{Honu Output Printing}
-
-Some Racket values have a standard H-expression representation. For
-values with no H-expression representation but with a
-@racket[read]able S-expression form, the Racket printer produces an
-S-expression prefixed with @litchar{#sx}. For values with neither an
-H-expression form nor a @racket[read]able S-expression form, then
-printer produces output of the form @litchar{#<}...@litchar{>}, as in
-Racket mode. The @racket[print-honu] parameter controls whether
-Racket's printer produces Racket or Honu output.
-
-The values with H-expression forms are as follows:
+@subsection{Tokens}
+The Honu reader, @racket[honu-read], will tokenize the input stream according to
+the following regular expressions.

@itemize[
-
- @item{Every real number has an H-expression form, although the
-       representation for an exact, non-integer rational number is
-       actually three H-expressions, where the middle H-expression is
-       @racket[/].}
-
- @item{Every character string is represented the same in H-expression
-       form as its S-expression form.}
-
- @item{Every character is represented like a single-character string,
-       but (1) using a @litchar{'} as the delimiter instead of
-       @litchar{"}, and (2) protecting a @litchar{'} character content
-       with a @litchar{\} instead of protecting @litchar{"} character
-       content.}
-
- @item{A list is represented with the H-expression sequence
-       @litchar{list(}@nonterm{v}@|lcomma|...@litchar{)},
-       where each @nonterm{v} is the representation of each element of
-       the list.}
-
- @item{A pair that is not a list is represented with the H-expression
-       sequence
-       @litchar{cons(}@nonterm{v1}@|lcomma|@nonterm{v2}@litchar{)},
-       where @nonterm{v1} and @nonterm{v2} are the representations of
-       the pair elements.}
-
- @item{A vector's representation depends on the value of the
-       @racket[print-vector-length] parameter. If it is @racket[#f],
-       the vector is represented with the H-expression sequence
-       @litchar{vectorN(}@nonterm{v}@|lcomma|...@litchar{)}, where
-       each @nonterm{v} is the representation of each element of the
-       vector. If @racket[print-vector-length] is set to @racket[#t],
-       the vector is represented with the H-expression sequence
-       @litchar{vectorN(}@nonterm{n}@|lcomma|@nonterm{v}@|lcomma|...@litchar{)},
-       where @nonterm{n} is the length of the vector and each
-       @nonterm{v} is the representation of each element of the
-       vector, and multiple instances of the same value at the end of
-       the vector are represented by a single @nonterm{v}.}
-
- @item{The empty list is represented as the H-expression
-       @litchar{null}.}
-
- @item{True is represented as the H-expression @litchar{true}.}
-
- @item{False is represented as the H-expression @litchar{false}.}
-
+  @item{Identifiers are [a-zA-Z_?][a-zA-Z_?0-9]*}
+  @item{Strings are "[^"]*"}
+  @item{Numbers are \d+(\.\d+)?}
+  @item{And the following tokens + = * / - ^ || | && <= >= <- < > !
+  :: := : ; ` ' . ,  ( ) { } [ ]}
 ]
+
+@subsection{Structure}
+
+After tokenization a Honu program will be converted into a tree with minimal
+structure. Enclosing tokens will be grouped into a single object represented as
+an s-expression. Enclosing tokens are pairs of (), {}, and [].
+
+Consider the following stream of tokens
+
+@codeblock|{
+x ( 5 + 2 )
+}|
+
+This will be converted into
+@codeblock|{
+(x (#%parens 5 + 2))
+}|
+
+{} will be converted to (#%braces ...) and [] will be conveted to (#%brackets
+...)
+
+@defproc[(honu-read (port port?)) any]{
+  Read an s-expression from the given port.
+}
+
+@defproc[(honu-read-syntax (name any) (port port?)) any]{
+  Read a syntax object from the given port.
+}
+
+@defproc[(honu-lexer (port port?)) (list position-token?)]{
+  Tokenize a port into a stream of honu tokens.
+}
+
+@section{Parsing}
+
+Honu is parsed using an algorithm based primarily on operator precedence. The
+main focus of the operator precedence algorithm is to support infix operators.
+In short, the algorithm operates in the following way
+
+@itemlist[
+@item{1. parse an @tech{expression}}
+@item{2. check for a binary operator. if one is found then continue to step 3
+otherwise return the expression from step 1 immediately.}
+@item{3. parse another @tech{expression}}
+@item{4. check for a binary operator. if one is found then check if its precedence is
+higher than the operator found in step 2, and if so then continue parsing from
+step 3. if the precedence is lower or an operator is not found then build an
+infix expression from the left hand expression from step 1, the binary operator
+in step 2, and the right hand expression in step 3.}
+]
+
+Parsing will maintain the following registers
+@itemlist[
+  @item{@bold{left} - a function that takes the right hand side of an expression and
+  returns the infix expression by combining the left hand side and the
+  operator.}
+  @item{@bold{current} - the current right hand side}
+  @item{@bold{precedence} - represents the current precedence level}
+  @item{@bold{stream} - stream of tokens to parse}
+]
+
+This algorithm is illustrated with the following example. Consider the raw
+stream of tokens
+
+@codeblock|{ 1 + 2 * 3 - 9 }|
+
+@tabular[
+  @list[
+    @list["left" (hspace 1) "current" (hspace 1) "precedence" (hspace 1) "stream"]
+    @list[@racket[(lambda (x) x)] (hspace 1)
+          @racket[#f] (hspace 1)
+          @racket[0] (hspace 1)
+          @codeblock|{1 + 2 * 3 - 9}|]
+    @list[@racket[(lambda (x) x)] (hspace 1)
+          @racket[1] (hspace 1)
+          @racket[0] (hspace 1)
+          @codeblock|{+ 2 * 3 - 9}|]
+    @list[@racket[(lambda (x) #'(+ 1 x))] (hspace 1)
+          @racket[#f] (hspace 1)
+          @racket[1] (hspace 1)
+          @codeblock|{2 * 3 - 9}|]
+    @list[@racket[(lambda (x) #'(+ 1 x))] (hspace 1)
+          @racket[2] (hspace 1)
+          @racket[1] (hspace 1)
+          @codeblock|{* 3 - 9}|]
+    @list[@racket[(lambda (x) (left #'(* 2 x)))] (hspace 1)
+          @racket[2] (hspace 1)
+          @racket[2] (hspace 1)
+          @codeblock|{3 - 9}|]
+    @list[@racket[(lambda (x) (left #'(* 2 x)))] (hspace 1)
+          @racket[3] (hspace 1)
+          @racket[2] (hspace 1)
+          @codeblock|{- 9}|]
+    @list[@racket[(lambda (x) #'(- (+ 1 (* 2 3)) x))] (hspace 1)
+          @racket[#f] (hspace 1)
+          @racket[1] (hspace 1)
+          @codeblock|{9}|]
+    @list[@racket[(lambda (x) #'(- (+ 1 (* 2 3)) x))] (hspace 1)
+          @racket[9] (hspace 1)
+          @racket[1] (hspace 1)
+          @codeblock|{}|]
+  ]
+]
+
+When the stream of tokens is empty the @bold{current} register is passed as an
+argument to the @bold{left} function which ultimately produces the expression
+@codeblock|{(- (+ 1 (* 2 3)) 9)}|
+
+In this example @racket[+] and @racket[-] both have a precedence of 1 while
+@racket[*] has a precedence of 2. Currently, precedences can be any number that
+can be compared with @racket[<=].
+
+The example takes some liberties with respect to how the actual implementation
+works. In particular the binary operators are syntax transformers that accept
+the left and right hand expressions as parameters and return new syntax objects.
+Also when the @racket[*] operator is parsed the @bold{left} function for
+@racket[+] is nested inside the new function for @racket[*].
+
+An @deftech{expression} can be one of the following
+@itemlist[
+  @item{@bold{datum} - number, string, or symbol. @codeblock|{5}|}
+  @item{@bold{macro} - a symbol bound to a syntax transformer.
+  @codeblock|{cond x = 5: true, else: false}|}
+  @item{@bold{stop} - a symbol which immediately ends the current expression.
+  these are currently , ; :}
+  @item{@bold{lambda expression} - an identifier followed by @racket[(id ...)]
+  followed by a block of code in braces. @codeblock|{add(x, y){ x + y }}|}
+  @item{@bold{function application} - an expression followed by @racket[(arg
+  ...)]. @codeblock|{f(2, 2)}|}
+  @item{@bold{list comprehension} - @codeblock|{[x + 1: x <- [1, 2, 3]]}|}
+  @item{@bold{block of code} - a series of expressions wrapped in braces.}
+  @item{@bold{expression grouping} - any expression inside a set of parenthesis
+  @codeblock|{(1 + 1) * 2}|}
+]
+
+@section{Macros}
+@section{Language}
+@section{Examples}