From 34689f171195bb76d7c47375bcd2e51c7fc8ea62 Mon Sep 17 00:00:00 2001 From: Jon Rafkind Date: Tue, 20 Sep 2011 17:37:37 -0600 Subject: [PATCH] [honu] checkpoint for docs --- .../honu/core/private/honu-typed-scheme.rkt | 3 +- collects/scribblings/honu/honu.scrbl | 373 ++++++++---------- 2 files changed, 160 insertions(+), 216 deletions(-) diff --git a/collects/honu/core/private/honu-typed-scheme.rkt b/collects/honu/core/private/honu-typed-scheme.rkt index 173ff2515c..9a6574c4aa 100644 --- a/collects/honu/core/private/honu-typed-scheme.rkt +++ b/collects/honu/core/private/honu-typed-scheme.rkt @@ -463,8 +463,7 @@ Then, in the pattern above for 'if', 'then' would be bound to the following synt (syntax->datum unparsed)) ;; if parsed is #f then we don't want to expand to anything that will print ;; so use an empty form, begin, `parsed' could be #f becuase there was no expression - ;; in the input such as parsing just ";". hygiene should ensure that this variable - ;; will not collide with anything else + ;; in the input such as parsing just ";". (with-syntax ([parsed (if (not parsed) #'(begin) parsed)] [(unparsed ...) unparsed]) (if (null? (syntax->datum #'(unparsed ...))) diff --git a/collects/scribblings/honu/honu.scrbl b/collects/scribblings/honu/honu.scrbl index 5822bd0664..b47176a332 100644 --- a/collects/scribblings/honu/honu.scrbl +++ b/collects/scribblings/honu/honu.scrbl @@ -1,236 +1,181 @@ #lang scribble/doc @(require scribble/manual scribble/bnf - (for-label scheme)) + honu/core/read + (for-label honu/core/read)) @(define lcomma (litchar ", ")) @title{Honu} -@defterm{Honu} is a family of languages built on top of Racket. Honu -syntax resembles Java. Like Racket, however, Honu has no fixed syntax, -because Honu supports extensibility through macros and a base syntax -of @as-index{H-expressions}, which are analogous to S-expressions. - -The Honu language currently exists only as a undocumented -prototype. Racket's parsing and printing of H-expressions is -independent of the Honu language, however, so it is documented here. +@defterm{Honu} is a language with Java-like syntax built on top of Racket. +Honu's main goal is to support syntactic abstraction mechanisms similar to +Racket. Currently, Honu is a prototype and may change without notice. @table-of-contents[] @; ---------------------------------------------------------------------- -@section{H-expressions} +@defmodulelang[honu] -The Racket reader incorporates an H-expression reader, and Racket's -printer also supports printing values in Honu syntax. The reader can -be put into H-expression mode either by including @litchar{#hx} in the -input stream, or by calling @racket[read-honu] or -@racket[read-honu-syntax] instead of @racket[read] or -@racket[read-syntax]. Similarly, @racket[print] (or, more precisely, -the default print handler) produces Honu output when the -@racket[print-honu] parameter is set to @racket[#t]. +@section{Get started} +To use Honu in a module, write the following line at the top of the file. -When the reader encounters @litchar{#hx}, it reads a single -H-expression, and it produces an S-expression that encodes the -H-expression. Except for atomic H-expressions, evaluating this -S-expression as Racket is unlikely to succeed. In other words, -H-expressions are not intended as a replacement for S-expressions to -represent Racket code. +@racketmod[honu] -Honu syntax is normally used via @litchar{#lang honu}, which reads -H-expressions repeatedly until an end-of-file is encountered, and -processes the result as a module in the Honu language. - -Ignoring whitespace, an H-expression is either - -@itemize[ - - @item{a number (see @secref["honu:numbers"]);} - - @item{an identifier (see @secref["honu:identifiers"]);} - - @item{a string (see @secref["honu:strings"]);} - - @item{a character (see @secref["honu:chars"]);} - - @item{a sequence of H-expressions between parentheses (see @secref["honu:parens"]);} - - @item{a sequence of H-expressions between square brackets (see @secref["honu:parens"]);} - - @item{a sequence of H-expressions between curly braces (see @secref["honu:parens"]);} - - @item{a comment followed by an H-expression (see @secref["honu:comments"]);} - - @item{@litchar{#;} followed by two H-expressions (see @secref["honu:comments"]);} - - @item{@litchar{#hx} followed by an H-expression;} - - @item{@litchar{#sx} followed by an S-expression (see @secref[#:doc -'(lib "scribblings/reference/reference.scrbl") "reader"]).} - -] - -Within a sequence of H-expressions, a sub-sequence between angle -brackets is represented specially (see @secref["honu:parens"]). - -Whitespace for H-expressions is as in Racket: any character for which -@racket[char-whitespace?] returns true counts as a whitespace. - -@; ---------------------------------------------------------------------- - -@subsection[#:tag "honu:numbers"]{Numbers} - -The syntax for Honu numbers is the same as for Java. The S-expression -encoding of a particular H-expression number is the obvious Racket -number. - -@; ---------------------------------------------------------------------- - -@subsection[#:tag "honu:identifiers"]{Identifiers} - -The syntax for Honu identifiers is the union of Java identifiers plus -@litchar{;}, @litchar{,}, and a set of operator identifiers. An -@defterm{operator identifier} is any combination of the following -characters: - -@t{ - @hspace[2] @litchar{+} @litchar{-} @litchar{=} @litchar{?} - @litchar{:} @litchar{<} @litchar{>} @litchar{.} @litchar{!} @litchar{%} - @litchar{^} @litchar{&} @litchar{*} @litchar{/} @litchar{~} @litchar{|} +You can use Honu at the REPL on the command line by invoking racket like so +@verbatim{ +racket -Iq honu } -The S-expression encoding of an H-expression identifier is the obvious -Racket symbol. +@section{Reader} -Input is parsed to form maximally long identifiers. For example, the -input @litchar{int->int;} is parsed as four H-expressions represented -by symbols: @racket['int], @racket['->], @racket['int], and -@racket['|;|]. - -@; ---------------------------------------------------------------------- - -@subsection[#:tag "honu:strings"]{Strings} - -The syntax for an H-expression string is exactly the same as for an -S-expression string, and an H-expression string is represented by the -obvious Racket string. - -@; ---------------------------------------------------------------------- - -@subsection[#:tag "honu:chars"]{Characters} - -The syntax for an H-expression character is the same as for an -H-expression string that has a single content character, except that a -@litchar{'} surrounds the character instead of @litchar{"}. The -S-expression representation of an H-expression character is the -obvious Racket character. - -@; ---------------------------------------------------------------------- - -@subsection[#:tag "honu:parens"]{Parentheses, Brackets, and Braces} - -A H-expression between @litchar{(} and @litchar{)}, @litchar{[} and -@litchar{]}, or @litchar["{"] and @litchar["}"] is represented by a -Racket list. The first element of the list is @racket['#%parens] for a -@litchar{(}...@litchar{)} sequence, @racket['#%brackets] for a -@litchar{[}...@litchar{]} sequence, or @racket['#%braces] for a -@litchar["{"]...@litchar["}"] sequence. The remaining elements are the -Racket representations for the grouped H-expressions in order. - -In an H-expression sequence, when a @litchar{<} is followed by a -@litchar{>}, and when nothing between the @litchar{<} and @litchar{>} -is an immediate symbol containing a @litchar{=}, @litchar{&}, or -@litchar{|}, then the sub-sequence is represented by a Racket list -that starts with @racket['#%angles] and continues with the elements of -the sub-sequence between the @litchar{<} and @litchar{>} -(exclusive). This representation is applied recursively, so that angle -brackets can be nested. - -An angle-bracketed sequence by itself is not a single H-expression, -since the @litchar{<} by itself is a single H-expression; the -angle-bracket conversion is performed only when representing sequences -of H-expressions. - -Symbols with a @litchar{=}, @litchar{&}, or @litchar{|} prevent -angle-bracket formation because they correspond to operators that -normally have lower or equal precedence compared to less-than and -greater-than operators. - -@; ---------------------------------------------------------------------- - -@subsection[#:tag "honu:comments"]{Comments} - -An H-expression comment starts with either @litchar{//} or -@litchar{/*}. In the former case, the comment runs until a linefeed or -return. In the second case, the comment runs until @litchar{*/}, but -@litchar{/*}...@litchar{*/} comments can be nested. Comments are -treated like whitespace. - -A @litchar{#;} starts an H-expression comment, as in S-expressions. It -is followed by an H-expression to be treated as whitespace. Note that -@litchar{#;} is equivalent to @litchar{#sx#;#hx}. - -@; ---------------------------------------------------------------------- - -@subsection{Honu Output Printing} - -Some Racket values have a standard H-expression representation. For -values with no H-expression representation but with a -@racket[read]able S-expression form, the Racket printer produces an -S-expression prefixed with @litchar{#sx}. For values with neither an -H-expression form nor a @racket[read]able S-expression form, then -printer produces output of the form @litchar{#<}...@litchar{>}, as in -Racket mode. The @racket[print-honu] parameter controls whether -Racket's printer produces Racket or Honu output. - -The values with H-expression forms are as follows: +@subsection{Tokens} +The Honu reader, @racket[honu-read], will tokenize the input stream according to +the following regular expressions. @itemize[ - - @item{Every real number has an H-expression form, although the - representation for an exact, non-integer rational number is - actually three H-expressions, where the middle H-expression is - @racket[/].} - - @item{Every character string is represented the same in H-expression - form as its S-expression form.} - - @item{Every character is represented like a single-character string, - but (1) using a @litchar{'} as the delimiter instead of - @litchar{"}, and (2) protecting a @litchar{'} character content - with a @litchar{\} instead of protecting @litchar{"} character - content.} - - @item{A list is represented with the H-expression sequence - @litchar{list(}@nonterm{v}@|lcomma|...@litchar{)}, - where each @nonterm{v} is the representation of each element of - the list.} - - @item{A pair that is not a list is represented with the H-expression - sequence - @litchar{cons(}@nonterm{v1}@|lcomma|@nonterm{v2}@litchar{)}, - where @nonterm{v1} and @nonterm{v2} are the representations of - the pair elements.} - - @item{A vector's representation depends on the value of the - @racket[print-vector-length] parameter. If it is @racket[#f], - the vector is represented with the H-expression sequence - @litchar{vectorN(}@nonterm{v}@|lcomma|...@litchar{)}, where - each @nonterm{v} is the representation of each element of the - vector. If @racket[print-vector-length] is set to @racket[#t], - the vector is represented with the H-expression sequence - @litchar{vectorN(}@nonterm{n}@|lcomma|@nonterm{v}@|lcomma|...@litchar{)}, - where @nonterm{n} is the length of the vector and each - @nonterm{v} is the representation of each element of the - vector, and multiple instances of the same value at the end of - the vector are represented by a single @nonterm{v}.} - - @item{The empty list is represented as the H-expression - @litchar{null}.} - - @item{True is represented as the H-expression @litchar{true}.} - - @item{False is represented as the H-expression @litchar{false}.} - + @item{Identifiers are [a-zA-Z_?][a-zA-Z_?0-9]*} + @item{Strings are "[^"]*"} + @item{Numbers are \d+(\.\d+)?} + @item{And the following tokens + = * / - ^ || | && <= >= <- < > ! + :: := : ; ` ' . , ( ) { } [ ]} ] + +@subsection{Structure} + +After tokenization a Honu program will be converted into a tree with minimal +structure. Enclosing tokens will be grouped into a single object represented as +an s-expression. Enclosing tokens are pairs of (), {}, and []. + +Consider the following stream of tokens + +@codeblock|{ +x ( 5 + 2 ) +}| + +This will be converted into +@codeblock|{ +(x (#%parens 5 + 2)) +}| + +{} will be converted to (#%braces ...) and [] will be conveted to (#%brackets +...) + +@defproc[(honu-read (port port?)) any]{ + Read an s-expression from the given port. +} + +@defproc[(honu-read-syntax (name any) (port port?)) any]{ + Read a syntax object from the given port. +} + +@defproc[(honu-lexer (port port?)) (list position-token?)]{ + Tokenize a port into a stream of honu tokens. +} + +@section{Parsing} + +Honu is parsed using an algorithm based primarily on operator precedence. The +main focus of the operator precedence algorithm is to support infix operators. +In short, the algorithm operates in the following way + +@itemlist[ +@item{1. parse an @tech{expression}} +@item{2. check for a binary operator. if one is found then continue to step 3 +otherwise return the expression from step 1 immediately.} +@item{3. parse another @tech{expression}} +@item{4. check for a binary operator. if one is found then check if its precedence is +higher than the operator found in step 2, and if so then continue parsing from +step 3. if the precedence is lower or an operator is not found then build an +infix expression from the left hand expression from step 1, the binary operator +in step 2, and the right hand expression in step 3.} +] + +Parsing will maintain the following registers +@itemlist[ + @item{@bold{left} - a function that takes the right hand side of an expression and + returns the infix expression by combining the left hand side and the + operator.} + @item{@bold{current} - the current right hand side} + @item{@bold{precedence} - represents the current precedence level} + @item{@bold{stream} - stream of tokens to parse} +] + +This algorithm is illustrated with the following example. Consider the raw +stream of tokens + +@codeblock|{ 1 + 2 * 3 - 9 }| + +@tabular[ + @list[ + @list["left" (hspace 1) "current" (hspace 1) "precedence" (hspace 1) "stream"] + @list[@racket[(lambda (x) x)] (hspace 1) + @racket[#f] (hspace 1) + @racket[0] (hspace 1) + @codeblock|{1 + 2 * 3 - 9}|] + @list[@racket[(lambda (x) x)] (hspace 1) + @racket[1] (hspace 1) + @racket[0] (hspace 1) + @codeblock|{+ 2 * 3 - 9}|] + @list[@racket[(lambda (x) #'(+ 1 x))] (hspace 1) + @racket[#f] (hspace 1) + @racket[1] (hspace 1) + @codeblock|{2 * 3 - 9}|] + @list[@racket[(lambda (x) #'(+ 1 x))] (hspace 1) + @racket[2] (hspace 1) + @racket[1] (hspace 1) + @codeblock|{* 3 - 9}|] + @list[@racket[(lambda (x) (left #'(* 2 x)))] (hspace 1) + @racket[2] (hspace 1) + @racket[2] (hspace 1) + @codeblock|{3 - 9}|] + @list[@racket[(lambda (x) (left #'(* 2 x)))] (hspace 1) + @racket[3] (hspace 1) + @racket[2] (hspace 1) + @codeblock|{- 9}|] + @list[@racket[(lambda (x) #'(- (+ 1 (* 2 3)) x))] (hspace 1) + @racket[#f] (hspace 1) + @racket[1] (hspace 1) + @codeblock|{9}|] + @list[@racket[(lambda (x) #'(- (+ 1 (* 2 3)) x))] (hspace 1) + @racket[9] (hspace 1) + @racket[1] (hspace 1) + @codeblock|{}|] + ] +] + +When the stream of tokens is empty the @bold{current} register is passed as an +argument to the @bold{left} function which ultimately produces the expression +@codeblock|{(- (+ 1 (* 2 3)) 9)}| + +In this example @racket[+] and @racket[-] both have a precedence of 1 while +@racket[*] has a precedence of 2. Currently, precedences can be any number that +can be compared with @racket[<=]. + +The example takes some liberties with respect to how the actual implementation +works. In particular the binary operators are syntax transformers that accept +the left and right hand expressions as parameters and return new syntax objects. +Also when the @racket[*] operator is parsed the @bold{left} function for +@racket[+] is nested inside the new function for @racket[*]. + +An @deftech{expression} can be one of the following +@itemlist[ + @item{@bold{datum} - number, string, or symbol. @codeblock|{5}|} + @item{@bold{macro} - a symbol bound to a syntax transformer. + @codeblock|{cond x = 5: true, else: false}|} + @item{@bold{stop} - a symbol which immediately ends the current expression. + these are currently , ; :} + @item{@bold{lambda expression} - an identifier followed by @racket[(id ...)] + followed by a block of code in braces. @codeblock|{add(x, y){ x + y }}|} + @item{@bold{function application} - an expression followed by @racket[(arg + ...)]. @codeblock|{f(2, 2)}|} + @item{@bold{list comprehension} - @codeblock|{[x + 1: x <- [1, 2, 3]]}|} + @item{@bold{block of code} - a series of expressions wrapped in braces.} + @item{@bold{expression grouping} - any expression inside a set of parenthesis + @codeblock|{(1 + 1) * 2}|} +] + +@section{Macros} +@section{Language} +@section{Examples}