update brag docs

2016-09-26 06:54:41 -07:00 · 2016-09-26 06:54:41 -07:00 · 7712ab31d4
commit 7712ab31d4
parent c8899a603b
1 changed files with 91 additions and 124 deletions
--- a/brag/brag/brag.scrbl
+++ b/brag/brag/brag.scrbl
@ -27,7 +27,7 @@


@title{brag: the Beautiful Racket AST Generator}
-@author["Danny Yoo" "Matthew Butterick"]
+@author["Danny Yoo (95%)" "Matthew Butterick (5%)"]

@defmodulelang[brag]

@ -38,21 +38,17 @@
                          racket/list
                          racket/match))

-Salutations!  Let's consider the following scenario: say that we're given the
+Suppose we're given the
 following string:
@racketblock["(radiant (humble))"]


-@margin-note{(... and pretend that we don't already know about the built-in
-@racket[read] function.)}  How do we go about turning this kind of string into a
-structured value?  That is, how would we @emph{parse} it?
+How would we turn this string into a structured value?  That is, how would we @emph{parse} it? (Let's also suppose we've never heard of @racket[read].)

-We need to first consider the shape of the things we'd like to parse.  The
-string above looks like a deeply nested list of words.  How might we describe
-this formally?  A convenient notation to describe the shape of these things is
-@link["http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form"]{Backus-Naur
-Form} (BNF).  So let's try to notate the structure of nested word lists in BNF.
+First, we need to consider the structure of the things we'd like to parse. The
+string above looks like a nested list of words. Good start.

+Second, how might we describe this formally — meaning, in a way that a computer could understand? A common notation to describe the structure of these things is @link["http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form"]{Backus-Naur Form} (BNF). So let's try to notate the structure of nested word lists in BNF.

@nested[#:style 'code-inset]{
@verbatim{
@ -60,12 +56,7 @@ nested-word-list: WORD
                | LEFT-PAREN nested-word-list* RIGHT-PAREN
 }}

-What we intend by this notation is this: @racket[nested-word-list] is either an
-atomic @racket[WORD], or a parenthesized list of any number of
-@racket[nested-word-list]s.  We use the character @litchar{*} to represent zero
-or more repetitions of the previous thing, and we treat the uppercased
-@racket[LEFT-PAREN], @racket[RIGHT-PAREN], and @racket[WORD] as placeholders
-for atomic @emph{tokens}.
+What we intend by this notation is this: @racket[nested-word-list] is either a @racket[WORD], or a parenthesized list of @racket[nested-word-list]s. We use the character @litchar{*} to represent zero or more repetitions of the previous thing. We treat the uppercased @racket[LEFT-PAREN], @racket[RIGHT-PAREN], and @racket[WORD] as placeholders for @emph{tokens} (a @deftech{token} being the smallest meaningful item in the parsed string):

 Here are a few examples of tokens:
@interaction[#:eval my-eval
@ -74,15 +65,11 @@ Here are a few examples of tokens:
 (token 'WORD "crunchy" #:span 7)
 (token 'RIGHT-PAREN)]

+This BNF description is also known as a @deftech{grammar}. Just as it does in a natural language like English or French, a grammar describes something in terms of what elements can fit where.

-Have we made progress?  At this point, we only have a BNF description in hand,
-but we're still missing a @emph{parser}, something to take that description and
-use it to make structures out of a sequence of tokens.
+Have we made progress?  We have a valid grammar. But we're still missing a @emph{parser}: a function that can use that description to make structures out of a sequence of tokens.

-
-It's clear that we don't yet have a program because there's no @litchar{#lang}
-line.  We should add one.  Put @litchar{#lang brag} at the top of the BNF
-description, and save it as a file called @filepath{nested-word-list.rkt}.
+Meanwhile, it's clear that we don't yet have a valid program because there's no @litchar{#lang} line. Let's add one: put @litchar{#lang brag} at the top of the grammar, and save it as a file called @filepath{nested-word-list.rkt}.

@filebox["nested-word-list.rkt"]{
@verbatim{
@ -91,7 +78,7 @@ nested-word-list: WORD
                | LEFT-PAREN nested-word-list* RIGHT-PAREN
 }}

-Now it is a proper program.  But what does it do?
+Now it's a proper program. But what does it do?

@interaction[#:eval my-eval
@eval:alts[(require "nested-word-list.rkt") (void)]
@ -99,7 +86,7 @@ parse
 ]

 It gives us a @racket[parse] function. Let's investigate what @racket[parse]
-does for us.  What happens if we pass it a sequence of tokens?
+does. What happens if we pass it a sequence of tokens?

@interaction[#:eval my-eval
             (define a-parsed-value
@ -111,15 +98,16 @@ does for us.  What happens if we pass it a sequence of tokens?
                            (token 'RIGHT-PAREN ")"))))
             a-parsed-value]

-Wait... that looks suspiciously like a syntax object!
+Those who have messed around with macros will recognize this as a @tech[#:doc '(lib "guide/stx-obj.html")]{syntax object}.
+
@interaction[#:eval my-eval
 (syntax->datum a-parsed-value)
 ]

-
 That's @racket[(some [pig])], essentially.

-What happens if we pass it a more substantial source of tokens?
+What happens if we pass our @racket[parse] function a bigger source of tokens?
+
@interaction[#:eval my-eval
@code:comment{tokenize: string -> (sequenceof token-struct?)}
@code:comment{Generate tokens from a string:}
@ -143,39 +131,35 @@ Welcome to @tt{brag}.



-
-
-
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

@section{Introduction}

-@tt{brag} is a parsing framework for Racket with the design goal to be easy
-to use.  It includes the following features:
+@tt{brag} is a parsing framework designed to be easy
+to use:
+
@itemize[

-@item{It provides a @litchar{#lang} for writing extended BNF grammars.
+@item{It provides a @litchar{#lang} for writing BNF grammars.
 A module written in @litchar{#lang brag} automatically generates a
 parser. The output of this parser tries to follow
@link["http://en.wikipedia.org/wiki/How_to_Design_Programs"]{HTDP}
-doctrine; the structure of the grammar informs the structure of the
+guidelines. The structure of the grammar informs the structure of the
 Racket syntax objects it generates.}

@item{The language uses a few conventions to simplify the expression of
-grammars.  The first rule in the grammar is automatically assumed to be the
-starting production.  Identifiers in uppercase are assumed to represent
-terminal tokens, and are otherwise the names of nonterminals.}
+grammars. The first rule in the grammar is assumed to be the
+starting production. Identifiers in @tt{UPPERCASE} are treated as
+terminal tokens. All other identifiers are treated as nonterminals.}

-@item{Tokenizers can be developed completely independently of parsers.
+@item{Tokenizers can be developed independently of parsers.
@tt{brag} takes a liberal view on tokens: they can be strings,
-symbols, or instances constructed with @racket[token].  Furthermore,
-tokens can optionally provide location: if tokens provide location, the
-generated syntax objects will as well.}
+symbols, or instances constructed with @racket[token]. Tokens can optionally provide source location, in which case a syntax object generated by the parser will too.}

-@item{The underlying parser should be able to handle ambiguous grammars.}
+@item{The parser can usually handle ambiguous grammars.}

-@item{It should integrate with the rest of the Racket
+@item{It integrates with the rest of the Racket
@link["http://docs.racket-lang.org/guide/languages.html"]{language toolchain}.}

 ]
@ -184,11 +168,12 @@ generated syntax objects will as well.}

@subsection{Example: a small DSL for ASCII diagrams}

-@margin-note{This is a
-@link["http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket"]{restatement
-of a question on Stack Overflow}.}  To motivate @tt{brag}'s design, let's look
-at the following toy problem: we'd like to define a language for
-drawing simple ASCII diagrams.  We'd like to be able write something like this:
+@margin-note{This example is
+@link["http://stackoverflow.com/questions/12345647/rewrite-this-script-by-designing-an-interpreter-in-racket"]{derived from a question}  on Stack Overflow.}  
+
+To understand @tt{brag}'s design, let's look
+at a toy problem. We'd like to define a language for
+drawing simple ASCII diagrams. So if we write something like this:

@nested[#:style 'inset]{
@verbatim|{
@ -197,7 +182,7 @@ drawing simple ASCII diagrams.  We'd like to be able write something like this:
 3 9 X;
 }|}

-whose interpretation should generate the following picture:
+It should generate the following picture:

@nested[#:style 'inset]{
@verbatim|{
@ -218,10 +203,11 @@ XXXXXXXXX


@subsection{Syntax and semantics}
-We're being very fast-and-loose with what we mean by the program above, so
-let's try to nail down some meanings.  Each line of the program has a semicolon
-at the end, and describes the output of several @emph{rows} of the line
-drawing.  Let's look at two of the lines in the example:
+
+We're being somewhat casual with what we mean by the program above, so
+let's try to nail down some meanings. 
+
+Each line of the program has a semicolon at the end, and describes the output of several @emph{rows} of the line drawing. Let's look at two of the lines in the example:

@itemize[
@item{@litchar{3 9 X;}: ``Repeat the following 3 times: print @racket["X"] nine times, followed by
@ -232,21 +218,14 @@ followed by @racket["X"] three times, followed by @racket[" "] three times, foll
 ]

 Then each line consists of a @emph{repeat} number, followed by pairs of
-(number, character) @emph{chunks}.  We will
-assume here that the intent of the lowercased character @litchar{b} is to
-represent the printing of a 1-character whitespace @racket[" "], and for other
-uppercase letters to represent the printing of themselves.
+(number, character) @emph{chunks}. We'll assume here that the intent of the lowercased character @litchar{b} is to represent the printing of a 1-character whitespace @racket[" "], and for other uppercase letters to represent the printing of themselves.

-Once we have a better idea of the pieces of each line, we have a better chance
-to capture that meaning in a formal notation.  Once we have each instruction in
-a structured format, we should be able to interpret it with a straighforward
-case analysis.
-
-Here is a first pass at expressing the structure of these line-drawing
-programs.
+By understanding the pieces of each line, we can more easily capture that meaning in a grammar. Once we have each instruction of our ASCII DSL in a structured format, we should be able to parse it.

+Here's a first pass at expressing the structure of these line-drawing programs.

@subsection{Parsing the concrete syntax}
+
@filebox["simple-line-drawing.rkt"]{
@verbatim|{
 #lang brag
@ -258,7 +237,7 @@ chunk: INTEGER STRING
 }

@margin-note{@secref{brag-syntax} describes @tt{brag}'s syntax in more detail.}
-We write a @tt{brag} program as an extended BNF grammar, where patterns can be:
+We write a @tt{brag} program as an BNF grammar, where patterns can be:
@itemize[
@item{the names of other rules (e.g. @racket[chunk])}
@item{literal and symbolic token names (e.g. @racket[";"], @racket[INTEGER])}
@ -282,17 +261,11 @@ Let's exercise this function:
 (syntax->datum stx)
 ]

-Tokens can either be: plain strings, symbols, or instances produced by the
-@racket[token] function.  (Plus a few more special cases, one in which we'll describe in a
-moment.)
+A @emph{token} is the smallest meaningful element of a source program. Tokens can be  strings, symbols, or instances of the @racket[token] data structure. (Plus a few other special cases, which we'll discuss later.) Usually, a token holds a single character from the source program. But sometimes it makes sense to package a sequence of characters into a single token, if the sequence has an indivisible meaning.

-Preferably, we want to attach each token with auxiliary source location
-information.  The more source location we can provide, the better, as the
-syntax objects produced by @racket[parse] will incorporate them.
+If possible, we also want to attach source location information to each token. Why? Because this informatino will be incorporated into the syntax objects produced by @racket[parse].

-Let's write a helper function, a @emph{lexer}, to help us construct tokens more
-easily.  The Racket standard library comes with a module called
-@racketmodname[parser-tools/lex] which can help us write a position-sensitive
+A parser often works in conjunction with a helper function called a @emph{lexer} that converts the raw code of the source program into tokens. The @racketmodname[parser-tools/lex] library can help us write a position-sensitive
 tokenizer:

@interaction[#:eval my-eval
@ -328,24 +301,19 @@ tokenizer:
 ]


-There are a few things to note from this lexer example: 
+Note also from this lexer example: 
+
@itemize[

-@item{The @racket[parse] function can consume either sequences of tokens, or a
-function that produces tokens.  Both of these are considered sources of
-tokens.}
+@item{@racket[parse] accepts as input either a sequence of tokens, or a
+function that produces tokens (which @racket[parse] will call repeatedly to get the next token).}

-@item{As a special case for acceptable tokens, a token can also be an instance
-of the @racket[position-token] structure of @racketmodname[parser-tools/lex],
-in which case the token will try to derive its position from that of the
-position-token.}
+@item{As an alternative to the basic @racket[token] structure, a token can also be an instance of the @racket[position-token] structure (also found in @racketmodname[parser-tools/lex]). In that case, the token will try to derive its position from that of the position-token.}

-@item{The @racket[parse] function will stop reading from a token source if any
-token is @racket[void].}
+@item{@racket[parse] will stop if it gets @racket[void] (or @racket['eof]) as a token.}

-@item{The @racket[parse] function will skip over any token with the
-@racket[#:skip?]  attribute. Elements such as whitespace and comments will
-often have @racket[#:skip?] set to @racket[#t].}
+@item{@racket[parse] will skip any token that has
+@racket[#:skip?] attribute set to @racket[#t]. For instance, tokens representing comments often use @racket[#:skip?].}

 ]

@ -353,16 +321,16 @@ often have @racket[#:skip?] set to @racket[#t].}
@subsection{From parsing to interpretation}

 We now have a parser for programs written in this simple-line-drawing language.
-Our parser will give us back syntax objects:
+Our parser will return syntax objects:
+
@interaction[#:eval my-eval
 (define parsed-program
  (parse (tokenize (open-input-string "3 9 X; 6 3 b 3 X 3 b; 3 9 X;"))))
 (syntax->datum parsed-program)
 ]

-Moreover, we know that these syntax objects have a regular, predictable
-structure.  Their structure follows the grammar, so we know we'll be looking at
-values of the form:
+Better still, these syntax objects will have a predictable
+structure that follows the grammar:

@racketblock[
    (drawing (rows (repeat <number>)
@ -374,10 +342,9 @@ where @racket[drawing], @racket[rows], @racket[repeat], and @racket[chunk]
 should be treated literally, and everything else will be numbers or strings.


-Still, these syntax object values are just inert structures.  How do we
-interpret them, and make them @emph{print}?  We did claim at the beginning of
-this section that these syntax objects should be fairly easy to case-analyze
-and interpret, so let's do it.
+Still, these syntax-object values are just inert structures. How do we
+interpret them, and make them @emph{print}?  We claimed at the beginning of
+this section that these syntax objects should be easy to interpret. So let's do it.

@margin-note{This is a very quick-and-dirty treatment of @racket[syntax-parse].
 See the @racketmodname[syntax/parse] documentation for a gentler guide to its
@ -862,7 +829,7 @@ source.

 If @racket[parse] succeeds, it will return a structured syntax object. The
 structure of the syntax object follows the overall structure of the rules in
-the BNF.  For each rule @racket[r] and its associated pattern @racket[p],
+the BNF grammar. For each rule @racket[r] and its associated pattern @racket[p],
@racket[parse] generates a syntax object @racket[#'(r p-value)] where
@racket[p-value]'s structure follows a case analysis on @racket[p]: