svn: r9321
This commit is contained in:
Matthew Flatt 2008-04-15 16:46:43 +00:00
parent 14d7f4dc9d
commit 9ca65af282
4 changed files with 440 additions and 372 deletions

View File

@ -199,8 +199,12 @@ should be returned:
(extract-addresses "John Doe <doe@localhost>" 'all)
(extract-addresses "doe@localhost (Johnny Doe)" 'all)
(extract-addresses "doe@localhost" 'all)
(extract-addresses " \"Doe, John\" <doe@localhost>, jane"
'all)
(define r
(extract-addresses " \"John\" <doe@localhost>, jane"
'all))
(length r)
(car r)
(cadr r)
]}
]}

View File

@ -1,370 +0,0 @@
_XML_ Library
=============
Files: _xml.ss_
Signature: _xml^_
Basic XML Data Types
====================
Document:
This structure represents an XML document. The only useful part is
the document-element, which contains all the content. The rest of
of the structure contains DTD information, which isn't supported,
and processing-instructions.
Element:
Each pair of start/end tags and everything in between is an element.
It has the following pieces:
a name
attributes
contents including sub-elements
Xexpr:
S-expression representations of XML data.
The end of this document has more details.
Exceptions
==========
> (define-struct (exn:invalid-xexpr exn) (code))
Raised by validate-xexpr when passed an invalid Xexpr. Code contains an
invalid part of an Xexpr.
Functions
=========
> read-xml : [Input-port] -> Document
reads in an XML document from the given or current input port
XML documents contain exactly one element. It throws an xml-read:error
if there isn't any element or if there are more than one element.
Malformed xml is reported with source locations in
the form `l.c/o', where l, c, and o are the line number,
column number, and next port position, respectively as
returned by port-next-location.
Any non-characters other than eof read from the input-port will
appear in the document content. Such special values may only appear
where XML content may. See make-input-port for information
about creating ports that return non-character values.
> read-xml/element : [Input-port] -> Element
reads an XML element from the port. The next non-whitespace character
read must start an XML element. The input-port may contain other data
after the element.
> syntax:read-xml : [Input-port] -> Syntax
reads in an XML document and produces a syntax object version of
an xexpression.
> syntax:read-xml/element : [Input-port] -> Syntax
is just like read-xml/element except it produces a syntax version
of an xexpression
> write-xml : Document [Output-port] -> Void
writes a document to the given or current output port, currently
ignoring everything except the document's root element.
> write-xml/content : Content [Output-port] -> Void
writes a document's contents to the given or current output port
> display-xml : Document [Output-port] -> Void
just like write-xml, but newlines and indentation make the output more
readable, though less technically correct when white space is
significant.
> display-xml/content : Content [Output-port] -> Void
just like write-xml/content, but with indentation and newlines
> xml->xexpr : Content -> Xexpr
converts the interesting part of an XML document into an Xexpression
> xexpr->xml : Xexpr -> Content
converts an Xexpression into the interesting part of an XML document
> xexpr->string : Xexpression -> String
converts an Xexpression into a string representation
> eliminate-whitespace : (listof Symbol) (Bool -> Bool) -> Element -> Element
Some elements should not contain any text, only other tags, except they
often contain whitespace for formating purposes. Given a list of tag
names and the identity function, eliminate-whitespace produces a
function that filters out pcdata consisting solely of whitespace from
those elements and raises an error if any non-whitespace text appears.
Passing in the function called "not" instead of the identity function
filters all elements which are not named in the list. Using void
filters all elements regardless of the list.
> xexpr? : any -> Boolean
Is the given thing an Xexpr?
> validate-xexpr : any -> #t
If the given thing is an Xexpr, produce true. Otherwise, raise
_exn:invalid-xexpr_, with the message set to "Expected something, given
something-else", where "something" is what it expected and
"something-else" set to what it was really given; and the code set to
the part of the non-Xexpr that caused the exception.
> correct-xexpr? : any (-> a) (exn -> a) -> a
If the given thing is an Xexpr, produce an a. Otherwise call the
second function with an exn:invalid-xexpr. This second function
may inspect this structure and decide to return a "correct" value.
This is a method of extending the definition of an Xexpr and is used
by the web-server's Xexpr/callbacks. (See for an example.)
Parameters
==========
> empty-tag-shorthand : 'always | 'never | (listof Symbol)
Default: 'always
This determines if the output functions should use the <empty/>
tag notation instead of writing <empty></empty>. If the
argument is 'always, the abbreviated notation is always used,
and if the argument is 'never, the open/close pair is always
generated. If a list of symbols is provided, tags with names
in this list will be abbreviated. The first form is the
preferred XML notation. However, most browsers designed for
HTML will only properly render XHTML if the document uses a
mixture of the two formats. _html-empty-tags_ contains the W3
consortium's recommended list of XHTML tags that should use the
shorthand.
> collapse-whitespace : Bool
Default: #f
All consecutive whitespace is replaced by a single space.
CDATA sections are not affected.
> trim-whitespace : Bool
This parameter no longer exists. Consider using collapse-whitespace
and eliminate-whitespace instead.
> read-comments : Bool
Default: #f
Comments, by definition, should be ignored by programs. However,
interoperating with ad hoc extensions to other languages sometimes
requires processing comments anyway.
> xexpr-drop-empty-attributes : Bool
Default: #f
It's easier to write functions processing Xexpressions, if they always
have a list of attributes. On the other hand, it's less cumbersome to
write Xexpresssions by hand without empty lists of attributes
everywhere. Normally xml->xexpr leaves in empty attribute lists.
Setting this parameter to #t drops them, so further editing the
Xexpression by hand is less annoying.
Examples
========
Reading an Xexpression:
(xml->xexpr (document-element (read-xml input-port)))
Writing an Xexpression:
(empty-tag-shorthand html-empty-tags)
(write-xml/content (xexpr->xml `(html (head (title ,banner))
(body ((bgcolor "white"))
,text)))
output-port)
What this Library Doesn't Provide
=================================
Document Type Declaration (DTD) processing
Validation
Expanding user-defined entities
Reading user-defined entities in attributes
Unicode support
XML Datatype Details
====================
Note: Users of the XML collection don't need to know most of these definitions.
Note: Xexpr is the only important one to understand. Even then,
Processing-instructions may be ignored.
> Xexpr = String
| (cons Symbol (cons (listof (list Symbol String)) (listof Xexpr)))
| (cons Symbol (listof Xexpr)) ;; an element with no attributes
| Symbol ;; symbolic entities such as &nbsp;
| Number ;; numeric entities like &#20;
| Cdata
| Misc
> Document = (make-document Prolog Element (listof Processing-instruction))
(define-struct document (prolog element misc))
> Prolog = (make-prolog (listof Misc) Document-type [Misc ...])
(define-struct prolog (misc dtd misc2))
The last field is a (listof Misc), but the maker accepts optional
arguments instead for backwards compatibility.
> Document-type = #f | (make-document-type Symbol External-dtd #f)
(define-struct document-type (name external inlined))
> External-dtd = (make-external-dtd/public str str)
| (make-external-dtd/system str)
| #f
(define-struct external-dtd (system))
(define-struct (external-dtd/public external-dtd) (public))
(define-struct (external-dtd/system external-dtd) ())
> Element = (make-element Location Location
Symbol
(listof Attribute)
(listof Content))
(define-struct (element struct:source) (name attributes content))
> Attribute = (make-attribute Location Location Symbol String)
(define-struct (attribute struct:source) (name value))
> Content = Pcdata
| Element
| Entity
| Misc
> Misc = Comment
| Processing-instruction
> Pcdata = (make-pcdata Location Location String)
(define-struct (pcdata struct:source) (string))
> Cdata = (make-cdata Location Location String)
(define-struct (cdata struct:source) (string))
Note: The string of a cdata structure is assumed to be of the form
"<![CDATA[~a]]>" with proper quoting. If this is an incorrect
assumption, this library will generate invalid XML.
> Entity = (make-entity Location Location (U Nat Symbol))
(define-struct (entity struct:source) (text))
> Processing-instruction = (make-pi Location Location String String)
(define-struct (pi struct:source) (target-name instruction))
> Comment = (make-comment String)
(define-struct comment (text))
> Source = (make-source Location Location)
(define-struct source (start stop))
> Location = (make-location Nat Nat Nat) | Symbol
(define-struct location (line char offset))
Note: read-xml records location structures, while xexpr->xml inserts a
symbol. Other functions that must fabricate XML Locations
without prior source location should use a sensible "comment" symbol.
The PList Library
=================
Files: _plist.ss_
The PList library provides the ability to read and write xml documents which
conform to the "plist" DTD, used to store 'dictionaries' of string - value
associations. This format is typically used by Mac OS X --- the operating
system and its applications --- to store all kinds of data.
To Load
=======
(require (lib "plist.ss" "xml"))
Functions
=========
> read-plist : Port -> PLDict
reads a plist from a port, and produces a 'dict' x-expression
> write-plist : PLDict Port -> Void
writes a plist to the given port. May raise the exn:application:type
exception if the plist is badly formed.
Datatypes
=========
NB: all of these are subtypes of x-expression:
> PLDict = (list 'dict Assoc-pair ...)
> PLAssoc-pair = (list 'assoc-pair String PLValue)
> PLValue = String
| (list 'true)
| (list 'false)
| (list 'integer Integer)
| (list 'real Real)
| PLDict
| PLArray
> PLArray = (list 'array PLValue ...)
In fact, the PList DTD also defines Data and Date types, but we're ignoring
these for the moment.
Examples
========
Here's a sample PLDict:
(define my-dict
`(dict (assoc-pair "first-key"
"just a string
with some whitespace in it")
(assoc-pair "second-key"
(false))
(assoc-pair "third-key"
(dict ))
(assoc-pair "fourth-key"
(dict (assoc-pair "inner-key"
(real 3.432))))
(assoc-pair "fifth-key"
(array (integer 14)
"another string"
(true)))
(assoc-pair "sixth-key"
(array))))
Let's write it to disk:
(call-with-output-file "/Users/clements/tmp.plist"
(lambda (port)
(write-plist my-dict port))
'truncate)
Let's read it back from the disk:
(define new-dict
(call-with-input-file "/Users/clements/tmp.plist"
(lambda (port)
(read-plist port))))
Here's what that (hand-formatted) text file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist SYSTEM "file://localhost/System/Library/DTDs/PropertyList.dtd">
<plist version="0.9">
<dict>
<key>first-key</key>
<string>just a string
with some whitespace in it</string>
<key>second-key</key>
<false />
<key>third-key</key>
<dict />
<key>fourth-key</key>
<dict>
<key>inner-key</key>
<real>3.432</real>
</dict>
<key>fifth-key</key>
<array>
<integer>14</integer>
<string>another string</string>
<true />
</array>
<key>sixth-key</key>
<array />
</dict>
</plist>

View File

@ -5,3 +5,5 @@
;; bit) more information
(define tools '(("text-box-tool.ss")))
(define tool-names '("Text Box"))
(define scribblings '(("xml.scrbl" ())))

432
collects/xml/xml.scrbl Normal file
View File

@ -0,0 +1,432 @@
#lang scribble/doc
@(require scribble/manual
scribble/bnf
scribble/eval
(for-label scheme/base
scheme/contract
xml
xml/plist))
@(define xml-eval (make-base-eval))
@(define plist-eval (make-base-eval))
@interaction-eval[#:eval xml-eval (require xml)]
@interaction-eval[#:eval plist-eval (require xml/plist)]
@title{@bold{XML}: Parsing and Writing}
@defmodule[xml]
The @schememodname[xml] library provides functions for parsing and
generating XML. XML can be represented as an instance of the
@scheme[document] structure type, or as a kind of S-expression that is
called an @deftech{X-expression}.
The @schememodname[xml] library does not provides Document Type
Declaration (DTD) processing, validation, expanding user-defined
entities, or reading user-defined entities in attributes.
@; ----------------------------------------------------------------------
@section{Datatypes}
@defproc[(xexpr? [v any/c]) boolean?]{
Returns @scheme[#t] if @scheme[v] is a @tech{X-expression}, @scheme[#f] otherwise.
The following grammar describes expressions that create @tech{X-expressions}:
@schemegrammar[
#:literals (cons list)
xexpr string
(list symbol (list (list symbol string) ...) xexpr ...)
(cons symbol (list xexpr ...))
symbol
exact-nonnegative-integer
cdata
misc
]
A @scheme[_string] is literal data. When converted to an XML stream,
the characters of the data will be escaped as necessary.
A pair represents an element, optionally with attributes. Each
attribute's name is represented by a symbol, and its value is
represented by a string.
A @scheme[_symbol] represents a symbolic entity. For example,
@scheme['nbsp] represents @litchar{&nbsp;}.
An @scheme[_exact-nonnegative-integer] represents a numeric entity. For example,
@schemevalfont{#x20} represents @litchar{&#20;}.
A @scheme[_cdata] is an instance of the @scheme[cdata] structure type,
and a @scheme[_misc] is an instance of the @scheme[comment] or
@scheme[pcdata] structure types.}
@defstruct[document ([prolog prolog?]
[element element?]
[misc (or/c comment? pcdata?)])]{
Represents a document.}
@defstruct[prolog ([misc (listof (or/c comment? pcdata?))]
[dtd (or/c document-type false/c)]
[misc2 (listof (or/c comment? pcdata?))])]{
Represents a document prolog. The @scheme[make-prolog] binding is
unusual: it accepts two or more arguments, and all arguments after the
first two are collected into the @scheme[misc2] field.}
@defstruct[document-type ([name symbol?]
[external external-dtd?]
[inlined false/c])]{
Represents a document type.}
@deftogether[(
@defstruct[external-dtd ([system string?])]
@defstruct[(external-dtd/public external-dtd) ([public string?])]
@defstruct[(external-dtd/system external-dtd) ()]
)]{
Represents an externally defined DTD.}
@defstruct[(element source) ([name symbol?]
[attributes (listof attribute?)]
[content (listof content?)])]{
Represents an element.}
@defproc[(content? [v any/c]) boolean?]{
Returns @scheme[#t] if @scheme[v] is a @scheme[pcdata] instance,
@scheme[element] instance, an @scheme[entity] instance,
@scheme[comment], or @scheme[pcdata] instance.}
@defstruct[(attribute source) ([name symbol?] [value string?])]{
Represents an attribute within an element.}
@defstruct[(entity source) ([text (or/c symbol? exact-nonnegative-integer?)])]{
Represents a symbolic or numerical entity.}
@defstruct[(pcdata source) ([string string?])]{
Represents PCDATA content.}
@defstruct[(cdata source) ([string string?])]{
Represents CDATA content.
The @scheme[string] field is assumed to be of the form
@litchar{<![CDATA[}@nonterm{content}@litchar{]]>} with proper quoting
of @nonterm{content}. Otherwise, @scheme[write-xml] generates
incorrect output.}
@defstruct[(p-i source) ([target-name string?]
[instruction string?])]{
Represents a processing instruction.}
@defstruct[comment ([text string?])]{
Represents a comment.}
@defstruct[source ([start (or/c location? symbol?)]
[stop (or/c location? symbol?)])]{
Represents a source location. Other structure types extend @scheme[source].
When XML is generated from an input stream by @scheme[read-xml],
locations are represented by @scheme[location] instances. When XML
structures are generated by @scheme[xexpr->xml], then locations are
symbols.}
@defstruct[location ([line exact-nonnegative-integer?]
[char exact-nonnegative-integer?]
[offset exact-nonnegative-integer?])]{
Represents a location in an input stream.}
@defstruct[(exn:invalid-xexpr exn) ([code any/c])]{
Raised by @scheme[validate-xexpr] when passed an invalid
@tech{X-expression}. The @scheme[code] fields contains an invalid part
of the input to @scheme[validate-xexpr].}
@; ----------------------------------------------------------------------
@section{Reading and Writing XML}
@defproc[(read-xml [in input-port? (current-input-port)]) document?]{
Reads in an XML document from the given or current input port XML
documents contain exactly one element, raising @scheme[xml-read:error]
if the input stream has zero elements or more than one element.
Malformed xml is reported with source locations in the form
@nonterm{l}@litchar{.}@nonterm{c}@litchar{/}@nonterm{o}, where
@nonterm{l}, @nonterm{c}, and @nonterm{o} are the line number, column
number, and next port position, respectively as returned by
@scheme[port-next-location].
Any non-characters other than @scheme[eof] read from the input-port
appear in the document content. Such special values may appear only
where XML content may. See @scheme[make-input-port] for information
about creating ports that return non-character values.
@examples[
#:eval xml-eval
(xml->xexpr (document-element
(read-xml (open-input-string
"<doc><bold>hi</bold> there!</doc>"))))
]}
@defproc[(read-xml/element [in input-port? (current-input-port)]) element?]{
Reads a single XML element from the port. The next non-whitespace
character read must start an XML element, but the input port can
contain other data after the element.}
@defproc[(syntax:read-xml [in input-port? (current-input-port)]) syntax?]{
Reads in an XML document and produces a syntax object version (like
@scheme[read-syntax]) of an @tech{X-expression}.}
@defproc[(syntax:read-xml/element [in input-port? (current-input-port)]) syntax?]{
Like @scheme[syntax:real-xml], but it reads an XML element like
@scheme[read-xml/element].}
@defproc[(write-xml [doc document?] [out output-port? (current-output-port)])
void?]{
Writes a document to the given output port, currently ignoring
everything except the document's root element.}
@defproc[(write-xml/content [content content?] [out output-port? (current-output-port)])
void?]{
Writes document content to the given output port.}
@defproc[(display-xml [doc document?] [out output-port? (current-output-port)])
void?]{
Like @scheme[write-xml], but newlines and indentation make the output
more readable, though less technically correct when whitespace is
significant.}
@defproc[(display-xml/content [content content?] [out output-port? (current-output-port)])
void?]{
Like @scheme[write-xml/content], but with indentation and newlines
like @scheme[display-xml].}
@; ----------------------------------------------------------------------
@section{XML and X-expression Conversions}
@defproc[(xml->xexpr [content content?]) xexpr?]{
Converts document content into an @tech{X-expression}.}
@defproc[(xexpr->xml [xexpr xexpr?]) content?]{
Converts an @tech{X-expression} into XML content.}
@defproc[(xexpr->string [xexpr xexpr?]) string?]{
Converts an @tech{X-expression} into a string containing XML.}
@defproc[(eliminate-whitespace [tags (listof symbol?)]
[choose (boolean? . -> . any/c)]
[elem element?])
element?]{
Some elements should not contain any text, only other tags, except
they often contain whitespace for formating purposes. Given a list of
tag names as @scheme[tag]s and the identity function as
@scheme[choose], @scheme[eliminate-whitespace] produces a function
that filters out PCDATA consisting solely of whitespace from those
elements, and it raises an error if any non-whitespace text appears.
Passing in @scheme[not] as @scheme[choose] filters all elements which
are not named in the @scheme[tags] list. Using @scheme[void] as
@scheme[choose] filters all elements regardless of the @scheme[tags]
list.}
@defproc[(validate-xexpr [v any/c]) (one-of/c #t)]{
If @scheme[v] is an @tech{X-expression}, the result
@scheme[#t]. Otherwise, @scheme[exn:invalid-xexpr]s is raised, with
the a message of the form ``Expected @nonterm{something}, given
@nonterm{something-else}/'' The @scheme[code] field of the exception
is the part of @scheme[v] that caused the exception.}
@defproc[(correct-xexpr? [v any/c]
[success-k (-> any/c)]
[fail-k (exn:invalid-xexpr? . -> . any/c)])
any/c]{
Like @scheme[validate-expr], except that @scheme[success-k] is called
on each valid leaf, and @scheme[fail-k] is called on invalid leaves;
the @scheme[fail-k] may return a value instead of raising an exception
of otherwise escaping. Results from the leaves are combined with
@scheme[and] to arrive at the final result.}
@; ----------------------------------------------------------------------
@section{Parameters}
@defparam[empty-tag-shorthand shorthand (or/c (one-of/c 'always 'never) (listof symbol?))]{
A parameter that determines whether output functions should use the
@litchar{<}@nonterm{tag}@litchar{/>} tag notation instead of
@litchar{<}@nonterm{tag}@litchar{>}@litchar{</}@nonterm{tag}@litchar{>}
for elements that have no content.
When the parameter is set to @scheme['always], the abbreviated
notation is always used. When set of @scheme['never], the abbreviated
notation is never generated. when set to a list of symbols is
provided, tags with names in the list are abbreviated. The default is
@scheme['always].
The abbreviated form is the preferred XML notation. However, most
browsers designed for HTML will only properly render XHTML if the
document uses a mixture of the two formats. The
@scheme[html-empty-tags] constant contains the W3 consortium's
recommended list of XHTML tags that should use the shorthand.}
@defthing[html-empty-tags (listof symbol?)]{
See @scheme[empty-tag-shorthand].
@examples[
#:eval xml-eval
(parameterize ([empty-tag-shorthand html-empty-tags])
(write-xml/content (xexpr->xml `(html
(body ((bgcolor "red"))
"Hi!" (br) "Bye!")))))
]}
@defboolparam[collapse-whitespace collapse?]{
A parameter that controls whether consecutive whitespace is replaced
by a single space. CDATA sections are not affected. The default is
@scheme[#f].}
@defboolparam[read-comments preserve?]{
A parameter that determines whether comments are preserved or
discarded when reading XML. The default is @scheme[#f], which
discards comments.}
@defboolparam[xexpr-drop-empty-attributes drop?]{
Controls whether @scheme[xml->xexpr] drops or preserves attribute
sections for an element that has no attributes. The default is
@scheme[#f], which means that all generated @tech{X-expression}
elements have an attributes list (even if it's empty).}
@; ----------------------------------------------------------------------
@section{PList Library}
@defmodule[xml/plist]
The @schememodname[xml/plist] library provides the ability to read and
write XML documents that conform to the @defterm{plist} DTD, which is
used to store dictionaries of string--value associations. This format
is used by Mac OS X (both the operating system and its applications)
to store all kinds of data.
A @deftech{dictionary X-expression} is an @tech{X-expression} that
could be create by an expression matching the following
@scheme[_dict-expr] grammar:
@schemegrammar*[
#:literals (list)
[dict-expr (list 'dict assoc-pair ...)]
[assoc-pair (list 'assoc-pair string pl-value)]
[pl-value string
(list 'true)
(list 'false)
(list 'integer integer)
(list 'real real)
dict-expr
(list 'array pl-value ...)]
]
@defproc[(read-plist [in input-port?]) xexpr?]{
Reads a plist from a port, and produces a @tech{dictionary
X-expression}.}
@defproc[(write-plist [dict xexpr?] [out output-port?]) void?]{
Write a plist to the given port. If @scheme[dict] is not a
@tech{dictionary X-expression}, the @scheme[exn:fail:contract]
exception is raised.}
@examples[
#:eval plist-eval
(define my-dict
`(dict (assoc-pair "first-key"
"just a string with some whitespace")
(assoc-pair "second-key"
(false))
(assoc-pair "third-key"
(dict ))
(assoc-pair "fourth-key"
(dict (assoc-pair "inner-key"
(real 3.432))))
(assoc-pair "fifth-key"
(array (integer 14)
"another string"
(true)))
(assoc-pair "sixth-key"
(array))))
(define-values (in out) (make-pipe))
(write-plist my-dict out)
(close-output-port out)
(define new-dict (read-plist in))
(equal? my-dict new-dict)
]
The XML generated by @scheme[write-plist] in the above example
looks like the following, if re-formatted by:
@verbatim[#:indent 2]|{
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist SYSTEM
"file://localhost/System/Library/DTDs/PropertyList.dtd">
<plist version="0.9">
<dict>
<key>first-key</key>
<string>just a string with some whitespace</string>
<key>second-key</key>
<false />
<key>third-key</key>
<dict />
<key>fourth-key</key>
<dict>
<key>inner-key</key>
<real>3.432</real>
</dict>
<key>fifth-key</key>
<array>
<integer>14</integer>
<string>another string</string>
<true />
</array>
<key>sixth-key</key>
<array />
</dict>
</plist>
}|