#lang scribble/doc
@(require scribble/manual
scribble/bnf
scribble/eval
(for-label scheme/base
scheme/contract
scheme/list
xml
xml/plist))
@(define xml-eval (make-base-eval))
@(define plist-eval (make-base-eval))
@interaction-eval[#:eval xml-eval (require xml)]
@interaction-eval[#:eval xml-eval (require scheme/list)]
@interaction-eval[#:eval plist-eval (require xml/plist)]
@title{@bold{XML}: Parsing and Writing}
@author["Paul Graunke and Jay McCarthy"]
@defmodule[xml]
The @schememodname[xml] library provides functions for parsing and
generating XML. XML can be represented as an instance of the
@scheme[document] structure type, or as a kind of S-expression that is
called an @deftech{X-expression}.
The @schememodname[xml] library does not provide Document Type
Declaration (DTD) processing, including preservation of DTDs in read documents, or validation.
It also does not expand user-defined entities or read user-defined entities in attributes.
It does not interpret namespaces either.
@; ----------------------------------------------------------------------
@section{Datatypes}
@defstruct[location ([line exact-nonnegative-integer?]
[char exact-nonnegative-integer?]
[offset exact-nonnegative-integer?])]{
Represents a location in an input stream.}
@defthing[location/c contract?]{
Equivalent to @scheme[(or/c location? symbol? false/c)].
}
@defstruct[source ([start location/c]
[stop location/c])]{
Represents a source location. Other structure types extend @scheme[source].
When XML is generated from an input stream by @scheme[read-xml],
locations are represented by @scheme[location] instances. When XML
structures are generated by @scheme[xexpr->xml], then locations are
symbols.}
@deftogether[(
@defstruct[external-dtd ([system string?])]
@defstruct[(external-dtd/public external-dtd) ([public string?])]
@defstruct[(external-dtd/system external-dtd) ()]
)]{
Represents an externally defined DTD.}
@defstruct[document-type ([name symbol?]
[external external-dtd?]
[inlined false/c])]{
Represents a document type.}
@defstruct[comment ([text string?])]{
Represents a comment.}
@defstruct[(p-i source) ([target-name symbol?]
[instruction string?])]{
Represents a processing instruction.}
@defthing[misc/c contract?]{
Equivalent to @scheme[(or/c comment? p-i?)]
}
@defstruct[prolog ([misc (listof misc/c)]
[dtd (or/c document-type false/c)]
[misc2 (listof misc/c)])]{
Represents a document prolog.
}
@defstruct[document ([prolog prolog?]
[element element?]
[misc (listof misc/c)])]{
Represents a document.}
@defstruct[(element source) ([name symbol?]
[attributes (listof attribute?)]
[content (listof content/c)])]{
Represents an element.}
@defstruct[(attribute source) ([name symbol?] [value (or/c string? permissive/c)])]{
Represents an attribute within an element.}
@defthing[content/c contract?]{
Equivalent to @scheme[(or/c pcdata? element? entity? comment? cdata? p-i? permissive/c)].
}
@defthing[permissive/c contract?]{
If @scheme[(permissive-xexprs)] is @scheme[#t], then equivalent to @scheme[any/c], otherwise equivalent to @scheme[(make-none/c 'permissive)]}
@defstruct[(entity source) ([text (or/c symbol? exact-nonnegative-integer?)])]{
Represents a symbolic or numerical entity.}
@defstruct[(pcdata source) ([string string?])]{
Represents PCDATA content.}
@defstruct[(cdata source) ([string string?])]{
Represents CDATA content.
The @scheme[string] field is assumed to be of the form
@litchar{} with proper quoting
of @nonterm{content}. Otherwise, @scheme[write-xml] generates
incorrect output.}
@defstruct[(exn:invalid-xexpr exn:fail) ([code any/c])]{
Raised by @scheme[validate-xexpr] when passed an invalid
@tech{X-expression}. The @scheme[code] fields contains an invalid part
of the input to @scheme[validate-xexpr].}
@defstruct[(exn:xml exn:fail:read) ()]{
Raised by @scheme[read-xml] when an error in the XML input is found.
}
@defproc[(xexpr? [v any/c]) boolean?]{
Returns @scheme[#t] if @scheme[v] is a @tech{X-expression}, @scheme[#f] otherwise.
The following grammar describes expressions that create @tech{X-expressions}:
@schemegrammar[
#:literals (cons list)
xexpr string
(list symbol (list (list symbol string) ...) xexpr ...)
(cons symbol (list xexpr ...))
symbol
exact-nonnegative-integer
cdata
misc
]
A @scheme[_string] is literal data. When converted to an XML stream,
the characters of the data will be escaped as necessary.
A pair represents an element, optionally with attributes. Each
attribute's name is represented by a symbol, and its value is
represented by a string.
A @scheme[_symbol] represents a symbolic entity. For example,
@scheme['nbsp] represents @litchar{ }.
An @scheme[_exact-nonnegative-integer] represents a numeric entity. For example,
@schemevalfont{#x20} represents @litchar{}.
A @scheme[_cdata] is an instance of the @scheme[cdata] structure type,
and a @scheme[_misc] is an instance of the @scheme[comment] or
@scheme[p-i] structure types.}
@defthing[xexpr/c contract?]{
A contract that is like @scheme[xexpr?] except produces a better error message when the value is not an @tech{X-expression}.
}
@; ----------------------------------------------------------------------
@section{Reading and Writing XML}
@defproc[(read-xml [in input-port? (current-input-port)]) document?]{
Reads in an XML document from the given or current input port XML
documents contain exactly one element, raising @scheme[xml-read:error]
if the input stream has zero elements or more than one element.
Malformed xml is reported with source locations in the form
@nonterm{l}@litchar{.}@nonterm{c}@litchar{/}@nonterm{o}, where
@nonterm{l}, @nonterm{c}, and @nonterm{o} are the line number, column
number, and next port position, respectively as returned by
@scheme[port-next-location].
Any non-characters other than @scheme[eof] read from the input-port
appear in the document content. Such special values may appear only
where XML content may. See @scheme[make-input-port] for information
about creating ports that return non-character values.
@examples[
#:eval xml-eval
(xml->xexpr (document-element
(read-xml (open-input-string
"hi there!"))))
]}
@defproc[(read-xml/element [in input-port? (current-input-port)]) element?]{
Reads a single XML element from the port. The next non-whitespace
character read must start an XML element, but the input port can
contain other data after the element.}
@defproc[(syntax:read-xml [in input-port? (current-input-port)]) syntax?]{
Reads in an XML document and produces a syntax object version (like
@scheme[read-syntax]) of an @tech{X-expression}.}
@defproc[(syntax:read-xml/element [in input-port? (current-input-port)]) syntax?]{
Like @scheme[syntax:real-xml], but it reads an XML element like
@scheme[read-xml/element].}
@defproc[(write-xml [doc document?] [out output-port? (current-output-port)])
void?]{
Writes a document to the given output port, currently ignoring
everything except the document's root element.}
@defproc[(write-xml/content [content content/c] [out output-port? (current-output-port)])
void?]{
Writes document content to the given output port.}
@defproc[(display-xml [doc document?] [out output-port? (current-output-port)])
void?]{
Like @scheme[write-xml], but newlines and indentation make the output
more readable, though less technically correct when whitespace is
significant.}
@defproc[(display-xml/content [content content/c] [out output-port? (current-output-port)])
void?]{
Like @scheme[write-xml/content], but with indentation and newlines
like @scheme[display-xml].}
@; ----------------------------------------------------------------------
@section{XML and X-expression Conversions}
@defboolparam[permissive-xexprs v]{
If this is set to non-false, then @scheme[xml->xexpr] will allow
non-XML objects, such as other structs, in the content of the converted XML
and leave them in place in the resulting ``@tech{X-expression}''.
}
@defproc[(xml->xexpr [content content/c]) xexpr/c]{
Converts document content into an @tech{X-expression}, using
@scheme[permissive-xexprs] to determine if foreign objects are allowed.}
@defproc[(xexpr->xml [xexpr xexpr/c]) content/c]{
Converts an @tech{X-expression} into XML content.}
@defproc[(xexpr->string [xexpr xexpr/c]) string?]{
Converts an @tech{X-expression} into a string containing XML.}
@defproc[(string->xexpr [str string?]) xexpr/c]{
Converts XML represented with a string into an @tech{X-expression}.}
@defproc[((eliminate-whitespace [tags (listof symbol?)]
[choose (boolean? . -> . boolean?)])
[elem element?])
element?]{
Some elements should not contain any text, only other tags, except
they often contain whitespace for formating purposes. Given a list of
tag names as @scheme[tag]s and the identity function as
@scheme[choose], @scheme[eliminate-whitespace] produces a function
that filters out PCDATA consisting solely of whitespace from those
elements, and it raises an error if any non-whitespace text appears.
Passing in @scheme[not] as @scheme[choose] filters all elements which
are not named in the @scheme[tags] list. Using @scheme[(lambda (x) #t)] as
@scheme[choose] filters all elements regardless of the @scheme[tags]
list.}
@defproc[(validate-xexpr [v any/c]) (one-of/c #t)]{
If @scheme[v] is an @tech{X-expression}, the result
@scheme[#t]. Otherwise, @scheme[exn:invalid-xexpr]s is raised, with
the a message of the form ``Expected @nonterm{something}, given
@nonterm{something-else}/'' The @scheme[code] field of the exception
is the part of @scheme[v] that caused the exception.}
@defproc[(correct-xexpr? [v any/c]
[success-k (-> any/c)]
[fail-k (exn:invalid-xexpr? . -> . any/c)])
any/c]{
Like @scheme[validate-expr], except that @scheme[success-k] is called
on each valid leaf, and @scheme[fail-k] is called on invalid leaves;
the @scheme[fail-k] may return a value instead of raising an exception
of otherwise escaping. Results from the leaves are combined with
@scheme[and] to arrive at the final result.}
@; ----------------------------------------------------------------------
@section{Parameters}
@defparam[empty-tag-shorthand shorthand (or/c (one-of/c 'always 'never) (listof symbol?))]{
A parameter that determines whether output functions should use the
@litchar{<}@nonterm{tag}@litchar{/>} tag notation instead of
@litchar{<}@nonterm{tag}@litchar{>}@litchar{}@nonterm{tag}@litchar{>}
for elements that have no content.
When the parameter is set to @scheme['always], the abbreviated
notation is always used. When set of @scheme['never], the abbreviated
notation is never generated. when set to a list of symbols is
provided, tags with names in the list are abbreviated. The default is
@scheme['always].
The abbreviated form is the preferred XML notation. However, most
browsers designed for HTML will only properly render XHTML if the
document uses a mixture of the two formats. The
@scheme[html-empty-tags] constant contains the W3 consortium's
recommended list of XHTML tags that should use the shorthand.}
@defthing[html-empty-tags (listof symbol?)]{
See @scheme[empty-tag-shorthand].
@examples[
#:eval xml-eval
(parameterize ([empty-tag-shorthand html-empty-tags])
(write-xml/content (xexpr->xml `(html
(body ((bgcolor "red"))
"Hi!" (br) "Bye!")))))
]}
@defboolparam[collapse-whitespace collapse?]{
A parameter that controls whether consecutive whitespace is replaced
by a single space. CDATA sections are not affected. The default is
@scheme[#f].}
@defboolparam[read-comments preserve?]{
A parameter that determines whether comments are preserved or
discarded when reading XML. The default is @scheme[#f], which
discards comments.}
@defboolparam[xexpr-drop-empty-attributes drop?]{
Controls whether @scheme[xml->xexpr] drops or preserves attribute
sections for an element that has no attributes. The default is
@scheme[#f], which means that all generated @tech{X-expression}
elements have an attributes list (even if it's empty).}
@; ----------------------------------------------------------------------
@section{PList Library}
@defmodule[xml/plist]
The @schememodname[xml/plist] library provides the ability to read and
write XML documents that conform to the @defterm{plist} DTD, which is
used to store dictionaries of string--value associations. This format
is used by Mac OS X (both the operating system and its applications)
to store all kinds of data.
A @deftech{plist dictionary} is a value that could be created by an
expression matching the following @scheme[_dict-expr] grammar:
@schemegrammar*[
#:literals (list quote)
[dict-expr (list 'dict assoc-pair ...)]
[assoc-pair (list 'assoc-pair string pl-value)]
[pl-value string
(list 'true)
(list 'false)
(list 'integer integer)
(list 'real real)
dict-expr
(list 'array pl-value ...)]
]
@defproc[(plist-dict? [any/c v]) boolean?]{
Returns @scheme[#t] if @scheme[v] is a @tech{plist dictionary},
@scheme[#f] otherwise.}
@defproc[(read-plist [in input-port?]) plist-dict?]{
Reads a plist from a port, and produces a @tech{plist dictionary}.}
@defproc[(write-plist [dict plist-dict?] [out output-port?]) void?]{
Write a @tech{plist dictionary} to the given port.}
@examples[
#:eval plist-eval
(define my-dict
`(dict (assoc-pair "first-key"
"just a string with some whitespace")
(assoc-pair "second-key"
(false))
(assoc-pair "third-key"
(dict ))
(assoc-pair "fourth-key"
(dict (assoc-pair "inner-key"
(real 3.432))))
(assoc-pair "fifth-key"
(array (integer 14)
"another string"
(true)))
(assoc-pair "sixth-key"
(array))))
(define-values (in out) (make-pipe))
(write-plist my-dict out)
(close-output-port out)
(define new-dict (read-plist in))
(equal? my-dict new-dict)
]
The XML generated by @scheme[write-plist] in the above example
looks like the following, if re-formatted by:
@verbatim[#:indent 2]|{
first-key
just a string with some whitespace
second-key
third-key
fourth-key
inner-key
3.432
fifth-key
14
another string
sixth-key
}|