371 lines
12 KiB
Plaintext
371 lines
12 KiB
Plaintext
_XML_ Library
|
|
=============
|
|
|
|
Files: _xml.ss_
|
|
Signature: _xml^_
|
|
|
|
Basic XML Data Types
|
|
====================
|
|
|
|
Document:
|
|
This structure represents an XML document. The only useful part is
|
|
the document-element, which contains all the content. The rest of
|
|
of the structure contains DTD information, which isn't supported,
|
|
and processing-instructions.
|
|
|
|
Element:
|
|
Each pair of start/end tags and everything in between is an element.
|
|
It has the following pieces:
|
|
a name
|
|
attributes
|
|
contents including sub-elements
|
|
Xexpr:
|
|
S-expression representations of XML data.
|
|
|
|
The end of this document has more details.
|
|
|
|
Exceptions
|
|
==========
|
|
|
|
> (define-struct (exn:invalid-xexpr exn) (code))
|
|
Raised by validate-xexpr when passed an invalid Xexpr. Code contains an
|
|
invalid part of an Xexpr.
|
|
|
|
Functions
|
|
=========
|
|
|
|
> read-xml : [Input-port] -> Document
|
|
reads in an XML document from the given or current input port
|
|
XML documents contain exactly one element. It throws an xml-read:error
|
|
if there isn't any element or if there are more than one element.
|
|
|
|
Malformed xml is reported with source locations in
|
|
the form `l.c/o', where l, c, and o are the line number,
|
|
column number, and next port position, respectively as
|
|
returned by port-next-location.
|
|
|
|
Any non-characters other than eof read from the input-port will
|
|
appear in the document content. Such special values may only appear
|
|
where XML content may. See make-input-port for information
|
|
about creating ports that return non-character values.
|
|
|
|
> read-xml/element : [Input-port] -> Element
|
|
reads an XML element from the port. The next non-whitespace character
|
|
read must start an XML element. The input-port may contain other data
|
|
after the element.
|
|
|
|
> syntax:read-xml : [Input-port] -> Syntax
|
|
reads in an XML document and produces a syntax object version of
|
|
an xexpression.
|
|
|
|
> syntax:read-xml/element : [Input-port] -> Syntax
|
|
is just like read-xml/element except it produces a syntax version
|
|
of an xexpression
|
|
|
|
> write-xml : Document [Output-port] -> Void
|
|
writes a document to the given or current output port, currently
|
|
ignoring everything except the document's root element.
|
|
|
|
> write-xml/content : Content [Output-port] -> Void
|
|
writes a document's contents to the given or current output port
|
|
|
|
> display-xml : Document [Output-port] -> Void
|
|
just like write-xml, but newlines and indentation make the output more
|
|
readable, though less technically correct when white space is
|
|
significant.
|
|
|
|
> display-xml/content : Content [Output-port] -> Void
|
|
just like write-xml/content, but with indentation and newlines
|
|
|
|
> xml->xexpr : Content -> Xexpr
|
|
converts the interesting part of an XML document into an Xexpression
|
|
|
|
> xexpr->xml : Xexpr -> Content
|
|
converts an Xexpression into the interesting part of an XML document
|
|
|
|
> xexpr->string : Xexpression -> String
|
|
converts an Xexpression into a string representation
|
|
|
|
> eliminate-whitespace : (listof Symbol) (Bool -> Bool) -> Element -> Element
|
|
Some elements should not contain any text, only other tags, except they
|
|
often contain whitespace for formating purposes. Given a list of tag
|
|
names and the identity function, eliminate-whitespace produces a
|
|
function that filters out pcdata consisting solely of whitespace from
|
|
those elements and raises an error if any non-whitespace text appears.
|
|
Passing in the function called "not" instead of the identity function
|
|
filters all elements which are not named in the list. Using void
|
|
filters all elements regardless of the list.
|
|
|
|
> xexpr? : any -> Boolean
|
|
Is the given thing an Xexpr?
|
|
|
|
> validate-xexpr : any -> #t
|
|
If the given thing is an Xexpr, produce true. Otherwise, raise
|
|
_exn:invalid-xexpr_, with the message set to "Expected something, given
|
|
something-else", where "something" is what it expected and
|
|
"something-else" set to what it was really given; and the code set to
|
|
the part of the non-Xexpr that caused the exception.
|
|
|
|
> correct-xexpr? : any (-> a) (exn -> a) -> a
|
|
If the given thing is an Xexpr, produce an a. Otherwise call the
|
|
second function with an exn:invalid-xexpr. This second function
|
|
may inspect this structure and decide to return a "correct" value.
|
|
This is a method of extending the definition of an Xexpr and is used
|
|
by the web-server's Xexpr/callbacks. (See for an example.)
|
|
|
|
Parameters
|
|
==========
|
|
|
|
> empty-tag-shorthand : 'always | 'never | (listof Symbol)
|
|
Default: 'always
|
|
This determines if the output functions should use the <empty/>
|
|
tag notation instead of writing <empty></empty>. If the
|
|
argument is 'always, the abbreviated notation is always used,
|
|
and if the argument is 'never, the open/close pair is always
|
|
generated. If a list of symbols is provided, tags with names
|
|
in this list will be abbreviated. The first form is the
|
|
preferred XML notation. However, most browsers designed for
|
|
HTML will only properly render XHTML if the document uses a
|
|
mixture of the two formats. _html-empty-tags_ contains the W3
|
|
consortium's recommended list of XHTML tags that should use the
|
|
shorthand.
|
|
|
|
> collapse-whitespace : Bool
|
|
Default: #f
|
|
All consecutive whitespace is replaced by a single space.
|
|
CDATA sections are not affected.
|
|
|
|
> trim-whitespace : Bool
|
|
This parameter no longer exists. Consider using collapse-whitespace
|
|
and eliminate-whitespace instead.
|
|
|
|
> read-comments : Bool
|
|
Default: #f
|
|
Comments, by definition, should be ignored by programs. However,
|
|
interoperating with ad hoc extensions to other languages sometimes
|
|
requires processing comments anyway.
|
|
|
|
> xexpr-drop-empty-attributes : Bool
|
|
Default: #f
|
|
It's easier to write functions processing Xexpressions, if they always
|
|
have a list of attributes. On the other hand, it's less cumbersome to
|
|
write Xexpresssions by hand without empty lists of attributes
|
|
everywhere. Normally xml->xexpr leaves in empty attribute lists.
|
|
Setting this parameter to #t drops them, so further editing the
|
|
Xexpression by hand is less annoying.
|
|
|
|
Examples
|
|
========
|
|
|
|
Reading an Xexpression:
|
|
(xml->xexpr (document-element (read-xml input-port)))
|
|
|
|
Writing an Xexpression:
|
|
(empty-tag-shorthand html-empty-tags)
|
|
(write-xml/content (xexpr->xml `(html (head (title ,banner))
|
|
(body ((bgcolor "white"))
|
|
,text)))
|
|
output-port)
|
|
|
|
What this Library Doesn't Provide
|
|
=================================
|
|
|
|
Document Type Declaration (DTD) processing
|
|
Validation
|
|
Expanding user-defined entities
|
|
Reading user-defined entities in attributes
|
|
Unicode support
|
|
|
|
XML Datatype Details
|
|
====================
|
|
|
|
Note: Users of the XML collection don't need to know most of these definitions.
|
|
|
|
Note: Xexpr is the only important one to understand. Even then,
|
|
Processing-instructions may be ignored.
|
|
|
|
> Xexpr = String
|
|
| (cons Symbol (cons (listof (list Symbol String)) (listof Xexpr)))
|
|
| (cons Symbol (listof Xexpr)) ;; an element with no attributes
|
|
| Symbol ;; symbolic entities such as
|
|
| Number ;; numeric entities like 
|
|
| Cdata
|
|
| Misc
|
|
|
|
> Document = (make-document Prolog Element (listof Processing-instruction))
|
|
(define-struct document (prolog element misc))
|
|
|
|
> Prolog = (make-prolog (listof Misc) Document-type [Misc ...])
|
|
(define-struct prolog (misc dtd misc2))
|
|
The last field is a (listof Misc), but the maker accepts optional
|
|
arguments instead for backwards compatibility.
|
|
|
|
> Document-type = #f | (make-document-type Symbol External-dtd #f)
|
|
(define-struct document-type (name external inlined))
|
|
|
|
> External-dtd = (make-external-dtd/public str str)
|
|
| (make-external-dtd/system str)
|
|
| #f
|
|
(define-struct external-dtd (system))
|
|
(define-struct (external-dtd/public external-dtd) (public))
|
|
(define-struct (external-dtd/system external-dtd) ())
|
|
|
|
> Element = (make-element Location Location
|
|
Symbol
|
|
(listof Attribute)
|
|
(listof Content))
|
|
(define-struct (element struct:source) (name attributes content))
|
|
|
|
> Attribute = (make-attribute Location Location Symbol String)
|
|
(define-struct (attribute struct:source) (name value))
|
|
|
|
> Content = Pcdata
|
|
| Element
|
|
| Entity
|
|
| Misc
|
|
|
|
Misc = Comment
|
|
| Processing-instruction
|
|
|
|
> Pcdata = (make-pcdata Location Location String)
|
|
(define-struct (pcdata struct:source) (string))
|
|
|
|
> Cdata = (make-cdata Location Location String)
|
|
(define-struct (cdata struct:source) (string))
|
|
Note: The string of a cdata structure is assumed to be of the form
|
|
"<![CDATA[~a]]>" with proper quoting. If this is an incorrect
|
|
assumption, this library will generate invalid XML.
|
|
|
|
> Entity = (make-entity Location Location (U Nat Symbol))
|
|
(define-struct (entity struct:source) (text))
|
|
|
|
> Processing-instruction = (make-pi Location Location String String)
|
|
(define-struct (pi struct:source) (target-name instruction))
|
|
|
|
> Comment = (make-comment String)
|
|
(define-struct comment (text))
|
|
|
|
Source = (make-source Location Location)
|
|
(define-struct source (start stop))
|
|
|
|
Location = (make-location Nat Nat Nat) | Symbol
|
|
(define-struct location (line char offset))
|
|
Note: read-xml records location structures, while xexpr->xml inserts a
|
|
symbol. Other functions that must fabricate XML Locations
|
|
without prior source location should use a sensible "comment" symbol.
|
|
|
|
|
|
The PList Library
|
|
=================
|
|
|
|
Files: _plist.ss_
|
|
|
|
The PList library provides the ability to read and write xml documents which
|
|
conform to the "plist" DTD, used to store 'dictionaries' of string - value
|
|
associations. This format is typically used by Mac OS X --- the operating
|
|
system and its applications --- to store all kinds of data.
|
|
|
|
To Load
|
|
=======
|
|
|
|
(require (lib "plist.ss" "xml"))
|
|
|
|
Functions
|
|
=========
|
|
|
|
> read-plist : Port -> PLDict
|
|
reads a plist from a port, and produces a 'dict' x-expression
|
|
|
|
> write-plist : PLDict Port -> Void
|
|
writes a plist to the given port. May raise the exn:application:type
|
|
exception if the plist is badly formed.
|
|
|
|
Datatypes
|
|
=========
|
|
|
|
NB: all of these are subtypes of x-expression:
|
|
|
|
> PLDict = (list 'dict Assoc-pair ...)
|
|
|
|
> PLAssoc-pair = (list 'assoc-pair String PLValue)
|
|
|
|
> PLValue = String
|
|
|
|
| (list 'true)
|
|
| (list 'false)
|
|
| (list 'integer Integer)
|
|
| (list 'real Real)
|
|
| PLDict
|
|
| PLArray
|
|
|
|
> PLArray = (list 'array PLValue ...)
|
|
|
|
In fact, the PList DTD also defines Data and Date types, but we're ignoring
|
|
these for the moment.
|
|
|
|
Examples
|
|
========
|
|
|
|
Here's a sample PLDict:
|
|
|
|
(define my-dict
|
|
`(dict (assoc-pair "first-key"
|
|
"just a string
|
|
with some whitespace in it")
|
|
(assoc-pair "second-key"
|
|
(false))
|
|
(assoc-pair "third-key"
|
|
(dict ))
|
|
(assoc-pair "fourth-key"
|
|
(dict (assoc-pair "inner-key"
|
|
(real 3.432))))
|
|
(assoc-pair "fifth-key"
|
|
(array (integer 14)
|
|
"another string"
|
|
(true)))
|
|
(assoc-pair "sixth-key"
|
|
(array))))
|
|
|
|
Let's write it to disk:
|
|
|
|
(call-with-output-file "/Users/clements/tmp.plist"
|
|
(lambda (port)
|
|
(write-plist my-dict port))
|
|
'truncate)
|
|
|
|
Let's read it back from the disk:
|
|
|
|
(define new-dict
|
|
(call-with-input-file "/Users/clements/tmp.plist"
|
|
(lambda (port)
|
|
(read-plist port))))
|
|
|
|
Here's what that (hand-formatted) text file looks like:
|
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE plist SYSTEM "file://localhost/System/Library/DTDs/PropertyList.dtd">
|
|
<plist version="0.9">
|
|
<dict>
|
|
<key>first-key</key>
|
|
<string>just a string
|
|
with some whitespace in it</string>
|
|
<key>second-key</key>
|
|
<false />
|
|
<key>third-key</key>
|
|
<dict />
|
|
<key>fourth-key</key>
|
|
<dict>
|
|
<key>inner-key</key>
|
|
<real>3.432</real>
|
|
</dict>
|
|
<key>fifth-key</key>
|
|
<array>
|
|
<integer>14</integer>
|
|
<string>another string</string>
|
|
<true />
|
|
</array>
|
|
<key>sixth-key</key>
|
|
<array />
|
|
</dict>
|
|
</plist>
|