115 lines
5.1 KiB
Plaintext
115 lines
5.1 KiB
Plaintext
SSAX Package
|
|
============
|
|
|
|
A SSAX functional XML parsing framework consists of a DOM/SXML parser, a SAX
|
|
parser, and a supporting library of lexing and parsing procedures. The
|
|
procedures in the package can be used separately to tokenize or parse various
|
|
pieces of XML documents. The framework supports XML Namespaces, character,
|
|
internal and external parsed entities, attribute value normalization,
|
|
processing instructions and CDATA sections. The package includes a
|
|
semi-validating SXML parser: a DOM-mode parser that is an instantiation of
|
|
a SAX parser (called SSAX).
|
|
|
|
SSAX is a full-featured, algorithmically optimal, pure-functional parser,
|
|
which can act as a stream processor. SSAX is an efficient SAX parser that is
|
|
easy to use. SSAX minimizes the amount of application-specific state that has
|
|
to be shared among user-supplied event handlers. SSAX makes the maintenance
|
|
of an application-specific element stack unnecessary, which eliminates several
|
|
classes of common bugs. SSAX is written in a pure-functional subset of Scheme.
|
|
Therefore, the event handlers are referentially transparent, which makes them
|
|
easier for a programmer to write and to reason about. The more expressive,
|
|
reliable and easier to use application interface for the event-driven XML
|
|
parsing is the outcome of implementing the parsing engine as an enhanced tree
|
|
fold combinator, which fully captures the control pattern of the depth-first
|
|
tree traversal.
|
|
|
|
-------------------------------------------------
|
|
|
|
Quick start
|
|
|
|
; procedure: ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG
|
|
;
|
|
; This is an instance of a SSAX parser that returns an SXML
|
|
; representation of the XML document to be read from PORT.
|
|
; NAMESPACE-PREFIX-ASSIG is a list of (USER-PREFIX . URI-STRING)
|
|
; that assigns USER-PREFIXes to certain namespaces identified by
|
|
; particular URI-STRINGs. It may be an empty list.
|
|
; The procedure returns an SXML tree. The port points out to the
|
|
; first character after the root element.
|
|
(define (ssax:xml->sxml port namespace-prefix-assig) ...)
|
|
|
|
; procedure: pre-post-order TREE BINDINGS
|
|
;
|
|
; Traversal of an SXML tree or a grove:
|
|
; a <Node> or a <Nodelist>
|
|
;
|
|
; A <Node> and a <Nodelist> are mutually-recursive datatypes that
|
|
; underlie the SXML tree:
|
|
; <Node> ::= (name . <Nodelist>) | "text string"
|
|
; An (ordered) set of nodes is just a list of the constituent nodes:
|
|
; <Nodelist> ::= (<Node> ...)
|
|
; Nodelists, and Nodes other than text strings are both lists. A
|
|
; <Nodelist> however is either an empty list, or a list whose head is
|
|
; not a symbol (an atom in general). A symbol at the head of a node is
|
|
; either an XML name (in which case it's a tag of an XML element), or
|
|
; an administrative name such as '@'.
|
|
; See SXPath.scm and SSAX.scm for more information on SXML.
|
|
;
|
|
;
|
|
; Pre-Post-order traversal of a tree and creation of a new tree:
|
|
; pre-post-order:: <tree> x <bindings> -> <new-tree>
|
|
; where
|
|
; <bindings> ::= (<binding> ...)
|
|
; <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
|
|
; (<trigger-symbol> *macro* . <handler>) |
|
|
; (<trigger-symbol> <new-bindings> . <handler>) |
|
|
; (<trigger-symbol> . <handler>)
|
|
; <trigger-symbol> ::= XMLname | *text* | *default*
|
|
; <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
|
|
;
|
|
; The pre-post-order function visits the nodes and nodelists
|
|
; pre-post-order (depth-first). For each <Node> of the form (name
|
|
; <Node> ...) it looks up an association with the given 'name' among
|
|
; its <bindings>. If failed, pre-post-order tries to locate a
|
|
; *default* binding. It's an error if the latter attempt fails as
|
|
; well. Having found a binding, the pre-post-order function first
|
|
; checks to see if the binding is of the form
|
|
; (<trigger-symbol> *preorder* . <handler>)
|
|
; If it is, the handler is 'applied' to the current node. Otherwise,
|
|
; the pre-post-order function first calls itself recursively for each
|
|
; child of the current node, with <new-bindings> prepended to the
|
|
; <bindings> in effect. The result of these calls is passed to the
|
|
; <handler> (along with the head of the current <Node>). To be more
|
|
; precise, the handler is _applied_ to the head of the current node
|
|
; and its processed children. The result of the handler, which should
|
|
; also be a <tree>, replaces the current <Node>. If the current <Node>
|
|
; is a text string or other atom, a special binding with a symbol
|
|
; *text* is looked up.
|
|
;
|
|
; A binding can also be of a form
|
|
; (<trigger-symbol> *macro* . <handler>)
|
|
; This is equivalent to *preorder* described above. However, the result
|
|
; is re-processed again, with the current stylesheet.
|
|
;
|
|
(define (pre-post-order tree bindings) ...)
|
|
|
|
-------------------------------------------------
|
|
|
|
Additional tools included into the package
|
|
|
|
1. "access-remote.ss"
|
|
Uniform access to local and remote resources
|
|
Resolution for relative URIs in accordance with RFC 2396
|
|
|
|
2. "id.ss"
|
|
Creation and manipulation of the ID-index for a faster access to SXML elements
|
|
by their unique ID
|
|
Provides the DTD parser for extracting ID attribute declarations
|
|
|
|
3. "xlink-parser.ss"
|
|
Parser for XML documents that contain XLink elements
|
|
|
|
4. "multi-parser.ss"
|
|
SSAX multi parser: combines several specialized parsers into one
|
|
Provides creation of parent pointers to SXML document constructed
|