697 lines
26 KiB
Plaintext
697 lines
26 KiB
Plaintext
_SSAX_/_SXML_ Library
|
|
=====================
|
|
|
|
These is Oleg Kiselyov's SSAX/SXML library. For more documentation
|
|
and papers, refer to
|
|
|
|
http://ssax.sourceforge.net/
|
|
|
|
SSAX
|
|
|
|
================================================================
|
|
_ssax.ss_
|
|
================================================================
|
|
|
|
To load the module, do
|
|
(require (lib "ssax.ss" "ssax"))
|
|
|
|
Besides the standard SSAX procedures described below, this defines:
|
|
|
|
> ssax:warn-parameter - Parameter for the procedure to display
|
|
warnings. It must accept a port, a warning message, and any number
|
|
of additional arguments.
|
|
|
|
> make-xml-token KIND HEAD
|
|
This creates an XML token.
|
|
|
|
> xml-token? THING
|
|
|
|
> xml-token-kind XML-TOKEN
|
|
|
|
> xml-token-head XML-TOKEN
|
|
|
|
> ssax:read-markup-token PORT
|
|
This procedure starts parsing of a markup token. The current position
|
|
in the stream must be #\<. This procedure scans enough of the input stream
|
|
to figure out what kind of a markup token it is seeing. The procedure returns
|
|
an xml-token structure describing the token. Note, generally reading
|
|
of the current markup is not finished! In particular, no attributes of
|
|
the start-tag token are scanned.
|
|
|
|
Here's a detailed break out of the return values and the position in the PORT
|
|
when that particular value is returned:
|
|
PI-token: only PI-target is read.
|
|
To finish the Processing Instruction and disregard it,
|
|
call ssax:skip-pi. ssax:read-attributes may be useful
|
|
as well (for PIs whose content is attribute-value
|
|
pairs)
|
|
END-token: The end tag is read completely; the current position
|
|
is right after the terminating #\> character.
|
|
COMMENT is read and skipped completely. The current position
|
|
is right after "-->" that terminates the comment.
|
|
CDSECT The current position is right after "<!CDATA["
|
|
Use ssax:read-cdata-body to read the rest.
|
|
DECL We have read the keyword (the one that follows "<!")
|
|
identifying this declaration markup. The current
|
|
position is after the keyword (usually a
|
|
whitespace character)
|
|
|
|
START-token We have read the keyword (GI) of this start tag.
|
|
No attributes are scanned yet. We don't know if this
|
|
tag has an empty content either.
|
|
Use ssax:complete-start-tag to finish parsing of
|
|
the token.
|
|
|
|
|
|
> ssax:read-pi-body-as-string PORT
|
|
The current position is right after reading the PITarget. We read the
|
|
body of PI and return is as a string. The port will point to the
|
|
character right after '?>' combination that terminates PI.
|
|
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
|
|
|
|
|
|
> ssax:skip-internal-dtd PORT
|
|
The current pos in the port is inside an internal DTD subset
|
|
(e.g., after reading #\[ that begins an internal DTD subset)
|
|
Skip until the "]>" combination that terminates this DTD
|
|
|
|
> ssax:read-cdata-body PORT STR-HANDLER SEED
|
|
|
|
This procedure must be called after we have read a string "<![CDATA["
|
|
that begins a CDATA section. The current position must be the first
|
|
position of the CDATA body. This function reads _lines_ of the CDATA
|
|
body and passes them to a STR-HANDLER, a character data consumer.
|
|
|
|
The str-handler is a STR-HANDLER, a procedure STRING1 STRING2 SEED.
|
|
The first STRING1 argument to STR-HANDLER never contains a newline.
|
|
The second STRING2 argument often will. On the first invocation of
|
|
the STR-HANDLER, the seed is the one passed to ssax:read-cdata-body
|
|
as the third argument. The result of this first invocation will be
|
|
passed as the seed argument to the second invocation of the line
|
|
consumer, and so on. The result of the last invocation of the
|
|
STR-HANDLER is returned by the ssax:read-cdata-body. Note a
|
|
similarity to the fundamental 'fold' iterator.
|
|
|
|
Within a CDATA section all characters are taken at their face value,
|
|
with only three exceptions:
|
|
CR, LF, and CRLF are treated as line delimiters, and passed
|
|
as a single #\newline to the STR-HANDLER
|
|
"]]>" combination is the end of the CDATA section.
|
|
> is treated as an embedded #\> character
|
|
Note, < and & are not specially recognized (and are not expanded)!
|
|
|
|
|
|
> ssax:read-char-ref PORT
|
|
|
|
[66] CharRef ::= '&#' [0-9]+ ';'
|
|
| '&#x' [0-9a-fA-F]+ ';'
|
|
|
|
This procedure must be called after we we have read "&#"
|
|
that introduces a char reference.
|
|
The procedure reads this reference and returns the corresponding char
|
|
The current position in PORT will be after ";" that terminates
|
|
the char reference
|
|
Faults detected:
|
|
WFC: XML-Spec.html#wf-Legalchar
|
|
|
|
According to Section "4.1 Character and Entity References"
|
|
of the XML Recommendation:
|
|
"[Definition: A character reference refers to a specific character
|
|
in the ISO/IEC 10646 character set, for example one not directly
|
|
accessible from available input devices.]"
|
|
Therefore, we use a ucscode->char function to convert a character
|
|
code into the character -- *regardless* of the current character
|
|
encoding of the input stream.
|
|
|
|
|
|
> ssax:handle-parsed-entity PORT NAME ENTITIES
|
|
CONTENT-HANDLER STR-HANDLER SEED
|
|
|
|
Expand and handle a parsed-entity reference
|
|
port - a PORT
|
|
name - the name of the parsed entity to expand, a symbol
|
|
entities - see ENTITIES
|
|
content-handler -- procedure PORT ENTITIES SEED
|
|
that is supposed to return a SEED
|
|
str-handler - a STR-HANDLER. It is called if the entity in question
|
|
turns out to be a pre-declared entity
|
|
|
|
The result is the one returned by CONTENT-HANDLER or STR-HANDLER
|
|
Faults detected:
|
|
WFC: XML-Spec.html#wf-entdeclared
|
|
WFC: XML-Spec.html#norecursion
|
|
|
|
|
|
> make-empty-attlist
|
|
The ATTLIST Abstract Data Type
|
|
Currently is implemented as an assoc list sorted in the ascending
|
|
order of NAMES.
|
|
|
|
|
|
> attlist-add ATTLIST NAME-VALUE-PAIR
|
|
Add a name-value pair to the existing attlist preserving the order
|
|
Return the new list, in the sorted ascending order.
|
|
Return #f if a pair with the same name already exists in the attlist
|
|
|
|
|
|
> attlist-null? ATTLIST
|
|
|
|
> attlist-remove-top ATTLIST
|
|
Given an non-null attlist, return a pair of values: the top and the rest
|
|
|
|
> attliast->alist
|
|
|
|
> attlist-fold
|
|
|
|
> ssax:read-attributes PORT ENTITIES
|
|
|
|
This procedure reads and parses a production Attribute*
|
|
[41] Attribute ::= Name Eq AttValue
|
|
[10] AttValue ::= '"' ([^<&"] | Reference)* '"'
|
|
| "'" ([^<&'] | Reference)* "'"
|
|
[25] Eq ::= S? '=' S?
|
|
|
|
|
|
The procedure returns an ATTLIST, of Name (as UNRES-NAME), Value (as string)
|
|
pairs. The current character on the PORT is a non-whitespace character
|
|
that is not an ncname-starting character.
|
|
|
|
Note the following rules to keep in mind when reading an 'AttValue'
|
|
"Before the value of an attribute is passed to the application
|
|
or checked for validity, the XML processor must normalize it as follows:
|
|
- a character reference is processed by appending the referenced
|
|
character to the attribute value
|
|
- an entity reference is processed by recursively processing the
|
|
replacement text of the entity [see ENTITIES]
|
|
[named entities amp lt gt quot apos are assumed pre-declared]
|
|
- a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20
|
|
to the normalized value, except that only a single #x20 is appended for a
|
|
"#xD#xA" sequence that is part of an external parsed entity or the
|
|
literal entity value of an internal parsed entity
|
|
- other characters are processed by appending them to the normalized value "
|
|
|
|
|
|
Faults detected:
|
|
WFC: XML-Spec.html#CleanAttrVals
|
|
WFC: XML-Spec.html#uniqattspec
|
|
|
|
|
|
> ssax:uri-string->symbol URI-STR
|
|
Convert a URI-STR to an appropriate symbol
|
|
|
|
> ssax:complete-start-tag TAG PORT ELEMS ENTITIES NAMESPACES
|
|
|
|
This procedure is to complete parsing of a start-tag markup. The
|
|
procedure must be called after the start tag token has been
|
|
read. TAG is an UNRES-NAME. ELEMS is an instance of xml-decl::elems;
|
|
it can be #f to tell the function to do _no_ validation of elements
|
|
and their attributes.
|
|
|
|
This procedure returns several values:
|
|
ELEM-GI: a RES-NAME.
|
|
ATTRIBUTES: element's attributes, an ATTLIST of (RES-NAME . STRING)
|
|
pairs. The list does NOT include xmlns attributes.
|
|
NAMESPACES: the input list of namespaces amended with namespace
|
|
(re-)declarations contained within the start-tag under parsing
|
|
ELEM-CONTENT-MODEL
|
|
|
|
On exit, the current position in PORT will be the first character after
|
|
#\> that terminates the start-tag markup.
|
|
|
|
Faults detected:
|
|
VC: XML-Spec.html#enum
|
|
VC: XML-Spec.html#RequiredAttr
|
|
VC: XML-Spec.html#FixedAttr
|
|
VC: XML-Spec.html#ValueType
|
|
WFC: XML-Spec.html#uniqattspec (after namespaces prefixes are resolved)
|
|
VC: XML-Spec.html#elementvalid
|
|
WFC: REC-xml-names/#dt-NSName
|
|
|
|
Note, although XML Recommendation does not explicitly say it,
|
|
xmlns and xmlns: attributes don't have to be declared (although they
|
|
can be declared, to specify their default value)
|
|
|
|
Procedure: ssax:complete-start-tag tag-head port elems entities namespaces
|
|
|
|
> ssax:read-external-id PORT
|
|
|
|
This procedure parses an ExternalID production:
|
|
[75] ExternalID ::= 'SYSTEM' S SystemLiteral
|
|
| 'PUBLIC' S PubidLiteral S SystemLiteral
|
|
[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
|
|
[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
|
|
[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9]
|
|
| [-'()+,./:=?;!*#@$_%]
|
|
|
|
This procedure is supposed to be called when an ExternalID is expected;
|
|
that is, the current character must be either #\S or #\P that start
|
|
correspondingly a SYSTEM or PUBLIC token. This procedure returns the
|
|
SystemLiteral as a string. A PubidLiteral is disregarded if present.
|
|
|
|
> ssax:read-char-data PORT EXPECT-EOF? STR-HANDLER SEED
|
|
|
|
This procedure is to read the character content of an XML document
|
|
or an XML element.
|
|
[43] content ::=
|
|
(element | CharData | Reference | CDSect | PI
|
|
| Comment)*
|
|
To be more precise, the procedure reads CharData, expands CDSect
|
|
and character entities, and skips comments. The procedure stops
|
|
at a named reference, EOF, at the beginning of a PI or a start/end tag.
|
|
|
|
port
|
|
a PORT to read
|
|
expect-eof?
|
|
a boolean indicating if EOF is normal, i.e., the character
|
|
data may be terminated by the EOF. EOF is normal
|
|
while processing a parsed entity.
|
|
str-handler
|
|
a STR-HANDLER
|
|
seed
|
|
an argument passed to the first invocation of STR-HANDLER.
|
|
|
|
The procedure returns two results: SEED and TOKEN.
|
|
The SEED is the result of the last invocation of STR-HANDLER, or the
|
|
original seed if STR-HANDLER was never called.
|
|
|
|
TOKEN can be either an eof-object (this can happen only if
|
|
expect-eof? was #t), or:
|
|
- an xml-token describing a START tag or an END-tag;
|
|
For a start token, the caller has to finish reading it.
|
|
- an xml-token describing the beginning of a PI. It's up to an
|
|
application to read or skip through the rest of this PI;
|
|
- an xml-token describing a named entity reference.
|
|
|
|
CDATA sections and character references are expanded inline and
|
|
never returned. Comments are silently disregarded.
|
|
|
|
As the XML Recommendation requires, all whitespace in character data
|
|
must be preserved. However, a CR character (#xD) must be disregarded
|
|
if it appears before a LF character (#xA), or replaced by a #xA character
|
|
otherwise. See Secs. 2.10 and 2.11 of the XML Recommendation. See also
|
|
the canonical XML Recommendation.
|
|
|
|
|
|
> ssax:assert-token TOKEN KIND GI
|
|
Make sure that TOKEN is of anticipated KIND and has anticipated GI
|
|
Note GI argument may actually be a pair of two symbols, Namespace
|
|
URI or the prefix, and of the localname.
|
|
If the assertion fails, error-cont is evaluated by passing it
|
|
three arguments: token kind gi. The result of error-cont is returned.
|
|
|
|
> ssax:make-pi-parser my-pi-handlers
|
|
Create a parser to parse and process one Processing Element (PI).
|
|
|
|
my-pi-handlers
|
|
An assoc list of pairs (PI-TAG . PI-HANDLER)
|
|
where PI-TAG is an NCName symbol, the PI target, and
|
|
PI-HANDLER is a procedure PORT PI-TAG SEED
|
|
where PORT points to the first symbol after the PI target.
|
|
The handler should read the rest of the PI up to and including
|
|
the combination '?>' that terminates the PI. The handler should
|
|
return a new seed.
|
|
One of the PI-TAGs may be the symbol *DEFAULT*. The corresponding
|
|
handler will handle PIs that no other handler will. If the
|
|
*DEFAULT* PI-TAG is not specified, ssax:make-pi-parser will assume
|
|
the default handler that skips the body of the PI
|
|
|
|
The output of the ssax:make-pi-parser is a procedure
|
|
PORT PI-TAG SEED
|
|
that will parse the current PI according to the user-specified handlers.
|
|
|
|
The previous version of ssax:make-pi-parser was a low-level macro:
|
|
(define-macro ssax:make-pi-parser
|
|
(lambda (my-pi-handlers)
|
|
`(lambda (port target seed)
|
|
(case target
|
|
; Generate the body of the case statement
|
|
,@(let loop ((pi-handlers my-pi-handlers) (default #f))
|
|
(cond
|
|
((null? pi-handlers)
|
|
(if default `((else (,default port target seed)))
|
|
'((else
|
|
(ssax:warn port "Skipping PI: " target nl)
|
|
(ssax:skip-pi port)
|
|
seed))))
|
|
((eq? '*DEFAULT* (caar pi-handlers))
|
|
(loop (cdr pi-handlers) (cdar pi-handlers)))
|
|
(else
|
|
(cons
|
|
`((,(caar pi-handlers)) (,(cdar pi-handlers) port target seed))
|
|
(loop (cdr pi-handlers) default)))))))))
|
|
|
|
|
|
> ssax:make-elem-parser my-new-level-seed my-finish-element
|
|
my-char-data-handler my-pi-handlers
|
|
|
|
Create a parser to parse and process one element, including its
|
|
character content or children elements. The parser is typically
|
|
applied to the root element of a document.
|
|
|
|
my-new-level-seed
|
|
procedure ELEM-GI ATTRIBUTES NAMESPACES EXPECTED-CONTENT SEED
|
|
where ELEM-GI is a RES-NAME of the element
|
|
about to be processed.
|
|
This procedure is to generate the seed to be passed
|
|
to handlers that process the content of the element.
|
|
This is the function identified as 'fdown' in the denotational
|
|
semantics of the XML parser given in the title comments to this
|
|
file.
|
|
|
|
my-finish-element
|
|
procedure ELEM-GI ATTRIBUTES NAMESPACES PARENT-SEED SEED
|
|
This procedure is called when parsing of ELEM-GI is finished.
|
|
The SEED is the result from the last content parser (or
|
|
from my-new-level-seed if the element has the empty content).
|
|
PARENT-SEED is the same seed as was passed to my-new-level-seed.
|
|
The procedure is to generate a seed that will be the result
|
|
of the element parser.
|
|
This is the function identified as 'fup' in the denotational
|
|
semantics of the XML parser given in the title comments to this
|
|
file.
|
|
|
|
my-char-data-handler
|
|
A STR-HANDLER
|
|
|
|
my-pi-handlers
|
|
See ssax:make-pi-handler above
|
|
|
|
|
|
The generated parser is a
|
|
procedure START-TAG-HEAD PORT ELEMS ENTITIES
|
|
NAMESPACES PRESERVE-WS? SEED
|
|
The procedure must be called after the start tag token has been
|
|
read. START-TAG-HEAD is an UNRES-NAME from the start-element tag.
|
|
ELEMS is an instance of xml-decl::elems.
|
|
See ssax:complete-start-tag::preserve-ws?
|
|
|
|
Faults detected:
|
|
VC: XML-Spec.html#elementvalid
|
|
WFC: XML-Spec.html#GIMatch
|
|
|
|
|
|
|
|
> ssax:make-parser user-handler-tag user-handler-proc ...
|
|
|
|
Create an XML parser, an instance of the XML parsing framework.
|
|
This will be a SAX, a DOM, or a specialized parser depending
|
|
on the supplied user-handlers.
|
|
|
|
user-handler-tag is a symbol that identifies a procedural expression
|
|
that follows the tag. Given below are tags and signatures of the
|
|
corresponding procedures. Not all tags have to be specified. If some
|
|
are omitted, reasonable defaults will apply.
|
|
|
|
|
|
tag: DOCTYPE
|
|
handler-procedure: PORT DOCNAME SYSTEMID INTERNAL-SUBSET? SEED
|
|
If internal-subset? is #t, the current position in the port
|
|
is right after we have read #\[ that begins the internal DTD subset.
|
|
We must finish reading of this subset before we return
|
|
(or must call skip-internal-subset if we aren't interested in reading it).
|
|
The port at exit must be at the first symbol after the whole
|
|
DOCTYPE declaration.
|
|
The handler-procedure must generate four values:
|
|
ELEMS ENTITIES NAMESPACES SEED
|
|
See xml-decl::elems for ELEMS. It may be #f to switch off the validation.
|
|
NAMESPACES will typically contain USER-PREFIXes for selected URI-SYMBs.
|
|
The default handler-procedure skips the internal subset,
|
|
if any, and returns (values #f '() '() seed)
|
|
|
|
tag: UNDECL-ROOT
|
|
handler-procedure: ELEM-GI SEED
|
|
where ELEM-GI is an UNRES-NAME of the root element. This procedure
|
|
is called when an XML document under parsing contains _no_ DOCTYPE
|
|
declaration.
|
|
The handler-procedure, as a DOCTYPE handler procedure above,
|
|
must generate four values:
|
|
ELEMS ENTITIES NAMESPACES SEED
|
|
The default handler-procedure returns (values #f '() '() seed)
|
|
|
|
tag: DECL-ROOT
|
|
handler-procedure: ELEM-GI SEED
|
|
where ELEM-GI is an UNRES-NAME of the root element. This procedure
|
|
is called when an XML document under parsing does contains the DOCTYPE
|
|
declaration.
|
|
The handler-procedure must generate a new SEED (and verify
|
|
that the name of the root element matches the doctype, if the handler
|
|
so wishes).
|
|
The default handler-procedure is the identity function.
|
|
|
|
tag: NEW-LEVEL-SEED
|
|
handler-procedure: see ssax:make-elem-parser, my-new-level-seed
|
|
|
|
tag: FINISH-ELEMENT
|
|
handler-procedure: see ssax:make-elem-parser, my-finish-element
|
|
|
|
tag: CHAR-DATA-HANDLER
|
|
handler-procedure: see ssax:make-elem-parser, my-char-data-handler
|
|
|
|
tag: PI
|
|
handler-procedure: see ssax:make-pi-parser
|
|
The default value is '()
|
|
|
|
> ssax:reverse-collect-str LIST-OF-FRAGS -> LIST-OF-FRAGS
|
|
given the list of fragments (some of which are text strings)
|
|
reverse the list and concatenate adjacent text strings.
|
|
We can prove from the general case below that if LIST-OF-FRAGS
|
|
has zero or one element, the result of the procedure is equal?
|
|
to its argument. This fact justifies the shortcut evaluation below.
|
|
|
|
> ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG
|
|
|
|
This is an instance of a SSAX parser above that returns an SXML
|
|
representation of the XML document to be read from PORT.
|
|
NAMESPACE-PREFIX-ASSIG is a list of (USER-PREFIX . URI-STRING)
|
|
that assigns USER-PREFIXes to certain namespaces identified by
|
|
particular URI-STRINGs. It may be an empty list.
|
|
The procedure returns an SXML tree. The port points out to the
|
|
first character after the root element.
|
|
|
|
|
|
|
|
|
|
================================================================
|
|
_input-parse.ss_
|
|
================================================================
|
|
|
|
To load the module, do
|
|
(require (lib "input-parse.ss" "ssax"))
|
|
|
|
This defines
|
|
|
|
> (define-struct (exn:ssax exn) (port stuff))
|
|
|
|
for parser errors raised by SSAX via RAISE. PORT contains the port on
|
|
which the error happens, STUFF is a list of additional, but random
|
|
information.
|
|
|
|
Note that this also exports a procedure called read-string, which
|
|
conflicts with the procedure of the same name in the mzscheme
|
|
language. The ssax collection comes with a module called
|
|
restricted-mzscheme that is identical to mzscheme, except that it
|
|
omits read-string.
|
|
|
|
> parser-error PORT MESSAGE SPECIALISING-MSG*
|
|
Many procedures of this package call parser-error to report a parsing
|
|
error. The first argument is a port, which typically points to the
|
|
offending character or its neighborhood. Most of the Scheme systems
|
|
let the user query a PORT for the current position. MESSAGE is the
|
|
description of the error. Other arguments supply more details about
|
|
the problem.
|
|
|
|
================================================================
|
|
_sxml-tree-trans.ss_
|
|
================================================================
|
|
|
|
To load the module, do
|
|
(require (lib "sxml-tree-trans.ss" "ssax"))
|
|
|
|
> SRV:send-reply FRAGMENT ...
|
|
|
|
Output the 'fragments'
|
|
The fragments are a list of strings, characters,
|
|
numbers, thunks, #f, #t -- and other fragments.
|
|
The function traverses the tree depth-first, writes out
|
|
strings and characters, executes thunks, and ignores
|
|
#f and '().
|
|
The function returns #t if anything was written at all;
|
|
otherwise the result is #f
|
|
If #t occurs among the fragments, it is not written out
|
|
but causes the result of SRV:send-reply to be #t
|
|
|
|
|
|
> pre-post-order TREE BINDINGS
|
|
|
|
Traversal of an SXML tree or a grove:
|
|
a <Node> or a <Nodelist>
|
|
|
|
A <Node> and a <Nodelist> are mutually-recursive datatypes that
|
|
underlie the SXML tree:
|
|
<Node> ::= (name . <Nodelist>) | "text string"
|
|
An (ordered) set of nodes is just a list of the constituent nodes:
|
|
<Nodelist> ::= (<Node> ...)
|
|
Nodelists, and Nodes other than text strings are both lists. A
|
|
<Nodelist> however is either an empty list, or a list whose head is
|
|
not a symbol (an atom in general). A symbol at the head of a node is
|
|
either an XML name (in which case it's a tag of an XML element), or
|
|
an administrative name such as '@'.
|
|
See SXPath.scm and SSAX.scm for more information on SXML.
|
|
|
|
|
|
Pre-Post-order traversal of a tree and creation of a new tree:
|
|
pre-post-order:: <tree> x <bindings> -> <new-tree>
|
|
where
|
|
<bindings> ::= (<binding> ...)
|
|
<binding> ::= (<trigger-symbol> *preorder* . <handler>) |
|
|
(<trigger-symbol> *macro* . <handler>) |
|
|
(<trigger-symbol> <new-bindings> . <handler>) |
|
|
(<trigger-symbol> . <handler>)
|
|
<trigger-symbol> ::= XMLname | *text* | *default*
|
|
<handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
|
|
|
|
The pre-post-order function visits the nodes and nodelists
|
|
pre-post-order (depth-first). For each <Node> of the form (name
|
|
<Node> ...) it looks up an association with the given 'name' among
|
|
its <bindings>. If failed, pre-post-order tries to locate a
|
|
*default* binding. It's an error if the latter attempt fails as
|
|
well. Having found a binding, the pre-post-order function first
|
|
checks to see if the binding is of the form
|
|
(<trigger-symbol> *preorder* . <handler>)
|
|
If it is, the handler is 'applied' to the current node. Otherwise,
|
|
the pre-post-order function first calls itself recursively for each
|
|
child of the current node, with <new-bindings> prepended to the
|
|
<bindings> in effect. The result of these calls is passed to the
|
|
<handler> (along with the head of the current <Node>). To be more
|
|
precise, the handler is _applied_ to the head of the current node
|
|
and its processed children. The result of the handler, which should
|
|
also be a <tree>, replaces the current <Node>. If the current <Node>
|
|
is a text string or other atom, a special binding with a symbol
|
|
*text* is looked up.
|
|
|
|
A binding can also be of a form
|
|
(<trigger-symbol> *macro* . <handler>)
|
|
This is equivalent to *preorder* described above. However, the result
|
|
is re-processed again, with the current stylesheet.
|
|
|
|
|
|
> post-order TREE BINDINGS
|
|
post-order is a strict subset of pre-post-order without *preorder*
|
|
(let alone *macro*) traversals.
|
|
Now pre-post-order is actually faster than the old post-order.
|
|
The function post-order is deprecated and is aliased below for
|
|
backward compatibility.
|
|
|
|
> replace-range:: BEG-PRED x END-PRED x FOREST -> FOREST
|
|
Traverse a forest depth-first and cut/replace ranges of nodes.
|
|
|
|
The nodes that define a range don't have to have the same immediate
|
|
parent, don't have to be on the same level, and the end node of a
|
|
range doesn't even have to exist. A replace-range procedure removes
|
|
nodes from the beginning node of the range up to (but not including)
|
|
the end node of the range. In addition, the beginning node of the
|
|
range can be replaced by a node or a list of nodes. The range of
|
|
nodes is cut while depth-first traversing the forest. If all
|
|
branches of the node are cut a node is cut as well. The procedure
|
|
can cut several non-overlapping ranges from a forest.
|
|
|
|
replace-range:: BEG-PRED x END-PRED x FOREST -> FOREST
|
|
where
|
|
type FOREST = (NODE ...)
|
|
type NODE = Atom | (Name . FOREST) | FOREST
|
|
|
|
The range of nodes is specified by two predicates, beg-pred and end-pred.
|
|
beg-pred:: NODE -> #f | FOREST
|
|
end-pred:: NODE -> #f | FOREST
|
|
The beg-pred predicate decides on the beginning of the range. The node
|
|
for which the predicate yields non-#f marks the beginning of the range
|
|
The non-#f value of the predicate replaces the node. The value can be a
|
|
list of nodes. The replace-range procedure then traverses the tree and skips
|
|
all the nodes, until the end-pred yields non-#f. The value of the end-pred
|
|
replaces the end-range node. The new end node and its brothers will be
|
|
re-scanned.
|
|
The predicates are evaluated pre-order. We do not descend into a node that
|
|
is marked as the beginning of the range.
|
|
|
|
|
|
|
|
|
|
================================================================
|
|
_sxml-to-html.ss_
|
|
================================================================
|
|
|
|
To load the module, do
|
|
(require (lib "sxml-to-html.ss" "ssax"))
|
|
|
|
> SXML->HTML TREE
|
|
|
|
The following procedure is the most generic transformation of SXML
|
|
into the corresponding HTML document. The SXML tree is traversed
|
|
post-oder (depth-first) and transformed into another tree, which,
|
|
written in a depth-first fashion, results in an HTML document.
|
|
|
|
> entag TAG ELEMS
|
|
Create the HTML markup for tags.
|
|
This is used in the node handlers for the post-order function, see
|
|
above.
|
|
|
|
|
|
> enattr ATTR-KEY VALUE
|
|
Create the HTML markup for attributes.
|
|
This and entag are being used in the node handlers for the post-order function, see
|
|
above.
|
|
|
|
> string->goodHTML STRING
|
|
Given a string, check to make sure it does not contain characters
|
|
such as '<' or '&' that require encoding. Return either the original
|
|
string, or a list of string fragments with special characters
|
|
replaced by appropriate character entities.
|
|
|
|
|
|
|
|
|
|
================================================================
|
|
_sxml-to-html-ext.ss_
|
|
================================================================
|
|
|
|
To load the module, do
|
|
(require (lib "sxml-to-html-ext.ss" "ssax"))
|
|
|
|
> make-header HEAD-PARMS
|
|
Create the 'head' SXML/HTML tag. HEAD-PARMS is an assoc list of
|
|
(h-key h-value), where h-value is a typically string;
|
|
h-key is a symbol:
|
|
title, description, AuthorAddress, keywords,
|
|
Date-Revision-yyyymmdd, Date-Creation-yyyymmdd,
|
|
long-title
|
|
One of the h-key can be Links.
|
|
In that case, h-value is a list of
|
|
(l-key l-href (attr value) ...)
|
|
where l-key is one of the following:
|
|
start, contents, prev, next, top, home
|
|
|
|
|
|
> make-navbar: HEAD-PARMS
|
|
Create a navigational bar. The argument head-parms is the same
|
|
as the one passed to make-header. We're only concerned with the
|
|
h-value Links
|
|
|
|
> make-footer HEAD-PARMS
|
|
Create a footer. The argument head-parms is the same
|
|
as passed to make-header.
|
|
|
|
> universal-conversion-rules
|
|
Bindings for the post-order function, which traverses the SXML tree
|
|
and converts it to a tree of fragments
|
|
|
|
The universal transformation from SXML to HTML. The following rules
|
|
work for every HTML, present and future
|
|
|
|
> universal-protected-rules
|
|
A variation of universal-conversion-rules which keeps '<', '>', '&'
|
|
and similar characters intact. The universal-protected-rules are
|
|
useful when the tree of fragments has to be traversed one more time.
|
|
|
|
> alist-conv-rules
|
|
The following rules define the identity transformation
|