diff --git a/collects/web-server/tmp/ssax/doc.txt b/collects/web-server/tmp/ssax/doc.txt index 89df59cb93..4021d807aa 100644 --- a/collects/web-server/tmp/ssax/doc.txt +++ b/collects/web-server/tmp/ssax/doc.txt @@ -1,114 +1,114 @@ -SSAX Package -============ - -A SSAX functional XML parsing framework consists of a DOM/SXML parser, a SAX -parser, and a supporting library of lexing and parsing procedures. The -procedures in the package can be used separately to tokenize or parse various -pieces of XML documents. The framework supports XML Namespaces, character, -internal and external parsed entities, attribute value normalization, -processing instructions and CDATA sections. The package includes a -semi-validating SXML parser: a DOM-mode parser that is an instantiation of -a SAX parser (called SSAX). - -SSAX is a full-featured, algorithmically optimal, pure-functional parser, -which can act as a stream processor. SSAX is an efficient SAX parser that is -easy to use. SSAX minimizes the amount of application-specific state that has -to be shared among user-supplied event handlers. SSAX makes the maintenance -of an application-specific element stack unnecessary, which eliminates several -classes of common bugs. SSAX is written in a pure-functional subset of Scheme. -Therefore, the event handlers are referentially transparent, which makes them -easier for a programmer to write and to reason about. The more expressive, -reliable and easier to use application interface for the event-driven XML -parsing is the outcome of implementing the parsing engine as an enhanced tree -fold combinator, which fully captures the control pattern of the depth-first -tree traversal. - -------------------------------------------------- - -Quick start - -; procedure: ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG -; -; This is an instance of a SSAX parser that returns an SXML -; representation of the XML document to be read from PORT. -; NAMESPACE-PREFIX-ASSIG is a list of (USER-PREFIX . URI-STRING) -; that assigns USER-PREFIXes to certain namespaces identified by -; particular URI-STRINGs. It may be an empty list. -; The procedure returns an SXML tree. The port points out to the -; first character after the root element. -(define (ssax:xml->sxml port namespace-prefix-assig) ...) - -; procedure: pre-post-order TREE BINDINGS -; -; Traversal of an SXML tree or a grove: -; a or a -; -; A and a are mutually-recursive datatypes that -; underlie the SXML tree: -; ::= (name . ) | "text string" -; An (ordered) set of nodes is just a list of the constituent nodes: -; ::= ( ...) -; Nodelists, and Nodes other than text strings are both lists. A -; however is either an empty list, or a list whose head is -; not a symbol (an atom in general). A symbol at the head of a node is -; either an XML name (in which case it's a tag of an XML element), or -; an administrative name such as '@'. -; See SXPath.scm and SSAX.scm for more information on SXML. -; -; -; Pre-Post-order traversal of a tree and creation of a new tree: -; pre-post-order:: x -> -; where -; ::= ( ...) -; ::= ( *preorder* . ) | -; ( *macro* . ) | -; ( . ) | -; ( . ) -; ::= XMLname | *text* | *default* -; :: x [] -> -; -; The pre-post-order function visits the nodes and nodelists -; pre-post-order (depth-first). For each of the form (name -; ...) it looks up an association with the given 'name' among -; its . If failed, pre-post-order tries to locate a -; *default* binding. It's an error if the latter attempt fails as -; well. Having found a binding, the pre-post-order function first -; checks to see if the binding is of the form -; ( *preorder* . ) -; If it is, the handler is 'applied' to the current node. Otherwise, -; the pre-post-order function first calls itself recursively for each -; child of the current node, with prepended to the -; in effect. The result of these calls is passed to the -; (along with the head of the current ). To be more -; precise, the handler is _applied_ to the head of the current node -; and its processed children. The result of the handler, which should -; also be a , replaces the current . If the current -; is a text string or other atom, a special binding with a symbol -; *text* is looked up. -; -; A binding can also be of a form -; ( *macro* . ) -; This is equivalent to *preorder* described above. However, the result -; is re-processed again, with the current stylesheet. -; -(define (pre-post-order tree bindings) ...) - -------------------------------------------------- - -Additional tools included into the package - -1. "access-remote.ss" - Uniform access to local and remote resources - Resolution for relative URIs in accordance with RFC 2396 - -2. "id.ss" - Creation and manipulation of the ID-index for a faster access to SXML elements - by their unique ID - Provides the DTD parser for extracting ID attribute declarations - -3. "xlink-parser.ss" - Parser for XML documents that contain XLink elements - -4. "multi-parser.ss" - SSAX multi parser: combines several specialized parsers into one - Provides creation of parent pointers to SXML document constructed +SSAX Package +============ + +A SSAX functional XML parsing framework consists of a DOM/SXML parser, a SAX +parser, and a supporting library of lexing and parsing procedures. The +procedures in the package can be used separately to tokenize or parse various +pieces of XML documents. The framework supports XML Namespaces, character, +internal and external parsed entities, attribute value normalization, +processing instructions and CDATA sections. The package includes a +semi-validating SXML parser: a DOM-mode parser that is an instantiation of +a SAX parser (called SSAX). + +SSAX is a full-featured, algorithmically optimal, pure-functional parser, +which can act as a stream processor. SSAX is an efficient SAX parser that is +easy to use. SSAX minimizes the amount of application-specific state that has +to be shared among user-supplied event handlers. SSAX makes the maintenance +of an application-specific element stack unnecessary, which eliminates several +classes of common bugs. SSAX is written in a pure-functional subset of Scheme. +Therefore, the event handlers are referentially transparent, which makes them +easier for a programmer to write and to reason about. The more expressive, +reliable and easier to use application interface for the event-driven XML +parsing is the outcome of implementing the parsing engine as an enhanced tree +fold combinator, which fully captures the control pattern of the depth-first +tree traversal. + +------------------------------------------------- + +Quick start + +; procedure: ssax:xml->sxml PORT NAMESPACE-PREFIX-ASSIG +; +; This is an instance of a SSAX parser that returns an SXML +; representation of the XML document to be read from PORT. +; NAMESPACE-PREFIX-ASSIG is a list of (USER-PREFIX . URI-STRING) +; that assigns USER-PREFIXes to certain namespaces identified by +; particular URI-STRINGs. It may be an empty list. +; The procedure returns an SXML tree. The port points out to the +; first character after the root element. +(define (ssax:xml->sxml port namespace-prefix-assig) ...) + +; procedure: pre-post-order TREE BINDINGS +; +; Traversal of an SXML tree or a grove: +; a or a +; +; A and a are mutually-recursive datatypes that +; underlie the SXML tree: +; ::= (name . ) | "text string" +; An (ordered) set of nodes is just a list of the constituent nodes: +; ::= ( ...) +; Nodelists, and Nodes other than text strings are both lists. A +; however is either an empty list, or a list whose head is +; not a symbol (an atom in general). A symbol at the head of a node is +; either an XML name (in which case it's a tag of an XML element), or +; an administrative name such as '@'. +; See SXPath.scm and SSAX.scm for more information on SXML. +; +; +; Pre-Post-order traversal of a tree and creation of a new tree: +; pre-post-order:: x -> +; where +; ::= ( ...) +; ::= ( *preorder* . ) | +; ( *macro* . ) | +; ( . ) | +; ( . ) +; ::= XMLname | *text* | *default* +; :: x [] -> +; +; The pre-post-order function visits the nodes and nodelists +; pre-post-order (depth-first). For each of the form (name +; ...) it looks up an association with the given 'name' among +; its . If failed, pre-post-order tries to locate a +; *default* binding. It's an error if the latter attempt fails as +; well. Having found a binding, the pre-post-order function first +; checks to see if the binding is of the form +; ( *preorder* . ) +; If it is, the handler is 'applied' to the current node. Otherwise, +; the pre-post-order function first calls itself recursively for each +; child of the current node, with prepended to the +; in effect. The result of these calls is passed to the +; (along with the head of the current ). To be more +; precise, the handler is _applied_ to the head of the current node +; and its processed children. The result of the handler, which should +; also be a , replaces the current . If the current +; is a text string or other atom, a special binding with a symbol +; *text* is looked up. +; +; A binding can also be of a form +; ( *macro* . ) +; This is equivalent to *preorder* described above. However, the result +; is re-processed again, with the current stylesheet. +; +(define (pre-post-order tree bindings) ...) + +------------------------------------------------- + +Additional tools included into the package + +1. "access-remote.ss" + Uniform access to local and remote resources + Resolution for relative URIs in accordance with RFC 2396 + +2. "id.ss" + Creation and manipulation of the ID-index for a faster access to SXML elements + by their unique ID + Provides the DTD parser for extracting ID attribute declarations + +3. "xlink-parser.ss" + Parser for XML documents that contain XLink elements + +4. "multi-parser.ss" + SSAX multi parser: combines several specialized parsers into one + Provides creation of parent pointers to SXML document constructed diff --git a/collects/web-server/tmp/ssax/info.ss b/collects/web-server/tmp/ssax/info.ss index b748ce5c55..d55c2b49bf 100644 --- a/collects/web-server/tmp/ssax/info.ss +++ b/collects/web-server/tmp/ssax/info.ss @@ -1,10 +1,10 @@ -(module info (lib "infotab.ss" "setup") - (define name "ssax") - (define blurb - (list "SSAX functional XML parsing framework " - "to inter-convert between an angular-bracket and " - "an S-expression-based notations for markup documents")) - (define primary-file "ssax.ss") - (define doc.txt "doc.txt") - (define categories '(xml)) - ) +(module info (lib "infotab.ss" "setup") + (define name "ssax") + (define blurb + (list "SSAX functional XML parsing framework " + "to inter-convert between an angular-bracket and " + "an S-expression-based notations for markup documents")) + (define primary-file "ssax.ss") + (define doc.txt "doc.txt") + (define categories '(xml)) + ) diff --git a/collects/web-server/tmp/sxml/doc.txt b/collects/web-server/tmp/sxml/doc.txt index fd350b2829..e4cd02a363 100644 --- a/collects/web-server/tmp/sxml/doc.txt +++ b/collects/web-server/tmp/sxml/doc.txt @@ -1,376 +1,376 @@ -SXML Package -============ - -SXML package contains a collection of tools for processing markup documents -(XML, XHTML, HTML) in the form of S-expressions (SXML, SHTML) - -You can find the API documentation in: -http://modis.ispras.ru/Lizorkin/Apidoc/index.html - -SXML tools tutorial (under construction): -http://modis.ispras.ru/Lizorkin/sxml-tutorial.html - -========================================================================== - -Description of the main high-level package components ------------------------------------------------------ - - 1. SXML-tools - 2. SXPath - SXML Query Language - 3. SXPath with context - 4. DDO SXPath - 5. Functional-style modification tool for SXML - 6. STX - Scheme-enabled XSLT processor - 7. XPathLink - query language for a set of linked documents - -------------------------------------------------- - - 1. SXML-tools - -XML is XML Infoset represented as native Scheme data - S-expressions. -Any Scheme programm can manipulate SXML data directly, and DOM-like API is not -necessary for SXML/Scheme applications. -SXML-tools (former DOMS) is just a set of handy functions which may be -convenient for some popular operations on SXML data. - -library file: Bigloo, Chicken, Gambit: "sxml/sxml-tools.scm" - PLT: "sxml-tools.ss" - -http://www.pair.com/lisovsky/xml/sxmltools/ - -------------------------------------------------- - - 2. SXPath - SXML Query Language - -SXPath is a query language for SXML. It treats a location path as a composite -query over an XPath tree or its branch. A single step is a combination of a -projection, selection or a transitive closure. Multiple steps are combined via -join and union operations. - -Lower-level SXPath consists of a set of predicates, filters, selectors and -combinators, and higher-level abbreviated SXPath functions which are -implemented in terms of lower-level functions. - -Higher level SXPath functions are dealing with XPath expressions which may be -represented as a list of steps in the location path ("native" SXPath): - (sxpath '(table (tr 3) td @ align)) -or as a textual representation of XPath expressions which is compatible with -W3C XPath recommendation ("textual" SXPath): - (sxpath "table/tr[3]/td/@align") - -An arbitrary converter implemented as a Scheme function may be used as a step -in location path of "native" SXPath, which makes it extremely powerful and -flexible tool. On other hand, a lot of W3C Recommendations such as XSLT, -XPointer, XLink depends on a textual XPath expressions. - -It is possible to combine "native" and "textual" location paths and location -step functions in one query, constructing an arbitrary XML query far beyond -capabilities of XPath. For example, the query - (sxpath `("document/chapter[3]" ,relevant-links @ author) -makes a use of location step function relevant-links which implements an -arbitrary algorithm in Scheme. - -SXPath may be considered as a compiler from abbreviated XPath (extended with -native SXPath and location step functions) to SXPath primitives. - -library file: Bigloo, Chicken, Gambit: "sxml/sxpath.scm" - PLT: "sxpath.ss" - -http://www.pair.com/lisovsky/query/sxpath/ - -------------------------------------------------- - - 3. SXPath with context - -SXPath with context provides the effective implementation for XPath reverse -axes ("parent::", "ancestor::" and such) on SXML documents. - -The limitation of SXML is the absense of an upward link from a child to its -parent, which makes the straightforward evaluation of XPath reverse axes -ineffective. The previous approach for evaluating reverse axes in SXPath was -searching for a parent from the root of the SXML tree. - -SXPath with context provides the fast reverse axes, which is achieved by -storing previously visited ancestors of the context node in the context. -With a special static analysis of an XPath expression, only the minimal -required number of ancestors is stored in the context on each location step. - -library file: Bigloo, Chicken, Gambit: "sxml/xpath-context.scm" - PLT: "xpath-context_xlink.ss" - -------------------------------------------------- - - 4. DDO SXPath - -The optimized SXPath that implements distinct document order (DDO) of the -nodeset produced. - -Unlike conventional SXPath and SXPath with context, DDO SXPath guarantees that -the execution time is at worst polynomial of the XPath expression size and of -the SXML document size. - -The API of DDO SXPath is compatible of that in conventional SXPath. The main -following kinds of optimization methods are designed and implemented in DDO -SXPath: - -- All XPath axes are implemented to keep a nodeset in distinct document - order (DDO). An axis can now be considered as a converter: - nodeset_in_DDO --> nodeset_in_DDO - -- Type inference for XPath expressions allows determining whether a - predicate involves context-position implicitly; - -- Faster evaluation for particular kinds of XPath predicates that involve - context-position, like: [position() > number] or [number]; - -- Sort-merge join algorithm implemented for XPath EqualityComparison of - two nodesets; - -- Deeply nested XPath predicates are evaluated at the very beginning of the - evaluation phase, to guarantee that evaluation of deeply nested predicates - is performed no more than once for each combination of - (context-node, context-position, context-size) - -library file: Bigloo, Chicken, Gambit: "sxml/ddo-txpath.scm" - PLT: "ddo-txpath.ss" - -http://modis.ispras.ru/Lizorkin/ddo.html - -------------------------------------------------- - - 5. Functional-style modification tool for SXML - -A tool for making functional-style modifications to SXML documents -The basics of modification language design was inspired by Patrick Lehti and -his data manipulation processor for XML Query Language: - http://www.ipsi.fraunhofer.de/~lehti/ -However, with functional techniques we can do this better... - -library file: Bigloo, Chicken, Gambit: "sxml/modif.scm" - PLT: "modif.ss" - -------------------------------------------------- - - 6. STX - Scheme-enabled XSLT processor - -STX is an XML transformation tool based on XSLT and Scheme which combines -a processor for most common XSLT stylesheets and a framework for their -extension in Scheme and provides an environment for a general-purpose -transformation of XML data. It integrates two functional languages - Scheme -and XSLT-like transformation language on the basis of the common data model - -SXML. - -library file: Bigloo, Chicken, Gambit: "stx/stx-engine.scm" - PLT: "stx-engine.ss" - -http://www.pair.com/lisovsky/transform/stx/ - -------------------------------------------------- - - 7. XPathLink - query language for a set of linked documents - -XLink is a language for describing links between resources using XML attributes -and namespaces. XLink provides expressive means for linking information in -different XML documents. With XLink, practical XML application data can be -expressed as several linked XML documents, rather than a single complicated XML -document. Such a design makes it very attractive to have a query language that -would inherently recognize XLink links and provide a natural navigation -mechanism over them. - -Such a query language has been designed and implemented in Scheme. This -language is an extension to XPath with 3 additional axes. The implementation -is naturally an extended SXPath. We call this language XPath with XLink -support, or XPathLink. - -Additionally, an HTML hyperlink can be considered as a particular case of -an XLink link. This observation makes it possible to query HTML documents with -XPathLink as well. Neil W. Van Dyke and his permissive -HTML parser HtmlPrag have made this feature possible. - -library file: Bigloo, Chicken, Gambit: "sxml/xlink.scm" - PLT: "xpath-context_xlink.ss" - -http://modis.ispras.ru/Lizorkin/xpathlink.html - - -========================================================================== - -Examples and expected results ------------------------------ - -Obtaining an SXML document from XML -(sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml") -==> -(*TOP* - (*PI* xml "version='1.0'") - (poem - (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) - (stanza - (line "Let us go then, you and I,") - (line "When the evening is spread out against the sky") - (line "Like a patient etherized upon a table:")) - (stanza - (line "In the room the women come and go") - (line "Talking of Michaelangelo.")))) - -Accessing parts of the document with SXPath -((sxpath "poem/stanza[2]/line/text()") - (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) -==> -("In the room the women come and go" "Talking of Michaelangelo.") - -Obtaining/querying HTML documents -((sxpath "html/head/title") - (sxml:document "http://modis.ispras.ru/Lizorkin/index.html")) -==> -((title "Dmitry Lizorkin homepage")) - -------------------------------------- -SXML Transformations - -Transforming the document according to XSLT stylesheet -(apply - string-append - (sxml:clean-feed - (stx:transform-dynamic - (sxml:add-parents - (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) - (stx:make-stx-stylesheet - (sxml:document - "http://modis.ispras.ru/Lizorkin/XML/poem2html.xsl" - '((xsl . "http://www.w3.org/1999/XSL/Transform"))))))) -==> -"The Lovesong of J. Alfred Prufrock -

The Lovesong of J. Alfred Prufrock

-

Let us go then, you and I,
-When the evening is spread out against the sky
-Like a patient etherized upon a table:

-

In the room the women come and go
Talking of Michaelangelo.

-T. S. Eliot" - -Expressing the same transformation in pre-post-order (requires SSAX package) -(pre-post-order - (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml") - `((*TOP* *macro* . ,(lambda top (car ((sxpath '(*)) top)))) - (poem - unquote - (lambda elem - `(html - (head - (title ,((sxpath "string(@title)") elem))) - (body - (h1 ,((sxpath "string(@title)") elem)) - ,@((sxpath "node()") elem) - (i ,((sxpath "string(@poet)") elem)))))) - (@ *preorder* . ,(lambda x x)) - (stanza . ,(lambda (tag . content) - `(p ,@(map-union (lambda (x) x) content)))) - (line . ,(lambda (tag . content) (append content '((br))))) - (*text* . ,(lambda (tag text) text)))) -==> -(html - (head (title "The Lovesong of J. Alfred Prufrock")) - (body - (h1 "The Lovesong of J. Alfred Prufrock") - (p - "Let us go then, you and I," - (br) - "When the evening is spread out against the sky" - (br) - "Like a patient etherized upon a table:" - (br)) - (p "In the room the women come and go" (br) - "Talking of Michaelangelo." (br)) - (i "T. S. Eliot"))) - -------------------------------------- -XPathLink: a query language with XLink support - -Returning a chapter element that is linked with the first item -in the table of contents -((sxpath/c "doc/item[1]/traverse::chapter") - (xlink:documents "http://modis.ispras.ru/Lizorkin/XML/doc.xml")) -==> -((chapter (@ (id "chap1")) - (title "Abstract") - (p "This document describes about XLink Engine..."))) - -Traversing between documents with XPathLink -((sxpath/c "descendant::a[.='XPathLink']/traverse::html/ - descendant::blockquote[1]/node()") - (xlink:documents "http://modis.ispras.ru/Lizorkin/index.html")) -==> -((b "Abstract: ") - "\r\n" - "XPathLink is a query language for XML documents linked with XLink links.\r\n" - "XPathLink is based on XPath and extends it with transparent XLink support.\r\n" - "The implementation of XPathLink in Scheme is provided.\r\n") - -------------------------------------- -SXML Modifications - -Modifying the SXML representation of the document -((sxml:modify '("/poem/stanza[2]" move-preceding "preceding-sibling::stanza")) - (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) -==> -(*TOP* - (*PI* xml "version='1.0'") - (poem - (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) - (stanza - (line "In the room the women come and go") - (line "Talking of Michaelangelo.")) - (stanza - (line "Let us go then, you and I,") - (line "When the evening is spread out against the sky") - (line "Like a patient etherized upon a table:")))) - -------------------------------------- -DDO SXPath: the optimized XPath implementation - -Return all text nodes that follow the keyword ``XPointer'' and -that are not descendants of the element appendix -((ddo:sxpath "//text()[contains(., 'XPointer')]/ - following::text()[not(./ancestor::appendix)]") - (sxml:document "http://modis.ispras.ru/Lizorkin/XML/doc.xml")) -==> -("XPointer is the fragment identifier of documents having the mime-type..." - "Models for using XLink/XPointer " - "There are important keywords." - "samples" - "Conclusion" - "Thanks a lot.") - -------------------------------------- -Lazy XML processing - -Lazy XML-to-SXML conversion -(define doc - (lazy:xml->sxml - (open-input-resource "http://modis.ispras.ru/Lizorkin/XML/poem.xml") - '())) -doc -==> -(*TOP* - (*PI* xml "version='1.0'") - (poem - (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) - (stanza (line "Let us go then, you and I,") #) - #)) - -Querying a lazy SXML document, lazyly -(define res ((lazy:sxpath "poem/stanza/line[1]") doc)) -res -==> -((line "Let us go then, you and I,") #) - -Obtain the next portion of the result -(force (cadr res)) -==> -((line "In the room the women come and go") #) - -Converting the lazy result to a conventional SXML nodeset -(lazy:result->list res) -==> -((line "Let us go then, you and I,") - (line "In the room the women come and go")) +SXML Package +============ + +SXML package contains a collection of tools for processing markup documents +(XML, XHTML, HTML) in the form of S-expressions (SXML, SHTML) + +You can find the API documentation in: +http://modis.ispras.ru/Lizorkin/Apidoc/index.html + +SXML tools tutorial (under construction): +http://modis.ispras.ru/Lizorkin/sxml-tutorial.html + +========================================================================== + +Description of the main high-level package components +----------------------------------------------------- + + 1. SXML-tools + 2. SXPath - SXML Query Language + 3. SXPath with context + 4. DDO SXPath + 5. Functional-style modification tool for SXML + 6. STX - Scheme-enabled XSLT processor + 7. XPathLink - query language for a set of linked documents + +------------------------------------------------- + + 1. SXML-tools + +XML is XML Infoset represented as native Scheme data - S-expressions. +Any Scheme programm can manipulate SXML data directly, and DOM-like API is not +necessary for SXML/Scheme applications. +SXML-tools (former DOMS) is just a set of handy functions which may be +convenient for some popular operations on SXML data. + +library file: Bigloo, Chicken, Gambit: "sxml/sxml-tools.scm" + PLT: "sxml-tools.ss" + +http://www.pair.com/lisovsky/xml/sxmltools/ + +------------------------------------------------- + + 2. SXPath - SXML Query Language + +SXPath is a query language for SXML. It treats a location path as a composite +query over an XPath tree or its branch. A single step is a combination of a +projection, selection or a transitive closure. Multiple steps are combined via +join and union operations. + +Lower-level SXPath consists of a set of predicates, filters, selectors and +combinators, and higher-level abbreviated SXPath functions which are +implemented in terms of lower-level functions. + +Higher level SXPath functions are dealing with XPath expressions which may be +represented as a list of steps in the location path ("native" SXPath): + (sxpath '(table (tr 3) td @ align)) +or as a textual representation of XPath expressions which is compatible with +W3C XPath recommendation ("textual" SXPath): + (sxpath "table/tr[3]/td/@align") + +An arbitrary converter implemented as a Scheme function may be used as a step +in location path of "native" SXPath, which makes it extremely powerful and +flexible tool. On other hand, a lot of W3C Recommendations such as XSLT, +XPointer, XLink depends on a textual XPath expressions. + +It is possible to combine "native" and "textual" location paths and location +step functions in one query, constructing an arbitrary XML query far beyond +capabilities of XPath. For example, the query + (sxpath `("document/chapter[3]" ,relevant-links @ author) +makes a use of location step function relevant-links which implements an +arbitrary algorithm in Scheme. + +SXPath may be considered as a compiler from abbreviated XPath (extended with +native SXPath and location step functions) to SXPath primitives. + +library file: Bigloo, Chicken, Gambit: "sxml/sxpath.scm" + PLT: "sxpath.ss" + +http://www.pair.com/lisovsky/query/sxpath/ + +------------------------------------------------- + + 3. SXPath with context + +SXPath with context provides the effective implementation for XPath reverse +axes ("parent::", "ancestor::" and such) on SXML documents. + +The limitation of SXML is the absense of an upward link from a child to its +parent, which makes the straightforward evaluation of XPath reverse axes +ineffective. The previous approach for evaluating reverse axes in SXPath was +searching for a parent from the root of the SXML tree. + +SXPath with context provides the fast reverse axes, which is achieved by +storing previously visited ancestors of the context node in the context. +With a special static analysis of an XPath expression, only the minimal +required number of ancestors is stored in the context on each location step. + +library file: Bigloo, Chicken, Gambit: "sxml/xpath-context.scm" + PLT: "xpath-context_xlink.ss" + +------------------------------------------------- + + 4. DDO SXPath + +The optimized SXPath that implements distinct document order (DDO) of the +nodeset produced. + +Unlike conventional SXPath and SXPath with context, DDO SXPath guarantees that +the execution time is at worst polynomial of the XPath expression size and of +the SXML document size. + +The API of DDO SXPath is compatible of that in conventional SXPath. The main +following kinds of optimization methods are designed and implemented in DDO +SXPath: + +- All XPath axes are implemented to keep a nodeset in distinct document + order (DDO). An axis can now be considered as a converter: + nodeset_in_DDO --> nodeset_in_DDO + +- Type inference for XPath expressions allows determining whether a + predicate involves context-position implicitly; + +- Faster evaluation for particular kinds of XPath predicates that involve + context-position, like: [position() > number] or [number]; + +- Sort-merge join algorithm implemented for XPath EqualityComparison of + two nodesets; + +- Deeply nested XPath predicates are evaluated at the very beginning of the + evaluation phase, to guarantee that evaluation of deeply nested predicates + is performed no more than once for each combination of + (context-node, context-position, context-size) + +library file: Bigloo, Chicken, Gambit: "sxml/ddo-txpath.scm" + PLT: "ddo-txpath.ss" + +http://modis.ispras.ru/Lizorkin/ddo.html + +------------------------------------------------- + + 5. Functional-style modification tool for SXML + +A tool for making functional-style modifications to SXML documents +The basics of modification language design was inspired by Patrick Lehti and +his data manipulation processor for XML Query Language: + http://www.ipsi.fraunhofer.de/~lehti/ +However, with functional techniques we can do this better... + +library file: Bigloo, Chicken, Gambit: "sxml/modif.scm" + PLT: "modif.ss" + +------------------------------------------------- + + 6. STX - Scheme-enabled XSLT processor + +STX is an XML transformation tool based on XSLT and Scheme which combines +a processor for most common XSLT stylesheets and a framework for their +extension in Scheme and provides an environment for a general-purpose +transformation of XML data. It integrates two functional languages - Scheme +and XSLT-like transformation language on the basis of the common data model - +SXML. + +library file: Bigloo, Chicken, Gambit: "stx/stx-engine.scm" + PLT: "stx-engine.ss" + +http://www.pair.com/lisovsky/transform/stx/ + +------------------------------------------------- + + 7. XPathLink - query language for a set of linked documents + +XLink is a language for describing links between resources using XML attributes +and namespaces. XLink provides expressive means for linking information in +different XML documents. With XLink, practical XML application data can be +expressed as several linked XML documents, rather than a single complicated XML +document. Such a design makes it very attractive to have a query language that +would inherently recognize XLink links and provide a natural navigation +mechanism over them. + +Such a query language has been designed and implemented in Scheme. This +language is an extension to XPath with 3 additional axes. The implementation +is naturally an extended SXPath. We call this language XPath with XLink +support, or XPathLink. + +Additionally, an HTML
hyperlink can be considered as a particular case of +an XLink link. This observation makes it possible to query HTML documents with +XPathLink as well. Neil W. Van Dyke and his permissive +HTML parser HtmlPrag have made this feature possible. + +library file: Bigloo, Chicken, Gambit: "sxml/xlink.scm" + PLT: "xpath-context_xlink.ss" + +http://modis.ispras.ru/Lizorkin/xpathlink.html + + +========================================================================== + +Examples and expected results +----------------------------- + +Obtaining an SXML document from XML +(sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml") +==> +(*TOP* + (*PI* xml "version='1.0'") + (poem + (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) + (stanza + (line "Let us go then, you and I,") + (line "When the evening is spread out against the sky") + (line "Like a patient etherized upon a table:")) + (stanza + (line "In the room the women come and go") + (line "Talking of Michaelangelo.")))) + +Accessing parts of the document with SXPath +((sxpath "poem/stanza[2]/line/text()") + (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) +==> +("In the room the women come and go" "Talking of Michaelangelo.") + +Obtaining/querying HTML documents +((sxpath "html/head/title") + (sxml:document "http://modis.ispras.ru/Lizorkin/index.html")) +==> +((title "Dmitry Lizorkin homepage")) + +------------------------------------- +SXML Transformations + +Transforming the document according to XSLT stylesheet +(apply + string-append + (sxml:clean-feed + (stx:transform-dynamic + (sxml:add-parents + (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) + (stx:make-stx-stylesheet + (sxml:document + "http://modis.ispras.ru/Lizorkin/XML/poem2html.xsl" + '((xsl . "http://www.w3.org/1999/XSL/Transform"))))))) +==> +"The Lovesong of J. Alfred Prufrock +

The Lovesong of J. Alfred Prufrock

+

Let us go then, you and I,
+When the evening is spread out against the sky
+Like a patient etherized upon a table:

+

In the room the women come and go
Talking of Michaelangelo.

+T. S. Eliot" + +Expressing the same transformation in pre-post-order (requires SSAX package) +(pre-post-order + (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml") + `((*TOP* *macro* . ,(lambda top (car ((sxpath '(*)) top)))) + (poem + unquote + (lambda elem + `(html + (head + (title ,((sxpath "string(@title)") elem))) + (body + (h1 ,((sxpath "string(@title)") elem)) + ,@((sxpath "node()") elem) + (i ,((sxpath "string(@poet)") elem)))))) + (@ *preorder* . ,(lambda x x)) + (stanza . ,(lambda (tag . content) + `(p ,@(map-union (lambda (x) x) content)))) + (line . ,(lambda (tag . content) (append content '((br))))) + (*text* . ,(lambda (tag text) text)))) +==> +(html + (head (title "The Lovesong of J. Alfred Prufrock")) + (body + (h1 "The Lovesong of J. Alfred Prufrock") + (p + "Let us go then, you and I," + (br) + "When the evening is spread out against the sky" + (br) + "Like a patient etherized upon a table:" + (br)) + (p "In the room the women come and go" (br) + "Talking of Michaelangelo." (br)) + (i "T. S. Eliot"))) + +------------------------------------- +XPathLink: a query language with XLink support + +Returning a chapter element that is linked with the first item +in the table of contents +((sxpath/c "doc/item[1]/traverse::chapter") + (xlink:documents "http://modis.ispras.ru/Lizorkin/XML/doc.xml")) +==> +((chapter (@ (id "chap1")) + (title "Abstract") + (p "This document describes about XLink Engine..."))) + +Traversing between documents with XPathLink +((sxpath/c "descendant::a[.='XPathLink']/traverse::html/ + descendant::blockquote[1]/node()") + (xlink:documents "http://modis.ispras.ru/Lizorkin/index.html")) +==> +((b "Abstract: ") + "\r\n" + "XPathLink is a query language for XML documents linked with XLink links.\r\n" + "XPathLink is based on XPath and extends it with transparent XLink support.\r\n" + "The implementation of XPathLink in Scheme is provided.\r\n") + +------------------------------------- +SXML Modifications + +Modifying the SXML representation of the document +((sxml:modify '("/poem/stanza[2]" move-preceding "preceding-sibling::stanza")) + (sxml:document "http://modis.ispras.ru/Lizorkin/XML/poem.xml")) +==> +(*TOP* + (*PI* xml "version='1.0'") + (poem + (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) + (stanza + (line "In the room the women come and go") + (line "Talking of Michaelangelo.")) + (stanza + (line "Let us go then, you and I,") + (line "When the evening is spread out against the sky") + (line "Like a patient etherized upon a table:")))) + +------------------------------------- +DDO SXPath: the optimized XPath implementation + +Return all text nodes that follow the keyword ``XPointer'' and +that are not descendants of the element appendix +((ddo:sxpath "//text()[contains(., 'XPointer')]/ + following::text()[not(./ancestor::appendix)]") + (sxml:document "http://modis.ispras.ru/Lizorkin/XML/doc.xml")) +==> +("XPointer is the fragment identifier of documents having the mime-type..." + "Models for using XLink/XPointer " + "There are important keywords." + "samples" + "Conclusion" + "Thanks a lot.") + +------------------------------------- +Lazy XML processing + +Lazy XML-to-SXML conversion +(define doc + (lazy:xml->sxml + (open-input-resource "http://modis.ispras.ru/Lizorkin/XML/poem.xml") + '())) +doc +==> +(*TOP* + (*PI* xml "version='1.0'") + (poem + (@ (title "The Lovesong of J. Alfred Prufrock") (poet "T. S. Eliot")) + (stanza (line "Let us go then, you and I,") #) + #)) + +Querying a lazy SXML document, lazyly +(define res ((lazy:sxpath "poem/stanza/line[1]") doc)) +res +==> +((line "Let us go then, you and I,") #) + +Obtain the next portion of the result +(force (cadr res)) +==> +((line "In the room the women come and go") #) + +Converting the lazy result to a conventional SXML nodeset +(lazy:result->list res) +==> +((line "Let us go then, you and I,") + (line "In the room the women come and go")) diff --git a/collects/web-server/tmp/sxml/info.ss b/collects/web-server/tmp/sxml/info.ss index 16967632db..71c9dd93f5 100644 --- a/collects/web-server/tmp/sxml/info.ss +++ b/collects/web-server/tmp/sxml/info.ss @@ -1,10 +1,10 @@ -(module info (lib "infotab.ss" "setup") - (define name "sxml") - (define blurb - (list "Collection of tools for processing markup documents " - "in the form of S-expressions")) - (define primary-file "sxml.ss") - (define doc.txt "doc.txt") - (define homepage "http://modis.ispras.ru/Lizorkin/sxml-tutorial.html") - (define categories '(xml)) - ) +(module info (lib "infotab.ss" "setup") + (define name "sxml") + (define blurb + (list "Collection of tools for processing markup documents " + "in the form of S-expressions")) + (define primary-file "sxml.ss") + (define doc.txt "doc.txt") + (define homepage "http://modis.ispras.ru/Lizorkin/sxml-tutorial.html") + (define categories '(xml)) + )