diff --git a/collects/net/scribblings/head.scrbl b/collects/net/scribblings/head.scrbl index 4b186161c7..28a6b258a6 100644 --- a/collects/net/scribblings/head.scrbl +++ b/collects/net/scribblings/head.scrbl @@ -199,8 +199,12 @@ should be returned: (extract-addresses "John Doe " 'all) (extract-addresses "doe@localhost (Johnny Doe)" 'all) (extract-addresses "doe@localhost" 'all) - (extract-addresses " \"Doe, John\" , jane" - 'all) + (define r + (extract-addresses " \"John\" , jane" + 'all)) + (length r) + (car r) + (cadr r) ]} ]} diff --git a/collects/xml/doc.txt b/collects/xml/doc.txt deleted file mode 100644 index 2b094bedd5..0000000000 --- a/collects/xml/doc.txt +++ /dev/null @@ -1,370 +0,0 @@ -_XML_ Library -============= - -Files: _xml.ss_ -Signature: _xml^_ - -Basic XML Data Types -==================== - -Document: - This structure represents an XML document. The only useful part is - the document-element, which contains all the content. The rest of - of the structure contains DTD information, which isn't supported, - and processing-instructions. - -Element: - Each pair of start/end tags and everything in between is an element. - It has the following pieces: - a name - attributes - contents including sub-elements -Xexpr: - S-expression representations of XML data. - -The end of this document has more details. - -Exceptions -========== - -> (define-struct (exn:invalid-xexpr exn) (code)) - Raised by validate-xexpr when passed an invalid Xexpr. Code contains an - invalid part of an Xexpr. - -Functions -========= - -> read-xml : [Input-port] -> Document - reads in an XML document from the given or current input port - XML documents contain exactly one element. It throws an xml-read:error - if there isn't any element or if there are more than one element. - - Malformed xml is reported with source locations in - the form `l.c/o', where l, c, and o are the line number, - column number, and next port position, respectively as - returned by port-next-location. - - Any non-characters other than eof read from the input-port will - appear in the document content. Such special values may only appear - where XML content may. See make-input-port for information - about creating ports that return non-character values. - -> read-xml/element : [Input-port] -> Element - reads an XML element from the port. The next non-whitespace character - read must start an XML element. The input-port may contain other data - after the element. - -> syntax:read-xml : [Input-port] -> Syntax - reads in an XML document and produces a syntax object version of - an xexpression. - -> syntax:read-xml/element : [Input-port] -> Syntax - is just like read-xml/element except it produces a syntax version - of an xexpression - -> write-xml : Document [Output-port] -> Void - writes a document to the given or current output port, currently - ignoring everything except the document's root element. - -> write-xml/content : Content [Output-port] -> Void - writes a document's contents to the given or current output port - -> display-xml : Document [Output-port] -> Void - just like write-xml, but newlines and indentation make the output more - readable, though less technically correct when white space is - significant. - -> display-xml/content : Content [Output-port] -> Void - just like write-xml/content, but with indentation and newlines - -> xml->xexpr : Content -> Xexpr - converts the interesting part of an XML document into an Xexpression - -> xexpr->xml : Xexpr -> Content - converts an Xexpression into the interesting part of an XML document - -> xexpr->string : Xexpression -> String - converts an Xexpression into a string representation - -> eliminate-whitespace : (listof Symbol) (Bool -> Bool) -> Element -> Element - Some elements should not contain any text, only other tags, except they - often contain whitespace for formating purposes. Given a list of tag - names and the identity function, eliminate-whitespace produces a - function that filters out pcdata consisting solely of whitespace from - those elements and raises an error if any non-whitespace text appears. - Passing in the function called "not" instead of the identity function - filters all elements which are not named in the list. Using void - filters all elements regardless of the list. - -> xexpr? : any -> Boolean - Is the given thing an Xexpr? - -> validate-xexpr : any -> #t - If the given thing is an Xexpr, produce true. Otherwise, raise - _exn:invalid-xexpr_, with the message set to "Expected something, given - something-else", where "something" is what it expected and - "something-else" set to what it was really given; and the code set to - the part of the non-Xexpr that caused the exception. - -> correct-xexpr? : any (-> a) (exn -> a) -> a - If the given thing is an Xexpr, produce an a. Otherwise call the - second function with an exn:invalid-xexpr. This second function - may inspect this structure and decide to return a "correct" value. - This is a method of extending the definition of an Xexpr and is used - by the web-server's Xexpr/callbacks. (See for an example.) - -Parameters -========== - -> empty-tag-shorthand : 'always | 'never | (listof Symbol) - Default: 'always - This determines if the output functions should use the - tag notation instead of writing . If the - argument is 'always, the abbreviated notation is always used, - and if the argument is 'never, the open/close pair is always - generated. If a list of symbols is provided, tags with names - in this list will be abbreviated. The first form is the - preferred XML notation. However, most browsers designed for - HTML will only properly render XHTML if the document uses a - mixture of the two formats. _html-empty-tags_ contains the W3 - consortium's recommended list of XHTML tags that should use the - shorthand. - -> collapse-whitespace : Bool - Default: #f - All consecutive whitespace is replaced by a single space. - CDATA sections are not affected. - -> trim-whitespace : Bool - This parameter no longer exists. Consider using collapse-whitespace - and eliminate-whitespace instead. - -> read-comments : Bool - Default: #f - Comments, by definition, should be ignored by programs. However, - interoperating with ad hoc extensions to other languages sometimes - requires processing comments anyway. - -> xexpr-drop-empty-attributes : Bool - Default: #f - It's easier to write functions processing Xexpressions, if they always - have a list of attributes. On the other hand, it's less cumbersome to - write Xexpresssions by hand without empty lists of attributes - everywhere. Normally xml->xexpr leaves in empty attribute lists. - Setting this parameter to #t drops them, so further editing the - Xexpression by hand is less annoying. - -Examples -======== - -Reading an Xexpression: - (xml->xexpr (document-element (read-xml input-port))) - -Writing an Xexpression: - (empty-tag-shorthand html-empty-tags) - (write-xml/content (xexpr->xml `(html (head (title ,banner)) - (body ((bgcolor "white")) - ,text))) - output-port) - -What this Library Doesn't Provide -================================= - - Document Type Declaration (DTD) processing - Validation - Expanding user-defined entities - Reading user-defined entities in attributes - Unicode support - -XML Datatype Details -==================== - -Note: Users of the XML collection don't need to know most of these definitions. - -Note: Xexpr is the only important one to understand. Even then, - Processing-instructions may be ignored. - -> Xexpr = String - | (cons Symbol (cons (listof (list Symbol String)) (listof Xexpr))) - | (cons Symbol (listof Xexpr)) ;; an element with no attributes - | Symbol ;; symbolic entities such as   - | Number ;; numeric entities like  - | Cdata - | Misc - -> Document = (make-document Prolog Element (listof Processing-instruction)) - (define-struct document (prolog element misc)) - -> Prolog = (make-prolog (listof Misc) Document-type [Misc ...]) - (define-struct prolog (misc dtd misc2)) - The last field is a (listof Misc), but the maker accepts optional - arguments instead for backwards compatibility. - -> Document-type = #f | (make-document-type Symbol External-dtd #f) - (define-struct document-type (name external inlined)) - -> External-dtd = (make-external-dtd/public str str) - | (make-external-dtd/system str) - | #f - (define-struct external-dtd (system)) - (define-struct (external-dtd/public external-dtd) (public)) - (define-struct (external-dtd/system external-dtd) ()) - -> Element = (make-element Location Location - Symbol - (listof Attribute) - (listof Content)) - (define-struct (element struct:source) (name attributes content)) - -> Attribute = (make-attribute Location Location Symbol String) - (define-struct (attribute struct:source) (name value)) - -> Content = Pcdata - | Element - | Entity - | Misc - -> Misc = Comment - | Processing-instruction - -> Pcdata = (make-pcdata Location Location String) - (define-struct (pcdata struct:source) (string)) - -> Cdata = (make-cdata Location Location String) - (define-struct (cdata struct:source) (string)) - Note: The string of a cdata structure is assumed to be of the form - "" with proper quoting. If this is an incorrect - assumption, this library will generate invalid XML. - -> Entity = (make-entity Location Location (U Nat Symbol)) - (define-struct (entity struct:source) (text)) - -> Processing-instruction = (make-pi Location Location String String) - (define-struct (pi struct:source) (target-name instruction)) - -> Comment = (make-comment String) - (define-struct comment (text)) - -> Source = (make-source Location Location) - (define-struct source (start stop)) - -> Location = (make-location Nat Nat Nat) | Symbol - (define-struct location (line char offset)) - Note: read-xml records location structures, while xexpr->xml inserts a - symbol. Other functions that must fabricate XML Locations - without prior source location should use a sensible "comment" symbol. - - -The PList Library -================= - -Files: _plist.ss_ - -The PList library provides the ability to read and write xml documents which -conform to the "plist" DTD, used to store 'dictionaries' of string - value -associations. This format is typically used by Mac OS X --- the operating -system and its applications --- to store all kinds of data. - -To Load -======= - -(require (lib "plist.ss" "xml")) - -Functions -========= - -> read-plist : Port -> PLDict - reads a plist from a port, and produces a 'dict' x-expression - -> write-plist : PLDict Port -> Void - writes a plist to the given port. May raise the exn:application:type - exception if the plist is badly formed. - -Datatypes -========= - -NB: all of these are subtypes of x-expression: - -> PLDict = (list 'dict Assoc-pair ...) - -> PLAssoc-pair = (list 'assoc-pair String PLValue) - -> PLValue = String - - | (list 'true) - | (list 'false) - | (list 'integer Integer) - | (list 'real Real) - | PLDict - | PLArray - -> PLArray = (list 'array PLValue ...) - -In fact, the PList DTD also defines Data and Date types, but we're ignoring -these for the moment. - -Examples -======== - -Here's a sample PLDict: - -(define my-dict - `(dict (assoc-pair "first-key" - "just a string - with some whitespace in it") - (assoc-pair "second-key" - (false)) - (assoc-pair "third-key" - (dict )) - (assoc-pair "fourth-key" - (dict (assoc-pair "inner-key" - (real 3.432)))) - (assoc-pair "fifth-key" - (array (integer 14) - "another string" - (true))) - (assoc-pair "sixth-key" - (array)))) - -Let's write it to disk: - - (call-with-output-file "/Users/clements/tmp.plist" - (lambda (port) - (write-plist my-dict port)) - 'truncate) - -Let's read it back from the disk: - - (define new-dict - (call-with-input-file "/Users/clements/tmp.plist" - (lambda (port) - (read-plist port)))) - -Here's what that (hand-formatted) text file looks like: - - - - - - first-key - just a string - with some whitespace in it - second-key - - third-key - - fourth-key - - inner-key - 3.432 - - fifth-key - - 14 - another string - - - sixth-key - - - diff --git a/collects/xml/info.ss b/collects/xml/info.ss index 48f9ec74bc..ad8b0b2042 100644 --- a/collects/xml/info.ss +++ b/collects/xml/info.ss @@ -5,3 +5,5 @@ ;; bit) more information (define tools '(("text-box-tool.ss"))) (define tool-names '("Text Box")) + +(define scribblings '(("xml.scrbl" ()))) diff --git a/collects/xml/xml.scrbl b/collects/xml/xml.scrbl new file mode 100644 index 0000000000..9656066d08 --- /dev/null +++ b/collects/xml/xml.scrbl @@ -0,0 +1,432 @@ +#lang scribble/doc +@(require scribble/manual + scribble/bnf + scribble/eval + (for-label scheme/base + scheme/contract + xml + xml/plist)) + +@(define xml-eval (make-base-eval)) +@(define plist-eval (make-base-eval)) +@interaction-eval[#:eval xml-eval (require xml)] +@interaction-eval[#:eval plist-eval (require xml/plist)] + +@title{@bold{XML}: Parsing and Writing} + +@defmodule[xml] + +The @schememodname[xml] library provides functions for parsing and +generating XML. XML can be represented as an instance of the +@scheme[document] structure type, or as a kind of S-expression that is +called an @deftech{X-expression}. + +The @schememodname[xml] library does not provides Document Type +Declaration (DTD) processing, validation, expanding user-defined +entities, or reading user-defined entities in attributes. + +@; ---------------------------------------------------------------------- + +@section{Datatypes} + +@defproc[(xexpr? [v any/c]) boolean?]{ + +Returns @scheme[#t] if @scheme[v] is a @tech{X-expression}, @scheme[#f] otherwise. + +The following grammar describes expressions that create @tech{X-expressions}: + +@schemegrammar[ +#:literals (cons list) +xexpr string + (list symbol (list (list symbol string) ...) xexpr ...) + (cons symbol (list xexpr ...)) + symbol + exact-nonnegative-integer + cdata + misc +] + +A @scheme[_string] is literal data. When converted to an XML stream, +the characters of the data will be escaped as necessary. + +A pair represents an element, optionally with attributes. Each +attribute's name is represented by a symbol, and its value is +represented by a string. + +A @scheme[_symbol] represents a symbolic entity. For example, +@scheme['nbsp] represents @litchar{ }. + +An @scheme[_exact-nonnegative-integer] represents a numeric entity. For example, +@schemevalfont{#x20} represents @litchar{}. + +A @scheme[_cdata] is an instance of the @scheme[cdata] structure type, +and a @scheme[_misc] is an instance of the @scheme[comment] or +@scheme[pcdata] structure types.} + +@defstruct[document ([prolog prolog?] + [element element?] + [misc (or/c comment? pcdata?)])]{ + +Represents a document.} + +@defstruct[prolog ([misc (listof (or/c comment? pcdata?))] + [dtd (or/c document-type false/c)] + [misc2 (listof (or/c comment? pcdata?))])]{ + +Represents a document prolog. The @scheme[make-prolog] binding is +unusual: it accepts two or more arguments, and all arguments after the +first two are collected into the @scheme[misc2] field.} + +@defstruct[document-type ([name symbol?] + [external external-dtd?] + [inlined false/c])]{ + +Represents a document type.} + +@deftogether[( +@defstruct[external-dtd ([system string?])] +@defstruct[(external-dtd/public external-dtd) ([public string?])] +@defstruct[(external-dtd/system external-dtd) ()] +)]{ + +Represents an externally defined DTD.} + +@defstruct[(element source) ([name symbol?] + [attributes (listof attribute?)] + [content (listof content?)])]{ + +Represents an element.} + +@defproc[(content? [v any/c]) boolean?]{ + +Returns @scheme[#t] if @scheme[v] is a @scheme[pcdata] instance, +@scheme[element] instance, an @scheme[entity] instance, +@scheme[comment], or @scheme[pcdata] instance.} + +@defstruct[(attribute source) ([name symbol?] [value string?])]{ + +Represents an attribute within an element.} + +@defstruct[(entity source) ([text (or/c symbol? exact-nonnegative-integer?)])]{ + +Represents a symbolic or numerical entity.} + +@defstruct[(pcdata source) ([string string?])]{ + +Represents PCDATA content.} + + +@defstruct[(cdata source) ([string string?])]{ + +Represents CDATA content. + +The @scheme[string] field is assumed to be of the form +@litchar{} with proper quoting +of @nonterm{content}. Otherwise, @scheme[write-xml] generates +incorrect output.} + +@defstruct[(p-i source) ([target-name string?] + [instruction string?])]{ + +Represents a processing instruction.} + + +@defstruct[comment ([text string?])]{ + +Represents a comment.} + + +@defstruct[source ([start (or/c location? symbol?)] + [stop (or/c location? symbol?)])]{ + +Represents a source location. Other structure types extend @scheme[source]. + +When XML is generated from an input stream by @scheme[read-xml], +locations are represented by @scheme[location] instances. When XML +structures are generated by @scheme[xexpr->xml], then locations are +symbols.} + + +@defstruct[location ([line exact-nonnegative-integer?] + [char exact-nonnegative-integer?] + [offset exact-nonnegative-integer?])]{ + +Represents a location in an input stream.} + + +@defstruct[(exn:invalid-xexpr exn) ([code any/c])]{ + +Raised by @scheme[validate-xexpr] when passed an invalid +@tech{X-expression}. The @scheme[code] fields contains an invalid part +of the input to @scheme[validate-xexpr].} + +@; ---------------------------------------------------------------------- + +@section{Reading and Writing XML} + +@defproc[(read-xml [in input-port? (current-input-port)]) document?]{ + +Reads in an XML document from the given or current input port XML +documents contain exactly one element, raising @scheme[xml-read:error] +if the input stream has zero elements or more than one element. + +Malformed xml is reported with source locations in the form +@nonterm{l}@litchar{.}@nonterm{c}@litchar{/}@nonterm{o}, where +@nonterm{l}, @nonterm{c}, and @nonterm{o} are the line number, column +number, and next port position, respectively as returned by +@scheme[port-next-location]. + +Any non-characters other than @scheme[eof] read from the input-port +appear in the document content. Such special values may appear only +where XML content may. See @scheme[make-input-port] for information +about creating ports that return non-character values. + +@examples[ +#:eval xml-eval +(xml->xexpr (document-element + (read-xml (open-input-string + "hi there!")))) +]} + +@defproc[(read-xml/element [in input-port? (current-input-port)]) element?]{ + +Reads a single XML element from the port. The next non-whitespace +character read must start an XML element, but the input port can +contain other data after the element.} + +@defproc[(syntax:read-xml [in input-port? (current-input-port)]) syntax?]{ + +Reads in an XML document and produces a syntax object version (like +@scheme[read-syntax]) of an @tech{X-expression}.} + +@defproc[(syntax:read-xml/element [in input-port? (current-input-port)]) syntax?]{ + +Like @scheme[syntax:real-xml], but it reads an XML element like +@scheme[read-xml/element].} + +@defproc[(write-xml [doc document?] [out output-port? (current-output-port)]) + void?]{ + +Writes a document to the given output port, currently ignoring +everything except the document's root element.} + +@defproc[(write-xml/content [content content?] [out output-port? (current-output-port)]) + void?]{ + +Writes document content to the given output port.} + +@defproc[(display-xml [doc document?] [out output-port? (current-output-port)]) + void?]{ + +Like @scheme[write-xml], but newlines and indentation make the output +more readable, though less technically correct when whitespace is +significant.} + +@defproc[(display-xml/content [content content?] [out output-port? (current-output-port)]) + void?]{ + +Like @scheme[write-xml/content], but with indentation and newlines +like @scheme[display-xml].} + + +@; ---------------------------------------------------------------------- + +@section{XML and X-expression Conversions} + +@defproc[(xml->xexpr [content content?]) xexpr?]{ + +Converts document content into an @tech{X-expression}.} + +@defproc[(xexpr->xml [xexpr xexpr?]) content?]{ + +Converts an @tech{X-expression} into XML content.} + +@defproc[(xexpr->string [xexpr xexpr?]) string?]{ + +Converts an @tech{X-expression} into a string containing XML.} + +@defproc[(eliminate-whitespace [tags (listof symbol?)] + [choose (boolean? . -> . any/c)] + [elem element?]) + element?]{ + +Some elements should not contain any text, only other tags, except +they often contain whitespace for formating purposes. Given a list of +tag names as @scheme[tag]s and the identity function as +@scheme[choose], @scheme[eliminate-whitespace] produces a function +that filters out PCDATA consisting solely of whitespace from those +elements, and it raises an error if any non-whitespace text appears. +Passing in @scheme[not] as @scheme[choose] filters all elements which +are not named in the @scheme[tags] list. Using @scheme[void] as +@scheme[choose] filters all elements regardless of the @scheme[tags] +list.} + +@defproc[(validate-xexpr [v any/c]) (one-of/c #t)]{ + +If @scheme[v] is an @tech{X-expression}, the result +@scheme[#t]. Otherwise, @scheme[exn:invalid-xexpr]s is raised, with +the a message of the form ``Expected @nonterm{something}, given +@nonterm{something-else}/'' The @scheme[code] field of the exception +is the part of @scheme[v] that caused the exception.} + +@defproc[(correct-xexpr? [v any/c] + [success-k (-> any/c)] + [fail-k (exn:invalid-xexpr? . -> . any/c)]) + any/c]{ + +Like @scheme[validate-expr], except that @scheme[success-k] is called +on each valid leaf, and @scheme[fail-k] is called on invalid leaves; +the @scheme[fail-k] may return a value instead of raising an exception +of otherwise escaping. Results from the leaves are combined with +@scheme[and] to arrive at the final result.} + +@; ---------------------------------------------------------------------- + +@section{Parameters} + +@defparam[empty-tag-shorthand shorthand (or/c (one-of/c 'always 'never) (listof symbol?))]{ + +A parameter that determines whether output functions should use the +@litchar{<}@nonterm{tag}@litchar{/>} tag notation instead of +@litchar{<}@nonterm{tag}@litchar{>}@litchar{} +for elements that have no content. + +When the parameter is set to @scheme['always], the abbreviated +notation is always used. When set of @scheme['never], the abbreviated +notation is never generated. when set to a list of symbols is +provided, tags with names in the list are abbreviated. The default is +@scheme['always]. + +The abbreviated form is the preferred XML notation. However, most +browsers designed for HTML will only properly render XHTML if the +document uses a mixture of the two formats. The +@scheme[html-empty-tags] constant contains the W3 consortium's +recommended list of XHTML tags that should use the shorthand.} + +@defthing[html-empty-tags (listof symbol?)]{ + +See @scheme[empty-tag-shorthand]. + +@examples[ +#:eval xml-eval +(parameterize ([empty-tag-shorthand html-empty-tags]) + (write-xml/content (xexpr->xml `(html + (body ((bgcolor "red")) + "Hi!" (br) "Bye!"))))) +]} + +@defboolparam[collapse-whitespace collapse?]{ + +A parameter that controls whether consecutive whitespace is replaced +by a single space. CDATA sections are not affected. The default is +@scheme[#f].} + +@defboolparam[read-comments preserve?]{ + +A parameter that determines whether comments are preserved or +discarded when reading XML. The default is @scheme[#f], which +discards comments.} + +@defboolparam[xexpr-drop-empty-attributes drop?]{ + +Controls whether @scheme[xml->xexpr] drops or preserves attribute +sections for an element that has no attributes. The default is +@scheme[#f], which means that all generated @tech{X-expression} +elements have an attributes list (even if it's empty).} + +@; ---------------------------------------------------------------------- + +@section{PList Library} + +@defmodule[xml/plist] + +The @schememodname[xml/plist] library provides the ability to read and +write XML documents that conform to the @defterm{plist} DTD, which is +used to store dictionaries of string--value associations. This format +is used by Mac OS X (both the operating system and its applications) +to store all kinds of data. + +A @deftech{dictionary X-expression} is an @tech{X-expression} that +could be create by an expression matching the following +@scheme[_dict-expr] grammar: + +@schemegrammar*[ +#:literals (list) +[dict-expr (list 'dict assoc-pair ...)] +[assoc-pair (list 'assoc-pair string pl-value)] +[pl-value string + (list 'true) + (list 'false) + (list 'integer integer) + (list 'real real) + dict-expr + (list 'array pl-value ...)] +] + +@defproc[(read-plist [in input-port?]) xexpr?]{ + +Reads a plist from a port, and produces a @tech{dictionary +X-expression}.} + +@defproc[(write-plist [dict xexpr?] [out output-port?]) void?]{ + +Write a plist to the given port. If @scheme[dict] is not a +@tech{dictionary X-expression}, the @scheme[exn:fail:contract] +exception is raised.} + +@examples[ +#:eval plist-eval +(define my-dict + `(dict (assoc-pair "first-key" + "just a string with some whitespace") + (assoc-pair "second-key" + (false)) + (assoc-pair "third-key" + (dict )) + (assoc-pair "fourth-key" + (dict (assoc-pair "inner-key" + (real 3.432)))) + (assoc-pair "fifth-key" + (array (integer 14) + "another string" + (true))) + (assoc-pair "sixth-key" + (array)))) +(define-values (in out) (make-pipe)) +(write-plist my-dict out) +(close-output-port out) +(define new-dict (read-plist in)) +(equal? my-dict new-dict) +] + +The XML generated by @scheme[write-plist] in the above example +looks like the following, if re-formatted by: + +@verbatim[#:indent 2]|{ + + + + + first-key + just a string with some whitespace + second-key + + third-key + + fourth-key + + inner-key + 3.432 + + fifth-key + + 14 + another string + + + sixth-key + + + +}| \ No newline at end of file