767 lines
25 KiB
HTML
767 lines
25 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of Parser</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>Parser</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2019-10-03<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
XML::Parser - A perl module for parsing XML documents
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
use XML::Parser;
|
|
|
|
$p1 = XML::Parser->new(Style => 'Debug');
|
|
$p1->parsefile('REC-xml-19980210.xml');
|
|
$p1->parse('<foo id="me">Hello World</foo>');
|
|
|
|
# Alternative
|
|
$p2 = XML::Parser->new(Handlers => {Start => \&handle_start,
|
|
End => \&handle_end,
|
|
Char => \&handle_char});
|
|
$p2->parse($socket);
|
|
|
|
# Another alternative
|
|
$p3 = XML::Parser->new(ErrorContext => 2);
|
|
|
|
$p3->setHandlers(Char => \&text,
|
|
Default => \&other);
|
|
|
|
open(my $fh, 'xmlgenerator |');
|
|
$p3->parse($foo, ProtocolEncoding => 'ISO-8859-1');
|
|
close($foo);
|
|
|
|
$p3->parsefile('junk.xml', ErrorContext => 3);
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
This module provides ways to parse <FONT SIZE="-1">XML</FONT> documents. It is built on top of
|
|
XML::Parser::Expat, which is a lower level interface to James Clark's
|
|
expat library. Each call to one of the parsing methods creates a new
|
|
instance of XML::Parser::Expat which is then used to parse the document.
|
|
Expat options may be provided when the XML::Parser object is created.
|
|
These options are then passed on to the Expat object on each parse call.
|
|
They can also be given as extra arguments to the parse methods, in which
|
|
case they override options given at XML::Parser creation time.
|
|
<P>
|
|
|
|
The behavior of the parser is controlled either by <TT>"STYLES"</TT> and/or
|
|
<TT>"HANDLERS"</TT> options, or by ``setHandlers'' method. These all provide
|
|
mechanisms for XML::Parser to set the handlers needed by XML::Parser::Expat.
|
|
If neither <TT>"Style"</TT> nor <TT>"Handlers"</TT> are specified, then parsing just
|
|
checks the document for being well-formed.
|
|
<P>
|
|
|
|
When underlying handlers get called, they receive as their first parameter
|
|
the <I>Expat</I> object, not the Parser object.
|
|
<A NAME="lbAE"> </A>
|
|
<H2>METHODS</H2>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="1">new<DD>
|
|
|
|
|
|
This is a class method, the constructor for XML::Parser. Options are passed
|
|
as keyword value pairs. Recognized options are:
|
|
<DL COMPACT><DT id="2"><DD>
|
|
<DL COMPACT>
|
|
<DT id="3">•<DD>
|
|
Style
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This option provides an easy way to create a given style of parser. The
|
|
built in styles are: ``Debug'', ``Subs'', ``Tree'', ``Objects'',
|
|
and ``Stream''. These are all defined in separate packages under
|
|
<TT>"XML::Parser::Style::*"</TT>, and you can find further documentation for
|
|
each style both below, and in those packages.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Custom styles can be provided by giving a full package name containing
|
|
at least one '::'. This package should then have subs defined for each
|
|
handler it wishes to have installed. See ``<FONT SIZE="-1">STYLES''</FONT> below
|
|
for a discussion of each built in style.
|
|
<DT id="4">•<DD>
|
|
Handlers
|
|
|
|
|
|
<P>
|
|
|
|
|
|
When provided, this option should be an anonymous hash containing as
|
|
keys the type of handler and as values a sub reference to handle that
|
|
type of event. All the handlers get passed as their 1st parameter the
|
|
instance of expat that is parsing the document. Further details on
|
|
handlers can be found in ``<FONT SIZE="-1">HANDLERS''</FONT>. Any handler set here
|
|
overrides the corresponding handler set with the Style option.
|
|
<DT id="5">•<DD>
|
|
Pkg
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Some styles will refer to subs defined in this package. If not provided,
|
|
it defaults to the package which called the constructor.
|
|
<DT id="6">•<DD>
|
|
ErrorContext
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is an Expat option. When this option is defined, errors are reported
|
|
in context. The value should be the number of lines to show on either side
|
|
of the line in which the error occurred.
|
|
<DT id="7">•<DD>
|
|
ProtocolEncoding
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is an Expat option. This sets the protocol encoding name. It defaults
|
|
to none. The built-in encodings are: <TT>"UTF-8"</TT>, <TT>"ISO-8859-1"</TT>, <TT>"UTF-16"</TT>, and
|
|
<TT>"US-ASCII"</TT>. Other encodings may be used if they have encoding maps in one
|
|
of the directories in the <TT>@Encoding_Path</TT> list. Check ``<FONT SIZE="-1">ENCODINGS''</FONT> for
|
|
more information on encoding maps. Setting the protocol encoding overrides
|
|
any encoding in the <FONT SIZE="-1">XML</FONT> declaration.
|
|
<DT id="8">•<DD>
|
|
Namespaces
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is an Expat option. If this is set to a true value, then namespace
|
|
processing is done during the parse. See ``Namespaces'' in XML::Parser::Expat
|
|
for further discussion of namespace processing.
|
|
<DT id="9">•<DD>
|
|
NoExpand
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is an Expat option. Normally, the parser will try to expand references
|
|
to entities defined in the internal subset. If this option is set to a true
|
|
value, and a default handler is also set, then the default handler will be
|
|
called when an entity reference is seen in text. This has no effect if a
|
|
default handler has not been registered, and it has no effect on the expansion
|
|
of entity references inside attribute values.
|
|
<DT id="10">•<DD>
|
|
Stream_Delimiter
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is an Expat option. It takes a string value. When this string is found
|
|
alone on a line while parsing from a stream, then the parse is ended as if it
|
|
saw an end of file. The intended use is with a stream of xml documents in a
|
|
<FONT SIZE="-1">MIME</FONT> multipart format. The string should not contain a trailing newline.
|
|
<DT id="11">•<DD>
|
|
ParseParamEnt
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is an Expat option. Unless standalone is set to ``yes'' in the <FONT SIZE="-1">XML</FONT>
|
|
declaration, setting this to a true value allows the external <FONT SIZE="-1">DTD</FONT> to be read,
|
|
and parameter entities to be parsed and expanded.
|
|
<DT id="12">•<DD>
|
|
NoLWP
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This option has no effect if the ExternEnt or ExternEntFin handlers are
|
|
directly set. Otherwise, if true, it forces the use of a file based external
|
|
entity handler.
|
|
<DT id="13">•<DD>
|
|
Non_Expat_Options
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If provided, this should be an anonymous hash whose keys are options that
|
|
shouldn't be passed to Expat. This should only be of concern to those
|
|
subclassing XML::Parser.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="14"><DD>
|
|
</DL>
|
|
|
|
<DT id="15">setHandlers(<FONT SIZE="-1">TYPE, HANDLER</FONT> [, <FONT SIZE="-1">TYPE, HANDLER</FONT> [...]])<DD>
|
|
|
|
|
|
This method registers handlers for various parser events. It overrides any
|
|
previous handlers registered through the Style or Handler options or through
|
|
earlier calls to setHandlers. By providing a false or undefined value as
|
|
the handler, the existing handler can be unset.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This method returns a list of type, handler pairs corresponding to the
|
|
input. The handlers returned are the ones that were in effect prior to
|
|
the call.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
See a description of the handler types in ``<FONT SIZE="-1">HANDLERS''</FONT>.
|
|
<DT id="16">parse(<FONT SIZE="-1">SOURCE</FONT> [, <FONT SIZE="-1">OPT</FONT> => <FONT SIZE="-1">OPT_VALUE</FONT> [...]])<DD>
|
|
|
|
|
|
The <FONT SIZE="-1">SOURCE</FONT> parameter should either be a string containing the whole <FONT SIZE="-1">XML</FONT>
|
|
document, or it should be an open IO::Handle. Constructor options to
|
|
XML::Parser::Expat given as keyword-value pairs may follow the <FONT SIZE="-1">SOURCE</FONT>
|
|
parameter. These override, for this call, any options or attributes passed
|
|
through from the XML::Parser instance.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A die call is thrown if a parse error occurs. Otherwise it will return 1
|
|
or whatever is returned from the <B>Final</B> handler, if one is installed.
|
|
In other words, what parse may return depends on the style.
|
|
<DT id="17">parsestring<DD>
|
|
|
|
|
|
This is just an alias for parse for backwards compatibility.
|
|
<DT id="18">parsefile(<FONT SIZE="-1">FILE</FONT> [, <FONT SIZE="-1">OPT</FONT> => <FONT SIZE="-1">OPT_VALUE</FONT> [...]])<DD>
|
|
|
|
|
|
Open <FONT SIZE="-1">FILE</FONT> for reading, then call parse with the open handle. The file
|
|
is closed no matter how parse returns. Returns what parse returns.
|
|
<DT id="19">parse_start([ <FONT SIZE="-1">OPT</FONT> => <FONT SIZE="-1">OPT_VALUE</FONT> [...]])<DD>
|
|
|
|
|
|
Create and return a new instance of XML::Parser::ExpatNB. Constructor
|
|
options may be provided. If an init handler has been provided, it is
|
|
called before returning the ExpatNB object. Documents are parsed by
|
|
making incremental calls to the parse_more method of this object, which
|
|
takes a string. A single call to the parse_done method of this object,
|
|
which takes no arguments, indicates that the document is finished.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If there is a final handler installed, it is executed by the parse_done
|
|
method before returning and the parse_done method returns whatever is
|
|
returned by the final handler.
|
|
</DL>
|
|
<A NAME="lbAF"> </A>
|
|
<H2>HANDLERS</H2>
|
|
|
|
|
|
|
|
Expat is an event based parser. As the parser recognizes parts of the
|
|
document (say the start or end tag for an <FONT SIZE="-1">XML</FONT> element), then any handlers
|
|
registered for that type of an event are called with suitable parameters.
|
|
All handlers receive an instance of XML::Parser::Expat as their first
|
|
argument. See ``<FONT SIZE="-1">METHODS''</FONT> in XML::Parser::Expat for a discussion of the
|
|
methods that can be called on this object.
|
|
<A NAME="lbAG"> </A>
|
|
<H3>Init (Expat)</H3>
|
|
|
|
|
|
|
|
This is called just before the parsing of the document starts.
|
|
<A NAME="lbAH"> </A>
|
|
<H3>Final (Expat)</H3>
|
|
|
|
|
|
|
|
This is called just after parsing has finished, but only if no errors
|
|
occurred during the parse. Parse returns what this returns.
|
|
<A NAME="lbAI"> </A>
|
|
<H3>Start (Expat, Element [, Attr, Val [,...]])</H3>
|
|
|
|
|
|
|
|
This event is generated when an <FONT SIZE="-1">XML</FONT> start tag is recognized. Element is the
|
|
name of the <FONT SIZE="-1">XML</FONT> element type that is opened with the start tag. The Attr &
|
|
Val pairs are generated for each attribute in the start tag.
|
|
<A NAME="lbAJ"> </A>
|
|
<H3>End (Expat, Element)</H3>
|
|
|
|
|
|
|
|
This event is generated when an <FONT SIZE="-1">XML</FONT> end tag is recognized. Note that
|
|
an <FONT SIZE="-1">XML</FONT> empty tag (<foo/>) generates both a start and an end event.
|
|
<A NAME="lbAK"> </A>
|
|
<H3>Char (Expat, String)</H3>
|
|
|
|
|
|
|
|
This event is generated when non-markup is recognized. The non-markup
|
|
sequence of characters is in String. A single non-markup sequence of
|
|
characters may generate multiple calls to this handler. Whatever the
|
|
encoding of the string in the original document, this is given to the
|
|
handler in <FONT SIZE="-1">UTF-8.</FONT>
|
|
<A NAME="lbAL"> </A>
|
|
<H3>Proc (Expat, Target, Data)</H3>
|
|
|
|
|
|
|
|
This event is generated when a processing instruction is recognized.
|
|
<A NAME="lbAM"> </A>
|
|
<H3>Comment (Expat, Data)</H3>
|
|
|
|
|
|
|
|
This event is generated when a comment is recognized.
|
|
<A NAME="lbAN"> </A>
|
|
<H3>CdataStart (Expat)</H3>
|
|
|
|
|
|
|
|
This is called at the start of a <FONT SIZE="-1">CDATA</FONT> section.
|
|
<A NAME="lbAO"> </A>
|
|
<H3>CdataEnd (Expat)</H3>
|
|
|
|
|
|
|
|
This is called at the end of a <FONT SIZE="-1">CDATA</FONT> section.
|
|
<A NAME="lbAP"> </A>
|
|
<H3>Default (Expat, String)</H3>
|
|
|
|
|
|
|
|
This is called for any characters that don't have a registered handler.
|
|
This includes both characters that are part of markup for which no
|
|
events are generated (markup declarations) and characters that
|
|
could generate events, but for which no handler has been registered.
|
|
<P>
|
|
|
|
Whatever the encoding in the original document, the string is returned to
|
|
the handler in <FONT SIZE="-1">UTF-8.</FONT>
|
|
<A NAME="lbAQ"> </A>
|
|
<H3>Unparsed (Expat, Entity, Base, Sysid, Pubid, Notation)</H3>
|
|
|
|
|
|
|
|
This is called for a declaration of an unparsed entity. Entity is the name
|
|
of the entity. Base is the base to be used for resolving a relative <FONT SIZE="-1">URI.</FONT>
|
|
Sysid is the system id. Pubid is the public id. Notation is the notation
|
|
name. Base and Pubid may be undefined.
|
|
<A NAME="lbAR"> </A>
|
|
<H3>Notation (Expat, Notation, Base, Sysid, Pubid)</H3>
|
|
|
|
|
|
|
|
This is called for a declaration of notation. Notation is the notation name.
|
|
Base is the base to be used for resolving a relative <FONT SIZE="-1">URI.</FONT> Sysid is the system
|
|
id. Pubid is the public id. Base, Sysid, and Pubid may all be undefined.
|
|
<A NAME="lbAS"> </A>
|
|
<H3>ExternEnt (Expat, Base, Sysid, Pubid)</H3>
|
|
|
|
|
|
|
|
This is called when an external entity is referenced. Base is the base to be
|
|
used for resolving a relative <FONT SIZE="-1">URI.</FONT> Sysid is the system id. Pubid is the public
|
|
id. Base, and Pubid may be undefined.
|
|
<P>
|
|
|
|
This handler should either return a string, which represents the contents of
|
|
the external entity, or return an open filehandle that can be read to obtain
|
|
the contents of the external entity, or return undef, which indicates the
|
|
external entity couldn't be found and will generate a parse error.
|
|
<P>
|
|
|
|
If an open filehandle is returned, it must be returned as either a glob
|
|
(*FOO) or as a reference to a glob (e.g. an instance of IO::Handle).
|
|
<P>
|
|
|
|
A default handler is installed for this event. The default handler is
|
|
XML::Parser::lwp_ext_ent_handler unless the NoLWP option was provided with
|
|
a true value, otherwise XML::Parser::file_ext_ent_handler is the default
|
|
handler for external entities. Even without the NoLWP option, if the
|
|
<FONT SIZE="-1">URI</FONT> or <FONT SIZE="-1">LWP</FONT> modules are missing, the file based handler ends up being used
|
|
after giving a warning on the first external entity reference.
|
|
<P>
|
|
|
|
The <FONT SIZE="-1">LWP</FONT> external entity handler will use proxies defined in the environment
|
|
(http_proxy, ftp_proxy, etc.).
|
|
<P>
|
|
|
|
Please note that the <FONT SIZE="-1">LWP</FONT> external entity handler reads the entire
|
|
entity into a string and returns it, where as the file handler opens a
|
|
filehandle.
|
|
<P>
|
|
|
|
Also note that the file external entity handler will likely choke on
|
|
absolute URIs or file names that don't fit the conventions of the local
|
|
operating system.
|
|
<P>
|
|
|
|
The expat base method can be used to set a basename for
|
|
relative pathnames. If no basename is given, or if the basename is itself
|
|
a relative name, then it is relative to the current working directory.
|
|
<A NAME="lbAT"> </A>
|
|
<H3>ExternEntFin (Expat)</H3>
|
|
|
|
|
|
|
|
This is called after parsing an external entity. It's not called unless
|
|
an ExternEnt handler is also set. There is a default handler installed
|
|
that pairs with the default ExternEnt handler.
|
|
<P>
|
|
|
|
If you're going to install your own ExternEnt handler, then you should
|
|
set (or unset) this handler too.
|
|
<A NAME="lbAU"> </A>
|
|
<H3>Entity (Expat, Name, Val, Sysid, Pubid, Ndata, IsParam)</H3>
|
|
|
|
|
|
|
|
This is called when an entity is declared. For internal entities, the Val
|
|
parameter will contain the value and the remaining three parameters will be
|
|
undefined. For external entities, the Val parameter will be undefined, the
|
|
Sysid parameter will have the system id, the Pubid parameter will have the
|
|
public id if it was provided (it will be undefined otherwise), the Ndata
|
|
parameter will contain the notation for unparsed entities. If this is a
|
|
parameter entity declaration, then the IsParam parameter is true.
|
|
<P>
|
|
|
|
Note that this handler and the Unparsed handler above overlap. If both are
|
|
set, then this handler will not be called for unparsed entities.
|
|
<A NAME="lbAV"> </A>
|
|
<H3>Element (Expat, Name, Model)</H3>
|
|
|
|
|
|
|
|
The element handler is called when an element declaration is found. Name
|
|
is the element name, and Model is the content model as an XML::Parser::Content
|
|
object. See ``XML::Parser::ContentModel Methods'' in XML::Parser::Expat
|
|
for methods available for this class.
|
|
<A NAME="lbAW"> </A>
|
|
<H3>Attlist (Expat, Elname, Attname, Type, Default, Fixed)</H3>
|
|
|
|
|
|
|
|
This handler is called for each attribute in an <FONT SIZE="-1">ATTLIST</FONT> declaration.
|
|
So an <FONT SIZE="-1">ATTLIST</FONT> declaration that has multiple attributes will generate multiple
|
|
calls to this handler. The Elname parameter is the name of the element with
|
|
which the attribute is being associated. The Attname parameter is the name
|
|
of the attribute. Type is the attribute type, given as a string. Default is
|
|
the default value, which will either be ``#REQUIRED'', ``#IMPLIED'' or a quoted
|
|
string (i.e. the returned string will begin and end with a quote character).
|
|
If Fixed is true, then this is a fixed attribute.
|
|
<A NAME="lbAX"> </A>
|
|
<H3>Doctype (Expat, Name, Sysid, Pubid, Internal)</H3>
|
|
|
|
|
|
|
|
This handler is called for <FONT SIZE="-1">DOCTYPE</FONT> declarations. Name is the document type
|
|
name. Sysid is the system id of the document type, if it was provided,
|
|
otherwise it's undefined. Pubid is the public id of the document type,
|
|
which will be undefined if no public id was given. Internal is the internal
|
|
subset, given as a string. If there was no internal subset, it will be
|
|
undefined. Internal will contain all whitespace, comments, processing
|
|
instructions, and declarations seen in the internal subset. The declarations
|
|
will be there whether or not they have been processed by another handler
|
|
(except for unparsed entities processed by the Unparsed handler). However,
|
|
comments and processing instructions will not appear if they've been processed
|
|
by their respective handlers.
|
|
<A NAME="lbAY"> </A>
|
|
<H3>* DoctypeFin (Parser)</H3>
|
|
|
|
|
|
|
|
This handler is called after parsing of the <FONT SIZE="-1">DOCTYPE</FONT> declaration has finished,
|
|
including any internal or external <FONT SIZE="-1">DTD</FONT> declarations.
|
|
<A NAME="lbAZ"> </A>
|
|
<H3>XMLDecl (Expat, Version, Encoding, Standalone)</H3>
|
|
|
|
|
|
|
|
This handler is called for xml declarations. Version is a string containing
|
|
the version. Encoding is either undefined or contains an encoding string.
|
|
Standalone will be either true, false, or undefined if the standalone attribute
|
|
is yes, no, or not made respectively.
|
|
<A NAME="lbBA"> </A>
|
|
<H2>STYLES</H2>
|
|
|
|
|
|
|
|
<A NAME="lbBB"> </A>
|
|
<H3>Debug</H3>
|
|
|
|
|
|
|
|
This just prints out the document in outline form. Nothing special is
|
|
returned by parse.
|
|
<A NAME="lbBC"> </A>
|
|
<H3>Subs</H3>
|
|
|
|
|
|
|
|
Each time an element starts, a sub by that name in the package specified
|
|
by the Pkg option is called with the same parameters that the Start
|
|
handler gets called with.
|
|
<P>
|
|
|
|
Each time an element ends, a sub with that name appended with an underscore
|
|
(``_''), is called with the same parameters that the End handler gets called
|
|
with.
|
|
<P>
|
|
|
|
Nothing special is returned by parse.
|
|
<A NAME="lbBD"> </A>
|
|
<H3>Tree</H3>
|
|
|
|
|
|
|
|
Parse will return a parse tree for the document. Each node in the tree
|
|
takes the form of a tag, content pair. Text nodes are represented with
|
|
a pseudo-tag of ``0'' and the string that is their content. For elements,
|
|
the content is an array reference. The first item in the array is a
|
|
(possibly empty) hash reference containing attributes. The remainder of
|
|
the array is a sequence of tag-content pairs representing the content
|
|
of the element.
|
|
<P>
|
|
|
|
So for example the result of parsing:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
<foo><head id="a">Hello <em>there</em></head><bar>Howdy<ref/></bar>do</foo>
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
would be:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
Tag Content
|
|
==================================================================
|
|
[foo, [{}, head, [{id => "a"}, 0, "Hello ", em, [{}, 0, "there"]],
|
|
bar, [ {}, 0, "Howdy", ref, [{}]],
|
|
0, "do"
|
|
]
|
|
]
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The root document ``foo'', has 3 children: a ``head'' element, a ``bar''
|
|
element and the text ``do''. After the empty attribute hash, these are
|
|
represented in it's contents by 3 tag-content pairs.
|
|
<A NAME="lbBE"> </A>
|
|
<H3>Objects</H3>
|
|
|
|
|
|
|
|
This is similar to the Tree style, except that a hash object is created for
|
|
each element. The corresponding object will be in the class whose name
|
|
is created by appending ``::'' and the element name to the package set with
|
|
the Pkg option. Non-markup text will be in the ::Characters class. The
|
|
contents of the corresponding object will be in an anonymous array that
|
|
is the value of the Kids property for that object.
|
|
<A NAME="lbBF"> </A>
|
|
<H3>Stream</H3>
|
|
|
|
|
|
|
|
This style also uses the Pkg package. If none of the subs that this
|
|
style looks for is there, then the effect of parsing with this style is
|
|
to print a canonical copy of the document without comments or declarations.
|
|
All the subs receive as their 1st parameter the Expat instance for the
|
|
document they're parsing.
|
|
<P>
|
|
|
|
It looks for the following routines:
|
|
<DL COMPACT>
|
|
<DT id="20">•<DD>
|
|
StartDocument
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Called at the start of the parse .
|
|
<DT id="21">•<DD>
|
|
StartTag
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Called for every start tag with a second parameter of the element type. The <TT>$_</TT>
|
|
variable will contain a copy of the tag and the <TT>%_</TT> variable will contain
|
|
attribute values supplied for that element.
|
|
<DT id="22">•<DD>
|
|
EndTag
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Called for every end tag with a second parameter of the element type. The <TT>$_</TT>
|
|
variable will contain a copy of the end tag.
|
|
<DT id="23">•<DD>
|
|
Text
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Called just before start or end tags with accumulated non-markup text in
|
|
the <TT>$_</TT> variable.
|
|
<DT id="24">•<DD>
|
|
<FONT SIZE="-1">PI</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Called for processing instructions. The <TT>$_</TT> variable will contain a copy of
|
|
the <FONT SIZE="-1">PI</FONT> and the target and data are sent as 2nd and 3rd parameters
|
|
respectively.
|
|
<DT id="25">•<DD>
|
|
EndDocument
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Called at conclusion of the parse.
|
|
</DL>
|
|
<A NAME="lbBG"> </A>
|
|
<H2>ENCODINGS</H2>
|
|
|
|
|
|
|
|
<FONT SIZE="-1">XML</FONT> documents may be encoded in character sets other than Unicode as
|
|
long as they may be mapped into the Unicode character set. Expat has
|
|
further restrictions on encodings. Read the xmlparse.h header file in
|
|
the expat distribution to see details on these restrictions.
|
|
<P>
|
|
|
|
Expat has built-in encodings for: <TT>"UTF-8"</TT>, <TT>"ISO-8859-1"</TT>, <TT>"UTF-16"</TT>, and
|
|
<TT>"US-ASCII"</TT>. Encodings are set either through the <FONT SIZE="-1">XML</FONT> declaration
|
|
encoding attribute or through the ProtocolEncoding option to XML::Parser
|
|
or XML::Parser::Expat.
|
|
<P>
|
|
|
|
For encodings other than the built-ins, expat calls the function
|
|
load_encoding in the Expat package with the encoding name. This function
|
|
looks for a file in the path list <TT>@XML::Parser::Expat::Encoding_Path</TT>, that
|
|
matches the lower-cased name with a '.enc' extension. The first one it
|
|
finds, it loads.
|
|
<P>
|
|
|
|
If you wish to build your own encoding maps, check out the XML::Encoding
|
|
module from <FONT SIZE="-1">CPAN.</FONT>
|
|
<A NAME="lbBH"> </A>
|
|
<H2>AUTHORS</H2>
|
|
|
|
|
|
|
|
Larry Wall <<I><A HREF="mailto:larry@wall.org">larry@wall.org</A></I>> wrote version 1.0.
|
|
<P>
|
|
|
|
Clark Cooper <<I><A HREF="mailto:coopercc@netheaven.com">coopercc@netheaven.com</A></I>> picked up support, changed the <FONT SIZE="-1">API</FONT>
|
|
for this version (2.x), provided documentation,
|
|
and added some standard package features.
|
|
<P>
|
|
|
|
Matt Sergeant <<I><A HREF="mailto:matt@sergeant.org">matt@sergeant.org</A></I>> is now maintaining XML::Parser
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="26"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="27"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="28"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="29"><A HREF="#lbAE">METHODS</A><DD>
|
|
<DT id="30"><A HREF="#lbAF">HANDLERS</A><DD>
|
|
<DL>
|
|
<DT id="31"><A HREF="#lbAG">Init (Expat)</A><DD>
|
|
<DT id="32"><A HREF="#lbAH">Final (Expat)</A><DD>
|
|
<DT id="33"><A HREF="#lbAI">Start (Expat, Element [, Attr, Val [,...]])</A><DD>
|
|
<DT id="34"><A HREF="#lbAJ">End (Expat, Element)</A><DD>
|
|
<DT id="35"><A HREF="#lbAK">Char (Expat, String)</A><DD>
|
|
<DT id="36"><A HREF="#lbAL">Proc (Expat, Target, Data)</A><DD>
|
|
<DT id="37"><A HREF="#lbAM">Comment (Expat, Data)</A><DD>
|
|
<DT id="38"><A HREF="#lbAN">CdataStart (Expat)</A><DD>
|
|
<DT id="39"><A HREF="#lbAO">CdataEnd (Expat)</A><DD>
|
|
<DT id="40"><A HREF="#lbAP">Default (Expat, String)</A><DD>
|
|
<DT id="41"><A HREF="#lbAQ">Unparsed (Expat, Entity, Base, Sysid, Pubid, Notation)</A><DD>
|
|
<DT id="42"><A HREF="#lbAR">Notation (Expat, Notation, Base, Sysid, Pubid)</A><DD>
|
|
<DT id="43"><A HREF="#lbAS">ExternEnt (Expat, Base, Sysid, Pubid)</A><DD>
|
|
<DT id="44"><A HREF="#lbAT">ExternEntFin (Expat)</A><DD>
|
|
<DT id="45"><A HREF="#lbAU">Entity (Expat, Name, Val, Sysid, Pubid, Ndata, IsParam)</A><DD>
|
|
<DT id="46"><A HREF="#lbAV">Element (Expat, Name, Model)</A><DD>
|
|
<DT id="47"><A HREF="#lbAW">Attlist (Expat, Elname, Attname, Type, Default, Fixed)</A><DD>
|
|
<DT id="48"><A HREF="#lbAX">Doctype (Expat, Name, Sysid, Pubid, Internal)</A><DD>
|
|
<DT id="49"><A HREF="#lbAY">* DoctypeFin (Parser)</A><DD>
|
|
<DT id="50"><A HREF="#lbAZ">XMLDecl (Expat, Version, Encoding, Standalone)</A><DD>
|
|
</DL>
|
|
<DT id="51"><A HREF="#lbBA">STYLES</A><DD>
|
|
<DL>
|
|
<DT id="52"><A HREF="#lbBB">Debug</A><DD>
|
|
<DT id="53"><A HREF="#lbBC">Subs</A><DD>
|
|
<DT id="54"><A HREF="#lbBD">Tree</A><DD>
|
|
<DT id="55"><A HREF="#lbBE">Objects</A><DD>
|
|
<DT id="56"><A HREF="#lbBF">Stream</A><DD>
|
|
</DL>
|
|
<DT id="57"><A HREF="#lbBG">ENCODINGS</A><DD>
|
|
<DT id="58"><A HREF="#lbBH">AUTHORS</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:06:01 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|