6859 lines
178 KiB
HTML
6859 lines
178 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of Twig</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>Twig</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2019-10-13<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
XML::Twig - A perl module for processing huge XML documents in tree mode.
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
|
|
|
|
Note that this documentation is intended as a reference to the module.
|
|
<P>
|
|
|
|
Complete docs, including a tutorial, examples, an easier to use <FONT SIZE="-1">HTML</FONT> version,
|
|
a quick reference card and a <FONT SIZE="-1">FAQ</FONT> are available at <<A HREF="http://www.xmltwig.org/xmltwig">http://www.xmltwig.org/xmltwig</A>>
|
|
<P>
|
|
|
|
Small documents (loaded in memory as a tree):
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
my $twig=XML::Twig->new(); # create the twig
|
|
$twig->parsefile( 'doc.xml'); # build it
|
|
my_process( $twig); # use twig methods to process it
|
|
$twig->print; # output the twig
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Huge documents (processed in combined stream/tree mode):
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
# at most one div will be loaded in memory
|
|
my $twig=XML::Twig->new(
|
|
twig_handlers =>
|
|
{ title => sub { $_->set_tag( 'h2') }, # change title tags to h2
|
|
# $_ is the current element
|
|
para => sub { $_->set_tag( 'p') }, # change para to p
|
|
hidden => sub { $_->delete; }, # remove hidden elements
|
|
list => \&my_list_process, # process list elements
|
|
div => sub { $_[0]->flush; }, # output and free memory
|
|
},
|
|
pretty_print => 'indented', # output will be nicely formatted
|
|
empty_tags => 'html', # outputs <empty_tag />
|
|
);
|
|
$twig->parsefile( 'my_big.xml');
|
|
|
|
sub my_list_process
|
|
{ my( $twig, $list)= @_;
|
|
# ...
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
See XML::Twig 101 for other ways to use the module, as a
|
|
filter for example.
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
This module provides a way to process <FONT SIZE="-1">XML</FONT> documents. It is build on top
|
|
of <TT>"XML::Parser"</TT>.
|
|
<P>
|
|
|
|
The module offers a tree interface to the document, while allowing you
|
|
to output the parts of it that have been completely processed.
|
|
<P>
|
|
|
|
It allows minimal resource (<FONT SIZE="-1">CPU</FONT> and memory) usage by building the tree
|
|
only for the parts of the documents that need actual processing, through the
|
|
use of the <TT>"twig_roots "</TT> and
|
|
<TT>"twig_print_outside_roots "</TT> options. The
|
|
<TT>"finish "</TT> and <TT>"finish_print "</TT> methods also help
|
|
to increase performances.
|
|
<P>
|
|
|
|
XML::Twig tries to make simple things easy so it tries its best to takes care
|
|
of a lot of the (usually) annoying (but sometimes necessary) features that
|
|
come with <FONT SIZE="-1">XML</FONT> and XML::Parser.
|
|
<A NAME="lbAE"> </A>
|
|
<H2>TOOLS</H2>
|
|
|
|
|
|
|
|
XML::Twig comes with a few command-line utilities:
|
|
<A NAME="lbAF"> </A>
|
|
<H3>xml_pp - xml pretty-printer</H3>
|
|
|
|
|
|
|
|
<FONT SIZE="-1">XML</FONT> pretty printer using XML::Twig
|
|
<A NAME="lbAG"> </A>
|
|
<H3>xml_grep - grep <FONT SIZE="-1">XML</FONT> files looking for specific elements</H3>
|
|
|
|
|
|
|
|
<TT>"xml_grep"</TT> does a grep on <FONT SIZE="-1">XML</FONT> files. Instead of using regular expressions
|
|
it uses XPath expressions (in fact the subset of XPath supported by
|
|
XML::Twig).
|
|
<A NAME="lbAH"> </A>
|
|
<H3>xml_split - cut a big <FONT SIZE="-1">XML</FONT> file into smaller chunks</H3>
|
|
|
|
|
|
|
|
<TT>"xml_split"</TT> takes a (presumably big) <FONT SIZE="-1">XML</FONT> file and split it in several smaller
|
|
files, based on various criteria (level in the tree, size or an XPath
|
|
expression)
|
|
<A NAME="lbAI"> </A>
|
|
<H3>xml_merge - merge back <FONT SIZE="-1">XML</FONT> files split with xml_split</H3>
|
|
|
|
|
|
|
|
<TT>"xml_merge"</TT> takes several xml files that have been split using <TT>"xml_split"</TT>
|
|
and recreates a single file.
|
|
<A NAME="lbAJ"> </A>
|
|
<H3>xml_spellcheck - spellcheck <FONT SIZE="-1">XML</FONT> files</H3>
|
|
|
|
|
|
|
|
<TT>"xml_spellcheck"</TT> lets you spell check the content of an <FONT SIZE="-1">XML</FONT> file. It extracts
|
|
the text (the content of elements and optionally of attributes), call a spell
|
|
checker on it and then recreates the <FONT SIZE="-1">XML</FONT> document.
|
|
<A NAME="lbAK"> </A>
|
|
<H2>XML::Twig 101</H2>
|
|
|
|
|
|
|
|
XML::Twig can be used either on ``small'' <FONT SIZE="-1">XML</FONT> documents (that fit in memory)
|
|
or on huge ones, by processing parts of the document and outputting or
|
|
discarding them once they are processed.
|
|
<A NAME="lbAL"> </A>
|
|
<H3>Loading an <FONT SIZE="-1">XML</FONT> document and processing it</H3>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new();
|
|
$t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
|
|
my $root= $t->root;
|
|
$root->set_tag( 'html'); # change doc to html
|
|
$title= $root->first_child( 'title'); # get the title
|
|
$title->set_tag( 'h1'); # turn it into h1
|
|
my @para= $root->children( 'para'); # get the para children
|
|
foreach my $para (@para)
|
|
{ $para->set_tag( 'p'); } # turn them into p
|
|
$t->print; # output the document
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Other useful methods include:
|
|
<P>
|
|
|
|
att: <TT>"$elt->{'att'}->{'foo'}"</TT> return the <TT>"foo"</TT> attribute for an
|
|
element,
|
|
<P>
|
|
|
|
set_att : <TT>"$elt->set_att( foo => "bar")"</TT> sets the <TT>"foo"</TT>
|
|
attribute to the <TT>"bar"</TT> value,
|
|
<P>
|
|
|
|
next_sibling: <TT>"$elt->{next_sibling}"</TT> return the next sibling
|
|
in the document (in the example <TT>"$title->{next_sibling}"</TT> is the first
|
|
<TT>"para"</TT>, you can also (and actually should) use
|
|
<TT>"$elt->next_sibling( 'para')"</TT> to get it
|
|
<P>
|
|
|
|
The document can also be transformed through the use of the cut,
|
|
copy, paste and move methods:
|
|
<TT>"$title->cut; $title->paste( after => $p);"</TT> for example
|
|
<P>
|
|
|
|
And much, much more, see XML::Twig::Elt.
|
|
<A NAME="lbAM"> </A>
|
|
<H3>Processing an <FONT SIZE="-1">XML</FONT> document chunk by chunk</H3>
|
|
|
|
|
|
|
|
One of the strengths of XML::Twig is that it let you work with files that do
|
|
not fit in memory (<FONT SIZE="-1">BTW</FONT> storing an <FONT SIZE="-1">XML</FONT> document in memory as a tree is quite
|
|
memory-expensive, the expansion factor being often around 10).
|
|
<P>
|
|
|
|
To do this you can define handlers, that will be called once a specific
|
|
element has been completely parsed. In these handlers you can access the
|
|
element and process it as you see fit, using the navigation and the
|
|
cut-n-paste methods, plus lots of convenient ones like <TT>"prefix "</TT>.
|
|
Once the element is completely processed you can then <TT>"flush "</TT> it,
|
|
which will output it and free the memory. You can also <TT>"purge "</TT> it
|
|
if you don't need to output it (if you are just extracting some data from
|
|
the document for example). The handler will be called again once the next
|
|
relevant element has been parsed.
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( twig_handlers =>
|
|
{ section => \&section,
|
|
para => sub { $_->set_tag( 'p'); }
|
|
},
|
|
);
|
|
$t->parsefile( 'doc.xml');
|
|
|
|
# the handler is called once a section is completely parsed, ie when
|
|
# the end tag for section is found, it receives the twig itself and
|
|
# the element (including all its sub-elements) as arguments
|
|
sub section
|
|
{ my( $t, $section)= @_; # arguments for all twig_handlers
|
|
$section->set_tag( 'div'); # change the tag name
|
|
# let's use the attribute nb as a prefix to the title
|
|
my $title= $section->first_child( 'title'); # find the title
|
|
my $nb= $title->{'att'}->{'nb'}; # get the attribute
|
|
$title->prefix( "$nb - "); # easy isn't it?
|
|
$section->flush; # outputs the section and frees memory
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
There is of course more to it: you can trigger handlers on more elaborate
|
|
conditions than just the name of the element, <TT>"section/title"</TT> for example.
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( twig_handlers =>
|
|
{ 'section/title' => sub { $_->print } }
|
|
)
|
|
->parsefile( 'doc.xml');
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
Here <TT>"sub { $_->print }"</TT> simply prints the current element (<TT>$_</TT> is aliased
|
|
to the element in the handler).
|
|
<P>
|
|
|
|
You can also trigger a handler on a test on an attribute:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( twig_handlers =>
|
|
{ 'section[@level="1"]' => sub { $_->print } }
|
|
);
|
|
->parsefile( 'doc.xml');
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
You can also use <TT>"start_tag_handlers "</TT> to process an
|
|
element as soon as the start tag is found. Besides <TT>"prefix "</TT> you
|
|
can also use <TT>"suffix "</TT>,
|
|
<A NAME="lbAN"> </A>
|
|
<H3>Processing just parts of an <FONT SIZE="-1">XML</FONT> document</H3>
|
|
|
|
|
|
|
|
The twig_roots mode builds only the required sub-trees from the document
|
|
Anything outside of the twig roots will just be ignored:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new(
|
|
# the twig will include just the root and selected titles
|
|
twig_roots => { 'section/title' => \&print_n_purge,
|
|
'annex/title' => \&print_n_purge
|
|
}
|
|
);
|
|
$t->parsefile( 'doc.xml');
|
|
|
|
sub print_n_purge
|
|
{ my( $t, $elt)= @_;
|
|
print $elt->text; # print the text (including sub-element texts)
|
|
$t->purge; # frees the memory
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
You can use that mode when you want to process parts of a documents but are
|
|
not interested in the rest and you don't want to pay the price, either in
|
|
time or memory, to build the tree for the it.
|
|
<A NAME="lbAO"> </A>
|
|
<H3>Building an <FONT SIZE="-1">XML</FONT> filter</H3>
|
|
|
|
|
|
|
|
You can combine the <TT>"twig_roots"</TT> and the <TT>"twig_print_outside_roots"</TT> options to
|
|
build filters, which let you modify selected elements and will output the rest
|
|
of the document as is.
|
|
<P>
|
|
|
|
This would convert prices in $ to prices in Euro in a document:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new(
|
|
twig_roots => { 'price' => \&convert, }, # process prices
|
|
twig_print_outside_roots => 1, # print the rest
|
|
);
|
|
$t->parsefile( 'doc.xml');
|
|
|
|
sub convert
|
|
{ my( $t, $price)= @_;
|
|
my $currency= $price->{'att'}->{'currency'}; # get the currency
|
|
if( $currency eq 'USD')
|
|
{ $usd_price= $price->text; # get the price
|
|
# %rate is just a conversion table
|
|
my $euro_price= $usd_price * $rate{usd2euro};
|
|
$price->set_text( $euro_price); # set the new price
|
|
$price->set_att( currency => 'EUR'); # don't forget this!
|
|
}
|
|
$price->print; # output the price
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAP"> </A>
|
|
<H3>XML::Twig and various versions of Perl, XML::Parser and expat:</H3>
|
|
|
|
|
|
|
|
XML::Twig is a lot more sensitive to variations in versions of perl,
|
|
XML::Parser and expat than to the <FONT SIZE="-1">OS,</FONT> so this should cover some
|
|
reasonable configurations.
|
|
<P>
|
|
|
|
The ``recommended configuration'' is perl 5.8.3+ (for good Unicode
|
|
support), XML::Parser 2.31+ and expat 1.95.5+
|
|
<P>
|
|
|
|
See <<A HREF="http://testers.cpan.org/search?request=dist">http://testers.cpan.org/search?request=dist</A>&dist=XML-Twig> for the
|
|
<FONT SIZE="-1">CPAN</FONT> testers reports on XML::Twig, which list all tested configurations.
|
|
<P>
|
|
|
|
An Atom feed of the <FONT SIZE="-1">CPAN</FONT> Testers results is available at
|
|
<<A HREF="http://xmltwig.org/rss/twig_testers.rss">http://xmltwig.org/rss/twig_testers.rss</A>>
|
|
<P>
|
|
|
|
Finally:
|
|
<DL COMPACT>
|
|
<DT id="1">XML::Twig does <B></B><FONT SIZE="-1"><B>NOT</B></FONT><B></B> work with expat 1.95.4<DD>
|
|
|
|
|
|
|
|
<DT id="2">XML::Twig only works with XML::Parser 2.27 in perl 5.6.*<DD>
|
|
|
|
|
|
|
|
Note that I can't compile XML::Parser 2.27 anymore, so I can't guarantee
|
|
that it still works
|
|
<DT id="3">XML::Parser 2.28 does not really work<DD>
|
|
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
When in doubt, upgrade expat, XML::Parser and Scalar::Util
|
|
<P>
|
|
|
|
Finally, for some optional features, XML::Twig depends on some additional
|
|
modules. The complete list, which depends somewhat on the version of Perl
|
|
that you are running, is given by running <TT>"t/zz_dump_config.t"</TT>
|
|
<A NAME="lbAQ"> </A>
|
|
<H2>Simplifying XML processing</H2>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="4">Whitespaces<DD>
|
|
|
|
|
|
Whitespaces that look non-significant are discarded, this behaviour can be
|
|
controlled using the <TT>"keep_spaces "</TT>,
|
|
<TT>"keep_spaces_in "</TT> and
|
|
<TT>"discard_spaces_in "</TT> options.
|
|
<DT id="5">Encoding<DD>
|
|
|
|
|
|
You can specify that you want the output in the same encoding as the input
|
|
(provided you have valid <FONT SIZE="-1">XML,</FONT> which means you have to specify the encoding
|
|
either in the document or when you create the Twig object) using the
|
|
<TT>"keep_encoding "</TT> option
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can also use <TT>"output_encoding"</TT> to convert the internal <FONT SIZE="-1">UTF-8</FONT> format
|
|
to the required encoding.
|
|
<DT id="6">Comments and Processing Instructions (<FONT SIZE="-1">PI</FONT>)<DD>
|
|
|
|
|
|
Comments and <FONT SIZE="-1">PI</FONT>'s can be hidden from the processing, but still appear in the
|
|
output (they are carried by the ``real'' element closer to them)
|
|
<DT id="7">Pretty Printing<DD>
|
|
|
|
|
|
XML::Twig can output the document pretty printed so it is easier to read for
|
|
us humans.
|
|
<DT id="8">Surviving an untimely death<DD>
|
|
|
|
|
|
<FONT SIZE="-1">XML</FONT> parsers are supposed to react violently when fed improper <FONT SIZE="-1">XML.</FONT>
|
|
XML::Parser just dies.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
XML::Twig provides the <TT>"safe_parse "</TT> and the
|
|
<TT>"safe_parsefile "</TT> methods which wrap the parse in an eval
|
|
and return either the parsed twig or 0 in case of failure.
|
|
<DT id="9">Private attributes<DD>
|
|
|
|
|
|
Attributes with a name starting with # (illegal in <FONT SIZE="-1">XML</FONT>) will not be
|
|
output, so you can safely use them to store temporary values during
|
|
processing. Note that you can store anything in a private attribute,
|
|
not just text, it's just a regular Perl variable, so a reference to
|
|
an object or a huge data structure is perfectly fine.
|
|
</DL>
|
|
<A NAME="lbAR"> </A>
|
|
<H2>CLASSES</H2>
|
|
|
|
|
|
|
|
XML::Twig uses a very limited number of classes. The ones you are most likely to use
|
|
are <TT>"XML::Twig"</TT> of course, which represents a complete <FONT SIZE="-1">XML</FONT> document, including the
|
|
document itself (the root of the document itself is <TT>"root"</TT>), its handlers, its
|
|
input or output filters... The other main class is <TT>"XML::Twig::Elt"</TT>, which models
|
|
an <FONT SIZE="-1">XML</FONT> element. Element here has a very wide definition: it can be a regular element, or
|
|
but also text, with an element <TT>"tag"</TT> of <TT>"#PCDATA"</TT> (or <TT>"#CDATA"</TT>), an entity (tag is
|
|
<TT>"#ENT"</TT>), a Processing Instruction (<TT>"#PI"</TT>), a comment (<TT>"#COMMENT"</TT>).
|
|
<P>
|
|
|
|
Those are the 2 commonly used classes.
|
|
<P>
|
|
|
|
You might want to look the <TT>"elt_class"</TT> option if you want to subclass <TT>"XML::Twig::Elt"</TT>.
|
|
<P>
|
|
|
|
Attributes are just attached to their parent element, they are not objects per se. (Please
|
|
use the provided methods <TT>"att"</TT> and <TT>"set_att"</TT> to access them, if you access them
|
|
as a hash, then your code becomes implementation dependent and might break in the future).
|
|
<P>
|
|
|
|
Other classes that are seldom used are <TT>"XML::Twig::Entity_list"</TT> and <TT>"XML::Twig::Entity"</TT>.
|
|
<P>
|
|
|
|
If you use <TT>"XML::Twig::XPath"</TT> instead of <TT>"XML::Twig"</TT>, elements are then created as
|
|
<TT>"XML::Twig::XPath::Elt"</TT>
|
|
<A NAME="lbAS"> </A>
|
|
<H2>METHODS</H2>
|
|
|
|
|
|
|
|
<A NAME="lbAT"> </A>
|
|
<H3>XML::Twig</H3>
|
|
|
|
|
|
|
|
A twig is a subclass of XML::Parser, so all XML::Parser methods can be
|
|
called on a twig object, including parse and parsefile.
|
|
<TT>"setHandlers"</TT> on the other hand cannot be used, see <TT>"BUGS "</TT>
|
|
<DL COMPACT>
|
|
<DT id="10">new<DD>
|
|
|
|
|
|
This is a class method, the constructor for XML::Twig. Options are passed
|
|
as keyword value pairs. Recognized options are the same as XML::Parser,
|
|
plus some (in fact a lot!) XML::Twig specifics.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
New Options:
|
|
<DL COMPACT><DT id="11"><DD>
|
|
<DL COMPACT>
|
|
<DT id="12">twig_handlers<DD>
|
|
|
|
|
|
This argument consists of a hash <TT>"{ expression ="</TT> \&handler}> where
|
|
expression is a an <I>XPath-like expression</I> (+ some others).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
XPath expressions are limited to using the child and descendant axis
|
|
(indeed you can't specify an axis), and predicates cannot be nested.
|
|
You can use the <TT>"string"</TT>, or <TT>"string(<tag>)"</TT> function (except
|
|
in <TT>"twig_roots"</TT> triggers).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Additionally you can use regexps (/ delimited) to match attribute
|
|
and string values.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Examples:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
foo
|
|
foo/bar
|
|
foo//bar
|
|
/foo/bar
|
|
/foo//bar
|
|
/foo/bar[@att1 = "val1" and @att2 = "val2"]/baz[@a >= 1]
|
|
foo[string()=~ /^duh!+/]
|
|
/foo[string(bar)=~ /\d+/]/baz[@att != 3]
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
#CDATA can be used to call a handler for a <FONT SIZE="-1">CDATA</FONT> section.
|
|
#COMMENT can be used to call a handler for comments
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Some additional (non-XPath) expressions are also provided for convenience:
|
|
<DL COMPACT><DT id="13"><DD>
|
|
<DL COMPACT>
|
|
<DT id="14">processing instructions<DD>
|
|
|
|
|
|
<TT>'?'</TT> or <TT>'#PI'</TT> triggers the handler for any processing instruction,
|
|
and <TT>'?<target>'</TT> or <TT>'#PI <target>'</TT> triggers a handler for processing
|
|
instruction with the given target( ex: <TT>'#PI xml-stylesheet'</TT>).
|
|
<DT id="15">level(<level>)<DD>
|
|
|
|
|
|
Triggers the handler on any element at that level in the tree (root is level 1)
|
|
<DT id="16">_all_<DD>
|
|
|
|
|
|
Triggers the handler for <B>all</B> elements in the tree
|
|
<DT id="17">_default_<DD>
|
|
|
|
|
|
Triggers the handler for each element that does <FONT SIZE="-1">NOT</FONT> have any other handler.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="18"><DD>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Expressions are evaluated against the input document.
|
|
Which means that even if you have changed the tag of an element (changing the
|
|
tag of a parent element from a handler for example) the change will not impact
|
|
the expression evaluation. There is an exception to this: ``private'' attributes
|
|
(which name start with a '#', and can only be created during the parsing, as
|
|
they are not valid <FONT SIZE="-1">XML</FONT>) are checked against the current twig.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Handlers are triggered in fixed order, sorted by their type (xpath expressions
|
|
first, then regexps, then level), then by whether they specify a full path
|
|
(starting at the root element) or
|
|
not, then by number of steps in the expression , then number of
|
|
predicates, then number of tests in predicates. Handlers where the last
|
|
step does not specify a step (<TT>"foo/bar/*"</TT>) are triggered after other XPath
|
|
handlers. Finally <TT>"_all_"</TT> handlers are triggered last.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Important</B>: once a handler has been triggered if it returns 0 then no other
|
|
handler is called, except a <TT>"_all_"</TT> handler which will be called anyway.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If a handler returns a true value and other handlers apply, then the next
|
|
applicable handler will be called. Repeat, rinse, lather..; The exception
|
|
to that rule is when the <TT>"do_not_chain_handlers"</TT>
|
|
option is set, in which case only the first handler will be called.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that it might be a good idea to explicitly return a short true value
|
|
(like 1) from handlers: this ensures that other applicable handlers are
|
|
called even if the last statement for the handler happens to evaluate to
|
|
false. This might also speedup the code by avoiding the result of the last
|
|
statement of the code to be copied and passed to the code managing handlers.
|
|
It can really pay to have 1 instead of a long string returned.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
When the closing tag for an element is parsed the corresponding handler is
|
|
called, with 2 arguments: the twig and the <TT>"Element "</TT>. The twig includes
|
|
the document tree that has been built so far, the element is the complete
|
|
sub-tree for the element. The fact that the handler is called only when the
|
|
closing tag for the element is found means that handlers for inner elements
|
|
are called before handlers for outer elements.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>$_</TT> is also set to the element, so it is easy to write inline handlers like
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
para => sub { $_->set_tag( 'p'); }
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Text is stored in elements whose tag name is #PCDATA (due to mixed content,
|
|
text and sub-element in an element there is no way to store the text as just
|
|
an attribute of the enclosing element).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Warning</B>: if you have used purge or flush on the twig the element might not
|
|
be complete, some of its children might have been entirely flushed or purged,
|
|
and the start tag might even have been printed (by <TT>"flush"</TT>) already, so changing
|
|
its tag might not give the expected result.
|
|
</DL>
|
|
|
|
<DT id="19">twig_roots<DD>
|
|
|
|
|
|
This argument lets you build the tree only for those elements you are
|
|
interested in.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
Example: my $t= XML::Twig->new( twig_roots => { title => 1, subtitle => 1});
|
|
$t->parsefile( file);
|
|
my $t= XML::Twig->new( twig_roots => { 'section/title' => 1});
|
|
$t->parsefile( file);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
return a twig containing a document including only <TT>"title"</TT> and <TT>"subtitle"</TT>
|
|
elements, as children of the root element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can use <I>generic_attribute_condition</I>, <I>attribute_condition</I>,
|
|
<I>full_path</I>, <I>partial_path</I>, <I>tag</I>, <I>tag_regexp</I>, <I>_default_</I> and
|
|
<I>_all_</I> to trigger the building of the twig.
|
|
<I>string_condition</I> and <I>regexp_condition</I> cannot be used as the content
|
|
of the element, and the string, have not yet been parsed when the condition
|
|
is checked.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: path are checked for the document. Even if the <TT>"twig_roots"</TT> option
|
|
is used they will be checked against the full document tree, not the virtual
|
|
tree created by XML::Twig
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: twig_roots elements should <FONT SIZE="-1">NOT</FONT> be nested, that would hopelessly
|
|
confuse XML::Twig ;--(
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note: you can set handlers (twig_handlers) using twig_roots
|
|
<BR> Example: my <TT>$t</TT>= XML::Twig->new( twig_roots =>
|
|
<BR> { title => sub { <TT>$_</TT>[1]->print;},
|
|
<BR> subtitle => \&process_subtitle
|
|
<BR> }
|
|
<BR> );
|
|
<BR> <TT>$t</TT>->parsefile( file);
|
|
<DT id="20">twig_print_outside_roots<DD>
|
|
|
|
|
|
To be used in conjunction with the <TT>"twig_roots"</TT> argument. When set to a true
|
|
value this will print the document outside of the <TT>"twig_roots"</TT> elements.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
Example: my $t= XML::Twig->new( twig_roots => { title => \&number_title },
|
|
twig_print_outside_roots => 1,
|
|
);
|
|
$t->parsefile( file);
|
|
{ my $nb;
|
|
sub number_title
|
|
{ my( $twig, $title);
|
|
$nb++;
|
|
$title->prefix( "$nb ");
|
|
$title->print;
|
|
}
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This example prints the document outside of the title element, calls
|
|
<TT>"number_title"</TT> for each <TT>"title"</TT> element, prints it, and then resumes printing
|
|
the document. The twig is built only for the <TT>"title"</TT> elements.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the value is a reference to a file handle then the document outside the
|
|
<TT>"twig_roots"</TT> elements will be output to this file handle:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
open( my $out, '>', 'out_file.xml') or die "cannot open out file.xml out_file:$!";
|
|
my $t= XML::Twig->new( twig_roots => { title => \&number_title },
|
|
# default output to $out
|
|
twig_print_outside_roots => $out,
|
|
);
|
|
|
|
{ my $nb;
|
|
sub number_title
|
|
{ my( $twig, $title);
|
|
$nb++;
|
|
$title->prefix( "$nb ");
|
|
$title->print( $out); # you have to print to \*OUT here
|
|
}
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="21">start_tag_handlers<DD>
|
|
|
|
|
|
A hash <TT>"{ expression ="</TT> \&handler}>. Sets element handlers that are called when
|
|
the element is open (at the end of the XML::Parser <TT>"Start"</TT> handler). The handlers
|
|
are called with 2 params: the twig and the element. The element is empty at
|
|
that point, its attributes are created though.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can use <I>generic_attribute_condition</I>, <I>attribute_condition</I>,
|
|
<I>full_path</I>, <I>partial_path</I>, <I>tag</I>, <I>tag_regexp</I>, <I>_default_</I> and <I>_all_</I>
|
|
to trigger the handler.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<I>string_condition</I> and <I>regexp_condition</I> cannot be used as the content of
|
|
the element, and the string, have not yet been parsed when the condition is
|
|
checked.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The main uses for those handlers are to change the tag name (you might have to
|
|
do it as soon as you find the open tag if you plan to <TT>"flush"</TT> the twig at some
|
|
point in the element, and to create temporary attributes that will be used
|
|
when processing sub-element with <TT>"twig_hanlders"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You should also use it to change tags if you use <TT>"flush"</TT>. If you change the tag
|
|
in a regular <TT>"twig_handler"</TT> then the start tag might already have been flushed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Note</B>: <TT>"start_tag"</TT> handlers can be called outside of <TT>"twig_roots"</TT> if this
|
|
argument is used, in this case handlers are called with the following arguments:
|
|
<TT>$t</TT> (the twig), <TT>$tag</TT> (the tag of the element) and <TT>%att</TT> (a hash of the
|
|
attributes of the element).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the <TT>"twig_print_outside_roots"</TT> argument is also used, if the last handler
|
|
called returns a <TT>"true"</TT> value, then the start tag will be output as it
|
|
appeared in the original document, if the handler returns a <TT>"false"</TT> value
|
|
then the start tag will <B>not</B> be printed (so you can print a modified string
|
|
yourself for example).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that you can use the ignore method in <TT>"start_tag_handlers"</TT>
|
|
(and only there).
|
|
<DT id="22">end_tag_handlers<DD>
|
|
|
|
|
|
A hash <TT>"{ expression ="</TT> \&handler}>. Sets element handlers that are called when
|
|
the element is closed (at the end of the XML::Parser <TT>"End"</TT> handler). The handlers
|
|
are called with 2 params: the twig and the tag of the element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<I>twig_handlers</I> are called when an element is completely parsed, so why have
|
|
this redundant option? There is only one use for <TT>"end_tag_handlers"</TT>: when using
|
|
the <TT>"twig_roots"</TT> option, to trigger a handler for an element <B>outside</B> the roots.
|
|
It is for example very useful to number titles in a document using nested
|
|
sections:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my @no= (0);
|
|
my $no;
|
|
my $t= XML::Twig->new(
|
|
start_tag_handlers =>
|
|
{ section => sub { $no[$#no]++; $no= join '.', @no; push @no, 0; } },
|
|
twig_roots =>
|
|
{ title => sub { $_[1]->prefix( $no); $_[1]->print; } },
|
|
end_tag_handlers => { section => sub { pop @no; } },
|
|
twig_print_outside_roots => 1
|
|
);
|
|
$t->parsefile( $file);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Using the <TT>"end_tag_handlers"</TT> argument without <TT>"twig_roots"</TT> will result in an
|
|
error.
|
|
<DT id="23">do_not_chain_handlers<DD>
|
|
|
|
|
|
If this option is set to a true value, then only one handler will be called for
|
|
each element, even if several satisfy the condition
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the <TT>"_all_"</TT> handler will still be called regardless
|
|
<DT id="24">ignore_elts<DD>
|
|
|
|
|
|
This option lets you ignore elements when building the twig. This is useful
|
|
in cases where you cannot use <TT>"twig_roots"</TT> to ignore elements, for example if
|
|
the element to ignore is a sibling of elements you are interested in.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $twig= XML::Twig->new( ignore_elts => { elt => 'discard' });
|
|
$twig->parsefile( 'doc.xml');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This will build the complete twig for the document, except that all <TT>"elt"</TT>
|
|
elements (and their children) will be left out.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The keys in the hash are triggers, limited to the same subset as
|
|
<TT>"start_tag_handlers"</TT>. The values can be <TT>"discard"</TT>, to discard
|
|
the element, <TT>"print"</TT>, to output the element as-is, <TT>"string"</TT> to
|
|
store the text of the ignored element(s), including markup, in a field of
|
|
the twig: <TT>"$t->{twig_buffered_string}"</TT> or a reference to a scalar, in
|
|
which case the text of the ignored element(s), including markup, will be
|
|
stored in the scalar. Any other value will be treated as <TT>"discard"</TT>.
|
|
<DT id="25">char_handler<DD>
|
|
|
|
|
|
A reference to a subroutine that will be called every time <TT>"PCDATA"</TT> is found.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The subroutine receives the string as argument, and returns the modified string:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
# we want all strings in upper case
|
|
sub my_char_handler
|
|
{ my( $text)= @_;
|
|
$text= uc( $text);
|
|
return $text;
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="26">elt_class<DD>
|
|
|
|
|
|
The name of a class used to store elements. this class should inherit from
|
|
<TT>"XML::Twig::Elt"</TT> (and by default it is <TT>"XML::Twig::Elt"</TT>). This option is used
|
|
to subclass the element class and extend it with new methods.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This option is needed because during the parsing of the <FONT SIZE="-1">XML,</FONT> elements are created
|
|
by <TT>"XML::Twig"</TT>, without any control from the user code.
|
|
<DT id="27">keep_atts_order<DD>
|
|
|
|
|
|
Setting this option to a true value causes the attribute hash to be tied to
|
|
a <TT>"Tie::IxHash"</TT> object.
|
|
This means that <TT>"Tie::IxHash"</TT> needs to be installed for this option to be
|
|
available. It also means that the hash keeps its order, so you will get
|
|
the attributes in order. This allows outputting the attributes in the same
|
|
order as they were in the original document.
|
|
<DT id="28">keep_encoding<DD>
|
|
|
|
|
|
This is a (slightly?) evil option: if the <FONT SIZE="-1">XML</FONT> document is not <FONT SIZE="-1">UTF-8</FONT> encoded and
|
|
you want to keep it that way, then setting keep_encoding will use the<TT>"Expat"</TT>
|
|
original_string method for character, thus keeping the original encoding, as
|
|
well as the original entities in the strings.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
See the <TT>"t/test6.t"</TT> test file to see what results you can expect from the
|
|
various encoding options.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: if the original encoding is multi-byte then attribute parsing will
|
|
be <FONT SIZE="-1">EXTREMELY</FONT> unsafe under any Perl before 5.6, as it uses regular expressions
|
|
which do not deal properly with multi-byte characters. You can specify an
|
|
alternate function to parse the start tags with the <TT>"parse_start_tag"</TT> option
|
|
(see below)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: this option is <FONT SIZE="-1">NOT</FONT> used when parsing with the non-blocking parser
|
|
(<TT>"parse_start"</TT>, <TT>"parse_more"</TT>, parse_done methods) which you probably should
|
|
not use with XML::Twig anyway as they are totally untested!
|
|
<DT id="29">output_encoding<DD>
|
|
|
|
|
|
This option generates an output_filter using <TT>"Encode"</TT>, <TT>"Text::Iconv"</TT> or
|
|
<TT>"Unicode::Map8"</TT> and <TT>"Unicode::Strings"</TT>, and sets the encoding in the <FONT SIZE="-1">XML</FONT>
|
|
declaration. This is the easiest way to deal with encodings, if you need
|
|
more sophisticated features, look at <TT>"output_filter"</TT> below
|
|
<DT id="30">output_filter<DD>
|
|
|
|
|
|
This option is used to convert the character encoding of the output document.
|
|
It is passed either a string corresponding to a predefined filter or
|
|
a subroutine reference. The filter will be called every time a document or
|
|
element is processed by the ``print'' functions (<TT>"print"</TT>, <TT>"sprint"</TT>, <TT>"flush"</TT>).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Pre-defined filters:
|
|
<DL COMPACT><DT id="31"><DD>
|
|
<DL COMPACT>
|
|
<DT id="32">latin1<DD>
|
|
|
|
|
|
uses either <TT>"Encode"</TT>, <TT>"Text::Iconv"</TT> or <TT>"Unicode::Map8"</TT> and <TT>"Unicode::String"</TT>
|
|
or a regexp (which works only with XML::Parser 2.27), in this order, to convert
|
|
all characters to <FONT SIZE="-1">ISO-8859-15</FONT> (usually latin1 is synonym to <FONT SIZE="-1">ISO-8859-1,</FONT> but
|
|
in practice it seems that <FONT SIZE="-1">ISO-8859-15,</FONT> which includes the euro sign, is more
|
|
useful and probably what most people want).
|
|
<DT id="33">html<DD>
|
|
|
|
|
|
does the same conversion as <TT>"latin1"</TT>, plus encodes entities using
|
|
<TT>"HTML::Entities"</TT> (oddly enough you will need to have HTML::Entities installed
|
|
for it to be available). This should only be used if the tags and attribute
|
|
names themselves are in US-ASCII, or they will be converted and the output will
|
|
not be valid <FONT SIZE="-1">XML</FONT> any more
|
|
<DT id="34">safe<DD>
|
|
|
|
|
|
converts the output to <FONT SIZE="-1">ASCII</FONT> (<FONT SIZE="-1">US</FONT>) only plus <I>character entities</I> (<TT>"&#nnn;"</TT>)
|
|
this should be used only if the tags and attribute names themselves are in
|
|
US-ASCII, or they will be converted and the output will not be valid <FONT SIZE="-1">XML</FONT> any
|
|
more
|
|
<DT id="35">safe_hex<DD>
|
|
|
|
|
|
same as <TT>"safe"</TT> except that the character entities are in hex (<TT>"&#xnnn;"</TT>)
|
|
<DT id="36">encode_convert ($encoding)<DD>
|
|
|
|
|
|
Return a subref that can be used to convert utf8 strings to <TT>$encoding</TT>).
|
|
Uses <TT>"Encode"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $conv = XML::Twig::encode_convert( 'latin1');
|
|
my $t = XML::Twig->new(output_filter => $conv);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="37">iconv_convert ($encoding)<DD>
|
|
|
|
|
|
this function is used to create a filter subroutine that will be used to
|
|
convert the characters to the target encoding using <TT>"Text::Iconv"</TT> (which needs
|
|
to be installed, look at the documentation for the module and for the
|
|
<TT>"iconv"</TT> library to find out which encodings are available on your system)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $conv = XML::Twig::iconv_convert( 'latin1');
|
|
my $t = XML::Twig->new(output_filter => $conv);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="38">unicode_convert ($encoding)<DD>
|
|
|
|
|
|
this function is used to create a filter subroutine that will be used to
|
|
convert the characters to the target encoding using <TT>"Unicode::Strings"</TT>
|
|
and <TT>"Unicode::Map8"</TT> (which need to be installed, look at the documentation
|
|
for the modules to find out which encodings are available on your system)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $conv = XML::Twig::unicode_convert( 'latin1');
|
|
my $t = XML::Twig->new(output_filter => $conv);
|
|
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="39"><DD>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The <TT>"text"</TT> and <TT>"att"</TT> methods do not use the filter, so their
|
|
result are always in unicode.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Those predeclared filters are based on subroutines that can be used
|
|
by themselves (as <TT>"XML::Twig::foo"</TT>).
|
|
<DL COMPACT>
|
|
<DT id="40">html_encode ($string)<DD>
|
|
|
|
|
|
Use <TT>"HTML::Entities"</TT> to encode a utf8 string
|
|
<DT id="41">safe_encode ($string)<DD>
|
|
|
|
|
|
Use either a regexp (perl < 5.8) or <TT>"Encode"</TT> to encode non-ascii characters
|
|
in the string in <TT>"&#<nnnn>;"</TT> format
|
|
<DT id="42">safe_encode_hex ($string)<DD>
|
|
|
|
|
|
Use either a regexp (perl < 5.8) or <TT>"Encode"</TT> to encode non-ascii characters
|
|
in the string in <TT>"&#x<nnnn>;"</TT> format
|
|
<DT id="43">regexp2latin1 ($string)<DD>
|
|
|
|
|
|
Use a regexp to encode a utf8 string into latin 1 (<FONT SIZE="-1">ISO-8859-1</FONT>). Does not
|
|
work with Perl 5.8.0!
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="44"><DD>
|
|
</DL>
|
|
|
|
<DT id="45">output_text_filter<DD>
|
|
|
|
|
|
same as output_filter, except it doesn't apply to the brackets and quotes
|
|
around attribute values. This is useful for all filters that could change
|
|
the tagging, basically anything that does not just change the encoding of
|
|
the output. <TT>"html"</TT>, <TT>"safe"</TT> and <TT>"safe_hex"</TT> are better used with this option.
|
|
<DT id="46">input_filter<DD>
|
|
|
|
|
|
This option is similar to <TT>"output_filter"</TT> except the filter is applied to
|
|
the characters before they are stored in the twig, at parsing time.
|
|
<DT id="47">remove_cdata<DD>
|
|
|
|
|
|
Setting this option to a true value will force the twig to output <FONT SIZE="-1">CDATA</FONT>
|
|
sections as regular (escaped) <FONT SIZE="-1">PCDATA</FONT>
|
|
<DT id="48">parse_start_tag<DD>
|
|
|
|
|
|
If you use the <TT>"keep_encoding"</TT> option then this option can be used to replace
|
|
the default parsing function. You should provide a coderef (a reference to a
|
|
subroutine) as the argument, this subroutine takes the original tag (given
|
|
by XML::Parser::Expat <TT>"original_string()"</TT> method) and returns a tag and the
|
|
attributes in a hash (or in a list attribute_name/attribute value).
|
|
<DT id="49">expand_external_ents<DD>
|
|
|
|
|
|
When this option is used external entities (that are defined) are expanded
|
|
when the document is output using ``print'' functions such as <TT>"print "</TT>,
|
|
<TT>"sprint "</TT>, <TT>"flush "</TT> and <TT>"xml_string "</TT>.
|
|
Note that in the twig the entity will be stored as an element with a
|
|
tag '<TT>"#ENT"</TT>', the entity will not be expanded there, so you might want to
|
|
process the entities before outputting it.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If an external entity is not available, then the parse will fail.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A special case is when the value of this option is -1. In that case a missing
|
|
entity will not cause the parser to die, but its <TT>"name"</TT>, <TT>"sysid"</TT> and <TT>"pubid"</TT>
|
|
will be stored in the twig as <TT>"$twig->{twig_missing_system_entities}"</TT>
|
|
(a reference to an array of hashes { name => <name>, sysid => <sysid>,
|
|
pubid => <pubid> }). Yes, this is a bit of a hack, but it's useful in some
|
|
cases.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: setting expand_external_ents to 0 or -1 currently doesn't work
|
|
as expected; cf. <<A HREF="https://rt.cpan.org/Public/Bug/Display.html?id=118097">https://rt.cpan.org/Public/Bug/Display.html?id=118097</A>>.
|
|
To completelty turn off expanding external entities use <TT>"no_xxe"</TT>.
|
|
<DT id="50">no_xxe<DD>
|
|
|
|
|
|
If this argument is set to a true value, expanding of external entities is
|
|
turned off.
|
|
<DT id="51">load_DTD<DD>
|
|
|
|
|
|
If this argument is set to a true value, <TT>"parse"</TT> or <TT>"parsefile"</TT> on the twig
|
|
will load the <FONT SIZE="-1">DTD</FONT> information. This information can then be accessed through
|
|
the twig, in a <TT>"DTD_handler"</TT> for example. This will load even an external <FONT SIZE="-1">DTD.</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Default and fixed values for attributes will also be filled, based on the <FONT SIZE="-1">DTD.</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that to do this the module will generate a temporary file in the current
|
|
directory. If this is a problem let me know and I will add an option to
|
|
specify an alternate directory.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
See ``<FONT SIZE="-1">DTD</FONT> Handling'' for more information
|
|
<DT id="52">DTD_base <path_to_DTD_directory><DD>
|
|
|
|
|
|
If the <FONT SIZE="-1">DTD</FONT> is in a different directory, looks for it there, useful to make up
|
|
somewhat for the lack of catalog support in <TT>"expat"</TT>. You still need a <FONT SIZE="-1">SYSTEM</FONT>
|
|
declaration
|
|
<DT id="53">DTD_handler<DD>
|
|
|
|
|
|
Set a handler that will be called once the doctype (and the <FONT SIZE="-1">DTD</FONT>) have been
|
|
loaded, with 2 arguments, the twig and the <FONT SIZE="-1">DTD.</FONT>
|
|
<DT id="54">no_prolog<DD>
|
|
|
|
|
|
Does not output a prolog (<FONT SIZE="-1">XML</FONT> declaration and <FONT SIZE="-1">DTD</FONT>)
|
|
<DT id="55">id<DD>
|
|
|
|
|
|
This optional argument gives the name of an attribute that can be used as
|
|
an <FONT SIZE="-1">ID</FONT> in the document. Elements whose <FONT SIZE="-1">ID</FONT> is known can be accessed through
|
|
the elt_id method. id defaults to 'id'.
|
|
See <TT>"BUGS "</TT>
|
|
<DT id="56">discard_spaces<DD>
|
|
|
|
|
|
If this optional argument is set to a true value then spaces are discarded
|
|
when they look non-significant: strings containing only spaces and at least
|
|
one line feed are discarded. This argument is set to true by default.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The exact algorithm to drop spaces is: strings including only spaces (perl \s)
|
|
and at least one \n right before an open or close tag are dropped.
|
|
<DT id="57">discard_all_spaces<DD>
|
|
|
|
|
|
If this argument is set to a true value, spaces are discarded more
|
|
aggressively than with <TT>"discard_spaces"</TT>: strings not including a \n are also
|
|
dropped. This option is appropriate for data-oriented <FONT SIZE="-1">XML.</FONT>
|
|
<DT id="58">keep_spaces<DD>
|
|
|
|
|
|
If this optional argument is set to a true value then all spaces in the
|
|
document are kept, and stored as <TT>"PCDATA"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Warning</B>: adding this option can result in changes in the twig generated:
|
|
space that was previously discarded might end up in a new text element. see
|
|
the difference by calling the following code with 0 and 1 as arguments:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
perl -MXML::Twig -e'print XML::Twig->new( keep_spaces => shift)->parse( "<d> \n<e/></d>")->_dump'
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"keep_spaces"</TT> and <TT>"discard_spaces"</TT> cannot be both set.
|
|
<DT id="59">discard_spaces_in<DD>
|
|
|
|
|
|
This argument sets <TT>"keep_spaces"</TT> to true but will cause the twig builder to
|
|
discard spaces in the elements listed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The syntax for using this argument is:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
XML::Twig->new( discard_spaces_in => [ 'elt1', 'elt2']);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="60">keep_spaces_in<DD>
|
|
|
|
|
|
This argument sets <TT>"discard_spaces"</TT> to true but will cause the twig builder to
|
|
keep spaces in the elements listed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The syntax for using this argument is:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
XML::Twig->new( keep_spaces_in => [ 'elt1', 'elt2']);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Warning</B>: adding this option can result in changes in the twig generated:
|
|
space that was previously discarded might end up in a new text element.
|
|
<DT id="61">pretty_print<DD>
|
|
|
|
|
|
Set the pretty print method, amongst '<TT>"none"</TT>' (default), '<TT>"nsgmls"</TT>',
|
|
'<TT>"nice"</TT>', '<TT>"indented"</TT>', '<TT>"indented_c"</TT>', '<TT>"indented_a"</TT>',
|
|
'<TT>"indented_close_tag"</TT>', '<TT>"cvs"</TT>', '<TT>"wrapped"</TT>', '<TT>"record"</TT>' and '<TT>"record_c"</TT>'
|
|
|
|
|
|
<P>
|
|
|
|
|
|
pretty_print formats:
|
|
<DL COMPACT><DT id="62"><DD>
|
|
<DL COMPACT>
|
|
<DT id="63">none<DD>
|
|
|
|
|
|
The document is output as one ling string, with no line breaks except those
|
|
found within text elements
|
|
<DT id="64">nsgmls<DD>
|
|
|
|
|
|
Line breaks are inserted in safe places: that is within tags, between a tag
|
|
and an attribute, between attributes and before the > at the end of a tag.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is quite ugly but better than <TT>"none"</TT>, and it is very safe, the document
|
|
will still be valid (conforming to its <FONT SIZE="-1">DTD</FONT>).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is how the <FONT SIZE="-1">SGML</FONT> parser <TT>"sgmls"</TT> splits documents, hence the name.
|
|
<DT id="65">nice<DD>
|
|
|
|
|
|
This option inserts line breaks before any tag that does not contain text (so
|
|
element with textual content are not broken as the \n is the significant).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: this option leaves the document well-formed but might make it
|
|
invalid (not conformant to its <FONT SIZE="-1">DTD</FONT>). If you have elements declared as
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<!ELEMENT foo (#PCDATA|bar)>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
then a <TT>"foo"</TT> element including a <TT>"bar"</TT> one will be printed as
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<foo>
|
|
<bar>bar is just pcdata</bar>
|
|
</foo>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is invalid, as the parser will take the line break after the <TT>"foo"</TT> tag
|
|
as a sign that the element contains <FONT SIZE="-1">PCDATA,</FONT> it will then die when it finds the
|
|
<TT>"bar"</TT> tag. This may or may not be important for you, but be aware of it!
|
|
<DT id="66">indented<DD>
|
|
|
|
|
|
Same as <TT>"nice"</TT> (and with the same warning) but indents elements according to
|
|
their level
|
|
<DT id="67">indented_c<DD>
|
|
|
|
|
|
Same as <TT>"indented"</TT> but a little more compact: the closing tags are on the
|
|
same line as the preceding text
|
|
<DT id="68">indented_close_tag<DD>
|
|
|
|
|
|
Same as <TT>"indented"</TT> except that the closing tag is also indented, to line up
|
|
with the tags within the element
|
|
<DT id="69">idented_a<DD>
|
|
|
|
|
|
This formats <FONT SIZE="-1">XML</FONT> files in a line-oriented version control friendly way.
|
|
The format is described in <<A HREF="http://tinyurl.com/2kwscq">http://tinyurl.com/2kwscq</A>> (that's an Oracle
|
|
document with an insanely long <FONT SIZE="-1">URL</FONT>).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that to be totaly conformant to the ``spec'', the order of attributes
|
|
should not be changed, so if they are not already in alphabetical order
|
|
you will need to use the <TT>"keep_atts_order"</TT> option.
|
|
<DT id="70">cvs<DD>
|
|
|
|
|
|
Same as <TT>"idented_a"</TT>.
|
|
<DT id="71">wrapped<DD>
|
|
|
|
|
|
Same as <TT>"indented_c"</TT> but lines are wrapped using Text::Wrap::wrap. The
|
|
default length for lines is the default for <TT>$Text::Wrap::columns</TT>, and can
|
|
be changed by changing that variable.
|
|
<DT id="72">record<DD>
|
|
|
|
|
|
This is a record-oriented pretty print, that display data in records, one field
|
|
per line (which looks a <FONT SIZE="-1">LOT</FONT> like <TT>"indented"</TT>)
|
|
<DT id="73">record_c<DD>
|
|
|
|
|
|
Stands for record compact, one record per line
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="74"><DD>
|
|
</DL>
|
|
|
|
<DT id="75">empty_tags<DD>
|
|
|
|
|
|
Set the empty tag display style ('<TT>"normal"</TT>', '<TT>"html"</TT>' or '<TT>"expand"</TT>').
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"normal"</TT> outputs an empty tag '<TT>"<tag/>"</TT>', <TT>"html"</TT> adds a space
|
|
'<TT>"<tag />"</TT>' for elements that can be empty in <FONT SIZE="-1">XHTML</FONT> and <TT>"expand"</TT> outputs
|
|
'<TT>"<tag></tag>"</TT>'
|
|
<DT id="76">quote<DD>
|
|
|
|
|
|
Set the quote character for attributes ('<TT>"single"</TT>' or '<TT>"double"</TT>').
|
|
<DT id="77">escape_gt<DD>
|
|
|
|
|
|
By default XML::Twig does not escape the character > in its output, as it is not
|
|
mandated by the <FONT SIZE="-1">XML</FONT> spec. With this option on, > will be replaced by <TT>"&gt;"</TT>
|
|
<DT id="78">comments<DD>
|
|
|
|
|
|
Set the way comments are processed: '<TT>"drop"</TT>' (default), '<TT>"keep"</TT>' or
|
|
'<TT>"process"</TT>'
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Comments processing options:
|
|
<DL COMPACT><DT id="79"><DD>
|
|
<DL COMPACT>
|
|
<DT id="80">drop<DD>
|
|
|
|
|
|
drops the comments, they are not read, nor printed to the output
|
|
<DT id="81">keep<DD>
|
|
|
|
|
|
comments are loaded and will appear on the output, they are not
|
|
accessible within the twig and will not interfere with processing
|
|
though
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Note</B>: comments in the middle of a text element such as
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<p>text <!-- comment --> more text --></p>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
are kept at their original position in the text. Using ``print''
|
|
methods like <TT>"print"</TT> or <TT>"sprint"</TT> will return the comments in the
|
|
text. Using <TT>"text"</TT> or <TT>"field"</TT> on the other hand will not.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Any use of <TT>"set_pcdata"</TT> on the <TT>"#PCDATA"</TT> element (directly or
|
|
through other methods like <TT>"set_content"</TT>) will delete the comment(s).
|
|
<DT id="82">process<DD>
|
|
|
|
|
|
comments are loaded in the twig and will be treated as regular elements
|
|
(their <TT>"tag"</TT> is <TT>"#COMMENT"</TT>) this can interfere with processing if you
|
|
expect <TT>"$elt->{first_child}"</TT> to be an element but find a comment there.
|
|
Validation will not protect you from this as comments can happen anywhere.
|
|
You can use <TT>"$elt->first_child( 'tag')"</TT> (which is a good habit anyway)
|
|
to get where you want.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Consider using <TT>"process"</TT> if you are outputting <FONT SIZE="-1">SAX</FONT> events from XML::Twig.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="83"><DD>
|
|
</DL>
|
|
|
|
<DT id="84">pi<DD>
|
|
|
|
|
|
Set the way processing instructions are processed: '<TT>"drop"</TT>', '<TT>"keep"</TT>'
|
|
(default) or '<TT>"process"</TT>'
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that you can also set <FONT SIZE="-1">PI</FONT> handlers in the <TT>"twig_handlers"</TT> option:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
'?' => \&handler
|
|
'?target' => \&handler 2
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The handlers will be called with 2 parameters, the twig and the <FONT SIZE="-1">PI</FONT> element if
|
|
<TT>"pi"</TT> is set to <TT>"process"</TT>, and with 3, the twig, the target and the data if
|
|
<TT>"pi"</TT> is set to <TT>"keep"</TT>. Of course they will not be called if <TT>"pi"</TT> is set to
|
|
<TT>"drop"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If <TT>"pi"</TT> is set to <TT>"keep"</TT> the handler should return a string that will be used
|
|
as-is as the <FONT SIZE="-1">PI</FONT> text (it should look like "<TT>" <?target data?"</TT> >" or '' if you
|
|
want to remove the <FONT SIZE="-1">PI</FONT>),
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Only one handler will be called, <TT>"?target"</TT> or <TT>"?"</TT> if no specific handler for
|
|
that target is available.
|
|
<DT id="85">map_xmlns<DD>
|
|
|
|
|
|
This option is passed a hashref that maps uri's to prefixes. The prefixes in
|
|
the document will be replaced by the ones in the map. The mapped prefixes can
|
|
(actually have to) be used to trigger handlers, navigate or query the document.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Here is an example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( map_xmlns => {'<A HREF="http://www.w3.org/2000/svg'">http://www.w3.org/2000/svg'</A> => "svg"},
|
|
twig_handlers =>
|
|
{ 'svg:circle' => sub { $_->set_att( r => 20) } },
|
|
pretty_print => 'indented',
|
|
)
|
|
->parse( '<doc xmlns:gr="<A HREF="http://www.w3.org/2000/svg">http://www.w3.org/2000/svg</A>">
|
|
<gr:circle cx="10" cy="90" r="10"/>
|
|
</doc>'
|
|
)
|
|
->print;
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This will output:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<doc xmlns:svg="<A HREF="http://www.w3.org/2000/svg">http://www.w3.org/2000/svg</A>">
|
|
<svg:circle cx="10" cy="90" r="20"/>
|
|
</doc>
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="86">keep_original_prefix<DD>
|
|
|
|
|
|
When used with <TT>"map_xmlns"</TT> this option will make <TT>"XML::Twig"</TT> use the original
|
|
namespace prefixes when outputting a document. The mapped prefix will still be used
|
|
for triggering handlers and in navigation and query methods.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( map_xmlns => {'<A HREF="http://www.w3.org/2000/svg'">http://www.w3.org/2000/svg'</A> => "svg"},
|
|
twig_handlers =>
|
|
{ 'svg:circle' => sub { $_->set_att( r => 20) } },
|
|
keep_original_prefix => 1,
|
|
pretty_print => 'indented',
|
|
)
|
|
->parse( '<doc xmlns:gr="<A HREF="http://www.w3.org/2000/svg">http://www.w3.org/2000/svg</A>">
|
|
<gr:circle cx="10" cy="90" r="10"/>
|
|
</doc>'
|
|
)
|
|
->print;
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This will output:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<doc xmlns:gr="<A HREF="http://www.w3.org/2000/svg">http://www.w3.org/2000/svg</A>">
|
|
<gr:circle cx="10" cy="90" r="20"/>
|
|
</doc>
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="87">original_uri ($prefix)<DD>
|
|
|
|
|
|
called within a handler, this will return the uri bound to the namespace prefix
|
|
in the original document.
|
|
<DT id="88">index ($arrayref or $hashref)<DD>
|
|
|
|
|
|
|
|
|
|
This option creates lists of specific elements during the parsing of the <FONT SIZE="-1">XML.</FONT>
|
|
It takes a reference to either a list of triggering expressions or to a hash
|
|
name => expression, and for each one generates the list of elements that
|
|
match the expression. The list can be accessed through the <TT>"index"</TT> method.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
# using an array ref
|
|
my $t= XML::Twig->new( index => [ 'div', 'table' ])
|
|
->parsefile( "foo.xml");
|
|
my $divs= $t->index( 'div');
|
|
my $first_div= $divs->[0];
|
|
my $last_table= $t->index( table => -1);
|
|
|
|
# using a hashref to name the indexes
|
|
my $t= XML::Twig->new( index => { email => 'a[@href=~/^ \s*mailto:/]'})
|
|
->parsefile( "foo.xml");
|
|
my $last_emails= $t->index( email => -1);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the index is not maintained after the parsing. If elements are
|
|
deleted, renamed or otherwise hurt during processing, the index is <FONT SIZE="-1">NOT</FONT> updated.
|
|
(changing the id element <FONT SIZE="-1">OTOH</FONT> will update the index)
|
|
<DT id="89">att_accessors <list of attribute names><DD>
|
|
|
|
|
|
creates methods that give direct access to attribute:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( att_accessors => [ 'href', 'src'])
|
|
->parsefile( $file);
|
|
my $first_href= $t->first_elt( 'img')->src; # same as ->att( 'src')
|
|
$t->first_elt( 'img')->src( 'new_logo.png') # changes the attribute value
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="90">elt_accessors<DD>
|
|
|
|
|
|
creates methods that give direct access to the first child element (in scalar context)
|
|
or the list of elements (in list context):
|
|
|
|
|
|
<P>
|
|
|
|
|
|
the list of accessors to create can be given 1 2 different ways: in an array,
|
|
or in a hash alias => expression
|
|
<BR> my <TT>$t</TT>= XML::Twig->new( elt_accessors => [ 'head'])
|
|
<BR> ->parsefile( <TT>$file</TT>);
|
|
<BR> my <TT>$title_text</TT>= <TT>$t</TT>->root->head->field( 'title');
|
|
<BR> # same as <TT>$title_text</TT>= <TT>$t</TT>->root->first_child( 'head')->field( 'title');
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( elt_accessors => { warnings => 'p[@class="warning"]', d2 => 'div[2]'}, )
|
|
->parsefile( $file);
|
|
my $body= $t->first_elt( 'body');
|
|
my @warnings= $body->warnings; # same as $body->children( 'p[@class="warning"]');
|
|
my $s2= $body->d2; # same as $body->first_child( 'div[2]')
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="91">field_accessors<DD>
|
|
|
|
|
|
creates methods that give direct access to the first child element text:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new( field_accessors => [ 'h1'])
|
|
->parsefile( $file);
|
|
my $div_title_text= $t->first_elt( 'div')->title;
|
|
# same as $title_text= $t->first_elt( 'div')->field( 'title');
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="92">use_tidy<DD>
|
|
|
|
|
|
set this option to use HTML::Tidy instead of HTML::TreeBuilder to convert
|
|
<FONT SIZE="-1">HTML</FONT> to <FONT SIZE="-1">XML. HTML,</FONT> especially real (real ``crap'') <FONT SIZE="-1">HTML</FONT> found in the wild,
|
|
so depending on the data, one module or the other does a better job at
|
|
the conversion. Also, HTML::Tidy can be a bit difficult to install, so
|
|
XML::Twig offers both option. <FONT SIZE="-1">TIMTOWTDI</FONT>
|
|
<DT id="93">output_html_doctype<DD>
|
|
|
|
|
|
when using HTML::TreeBuilder to convert <FONT SIZE="-1">HTML,</FONT> this option causes the <FONT SIZE="-1">DOCTYPE</FONT>
|
|
declaration to be output, which may be important for some legacy browsers.
|
|
Without that option the <FONT SIZE="-1">DOCTYPE</FONT> definition is <FONT SIZE="-1">NOT</FONT> output. Also if the definition
|
|
is completely wrong (ie not easily parsable), it is not output either.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="94"><DD>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Note</B>: I _HATE_ the Java-like name of arguments used by most <FONT SIZE="-1">XML</FONT> modules.
|
|
So in pure <FONT SIZE="-1">TIMTOWTDI</FONT> fashion all arguments can be written either as
|
|
<TT>"UglyJavaLikeName"</TT> or as <TT>"readable_perl_name"</TT>: <TT>"twig_print_outside_roots"</TT>
|
|
or <TT>"TwigPrintOutsideRoots"</TT> (or even <TT>"twigPrintOutsideRoots"</TT> {shudder}).
|
|
XML::Twig normalizes them before processing them.
|
|
</DL>
|
|
|
|
<DT id="95">parse ( $source)<DD>
|
|
|
|
|
|
|
|
|
|
The <TT>$source</TT> parameter should either be a string containing the whole <FONT SIZE="-1">XML</FONT>
|
|
document, or it should be an open <TT>"IO::Handle"</TT> (aka a filehandle).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A die call is thrown if a parse error occurs. Otherwise it will return
|
|
the twig built by the parse. Use <TT>"safe_parse"</TT> if you want the parsing
|
|
to return even when an error occurs.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If this method is called as a class method
|
|
(<TT>"XML::Twig->parse( $some_xml_or_html)"</TT>) then an XML::Twig object is
|
|
created, using the parameters except the last one (eg
|
|
<TT>"XML::Twig->parse( pretty_print => 'indented', $some_xml_or_html)"</TT>)
|
|
and <TT>"xparse"</TT> is called on it.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that when parsing a filehandle, the handle should <FONT SIZE="-1">NOT</FONT> be open with an
|
|
encoding (ie open with <TT>"open( my $in, '<', $filename)"</TT>. The file will be
|
|
parsed by <TT>"expat"</TT>, so specifying the encoding actually causes problems
|
|
for the parser (as in: it can crash it, see
|
|
<A HREF="https://rt.cpan.org/Ticket/Display.html?id=78877).">https://rt.cpan.org/Ticket/Display.html?id=78877).</A> For parsing a file it
|
|
is actually recommended to use <TT>"parsefile"</TT> on the file name, instead of
|
|
<parse> on the open file.
|
|
<DT id="96">parsestring<DD>
|
|
|
|
|
|
This is just an alias for <TT>"parse"</TT> for backwards compatibility.
|
|
<DT id="97">parsefile (<FONT SIZE="-1">FILE</FONT> [, <FONT SIZE="-1">OPT</FONT> => <FONT SIZE="-1">OPT_VALUE</FONT> [...]])<DD>
|
|
|
|
|
|
Open <TT>"FILE"</TT> for reading, then call <TT>"parse"</TT> with the open handle. The file
|
|
is closed no matter how <TT>"parse"</TT> returns.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A <TT>"die"</TT> call is thrown if a parse error occurs. Otherwise it will return
|
|
the twig built by the parse. Use <TT>"safe_parsefile"</TT> if you want the parsing
|
|
to return even when an error occurs.
|
|
<DT id="98">parsefile_inplace ( $file, $optional_extension)<DD>
|
|
|
|
|
|
|
|
|
|
Parse and update a file ``in place''. It does this by creating a temp file,
|
|
selecting it as the default for <B>print()</B> statements (and methods), then parsing
|
|
the input file. If the parsing is successful, then the temp file is
|
|
moved to replace the input file.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If an extension is given then the original file is backed-up (the rules for
|
|
the extension are the same as the rule for the -i option in perl).
|
|
<DT id="99">parsefile_html_inplace ( $file, $optional_extension)<DD>
|
|
|
|
|
|
|
|
|
|
Same as parsefile_inplace, except that it parses <FONT SIZE="-1">HTML</FONT> instead of <FONT SIZE="-1">XML</FONT>
|
|
<DT id="100">parseurl ($url $optional_user_agent)<DD>
|
|
|
|
|
|
|
|
|
|
Gets the data from <TT>$url</TT> and parse it. The data is piped to the parser in
|
|
chunks the size of the XML::Parser::Expat buffer, so memory consumption and
|
|
hopefully speed are optimal.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that <TT>"parseurl"</TT> forks a child process that calls <TT>"exit"</TT> once the data
|
|
has been retrieved, which can interfere with locks. If that's aproblem, see
|
|
below:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
For most (read ``small'') <FONT SIZE="-1">XML</FONT> it is probably as efficient (and easier to debug)
|
|
to just <TT>"get"</TT> the <FONT SIZE="-1">XML</FONT> file and then parse it as a string.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
use XML::Twig;
|
|
use LWP::Simple;
|
|
my $twig= XML::Twig->new();
|
|
$twig->parse( LWP::Simple::get( $URL ));
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
or more simply to call <TT>"nparse"</TT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
use XML::Twig;
|
|
my $twig= XML::Twig->nparse( $URL);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the <TT>$optional_user_agent</TT> argument is passed to the method then it is used,
|
|
otherwise a new one is created.
|
|
<DT id="101">safe_parse ( <FONT SIZE="-1">SOURCE</FONT> [, <FONT SIZE="-1">OPT</FONT> => <FONT SIZE="-1">OPT_VALUE</FONT> [...]])<DD>
|
|
|
|
|
|
This method is similar to <TT>"parse"</TT> except that it wraps the parsing in an
|
|
<TT>"eval"</TT> block. It returns the twig on success and 0 on failure (the twig object
|
|
also contains the parsed twig). <TT>$@</TT> contains the error message on failure.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the parsing still stops as soon as an error is detected, there is
|
|
no way to keep going after an error.
|
|
<DT id="102">safe_parsefile (<FONT SIZE="-1">FILE</FONT> [, <FONT SIZE="-1">OPT</FONT> => <FONT SIZE="-1">OPT_VALUE</FONT> [...]])<DD>
|
|
|
|
|
|
This method is similar to <TT>"parsefile"</TT> except that it wraps the parsing in an
|
|
<TT>"eval"</TT> block. It returns the twig on success and 0 on failure (the twig object
|
|
also contains the parsed twig) . <TT>$@</TT> contains the error message on failure
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the parsing still stops as soon as an error is detected, there is
|
|
no way to keep going after an error.
|
|
<DT id="103">safe_parseurl ($url $optional_user_agent)<DD>
|
|
|
|
|
|
|
|
|
|
Same as <TT>"parseurl"</TT> except that it wraps the parsing in an <TT>"eval"</TT> block. It
|
|
returns the twig on success and 0 on failure (the twig object also contains
|
|
the parsed twig) . <TT>$@</TT> contains the error message on failure
|
|
<DT id="104">parse_html ($string_or_fh)<DD>
|
|
|
|
|
|
parse an <FONT SIZE="-1">HTML</FONT> string or file handle (by converting it to <FONT SIZE="-1">XML</FONT> using
|
|
HTML::TreeBuilder, which needs to be available).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This works nicely, but some information gets lost in the process:
|
|
newlines are removed, and (at least on the version I use), comments
|
|
get an extra <FONT SIZE="-1">CDATA</FONT> section inside ( <!-- foo --> becomes
|
|
<!-- <![CDATA[ foo ]]> -->
|
|
<DT id="105">parsefile_html ($file)<DD>
|
|
|
|
|
|
parse an <FONT SIZE="-1">HTML</FONT> file (by converting it to <FONT SIZE="-1">XML</FONT> using HTML::TreeBuilder, which
|
|
needs to be available, or HTML::Tidy if the <TT>"use_tidy"</TT> option was used).
|
|
The file is loaded completely in memory and converted to <FONT SIZE="-1">XML</FONT> before being parsed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
this method is to be used with caution though, as it doesn't know about the
|
|
file encoding, it is usually better to use <TT>"parse_html"</TT>, which gives you
|
|
a chance to open the file with the proper encoding layer.
|
|
<DT id="106">parseurl_html ($url $optional_user_agent)<DD>
|
|
|
|
|
|
|
|
|
|
parse an <FONT SIZE="-1">URL</FONT> as html the same way <TT>"parse_html"</TT> does
|
|
<DT id="107">safe_parseurl_html ($url $optional_user_agent)<DD>
|
|
|
|
|
|
|
|
|
|
Same as <TT>"parseurl_html"</TT>> except that it wraps the parsing in an <TT>"eval"</TT>
|
|
block. It returns the twig on success and 0 on failure (the twig object also
|
|
contains the parsed twig) . <TT>$@</TT> contains the error message on failure
|
|
<DT id="108">safe_parsefile_html ($file $optional_user_agent)<DD>
|
|
|
|
|
|
|
|
|
|
Same as <TT>"parsefile_html"</TT>> except that it wraps the parsing in an <TT>"eval"</TT>
|
|
block. It returns the twig on success and 0 on failure (the twig object also
|
|
contains the parsed twig) . <TT>$@</TT> contains the error message on failure
|
|
<DT id="109">safe_parse_html ($string_or_fh)<DD>
|
|
|
|
|
|
Same as <TT>"parse_html"</TT> except that it wraps the parsing in an <TT>"eval"</TT> block.
|
|
It returns the twig on success and 0 on failure (the twig object also contains
|
|
the parsed twig) . <TT>$@</TT> contains the error message on failure
|
|
<DT id="110">xparse ($thing_to_parse)<DD>
|
|
|
|
|
|
parse the <TT>$thing_to_parse</TT>, whether it is a filehandle, a string, an <FONT SIZE="-1">HTML</FONT>
|
|
file, an <FONT SIZE="-1">HTML URL,</FONT> an <FONT SIZE="-1">URL</FONT> or a file.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that this is mostly a convenience method for one-off scripts. For example
|
|
files that end in '.htm' or '.html' are parsed first as <FONT SIZE="-1">XML,</FONT> and if this fails
|
|
as <FONT SIZE="-1">HTML.</FONT> This is certainly not the most efficient way to do this in general.
|
|
<DT id="111">nparse ($optional_twig_options, $thing_to_parse)<DD>
|
|
|
|
|
|
|
|
|
|
create a twig with the <TT>$optional_options</TT>, and parse the <TT>$thing_to_parse</TT>,
|
|
whether it is a filehandle, a string, an <FONT SIZE="-1">HTML</FONT> file, an <FONT SIZE="-1">HTML URL,</FONT> an <FONT SIZE="-1">URL</FONT> or a
|
|
file.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Examples:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
XML::Twig->nparse( "file.xml");
|
|
XML::Twig->nparse( error_context => 1, "<A HREF="file://file.xml">file://file.xml</A>");
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="112">nparse_pp ($optional_twig_options, $thing_to_parse)<DD>
|
|
|
|
|
|
|
|
|
|
same as <TT>"nparse"</TT> but also sets the <TT>"pretty_print"</TT> option to <TT>"indented"</TT>.
|
|
<DT id="113">nparse_e ($optional_twig_options, $thing_to_parse)<DD>
|
|
|
|
|
|
|
|
|
|
same as <TT>"nparse"</TT> but also sets the <TT>"error_context"</TT> option to 1.
|
|
<DT id="114">nparse_ppe ($optional_twig_options, $thing_to_parse)<DD>
|
|
|
|
|
|
|
|
|
|
same as <TT>"nparse"</TT> but also sets the <TT>"pretty_print"</TT> option to <TT>"indented"</TT>
|
|
and the <TT>"error_context"</TT> option to 1.
|
|
<DT id="115">parser<DD>
|
|
|
|
|
|
This method returns the <TT>"expat"</TT> object (actually the XML::Parser::Expat object)
|
|
used during parsing. It is useful for example to call XML::Parser::Expat methods
|
|
on it. To get the line of a tag for example use <TT>"$t->parser->current_line"</TT>.
|
|
<DT id="116">setTwigHandlers ($handlers)<DD>
|
|
|
|
|
|
Set the twig_handlers. <TT>$handlers</TT> is a reference to a hash similar to the
|
|
one in the <TT>"twig_handlers"</TT> option of new. All previous handlers are unset.
|
|
The method returns the reference to the previous handlers.
|
|
<DT id="117">setTwigHandler ($exp $handler)<DD>
|
|
|
|
|
|
|
|
|
|
Set a single twig_handler for elements matching <TT>$exp</TT>. <TT>$handler</TT> is a
|
|
reference to a subroutine. If the handler was previously set then the reference
|
|
to the previous handler is returned.
|
|
<DT id="118">setStartTagHandlers ($handlers)<DD>
|
|
|
|
|
|
Set the start_tag handlers. <TT>$handlers</TT> is a reference to a hash similar to the
|
|
one in the <TT>"start_tag_handlers"</TT> option of new. All previous handlers are unset.
|
|
The method returns the reference to the previous handlers.
|
|
<DT id="119">setStartTagHandler ($exp $handler)<DD>
|
|
|
|
|
|
|
|
|
|
Set a single start_tag handlers for elements matching <TT>$exp</TT>. <TT>$handler</TT> is a
|
|
reference to a subroutine. If the handler was previously set then the reference
|
|
to the previous handler is returned.
|
|
<DT id="120">setEndTagHandlers ($handlers)<DD>
|
|
|
|
|
|
Set the end_tag handlers. <TT>$handlers</TT> is a reference to a hash similar to the
|
|
one in the <TT>"end_tag_handlers"</TT> option of new. All previous handlers are unset.
|
|
The method returns the reference to the previous handlers.
|
|
<DT id="121">setEndTagHandler ($exp $handler)<DD>
|
|
|
|
|
|
|
|
|
|
Set a single end_tag handlers for elements matching <TT>$exp</TT>. <TT>$handler</TT> is a
|
|
reference to a subroutine. If the handler was previously set then the
|
|
reference to the previous handler is returned.
|
|
<DT id="122">setTwigRoots ($handlers)<DD>
|
|
|
|
|
|
Same as using the <TT>"twig_roots"</TT> option when creating the twig
|
|
<DT id="123">setCharHandler ($exp $handler)<DD>
|
|
|
|
|
|
|
|
|
|
Set a <TT>"char_handler"</TT>
|
|
<DT id="124">setIgnoreEltsHandler ($exp)<DD>
|
|
|
|
|
|
Set a <TT>"ignore_elt"</TT> handler (elements that match <TT>$exp</TT> will be ignored
|
|
<DT id="125">setIgnoreEltsHandlers ($exp)<DD>
|
|
|
|
|
|
Set all <TT>"ignore_elt"</TT> handlers (previous handlers are replaced)
|
|
<DT id="126">dtd<DD>
|
|
|
|
|
|
Return the dtd (an XML::Twig::DTD object) of a twig
|
|
<DT id="127">xmldecl<DD>
|
|
|
|
|
|
Return the <FONT SIZE="-1">XML</FONT> declaration for the document, or a default one if it doesn't
|
|
have one
|
|
<DT id="128">doctype<DD>
|
|
|
|
|
|
Return the doctype for the document
|
|
<DT id="129">doctype_name<DD>
|
|
|
|
|
|
returns the doctype of the document from the doctype declaration
|
|
<DT id="130">system_id<DD>
|
|
|
|
|
|
returns the system value of the <FONT SIZE="-1">DTD</FONT> of the document from the doctype declaration
|
|
<DT id="131">public_id<DD>
|
|
|
|
|
|
returns the public doctype of the document from the doctype declaration
|
|
<DT id="132">internal_subset<DD>
|
|
|
|
|
|
returns the internal subset of the <FONT SIZE="-1">DTD</FONT>
|
|
<DT id="133">dtd_text<DD>
|
|
|
|
|
|
Return the <FONT SIZE="-1">DTD</FONT> text
|
|
<DT id="134">dtd_print<DD>
|
|
|
|
|
|
Print the <FONT SIZE="-1">DTD</FONT>
|
|
<DT id="135">model ($tag)<DD>
|
|
|
|
|
|
Return the model (in the <FONT SIZE="-1">DTD</FONT>) for the element <TT>$tag</TT>
|
|
<DT id="136">root<DD>
|
|
|
|
|
|
Return the root element of a twig
|
|
<DT id="137">set_root ($elt)<DD>
|
|
|
|
|
|
Set the root of a twig
|
|
<DT id="138">first_elt ($optional_condition)<DD>
|
|
|
|
|
|
Return the first element matching <TT>$optional_condition</TT> of a twig, if
|
|
no condition is given then the root is returned
|
|
<DT id="139">last_elt ($optional_condition)<DD>
|
|
|
|
|
|
Return the last element matching <TT>$optional_condition</TT> of a twig, if
|
|
no condition is given then the last element of the twig is returned
|
|
<DT id="140">elt_id ($id)<DD>
|
|
|
|
|
|
Return the element whose <TT>"id"</TT> attribute is <TT>$id</TT>
|
|
<DT id="141">getEltById<DD>
|
|
|
|
|
|
Same as <TT>"elt_id"</TT>
|
|
<DT id="142">index ($index_name, $optional_index)<DD>
|
|
|
|
|
|
|
|
|
|
If the <TT>$optional_index</TT> argument is present, return the corresponding element
|
|
in the index (created using the <TT>"index"</TT> option for <TT>"XML::Twig-"</TT>new>)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the argument is not present, return an arrayref to the index
|
|
<DT id="143">normalize<DD>
|
|
|
|
|
|
merge together all consecutive pcdata elements in the document (if for example
|
|
you have turned some elements into pcdata using <TT>"erase"</TT>, this will give you
|
|
a ``clean'' document in which there all text elements are as long as possible).
|
|
<DT id="144">encoding<DD>
|
|
|
|
|
|
This method returns the encoding of the <FONT SIZE="-1">XML</FONT> document, as defined by the
|
|
<TT>"encoding"</TT> attribute in the <FONT SIZE="-1">XML</FONT> declaration (ie it is <TT>"undef"</TT> if the attribute
|
|
is not defined)
|
|
<DT id="145">set_encoding<DD>
|
|
|
|
|
|
This method sets the value of the <TT>"encoding"</TT> attribute in the <FONT SIZE="-1">XML</FONT> declaration.
|
|
Note that if the document did not have a declaration it is generated (with
|
|
an <FONT SIZE="-1">XML</FONT> version of 1.0)
|
|
<DT id="146">xml_version<DD>
|
|
|
|
|
|
This method returns the <FONT SIZE="-1">XML</FONT> version, as defined by the <TT>"version"</TT> attribute in
|
|
the <FONT SIZE="-1">XML</FONT> declaration (ie it is <TT>"undef"</TT> if the attribute is not defined)
|
|
<DT id="147">set_xml_version<DD>
|
|
|
|
|
|
This method sets the value of the <TT>"version"</TT> attribute in the <FONT SIZE="-1">XML</FONT> declaration.
|
|
If the declaration did not exist it is created.
|
|
<DT id="148">standalone<DD>
|
|
|
|
|
|
This method returns the value of the <TT>"standalone"</TT> declaration for the document
|
|
<DT id="149">set_standalone<DD>
|
|
|
|
|
|
This method sets the value of the <TT>"standalone"</TT> attribute in the <FONT SIZE="-1">XML</FONT>
|
|
declaration. Note that if the document did not have a declaration it is
|
|
generated (with an <FONT SIZE="-1">XML</FONT> version of 1.0)
|
|
<DT id="150">set_output_encoding<DD>
|
|
|
|
|
|
Set the <TT>"encoding"</TT> ``attribute'' in the <FONT SIZE="-1">XML</FONT> declaration
|
|
<DT id="151">set_doctype ($name, $system, $public, $internal)<DD>
|
|
|
|
|
|
|
|
|
|
Set the doctype of the element. If an argument is <TT>"undef"</TT> (or not present)
|
|
then its former value is retained, if a false ('' or 0) value is passed then
|
|
the former value is deleted;
|
|
<DT id="152">entity_list<DD>
|
|
|
|
|
|
Return the entity list of a twig
|
|
<DT id="153">entity_names<DD>
|
|
|
|
|
|
Return the list of all defined entities
|
|
<DT id="154">entity ($entity_name)<DD>
|
|
|
|
|
|
Return the entity
|
|
<DT id="155">change_gi ($old_gi, $new_gi)<DD>
|
|
|
|
|
|
|
|
|
|
Performs a (very fast) global change. All elements <TT>$old_gi</TT> are now
|
|
<TT>$new_gi</TT>. This is a bit dangerous though and should be avoided if
|
|
< possible, as the new tag might be ignored in subsequent processing.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
See <TT>"BUGS "</TT>
|
|
<DT id="156">flush ($optional_filehandle, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Flushes a twig up to (and including) the current element, then deletes
|
|
all unnecessary elements from the tree that's kept in memory.
|
|
<TT>"flush"</TT> keeps track of which elements need to be open/closed, so if you
|
|
flush from handlers you don't have to worry about anything. Just keep
|
|
flushing the twig every time you're done with a sub-tree and it will
|
|
come out well-formed. After the whole parsing don't forget to<TT>"flush"</TT>
|
|
one more time to print the end of the document.
|
|
The doctype and entity declarations are also printed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
flush take an optional filehandle as an argument.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If you use <TT>"flush"</TT> at any point during parsing, the document will be flushed
|
|
one last time at the end of the parsing, to the proper filehandle.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: use the <TT>"update_DTD"</TT> option if you have updated the (internal) <FONT SIZE="-1">DTD</FONT>
|
|
and/or the entity list and you want the updated <FONT SIZE="-1">DTD</FONT> to be output
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The <TT>"pretty_print"</TT> option sets the pretty printing of the document.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
Example: $t->flush( Update_DTD => 1);
|
|
$t->flush( $filehandle, pretty_print => 'indented');
|
|
$t->flush( \*FILE);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="157">flush_up_to ($elt, $optional_filehandle, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Flushes up to the <TT>$elt</TT> element. This allows you to keep part of the
|
|
tree in memory when you <TT>"flush"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see flush.
|
|
<DT id="158">purge<DD>
|
|
|
|
|
|
Does the same as a <TT>"flush"</TT> except it does not print the twig. It just deletes
|
|
all elements that have been completely parsed so far.
|
|
<DT id="159">purge_up_to ($elt)<DD>
|
|
|
|
|
|
Purges up to the <TT>$elt</TT> element. This allows you to keep part of the tree in
|
|
memory when you <TT>"purge"</TT>.
|
|
<DT id="160">print ($optional_filehandle, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Prints the whole document associated with the twig. To be used only <FONT SIZE="-1">AFTER</FONT> the
|
|
parse.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
<DT id="161">print_to_file ($filename, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Prints the whole document associated with the twig to file <TT>$filename</TT>.
|
|
To be used only <FONT SIZE="-1">AFTER</FONT> the parse.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
<DT id="162">safe_print_to_file ($filename, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Prints the whole document associated with the twig to file <TT>$filename</TT>.
|
|
This variant, which probably only works on *nix prints to a temp file,
|
|
then move the temp file to overwrite the original file.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is a bit safer when 2 processes an potentiallywrite the same file:
|
|
only the last one will succeed, but the file won't be corruted. I often
|
|
use this for cron jobs, so testing the code doesn't interfere with the
|
|
cron job running at the same time.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
<DT id="163">sprint<DD>
|
|
|
|
|
|
Return the text of the whole document associated with the twig. To be used only
|
|
<FONT SIZE="-1">AFTER</FONT> the parse.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
<DT id="164">trim<DD>
|
|
|
|
|
|
Trim the document: gets rid of initial and trailing spaces, and replaces multiple spaces
|
|
by a single one.
|
|
<DT id="165">toSAX1 ($handler)<DD>
|
|
|
|
|
|
Send <FONT SIZE="-1">SAX</FONT> events for the twig to the <FONT SIZE="-1">SAX1</FONT> handler <TT>$handler</TT>
|
|
<DT id="166">toSAX2 ($handler)<DD>
|
|
|
|
|
|
Send <FONT SIZE="-1">SAX</FONT> events for the twig to the <FONT SIZE="-1">SAX2</FONT> handler <TT>$handler</TT>
|
|
<DT id="167">flush_toSAX1 ($handler)<DD>
|
|
|
|
|
|
Same as flush, except that <FONT SIZE="-1">SAX</FONT> events are sent to the <FONT SIZE="-1">SAX1</FONT> handler
|
|
<TT>$handler</TT> instead of the twig being printed
|
|
<DT id="168">flush_toSAX2 ($handler)<DD>
|
|
|
|
|
|
Same as flush, except that <FONT SIZE="-1">SAX</FONT> events are sent to the <FONT SIZE="-1">SAX2</FONT> handler
|
|
<TT>$handler</TT> instead of the twig being printed
|
|
<DT id="169">ignore<DD>
|
|
|
|
|
|
This method should be called during parsing, usually in <TT>"start_tag_handlers"</TT>.
|
|
It causes the element to be skipped during the parsing: the twig is not built
|
|
for this element, it will not be accessible during parsing or after it. The
|
|
element will not take up any memory and parsing will be faster.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that this method can also be called on an element. If the element is a
|
|
parent of the current element then this element will be ignored (the twig will
|
|
not be built any more for it and what has already been built will be deleted).
|
|
<DT id="170">set_pretty_print ($style)<DD>
|
|
|
|
|
|
Set the pretty print method, amongst '<TT>"none"</TT>' (default), '<TT>"nsgmls"</TT>',
|
|
'<TT>"nice"</TT>', '<TT>"indented"</TT>', <TT>"indented_c"</TT>, '<TT>"wrapped"</TT>', '<TT>"record"</TT>' and
|
|
'<TT>"record_c"</TT>'
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING:</B></FONT><B></B> the pretty print style is a <B></B><FONT SIZE="-1"><B>GLOBAL</B></FONT><B></B> variable, so once set it's
|
|
applied to <B></B><FONT SIZE="-1"><B>ALL</B></FONT><B></B> <TT>"print"</TT>'s (and <TT>"sprint"</TT>'s). Same goes if you use XML::Twig
|
|
with <TT>"mod_perl"</TT> . This should not be a problem as the <FONT SIZE="-1">XML</FONT> that's generated
|
|
is valid anyway, and <FONT SIZE="-1">XML</FONT> processors (as well as <FONT SIZE="-1">HTML</FONT> processors, including
|
|
browsers) should not care. Let me know if this is a big problem, but at the
|
|
moment the performance/cleanliness trade-off clearly favors the global
|
|
approach.
|
|
<DT id="171">set_empty_tag_style ($style)<DD>
|
|
|
|
|
|
Set the empty tag display style ('<TT>"normal"</TT>', '<TT>"html"</TT>' or '<TT>"expand"</TT>'). As
|
|
with <TT>"set_pretty_print"</TT> this sets a global flag.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"normal"</TT> outputs an empty tag '<TT>"<tag/>"</TT>', <TT>"html"</TT> adds a space
|
|
'<TT>"<tag />"</TT>' for elements that can be empty in <FONT SIZE="-1">XHTML</FONT> and <TT>"expand"</TT> outputs
|
|
'<TT>"<tag></tag>"</TT>'
|
|
<DT id="172">set_remove_cdata ($flag)<DD>
|
|
|
|
|
|
set (or unset) the flag that forces the twig to output <FONT SIZE="-1">CDATA</FONT> sections as
|
|
regular (escaped) <FONT SIZE="-1">PCDATA</FONT>
|
|
<DT id="173">print_prolog ($optional_filehandle, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Prints the prolog (<FONT SIZE="-1">XML</FONT> declaration + <FONT SIZE="-1">DTD +</FONT> entity declarations) of a document.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
<DT id="174">prolog ($optional_filehandle, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Return the prolog (<FONT SIZE="-1">XML</FONT> declaration + <FONT SIZE="-1">DTD +</FONT> entity declarations) of a document.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
<DT id="175">finish<DD>
|
|
|
|
|
|
Call Expat <TT>"finish"</TT> method.
|
|
Unsets all handlers (including internal ones that set context), but expat
|
|
continues parsing to the end of the document or until it finds an error.
|
|
It should finish up a lot faster than with the handlers set.
|
|
<DT id="176">finish_print<DD>
|
|
|
|
|
|
Stops twig processing, flush the twig and proceed to finish printing the
|
|
document as fast as possible. Use this method when modifying a document and
|
|
the modification is done.
|
|
<DT id="177">finish_now<DD>
|
|
|
|
|
|
Stops twig processing, does not finish parsing the document (which could
|
|
actually be not well-formed after the point where <TT>"finish_now"</TT> is called).
|
|
Execution resumes after the <TT>"Lparse"</TT>> or <TT>"parsefile"</TT> call. The content
|
|
of the twig is what has been parsed so far (all open elements at the time
|
|
<TT>"finish_now"</TT> is called are considered closed).
|
|
<DT id="178">set_expand_external_entities<DD>
|
|
|
|
|
|
Same as using the <TT>"expand_external_ents"</TT> option when creating the twig
|
|
<DT id="179">set_input_filter<DD>
|
|
|
|
|
|
Same as using the <TT>"input_filter"</TT> option when creating the twig
|
|
<DT id="180">set_keep_atts_order<DD>
|
|
|
|
|
|
Same as using the <TT>"keep_atts_order"</TT> option when creating the twig
|
|
<DT id="181">set_keep_encoding<DD>
|
|
|
|
|
|
Same as using the <TT>"keep_encoding"</TT> option when creating the twig
|
|
<DT id="182">escape_gt<DD>
|
|
|
|
|
|
usually XML::Twig does not escape > in its output. Using this option
|
|
makes it replace > by &gt;
|
|
<DT id="183">do_not_escape_gt<DD>
|
|
|
|
|
|
reverts XML::Twig behavior to its default of not escaping > in its output.
|
|
<DT id="184">set_output_filter<DD>
|
|
|
|
|
|
Same as using the <TT>"output_filter"</TT> option when creating the twig
|
|
<DT id="185">set_output_text_filter<DD>
|
|
|
|
|
|
Same as using the <TT>"output_text_filter"</TT> option when creating the twig
|
|
<DT id="186">add_stylesheet ($type, @options)<DD>
|
|
|
|
|
|
|
|
|
|
Adds an external stylesheet to an <FONT SIZE="-1">XML</FONT> document.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Supported types and options:
|
|
<DL COMPACT><DT id="187"><DD>
|
|
<DL COMPACT>
|
|
<DT id="188">xsl<DD>
|
|
|
|
|
|
option: the url of the stylesheet
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$t->add_stylesheet( xsl => "xsl_style.xsl");
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
will generate the following <FONT SIZE="-1">PI</FONT> at the beginning of the document:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<?xml-stylesheet type="text/xsl" href="xsl_style.xsl"?>
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="189">css<DD>
|
|
|
|
|
|
option: the url of the stylesheet
|
|
<DT id="190">active_twig<DD>
|
|
|
|
|
|
a class method that returns the last processed twig, so you don't necessarily
|
|
need the object to call methods on it.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="191"><DD>
|
|
</DL>
|
|
|
|
<DT id="192">Methods inherited from XML::Parser::Expat<DD>
|
|
|
|
|
|
A twig inherits all the relevant methods from XML::Parser::Expat. These
|
|
methods can only be used during the parsing phase (they will generate
|
|
a fatal error otherwise).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Inherited methods are:
|
|
<DL COMPACT><DT id="193"><DD>
|
|
<DL COMPACT>
|
|
<DT id="194">depth<DD>
|
|
|
|
|
|
Returns the size of the context list.
|
|
<DT id="195">in_element<DD>
|
|
|
|
|
|
Returns true if <FONT SIZE="-1">NAME</FONT> is equal to the name of the innermost
|
|
currently opened element. If namespace processing is being used and
|
|
you want to check against a name that may be in a namespace, then
|
|
use the generate_ns_name method to create the <FONT SIZE="-1">NAME</FONT> argument.
|
|
<DT id="196">within_element<DD>
|
|
|
|
|
|
Returns the number of times the given name appears in the context
|
|
list. If namespace processing is being used and you want to check
|
|
against a name that may be in a namespace, then use the
|
|
generate_ns_name method to create the <FONT SIZE="-1">NAME</FONT> argument.
|
|
<DT id="197">context<DD>
|
|
|
|
|
|
Returns a list of element names that represent open elements, with
|
|
the last one being the innermost. Inside start and end tag
|
|
handlers, this will be the tag of the parent element.
|
|
<DT id="198">current_line<DD>
|
|
|
|
|
|
Returns the line number of the current position of the parse.
|
|
<DT id="199">current_column<DD>
|
|
|
|
|
|
Returns the column number of the current position of the parse.
|
|
<DT id="200">current_byte<DD>
|
|
|
|
|
|
Returns the current position of the parse.
|
|
<DT id="201">position_in_context<DD>
|
|
|
|
|
|
Returns a string that shows the current parse position. <FONT SIZE="-1">LINES</FONT>
|
|
should be an integer >= 0 that represents the number of lines on
|
|
either side of the current parse line to place into the returned
|
|
string.
|
|
<DT id="202">base ([<FONT SIZE="-1">NEWBASE</FONT>])<DD>
|
|
|
|
|
|
Returns the current value of the base for resolving relative URIs.
|
|
If <FONT SIZE="-1">NEWBASE</FONT> is supplied, changes the base to that value.
|
|
<DT id="203">current_element<DD>
|
|
|
|
|
|
Returns the name of the innermost currently opened element. Inside
|
|
start or end handlers, returns the parent of the element associated
|
|
with those tags.
|
|
<DT id="204">element_index<DD>
|
|
|
|
|
|
Returns an integer that is the depth-first visit order of the
|
|
current element. This will be zero outside of the root element. For
|
|
example, this will return 1 when called from the start handler for
|
|
the root element start tag.
|
|
<DT id="205">recognized_string<DD>
|
|
|
|
|
|
Returns the string from the document that was recognized in order
|
|
to call the current handler. For instance, when called from a start
|
|
handler, it will give us the start-tag string. The string is
|
|
encoded in <FONT SIZE="-1">UTF-8.</FONT> This method doesn't return a meaningful string
|
|
inside declaration handlers.
|
|
<DT id="206">original_string<DD>
|
|
|
|
|
|
Returns the verbatim string from the document that was recognized
|
|
in order to call the current handler. The string is in the original
|
|
document encoding. This method doesn't return a meaningful string
|
|
inside declaration handlers.
|
|
<DT id="207">xpcroak<DD>
|
|
|
|
|
|
Concatenate onto the given message the current line number within
|
|
the <FONT SIZE="-1">XML</FONT> document plus the message implied by ErrorContext. Then
|
|
croak with the formed message.
|
|
<DT id="208">xpcarp<DD>
|
|
|
|
|
|
Concatenate onto the given message the current line number within
|
|
the <FONT SIZE="-1">XML</FONT> document plus the message implied by ErrorContext. Then
|
|
carp with the formed message.
|
|
<DT id="209">xml_escape(<FONT SIZE="-1">TEXT</FONT> [, <FONT SIZE="-1">CHAR</FONT> [, <FONT SIZE="-1">CHAR ...</FONT>]])<DD>
|
|
|
|
|
|
Returns <FONT SIZE="-1">TEXT</FONT> with markup characters turned into character entities.
|
|
Any additional characters provided as arguments are also turned
|
|
into character references where found in <FONT SIZE="-1">TEXT.</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
(this method is broken on some versions of expat/XML::Parser)
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="210"><DD>
|
|
</DL>
|
|
|
|
<DT id="211">path ( $optional_tag)<DD>
|
|
|
|
|
|
|
|
|
|
Return the element context in a form similar to XPath's short
|
|
form: '<TT>"/root/tag1/../tag"</TT>'
|
|
<DT id="212">get_xpath ( $optional_array_ref, $xpath, $optional_offset)<DD>
|
|
|
|
|
|
|
|
|
|
Performs a <TT>"get_xpath"</TT> on the document root (see <Elt|``Elt''>)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the <TT>$optional_array_ref</TT> argument is used the array must contain
|
|
elements. The <TT>$xpath</TT> expression is applied to each element in turn
|
|
and the result is union of all results. This way a first query can be
|
|
refined in further steps.
|
|
<DT id="213">find_nodes ( $optional_array_ref, $xpath, $optional_offset)<DD>
|
|
|
|
|
|
|
|
|
|
same as <TT>"get_xpath"</TT>
|
|
<DT id="214">findnodes ( $optional_array_ref, $xpath, $optional_offset)<DD>
|
|
|
|
|
|
|
|
|
|
same as <TT>"get_xpath"</TT> (similar to the XML::LibXML method)
|
|
<DT id="215">findvalue ( $optional_array_ref, $xpath, $optional_offset)<DD>
|
|
|
|
|
|
|
|
|
|
Return the <TT>"join"</TT> of all texts of the results of applying <TT>"get_xpath"</TT>
|
|
to the node (similar to the XML::LibXML method)
|
|
<DT id="216">findvalues ( $optional_array_ref, $xpath, $optional_offset)<DD>
|
|
|
|
|
|
|
|
|
|
Return an array of all texts of the results of applying <TT>"get_xpath"</TT>
|
|
to the node
|
|
<DT id="217">subs_text ($regexp, $replace)<DD>
|
|
|
|
|
|
|
|
|
|
subs_text does text substitution on the whole document, similar to perl's
|
|
<TT>" s///"</TT> operator.
|
|
<DT id="218">dispose<DD>
|
|
|
|
|
|
Useful only if you don't have <TT>"Scalar::Util"</TT> or <TT>"WeakRef"</TT> installed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Reclaims properly the memory used by an XML::Twig object. As the object has
|
|
circular references it never goes out of scope, so if you want to parse lots
|
|
of <FONT SIZE="-1">XML</FONT> documents then the memory leak becomes a problem. Use
|
|
<TT>"$twig->dispose"</TT> to clear this problem.
|
|
<DT id="219">att_accessors (list_of_attribute_names)<DD>
|
|
|
|
|
|
A convenience method that creates l-valued accessors for attributes.
|
|
So <TT>"$twig->create_accessors( 'foo')"</TT> will create a <TT>"foo"</TT> method
|
|
that can be called on elements:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->foo; # equivalent to $elt->{'att'}->{'foo'};
|
|
$elt->foo( 'bar'); # equivalent to $elt->set_att( foo => 'bar');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The methods are l-valued only under those perl's that support this
|
|
feature (5.6 and above)
|
|
<DT id="220">create_accessors (list_of_attribute_names)<DD>
|
|
|
|
|
|
Same as att_accessors
|
|
<DT id="221">elt_accessors (list_of_attribute_names)<DD>
|
|
|
|
|
|
A convenience method that creates accessors for elements.
|
|
So <TT>"$twig->create_accessors( 'foo')"</TT> will create a <TT>"foo"</TT> method
|
|
that can be called on elements:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->foo; # equivalent to $elt->first_child( 'foo');
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="222">field_accessors (list_of_attribute_names)<DD>
|
|
|
|
|
|
A convenience method that creates accessors for element values (<TT>"field"</TT>).
|
|
So <TT>"$twig->create_accessors( 'foo')"</TT> will create a <TT>"foo"</TT> method
|
|
that can be called on elements:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->foo; # equivalent to $elt->field( 'foo');
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="223">set_do_not_escape_amp_in_atts<DD>
|
|
|
|
|
|
An evil method, that I only document because Test::Pod::Coverage complaints otherwise,
|
|
but really, you don't want to know about it.
|
|
</DL>
|
|
<A NAME="lbAU"> </A>
|
|
<H3>XML::Twig::Elt</H3>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="224">new ($optional_tag, $optional_atts, @optional_content)<DD>
|
|
|
|
|
|
|
|
|
|
The <TT>"tag"</TT> is optional (but then you can't have a content ), the <TT>$optional_atts</TT>
|
|
argument is a reference to a hash of attributes, the content can be just a
|
|
string or a list of strings and element. A content of '<TT>"#EMPTY"</TT>' creates an empty
|
|
element;
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
Examples: my $elt= XML::Twig::Elt->new();
|
|
my $elt= XML::Twig::Elt->new( para => { align => 'center' });
|
|
my $elt= XML::Twig::Elt->new( para => { align => 'center' }, 'foo');
|
|
my $elt= XML::Twig::Elt->new( br => '#EMPTY');
|
|
my $elt= XML::Twig::Elt->new( 'para');
|
|
my $elt= XML::Twig::Elt->new( para => 'this is a para');
|
|
my $elt= XML::Twig::Elt->new( para => $elt3, 'another para');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The strings are not parsed, the element is not attached to any twig.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: if you rely on <FONT SIZE="-1">ID</FONT>'s then you will have to set the id yourself. At
|
|
this point the element does not belong to a twig yet, so the <FONT SIZE="-1">ID</FONT> attribute
|
|
is not known so it won't be stored in the <FONT SIZE="-1">ID</FONT> list.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that <TT>"#COMMENT"</TT>, <TT>"#PCDATA"</TT> or <TT>"#CDATA"</TT> are valid tag names, that will
|
|
create text elements.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
To create an element <TT>"foo"</TT> containing a <FONT SIZE="-1">CDATA</FONT> section:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $foo= XML::Twig::Elt->new( '#CDATA' => "content of the CDATA section")
|
|
->wrap_in( 'foo');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
An attribute of '#CDATA', will create the content of the element as <FONT SIZE="-1">CDATA:</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $elt= XML::Twig::Elt->new( 'p' => { '#CDATA' => 1}, 'foo < bar');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
creates an element
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<p><![CDATA[foo < bar]]></>
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="225">parse ($string, %args)<DD>
|
|
|
|
|
|
|
|
|
|
Creates an element from an <FONT SIZE="-1">XML</FONT> string. The string is actually
|
|
parsed as a new twig, then the root of that twig is returned.
|
|
The arguments in <TT>%args</TT> are passed to the twig.
|
|
As always if the parse fails the parser will die, so use an
|
|
eval if you want to trap syntax errors.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
As obviously the element does not exist beforehand this method has to be
|
|
called on the class:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $elt= parse XML::Twig::Elt( "<a> string to parse, with <sub/>
|
|
<elements>, actually tons of </elements>
|
|
h</a>");
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="226">set_inner_xml ($string)<DD>
|
|
|
|
|
|
Sets the content of the element to be the tree created from the string
|
|
<DT id="227">set_inner_html ($string)<DD>
|
|
|
|
|
|
Sets the content of the element, after parsing the string with an <FONT SIZE="-1">HTML</FONT>
|
|
parser (HTML::Parser)
|
|
<DT id="228">set_outer_xml ($string)<DD>
|
|
|
|
|
|
Replaces the element with the tree created from the string
|
|
<DT id="229">print ($optional_filehandle, $optional_pretty_print_style)<DD>
|
|
|
|
|
|
|
|
|
|
Prints an entire element, including the tags, optionally to a
|
|
<TT>$optional_filehandle</TT>, optionally with a <TT>$pretty_print_style</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The print outputs <FONT SIZE="-1">XML</FONT> data so base entities are escaped.
|
|
<DT id="230">print_to_file ($filename, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Prints the element to file <TT>$filename</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
options: see <TT>"flush"</TT>.
|
|
=item sprint ($elt, <TT>$optional_no_enclosing_tag</TT>)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the xml string for an entire element, including the tags.
|
|
If the optional second argument is true then only the string inside the
|
|
element is returned (the start and end tag for <TT>$elt</TT> are not).
|
|
The text is XML-escaped: base entities (& and < in text, & < and " in
|
|
attribute values) are turned into entities.
|
|
<DT id="231">gi<DD>
|
|
|
|
|
|
Return the gi of the element (the gi is the <TT>"generic identifier"</TT> the tag
|
|
name in <FONT SIZE="-1">SGML</FONT> parlance).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"tag"</TT> and <TT>"name"</TT> are synonyms of <TT>"gi"</TT>.
|
|
<DT id="232">tag<DD>
|
|
|
|
|
|
Same as <TT>"gi"</TT>
|
|
<DT id="233">name<DD>
|
|
|
|
|
|
Same as <TT>"tag"</TT>
|
|
<DT id="234">set_gi ($tag)<DD>
|
|
|
|
|
|
Set the gi (tag) of an element
|
|
<DT id="235">set_tag ($tag)<DD>
|
|
|
|
|
|
Set the tag (=<TT>"tag"</TT>) of an element
|
|
<DT id="236">set_name ($name)<DD>
|
|
|
|
|
|
Set the name (=<TT>"tag"</TT>) of an element
|
|
<DT id="237">root<DD>
|
|
|
|
|
|
Return the root of the twig in which the element is contained.
|
|
<DT id="238">twig<DD>
|
|
|
|
|
|
Return the twig containing the element.
|
|
<DT id="239">parent ($optional_condition)<DD>
|
|
|
|
|
|
Return the parent of the element, or the first ancestor matching the
|
|
<TT>$optional_condition</TT>
|
|
<DT id="240">first_child ($optional_condition)<DD>
|
|
|
|
|
|
Return the first child of the element, or the first child matching the
|
|
<TT>$optional_condition</TT>
|
|
<DT id="241">has_child ($optional_condition)<DD>
|
|
|
|
|
|
Return the first child of the element, or the first child matching the
|
|
<TT>$optional_condition</TT> (same as first_child)
|
|
<DT id="242">has_children ($optional_condition)<DD>
|
|
|
|
|
|
Return the first child of the element, or the first child matching the
|
|
<TT>$optional_condition</TT> (same as first_child)
|
|
<DT id="243">first_child_text ($optional_condition)<DD>
|
|
|
|
|
|
Return the text of the first child of the element, or the first child
|
|
<BR> matching the <TT>$optional_condition</TT>
|
|
If there is no first_child then returns ''. This avoids getting the
|
|
child, checking for its existence then getting the text for trivial cases.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Similar methods are available for the other navigation methods:
|
|
<DL COMPACT><DT id="244"><DD>
|
|
<DL COMPACT>
|
|
<DT id="245">last_child_text<DD>
|
|
|
|
|
|
|
|
<DT id="246">prev_sibling_text<DD>
|
|
|
|
|
|
<DT id="247">next_sibling_text<DD>
|
|
|
|
|
|
<DT id="248">prev_elt_text<DD>
|
|
|
|
|
|
<DT id="249">next_elt_text<DD>
|
|
|
|
|
|
<DT id="250">child_text<DD>
|
|
|
|
|
|
<DT id="251">parent_text<DD>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="252"><DD>
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
All this methods also exist in ``trimmed'' variant:
|
|
<DL COMPACT>
|
|
<DT id="253">first_child_trimmed_text<DD>
|
|
|
|
|
|
|
|
<DT id="254">last_child_trimmed_text<DD>
|
|
|
|
|
|
<DT id="255">prev_sibling_trimmed_text<DD>
|
|
|
|
|
|
<DT id="256">next_sibling_trimmed_text<DD>
|
|
|
|
|
|
<DT id="257">prev_elt_trimmed_text<DD>
|
|
|
|
|
|
<DT id="258">next_elt_trimmed_text<DD>
|
|
|
|
|
|
<DT id="259">child_trimmed_text<DD>
|
|
|
|
|
|
<DT id="260">parent_trimmed_text<DD>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="261"><DD>
|
|
</DL>
|
|
|
|
<DT id="262">field ($condition)<DD>
|
|
|
|
|
|
|
|
Same method as <TT>"first_child_text"</TT> with a different name
|
|
<DT id="263">fields ($condition_list)<DD>
|
|
|
|
|
|
Return the list of field (text of first child matching the conditions),
|
|
missing fields are returned as the empty string.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Same method as <TT>"first_child_text"</TT> with a different name
|
|
<DT id="264">trimmed_field ($optional_condition)<DD>
|
|
|
|
|
|
Same method as <TT>"first_child_trimmed_text"</TT> with a different name
|
|
<DT id="265">set_field ($condition, $optional_atts, @list_of_elt_and_strings)<DD>
|
|
|
|
|
|
|
|
|
|
Set the content of the first child of the element that matches
|
|
<TT>$condition</TT>, the rest of the arguments is the same as for <TT>"set_content"</TT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If no child matches <TT>$condition</TT> _and_ if <TT>$condition</TT> is a valid
|
|
<FONT SIZE="-1">XML</FONT> element name, then a new element by that name is created and
|
|
inserted as the last child.
|
|
<DT id="266">first_child_matches ($optional_condition)<DD>
|
|
|
|
|
|
Return the element if the first child of the element (if it exists) passes
|
|
the <TT>$optional_condition</TT> <TT>"undef"</TT> otherwise
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
if( $elt->first_child_matches( 'title')) ...
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
is equivalent to
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
if( $elt->{first_child} && $elt->{first_child}->passes( 'title'))
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"first_child_is"</TT> is another name for this method
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Similar methods are available for the other navigation methods:
|
|
<DL COMPACT><DT id="267"><DD>
|
|
<DL COMPACT>
|
|
<DT id="268">last_child_matches<DD>
|
|
|
|
|
|
|
|
<DT id="269">prev_sibling_matches<DD>
|
|
|
|
|
|
<DT id="270">next_sibling_matches<DD>
|
|
|
|
|
|
<DT id="271">prev_elt_matches<DD>
|
|
|
|
|
|
<DT id="272">next_elt_matches<DD>
|
|
|
|
|
|
<DT id="273">child_matches<DD>
|
|
|
|
|
|
<DT id="274">parent_matches<DD>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="275"><DD>
|
|
</DL>
|
|
|
|
<DT id="276">is_first_child ($optional_condition)<DD>
|
|
|
|
|
|
|
|
returns true (the element) if the element is the first child of its parent
|
|
(optionally that satisfies the <TT>$optional_condition</TT>)
|
|
<DT id="277">is_last_child ($optional_condition)<DD>
|
|
|
|
|
|
returns true (the element) if the element is the last child of its parent
|
|
(optionally that satisfies the <TT>$optional_condition</TT>)
|
|
<DT id="278">prev_sibling ($optional_condition)<DD>
|
|
|
|
|
|
Return the previous sibling of the element, or the previous sibling matching
|
|
<TT>$optional_condition</TT>
|
|
<DT id="279">next_sibling ($optional_condition)<DD>
|
|
|
|
|
|
Return the next sibling of the element, or the first one matching
|
|
<TT>$optional_condition</TT>.
|
|
<DT id="280">next_elt ($optional_elt, $optional_condition)<DD>
|
|
|
|
|
|
|
|
|
|
Return the next elt (optionally matching <TT>$optional_condition</TT>) of the element. This
|
|
is defined as the next element which opens after the current element opens.
|
|
Which usually means the first child of the element.
|
|
Counter-intuitive as it might look this allows you to loop through the
|
|
whole document by starting from the root.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The <TT>$optional_elt</TT> is the root of a subtree. When the <TT>"next_elt"</TT> is out of the
|
|
subtree then the method returns undef. You can then walk a sub-tree with:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $elt= $subtree_root;
|
|
while( $elt= $elt->next_elt( $subtree_root))
|
|
{ # insert processing code here
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="281">prev_elt ($optional_condition)<DD>
|
|
|
|
|
|
Return the previous elt (optionally matching <TT>$optional_condition</TT>) of the
|
|
element. This is the first element which opens before the current one.
|
|
It is usually either the last descendant of the previous sibling or
|
|
simply the parent
|
|
<DT id="282">next_n_elt ($offset, $optional_condition)<DD>
|
|
|
|
|
|
|
|
|
|
Return the <TT>$offset</TT>-th element that matches the <TT>$optional_condition</TT>
|
|
<DT id="283">following_elt<DD>
|
|
|
|
|
|
Return the following element (as per the XPath following axis)
|
|
<DT id="284">preceding_elt<DD>
|
|
|
|
|
|
Return the preceding element (as per the XPath preceding axis)
|
|
<DT id="285">following_elts<DD>
|
|
|
|
|
|
Return the list of following elements (as per the XPath following axis)
|
|
<DT id="286">preceding_elts<DD>
|
|
|
|
|
|
Return the list of preceding elements (as per the XPath preceding axis)
|
|
<DT id="287">children ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of children (optionally which matches <TT>$optional_condition</TT>) of
|
|
the element. The list is in document order.
|
|
<DT id="288">children_count ($optional_condition)<DD>
|
|
|
|
|
|
Return the number of children of the element (optionally which matches
|
|
<TT>$optional_condition</TT>)
|
|
<DT id="289">children_text ($optional_condition)<DD>
|
|
|
|
|
|
In array context, returns an array containing the text of children of the
|
|
element (optionally which matches <TT>$optional_condition</TT>)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
In scalar context, returns the concatenation of the text of children of
|
|
the element
|
|
<DT id="290">children_trimmed_text ($optional_condition)<DD>
|
|
|
|
|
|
In array context, returns an array containing the trimmed text of children
|
|
of the element (optionally which matches <TT>$optional_condition</TT>)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
In scalar context, returns the concatenation of the trimmed text of children of
|
|
the element
|
|
<DT id="291">children_copy ($optional_condition)<DD>
|
|
|
|
|
|
Return a list of elements that are copies of the children of the element,
|
|
optionally which matches <TT>$optional_condition</TT>
|
|
<DT id="292">descendants ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of all descendants (optionally which matches
|
|
<TT>$optional_condition</TT>) of the element. This is the equivalent of the
|
|
<TT>"getElementsByTagName"</TT> of the <FONT SIZE="-1">DOM</FONT> (by the way, if you are really a <FONT SIZE="-1">DOM</FONT>
|
|
addict, you can use <TT>"getElementsByTagName"</TT> instead)
|
|
<DT id="293">getElementsByTagName ($optional_condition)<DD>
|
|
|
|
|
|
Same as <TT>"descendants"</TT>
|
|
<DT id="294">find_by_tag_name ($optional_condition)<DD>
|
|
|
|
|
|
Same as <TT>"descendants"</TT>
|
|
<DT id="295">descendants_or_self ($optional_condition)<DD>
|
|
|
|
|
|
Same as <TT>"descendants"</TT> except that the element itself is included in the list
|
|
if it matches the <TT>$optional_condition</TT>
|
|
<DT id="296">first_descendant ($optional_condition)<DD>
|
|
|
|
|
|
Return the first descendant of the element that matches the condition
|
|
<DT id="297">last_descendant ($optional_condition)<DD>
|
|
|
|
|
|
Return the last descendant of the element that matches the condition
|
|
<DT id="298">ancestors ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of ancestors (optionally matching <TT>$optional_condition</TT>) of the
|
|
element. The list is ordered from the innermost ancestor to the outermost one
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<FONT SIZE="-1">NOTE:</FONT> the element itself is not part of the list, in order to include it
|
|
you will have to use ancestors_or_self
|
|
<DT id="299">ancestors_or_self ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of ancestors (optionally matching <TT>$optional_condition</TT>) of the
|
|
element, including the element (if it matches the condition>).
|
|
The list is ordered from the innermost ancestor to the outermost one
|
|
<DT id="300">passes ($condition)<DD>
|
|
|
|
|
|
Return the element if it passes the <TT>$condition</TT>
|
|
<DT id="301">att ($att)<DD>
|
|
|
|
|
|
Return the value of attribute <TT>$att</TT> or <TT>"undef"</TT>
|
|
<DT id="302">latt ($att)<DD>
|
|
|
|
|
|
Return the value of attribute <TT>$att</TT> or <TT>"undef"</TT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
this method is an lvalue, so you can do <TT>"$elt->latt( 'foo')= 'bar'"</TT> or <TT>"$elt->latt( 'foo')++;"</TT>
|
|
<DT id="303">set_att ($att, $att_value)<DD>
|
|
|
|
|
|
|
|
|
|
Set the attribute of the element to the given value
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can actually set several attributes this way:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->set_att( att1 => "val1", att2 => "val2");
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="304">del_att ($att)<DD>
|
|
|
|
|
|
Delete the attribute for the element
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can actually delete several attributes at once:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->del_att( 'att1', 'att2', 'att3');
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="305">att_exists ($att)<DD>
|
|
|
|
|
|
Returns true if the attribute <TT>$att</TT> exists for the element, false
|
|
otherwise
|
|
<DT id="306">cut<DD>
|
|
|
|
|
|
Cut the element from the tree. The element still exists, it can be copied
|
|
or pasted somewhere else, it is just not attached to the tree anymore.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the ``old'' links to the parent, previous and next siblings can
|
|
still be accessed using the former_* methods
|
|
<DT id="307">former_next_sibling<DD>
|
|
|
|
|
|
Returns the former next sibling of a cut node (or undef if the node has not been cut)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This makes it easier to write loops where you cut elements:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $child= $parent->first_child( 'achild');
|
|
while( $child->{'att'}->{'cut'})
|
|
{ $child->cut; $child= ($child->{former} && $child->{former}->{next_sibling}); }
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="308">former_prev_sibling<DD>
|
|
|
|
|
|
Returns the former previous sibling of a cut node (or undef if the node has not been cut)
|
|
<DT id="309">former_parent<DD>
|
|
|
|
|
|
Returns the former parent of a cut node (or undef if the node has not been cut)
|
|
<DT id="310">cut_children ($optional_condition)<DD>
|
|
|
|
|
|
Cut all the children of the element (or all of those which satisfy the
|
|
<TT>$optional_condition</TT>).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the list of children
|
|
<DT id="311">cut_descendants ($optional_condition)<DD>
|
|
|
|
|
|
Cut all the descendants of the element (or all of those which satisfy the
|
|
<TT>$optional_condition</TT>).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the list of descendants
|
|
<DT id="312">copy ($elt)<DD>
|
|
|
|
|
|
Return a copy of the element. The copy is a ``deep'' copy: all sub-elements of
|
|
the element are duplicated.
|
|
<DT id="313">paste ($optional_position, $ref)<DD>
|
|
|
|
|
|
|
|
|
|
Paste a (previously <TT>"cut"</TT> or newly generated) element. Die if the element
|
|
already belongs to a tree.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the calling element is pasted:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$child->paste( first_child => $existing_parent);
|
|
$new_sibling->paste( after => $this_sibling_is_already_in_the_tree);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
or
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $new_elt= XML::Twig::Elt->new( tag => $content);
|
|
$new_elt->paste( $position => $existing_elt);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $t= XML::Twig->new->parse( 'doc.xml')
|
|
my $toc= $t->root->new( 'toc');
|
|
$toc->paste( $t->root); # $toc is pasted as first child of the root
|
|
foreach my $title ($t->findnodes( '/doc/section/title'))
|
|
{ my $title_toc= $title->copy;
|
|
# paste $title_toc as the last child of toc
|
|
$title_toc->paste( last_child => $toc)
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Position options:
|
|
<DL COMPACT><DT id="314"><DD>
|
|
<DL COMPACT>
|
|
<DT id="315">first_child (default)<DD>
|
|
|
|
|
|
The element is pasted as the first child of <TT>$ref</TT>
|
|
<DT id="316">last_child<DD>
|
|
|
|
|
|
The element is pasted as the last child of <TT>$ref</TT>
|
|
<DT id="317">before<DD>
|
|
|
|
|
|
The element is pasted before <TT>$ref</TT>, as its previous sibling.
|
|
<DT id="318">after<DD>
|
|
|
|
|
|
The element is pasted after <TT>$ref</TT>, as its next sibling.
|
|
<DT id="319">within<DD>
|
|
|
|
|
|
In this case an extra argument, <TT>$offset</TT>, should be supplied. The element
|
|
will be pasted in the reference element (or in its first text child) at the
|
|
given offset. To achieve this the reference element will be split at the
|
|
offset.
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="320"><DD>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that you can call directly the underlying method:
|
|
<DL COMPACT>
|
|
<DT id="321">paste_before<DD>
|
|
|
|
|
|
|
|
<DT id="322">paste_after<DD>
|
|
|
|
|
|
<DT id="323">paste_first_child<DD>
|
|
|
|
|
|
<DT id="324">paste_last_child<DD>
|
|
|
|
|
|
<DT id="325">paste_within<DD>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="326"><DD>
|
|
</DL>
|
|
|
|
<DT id="327">move ($optional_position, $ref)<DD>
|
|
|
|
|
|
|
|
|
|
|
|
Move an element in the tree.
|
|
This is just a <TT>"cut"</TT> then a <TT>"paste"</TT>. The syntax is the same as <TT>"paste"</TT>.
|
|
<DT id="328">replace ($ref)<DD>
|
|
|
|
|
|
Replaces an element in the tree. Sometimes it is just not possible to<TT>"cut"</TT>
|
|
an element then <TT>"paste"</TT> another in its place, so <TT>"replace"</TT> comes in handy.
|
|
The calling element replaces <TT>$ref</TT>.
|
|
<DT id="329">replace_with (@elts)<DD>
|
|
|
|
|
|
Replaces the calling element with one or more elements
|
|
<DT id="330">delete<DD>
|
|
|
|
|
|
Cut the element and frees the memory.
|
|
<DT id="331">prefix ($text, $optional_option)<DD>
|
|
|
|
|
|
|
|
|
|
Add a prefix to an element. If the element is a <TT>"PCDATA"</TT> element the text
|
|
is added to the pcdata, if the elements first child is a <TT>"PCDATA"</TT> then the
|
|
text is added to it's pcdata, otherwise a new <TT>"PCDATA"</TT> element is created
|
|
and pasted as the first child of the element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the option is <TT>"asis"</TT> then the prefix is added asis: it is created in
|
|
a separate <TT>"PCDATA"</TT> element with an <TT>"asis"</TT> property. You can then write:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt1->prefix( '<b>', 'asis');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
to create a <TT>"<b>"</TT> in the output of <TT>"print"</TT>.
|
|
<DT id="332">suffix ($text, $optional_option)<DD>
|
|
|
|
|
|
|
|
|
|
Add a suffix to an element. If the element is a <TT>"PCDATA"</TT> element the text
|
|
is added to the pcdata, if the elements last child is a <TT>"PCDATA"</TT> then the
|
|
text is added to it's pcdata, otherwise a new <FONT SIZE="-1">PCDATA</FONT> element is created
|
|
and pasted as the last child of the element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the option is <TT>"asis"</TT> then the suffix is added asis: it is created in
|
|
a separate <TT>"PCDATA"</TT> element with an <TT>"asis"</TT> property. You can then write:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt2->suffix( '</b>', 'asis');
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="333">trim<DD>
|
|
|
|
|
|
Trim the element in-place: spaces at the beginning and at the end of the element
|
|
are discarded and multiple spaces within the element (or its descendants) are
|
|
replaced by a single space.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that in some cases you can still end up with multiple spaces, if they are
|
|
split between several elements:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<doc> text <b> hah! </b> yep</doc>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
gets trimmed to
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<doc>text <b> hah! </b> yep</doc>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is somewhere in between a bug and a feature.
|
|
<DT id="334">normalize<DD>
|
|
|
|
|
|
merge together all consecutive pcdata elements in the element (if for example
|
|
you have turned some elements into pcdata using <TT>"erase"</TT>, this will give you
|
|
a ``clean'' element in which there all text fragments are as long as possible).
|
|
<DT id="335">simplify (%options)<DD>
|
|
|
|
|
|
Return a data structure suspiciously similar to XML::Simple's. Options are
|
|
identical to XMLin options, see XML::Simple doc for more details (or use
|
|
DATA::dumper or <FONT SIZE="-1">YAML</FONT> to dump the data structure)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Note</B>: there is no magic here, if you write
|
|
<TT>"$twig->parsefile( $file )->simplify();"</TT> then it will load the entire
|
|
document in memory. I am afraid you will have to put some work into it to
|
|
get just the bits you want and discard the rest. Look at the synopsis or
|
|
the XML::Twig 101 section at the top of the docs for more information.
|
|
<DL COMPACT><DT id="336"><DD>
|
|
<DL COMPACT>
|
|
<DT id="337">content_key<DD>
|
|
|
|
|
|
|
|
<DT id="338">forcearray<DD>
|
|
|
|
|
|
<DT id="339">keyattr<DD>
|
|
|
|
|
|
<DT id="340">noattr<DD>
|
|
|
|
|
|
<DT id="341">normalize_space<DD>
|
|
|
|
|
|
|
|
aka normalise_space
|
|
<DT id="342">variables (%var_hash)<DD>
|
|
|
|
|
|
<TT>%var_hash</TT> is a hash { name => value }
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This option allows variables in the <FONT SIZE="-1">XML</FONT> to be expanded when the file is read. (there is no facility for putting the variable names back if you regenerate <FONT SIZE="-1">XML</FONT> using XMLout).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A 'variable' is any text of the form ${name} (or <TT>$name</TT>) which occurs in an attribute value or in the text content of an element. If 'name' matches a key in the supplied hashref, ${name} will be replaced with the corresponding value from the hashref. If no matching key is found, the variable will not be replaced.
|
|
<DT id="343">var_att ($attribute_name)<DD>
|
|
|
|
|
|
This option gives the name of an attribute that will be used to create
|
|
variables in the <FONT SIZE="-1">XML:</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<dirs>
|
|
<dir name="prefix">/usr/local</dir>
|
|
<dir name="exec_prefix">$prefix/bin</dir>
|
|
</dirs>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
use <TT>"var => 'name'"</TT> to get <TT>$prefix</TT> replaced by /usr/local in the
|
|
generated data structure
|
|
|
|
|
|
<P>
|
|
|
|
|
|
By default variables are captured by the following regexp: /$(\w+)/
|
|
<DT id="344">var_regexp (regexp)<DD>
|
|
|
|
|
|
This option changes the regexp used to capture variables. The variable
|
|
name should be in <TT>$1</TT>
|
|
<DT id="345">group_tags { grouping tag => grouped tag, grouping tag 2 => grouped tag 2...}<DD>
|
|
|
|
|
|
Option used to simplify the structure: elements listed will not be used.
|
|
Their children will be, they will be considered children of the element
|
|
parent.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the element is:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<config host="laptop.xmltwig.org">
|
|
<server>localhost</server>
|
|
<dirs>
|
|
<dir name="base">/home/mrodrigu/standards</dir>
|
|
<dir name="tools">$base/tools</dir>
|
|
</dirs>
|
|
<templates>
|
|
<template name="std_def">std_def.templ</template>
|
|
<template name="dummy">dummy</template>
|
|
</templates>
|
|
</config>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Then calling simplify with <TT>"group_tags => { dirs => 'dir',
|
|
templates => 'template'}"</TT>
|
|
makes the data structure be exactly as if the start and end tags for <TT>"dirs"</TT> and
|
|
<TT>"templates"</TT> were not there.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A <FONT SIZE="-1">YAML</FONT> dump of the structure
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
base: '/home/mrodrigu/standards'
|
|
host: laptop.xmltwig.org
|
|
server: localhost
|
|
template:
|
|
- std_def.templ
|
|
- dummy.templ
|
|
tools: '$base/tools'
|
|
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="346"><DD>
|
|
</DL>
|
|
|
|
<DT id="347">split_at ($offset)<DD>
|
|
|
|
|
|
Split a text (<TT>"PCDATA"</TT> or <TT>"CDATA"</TT>) element in 2 at <TT>$offset</TT>, the original
|
|
element now holds the first part of the string and a new element holds the
|
|
right part. The new element is returned
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the element is not a text element then the first text child of the element
|
|
is split
|
|
<DT id="348">split ( $optional_regexp, $tag1, $atts1, $tag2, $atts2...)<DD>
|
|
|
|
|
|
|
|
|
|
Split the text descendants of an element in place, the text is split using
|
|
the <TT>$regexp</TT>, if the regexp includes () then the matched separators will be
|
|
wrapped in elements. <TT>$1</TT> is wrapped in <TT>$tag1</TT>, with attributes <TT>$atts1</TT> if
|
|
<TT>$atts1</TT> is given (as a hashref), <TT>$2</TT> is wrapped in <TT>$tag2</TT>...
|
|
|
|
|
|
<P>
|
|
|
|
|
|
if <TT>$elt</TT> is <TT>"<p>tati tata <b>tutu tati titi</b> tata tati tata</p>"</TT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->split( qr/(ta)ti/, 'foo', {type => 'toto'} )
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
will change <TT>$elt</TT> to
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<p><foo type="toto">ta</foo> tata <b>tutu <foo type="toto">ta</foo>
|
|
titi</b> tata <foo type="toto">ta</foo> tata</p>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The regexp can be passed either as a string or as <TT>"qr//"</TT> (perl 5.005 and
|
|
later), it defaults to \s+ just as the <TT>"split"</TT> built-in (but this would be
|
|
quite a useless behaviour without the <TT>$optional_tag</TT> parameter)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>$optional_tag</TT> defaults to <FONT SIZE="-1">PCDATA</FONT> or <FONT SIZE="-1">CDATA,</FONT> depending on the initial element
|
|
type
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The list of descendants is returned (including un-touched original elements
|
|
and newly created ones)
|
|
<DT id="349">mark ( $regexp, $optional_tag, $optional_attribute_ref)<DD>
|
|
|
|
|
|
|
|
|
|
This method behaves exactly as split, except only the newly created
|
|
elements are returned
|
|
<DT id="350">wrap_children ( $regexp_string, $tag, $optional_attribute_hashref)<DD>
|
|
|
|
|
|
|
|
|
|
Wrap the children of the element that match the regexp in an element <TT>$tag</TT>.
|
|
If <TT>$optional_attribute_hashref</TT> is passed then the new element will
|
|
have these attributes.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The <TT>$regexp_string</TT> includes tags, within pointy brackets, as in
|
|
<TT>"<title><para>+"</TT> and the usual Perl modifiers (+*?...).
|
|
Tags can be further qualified with attributes:
|
|
<TT>"<para type="warning" classif="cosmic_secret">+"</TT>. The values
|
|
for attributes should be xml-escaped: <TT>"<candy type="M&amp;Ms">*"</TT>
|
|
(<TT>"<"</TT>, <TT>"&"</TT> <B></B>">"<B></B> and <TT>"""</TT> should be escaped).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that elements might get extra <TT>"id"</TT> attributes in the process. See add_id.
|
|
Use strip_att to remove unwanted id's.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Here is an example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the element <TT>$elt</TT> has the following content:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<elt>
|
|
<p>para 1</p>
|
|
<l_l1_1>list 1 item 1 para 1</l_l1_1>
|
|
<l_l1>list 1 item 1 para 2</l_l1>
|
|
<l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
|
|
<l_l1_n>list 1 item 3 para 1</l_l1_n>
|
|
<l_l1>list 1 item 3 para 2</l_l1>
|
|
<l_l1>list 1 item 3 para 3</l_l1>
|
|
<l_l1_1>list 2 item 1 para 1</l_l1_1>
|
|
<l_l1>list 2 item 1 para 2</l_l1>
|
|
<l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
|
|
<l_l1_n>list 2 item 3 para 1</l_l1_n>
|
|
<l_l1>list 2 item 3 para 2</l_l1>
|
|
<l_l1>list 2 item 3 para 3</l_l1>
|
|
</elt>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Then the code
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->wrap_children( q{<l_l1_1><l_l1>*} , li => { type => "ul1" });
|
|
$elt->wrap_children( q{<l_l1_n><l_l1>*} , li => { type => "ul" });
|
|
|
|
$elt->wrap_children( q{<li type="ul1"><li type="ul">+}, "ul");
|
|
$elt->strip_att( 'id');
|
|
$elt->strip_att( 'type');
|
|
$elt->print;
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
will output:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<elt>
|
|
<p>para 1</p>
|
|
<ul>
|
|
<li>
|
|
<l_l1_1>list 1 item 1 para 1</l_l1_1>
|
|
<l_l1>list 1 item 1 para 2</l_l1>
|
|
</li>
|
|
<li>
|
|
<l_l1_n>list 1 item 2 para 1 (only para)</l_l1_n>
|
|
</li>
|
|
<li>
|
|
<l_l1_n>list 1 item 3 para 1</l_l1_n>
|
|
<l_l1>list 1 item 3 para 2</l_l1>
|
|
<l_l1>list 1 item 3 para 3</l_l1>
|
|
</li>
|
|
</ul>
|
|
<ul>
|
|
<li>
|
|
<l_l1_1>list 2 item 1 para 1</l_l1_1>
|
|
<l_l1>list 2 item 1 para 2</l_l1>
|
|
</li>
|
|
<li>
|
|
<l_l1_n>list 2 item 2 para 1 (only para)</l_l1_n>
|
|
</li>
|
|
<li>
|
|
<l_l1_n>list 2 item 3 para 1</l_l1_n>
|
|
<l_l1>list 2 item 3 para 2</l_l1>
|
|
<l_l1>list 2 item 3 para 3</l_l1>
|
|
</li>
|
|
</ul>
|
|
</elt>
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="351">subs_text ($regexp, $replace)<DD>
|
|
|
|
|
|
|
|
|
|
subs_text does text substitution, similar to perl's <TT>" s///"</TT> operator.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>$regexp</TT> must be a perl regexp, created with the <TT>"qr"</TT> operator.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>$replace</TT> can include <TT>"$1, $2"</TT>... from the <TT>$regexp</TT>. It can also be
|
|
used to create element and entities, by using
|
|
<TT>"&elt( tag => { att => val }, text)"</TT> (similar syntax as <TT>"new"</TT>) and
|
|
<TT>"&ent( name)"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Here is a rather complex example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->subs_text( qr{(?<!do not )link to (<A HREF="http://([^\s,]*))},">http://([^\s,]*))},</A>
|
|
'see &elt( a =>{ href => $1 }, $2)'
|
|
);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This will replace text like <I>link to <A HREF="http://www.xmltwig.org">http://www.xmltwig.org</A></I> by
|
|
<I>see <a href=``<A HREF="http://www.xmltwig.org">www.xmltwig.org</A>''><A HREF="http://www.xmltwig.org">www.xmltwig.org</A></a></I>, but not
|
|
<I>do not link to...</I>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Generating entities (here replacing spaces with &nbsp;):
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->subs_text( qr{ }, '&ent( "&nbsp;")');
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
or, using a variable:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $ent="&nbsp;";
|
|
$elt->subs_text( qr{ }, "&ent( '$ent')");
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the substitution is always global, as in using the <TT>"g"</TT> modifier
|
|
in a perl substitution, and that it is performed on all text descendants
|
|
of the element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B>Bug</B>: in the <TT>$regexp</TT>, you can only use <TT>"\1"</TT>, <TT>"\2"</TT>... if the replacement
|
|
expression does not include elements or attributes. eg
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$t->subs_text( qr/((t[aiou])\2)/, '$2'); # ok, replaces toto, tata, titi, tutu by to, ta, ti, tu
|
|
$t->subs_text( qr/((t[aiou])\2)/, '&elt(p => $1)' ); # NOK, does not find toto...
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="352">add_id ($optional_coderef)<DD>
|
|
|
|
|
|
Add an id to the element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The id is an attribute, <TT>"id"</TT> by default, see the <TT>"id"</TT> option for XML::Twig
|
|
<TT>"new"</TT> to change it. Use an id starting with <TT>"#"</TT> to get an id that's not
|
|
output by print, flush or sprint, yet that allows you to use the
|
|
elt_id method to get the element easily.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the element already has an id, no new id is generated.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
By default the method create an id of the form <TT>"twig_id_<nnnn>"</TT>,
|
|
where <TT>"<nnnn>"</TT> is a number, incremented each time the method is called
|
|
successfully.
|
|
<DT id="353">set_id_seed ($prefix)<DD>
|
|
|
|
|
|
by default the id generated by <TT>"add_id"</TT> is <TT>"twig_id_<nnnn>"</TT>,
|
|
<TT>"set_id_seed"</TT> changes the prefix to <TT>$prefix</TT> and resets the number
|
|
to 1
|
|
<DT id="354">strip_att ($att)<DD>
|
|
|
|
|
|
Remove the attribute <TT>$att</TT> from all descendants of the element (including
|
|
the element)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the element
|
|
<DT id="355">change_att_name ($old_name, $new_name)<DD>
|
|
|
|
|
|
|
|
|
|
Change the name of the attribute from <TT>$old_name</TT> to <TT>$new_name</TT>. If there is no
|
|
attribute <TT>$old_name</TT> nothing happens.
|
|
<DT id="356">lc_attnames<DD>
|
|
|
|
|
|
Lower cases the name all the attributes of the element.
|
|
<DT id="357">sort_children_on_value( %options)<DD>
|
|
|
|
|
|
|
|
|
|
Sort the children of the element in place according to their text.
|
|
All children are sorted.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the element, with its children sorted.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>%options</TT> are
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
type : numeric | alpha (default: alpha)
|
|
order : normal | reverse (default: normal)
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the element, with its children sorted
|
|
<DT id="358">sort_children_on_att ($att, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Sort the children of the element in place according to attribute <TT>$att</TT>.
|
|
<TT>%options</TT> are the same as for <TT>"sort_children_on_value"</TT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the element.
|
|
<DT id="359">sort_children_on_field ($tag, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Sort the children of the element in place, according to the field <TT>$tag</TT> (the
|
|
text of the first child of the child with this tag). <TT>%options</TT> are the same
|
|
as for <TT>"sort_children_on_value"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the element, with its children sorted
|
|
<DT id="360">sort_children( $get_key, %options)<DD>
|
|
|
|
|
|
|
|
|
|
Sort the children of the element in place. The <TT>$get_key</TT> argument is
|
|
a reference to a function that returns the sort key when passed an element.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
For example:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->sort_children( sub { $_[0]->{'att'}->{"nb"} + $_[0]->text },
|
|
type => 'numeric', order => 'reverse'
|
|
);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="361">field_to_att ($cond, $att)<DD>
|
|
|
|
|
|
|
|
|
|
Turn the text of the first sub-element matched by <TT>$cond</TT> into the value of
|
|
attribute <TT>$att</TT> of the element. If <TT>$att</TT> is omitted then <TT>$cond</TT> is used
|
|
as the name of the attribute, which makes sense only if <TT>$cond</TT> is a valid
|
|
element (and attribute) name.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The sub-element is then cut.
|
|
<DT id="362">att_to_field ($att, $tag)<DD>
|
|
|
|
|
|
|
|
|
|
Take the value of attribute <TT>$att</TT> and create a sub-element <TT>$tag</TT> as first
|
|
child of the element. If <TT>$tag</TT> is omitted then <TT>$att</TT> is used as the name of
|
|
the sub-element.
|
|
<DT id="363">get_xpath ($xpath, $optional_offset)<DD>
|
|
|
|
|
|
|
|
|
|
Return a list of elements satisfying the <TT>$xpath</TT>. <TT>$xpath</TT> is an XPATH-like
|
|
expression.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A subset of the <FONT SIZE="-1">XPATH</FONT> abbreviated syntax is covered:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
tag
|
|
tag[1] (or any other positive number)
|
|
tag[last()]
|
|
tag[@att] (the attribute exists for the element)
|
|
tag[@att="val"]
|
|
tag[@att=~ /regexp/]
|
|
tag[att1="val1" and att2="val2"]
|
|
tag[att1="val1" or att2="val2"]
|
|
tag[string()="toto"] (returns tag elements which text (as per the text method)
|
|
is toto)
|
|
tag[string()=~/regexp/] (returns tag elements which text (as per the text
|
|
method) matches regexp)
|
|
expressions can start with / (search starts at the document root)
|
|
expressions can start with . (search starts at the current element)
|
|
// can be used to get all descendants instead of just direct children
|
|
* matches any tag
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
So the following examples from the
|
|
<I>XPath recommendation<<A HREF="http://www.w3.org/TR/xpath.html#path-abbrev">http://www.w3.org/TR/xpath.html#path-abbrev</A>></I> work:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
para selects the para element children of the context node
|
|
* selects all element children of the context node
|
|
para[1] selects the first para child of the context node
|
|
para[last()] selects the last para child of the context node
|
|
*/para selects all para grandchildren of the context node
|
|
/doc/chapter[5]/section[2] selects the second section of the fifth chapter
|
|
of the doc
|
|
chapter//para selects the para element descendants of the chapter element
|
|
children of the context node
|
|
//para selects all the para descendants of the document root and thus selects
|
|
all para elements in the same document as the context node
|
|
//olist/item selects all the item elements in the same document as the
|
|
context node that have an olist parent
|
|
.//para selects the para element descendants of the context node
|
|
.. selects the parent of the context node
|
|
para[@type="warning"] selects all para children of the context node that have
|
|
a type attribute with value warning
|
|
employee[@secretary and @assistant] selects all the employee children of the
|
|
context node that have both a secretary attribute and an assistant
|
|
attribute
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The elements will be returned in the document order.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If <TT>$optional_offset</TT> is used then only one element will be returned, the one
|
|
with the appropriate offset in the list, starting at 0
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Quoting and interpolating variables can be a pain when the Perl syntax and the
|
|
<FONT SIZE="-1">XPATH</FONT> syntax collide, so use alternate quoting mechanisms like q or qq
|
|
(I like q{} and qq{} myself).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Here are some more examples to get you started:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
my $p1= "p1";
|
|
my $p2= "p2";
|
|
my @res= $t->get_xpath( qq{p[string( "$p1") or string( "$p2")]});
|
|
|
|
my $a= "a1";
|
|
my @res= $t->get_xpath( qq{//*[@att="$a"]});
|
|
|
|
my $val= "a1";
|
|
my $exp= qq{//p[ \@att='$val']}; # you need to use \@ or you will get a warning
|
|
my @res= $t->get_xpath( $exp);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that the only supported regexps delimiters are / and that you must
|
|
backslash all / in regexps <FONT SIZE="-1">AND</FONT> in regular strings.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
XML::Twig does not provide natively full <FONT SIZE="-1">XPATH</FONT> support, but you can use
|
|
<TT>"XML::Twig::XPath"</TT> to get <TT>"findnodes"</TT> to use <TT>"XML::XPath"</TT> as the
|
|
XPath engine, with full coverage of the spec.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"XML::Twig::XPath"</TT> to get <TT>"findnodes"</TT> to use <TT>"XML::XPath"</TT> as the
|
|
XPath engine, with full coverage of the spec.
|
|
<DT id="364">find_nodes<DD>
|
|
|
|
|
|
same as<TT>"get_xpath"</TT>
|
|
<DT id="365">findnodes<DD>
|
|
|
|
|
|
same as <TT>"get_xpath"</TT>
|
|
<DT id="366">text @optional_options<DD>
|
|
|
|
|
|
|
|
|
|
Return a string consisting of all the <TT>"PCDATA"</TT> and <TT>"CDATA"</TT> in an element,
|
|
without any tags. The text is not XML-escaped: base entities such as <TT>"&"</TT>
|
|
and <TT>"<"</TT> are not escaped.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The '<TT>"no_recurse"</TT>' option will only return the text of the element, not
|
|
of any included sub-elements (same as <TT>"text_only"</TT>).
|
|
<DT id="367">text_only<DD>
|
|
|
|
|
|
Same as <TT>"text"</TT> except that the text returned doesn't include
|
|
the text of sub-elements.
|
|
<DT id="368">trimmed_text<DD>
|
|
|
|
|
|
Same as <TT>"text"</TT> except that the text is trimmed: leading and trailing spaces
|
|
are discarded, consecutive spaces are collapsed
|
|
<DT id="369">set_text ($string)<DD>
|
|
|
|
|
|
Set the text for the element: if the element is a <TT>"PCDATA"</TT>, just set its
|
|
text, otherwise cut all the children of the element and create a single
|
|
<TT>"PCDATA"</TT> child for it, which holds the text.
|
|
<DT id="370">merge ($elt2)<DD>
|
|
|
|
|
|
Move the content of <TT>$elt2</TT> within the element
|
|
<DT id="371">insert ($tag1, [$optional_atts1], $tag2, [$optional_atts2],...)<DD>
|
|
|
|
|
|
|
|
|
|
For each tag in the list inserts an element <TT>$tag</TT> as the only child of the
|
|
element. The element gets the optional attributes in<TT>"$optional_atts<n>."</TT>
|
|
All children of the element are set as children of the new element.
|
|
The upper level element is returned.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$p->insert( table => { border=> 1}, 'tr', 'td')
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
put <TT>$p</TT> in a table with a visible border, a single <TT>"tr"</TT> and a single <TT>"td"</TT>
|
|
and return the <TT>"table"</TT> element:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<p><table border="1"><tr><td>original content of p</td></tr></table></p>
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="372">wrap_in (@tag)<DD>
|
|
|
|
|
|
Wrap elements in <TT>@tag</TT> as the successive ancestors of the element, returns the
|
|
new element.
|
|
<TT>"$elt->wrap_in( 'td', 'tr', 'table')"</TT> wraps the element as a single cell in a
|
|
table for example.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Optionally each tag can be followed by a hashref of attributes, that will be
|
|
set on the wrapping element:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$elt->wrap_in( p => { class => "advisory" }, div => { class => "intro", id => "div_intro" });
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="373">insert_new_elt ($opt_position, $tag, $opt_atts_hashref, @opt_content)<DD>
|
|
|
|
|
|
|
|
|
|
Combines a <TT>"new "</TT> and a <TT>"paste "</TT>: creates a new element using
|
|
<TT>$tag</TT>, <TT>$opt_atts_hashref </TT>and <TT>@opt_content</TT> which are arguments similar
|
|
to those for <TT>"new"</TT>, then paste it, using <TT>$opt_position</TT> or <TT>'first_child'</TT>,
|
|
relative to <TT>$elt</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Return the newly created element
|
|
<DT id="374">erase<DD>
|
|
|
|
|
|
Erase the element: the element is deleted and all of its children are
|
|
pasted in its place.
|
|
<DT id="375">set_content ( $optional_atts, @list_of_elt_and_strings) ( $optional_atts, '#EMPTY')<DD>
|
|
|
|
|
|
|
|
|
|
Set the content for the element, from a list of strings and
|
|
elements. Cuts all the element children, then pastes the list
|
|
elements as the children. This method will create a <TT>"PCDATA"</TT> element
|
|
for any strings in the list.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The <TT>$optional_atts</TT> argument is the ref of a hash of attributes. If this
|
|
argument is used then the previous attributes are deleted, otherwise they
|
|
are left untouched.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: if you rely on <FONT SIZE="-1">ID</FONT>'s then you will have to set the id yourself. At
|
|
this point the element does not belong to a twig yet, so the <FONT SIZE="-1">ID</FONT> attribute
|
|
is not known so it won't be stored in the <FONT SIZE="-1">ID</FONT> list.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A content of '<TT>"#EMPTY"</TT>' creates an empty element;
|
|
<DT id="376">namespace ($optional_prefix)<DD>
|
|
|
|
|
|
Return the <FONT SIZE="-1">URI</FONT> of the namespace that <TT>$optional_prefix</TT> or the element name
|
|
belongs to. If the name doesn't belong to any namespace, <TT>"undef"</TT> is returned.
|
|
<DT id="377">local_name<DD>
|
|
|
|
|
|
Return the local name (without the prefix) for the element
|
|
<DT id="378">ns_prefix<DD>
|
|
|
|
|
|
Return the namespace prefix for the element
|
|
<DT id="379">current_ns_prefixes<DD>
|
|
|
|
|
|
Return a list of namespace prefixes valid for the element. The order of the
|
|
prefixes in the list has no meaning. If the default namespace is currently
|
|
bound, '' appears in the list.
|
|
<DT id="380">inherit_att ($att, @optional_tag_list)<DD>
|
|
|
|
|
|
|
|
|
|
Return the value of an attribute inherited from parent tags. The value
|
|
returned is found by looking for the attribute in the element then in turn
|
|
in each of its ancestors. If the <TT>@optional_tag_list</TT> is supplied only those
|
|
ancestors whose tag is in the list will be checked.
|
|
<DT id="381">all_children_are ($optional_condition)<DD>
|
|
|
|
|
|
return 1 if all children of the element pass the <TT>$optional_condition</TT>,
|
|
0 otherwise
|
|
<DT id="382">level ($optional_condition)<DD>
|
|
|
|
|
|
Return the depth of the element in the twig (root is 0).
|
|
If <TT>$optional_condition</TT> is given then only ancestors that match the condition are
|
|
counted.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<B></B><FONT SIZE="-1"><B>WARNING</B></FONT><B></B>: in a tree created using the <TT>"twig_roots"</TT> option this will not return
|
|
the level in the document tree, level 0 will be the document root, level 1
|
|
will be the <TT>"twig_roots"</TT> elements. During the parsing (in a <TT>"twig_handler"</TT>)
|
|
you can use the <TT>"depth"</TT> method on the twig object to get the real parsing depth.
|
|
<DT id="383">in ($potential_parent)<DD>
|
|
|
|
|
|
Return true if the element is in the potential_parent (<TT>$potential_parent</TT> is
|
|
an element)
|
|
<DT id="384">in_context ($cond, $optional_level)<DD>
|
|
|
|
|
|
|
|
|
|
Return true if the element is included in an element which passes <TT>$cond</TT>
|
|
optionally within <TT>$optional_level</TT> levels. The returned value is the
|
|
including element.
|
|
<DT id="385">pcdata<DD>
|
|
|
|
|
|
Return the text of a <TT>"PCDATA"</TT> element or <TT>"undef"</TT> if the element is not
|
|
<TT>"PCDATA"</TT>.
|
|
<DT id="386">pcdata_xml_string<DD>
|
|
|
|
|
|
Return the text of a <TT>"PCDATA"</TT> element or undef if the element is not <TT>"PCDATA"</TT>.
|
|
The text is ``XML-escaped'' ('&' and '<' are replaced by '&amp;' and '&lt;')
|
|
<DT id="387">set_pcdata ($text)<DD>
|
|
|
|
|
|
Set the text of a <TT>"PCDATA"</TT> element. This method does not check that the element is
|
|
indeed a <TT>"PCDATA"</TT> so usually you should use <TT>"set_text"</TT> instead.
|
|
<DT id="388">append_pcdata ($text)<DD>
|
|
|
|
|
|
Add the text at the end of a <TT>"PCDATA"</TT> element.
|
|
<DT id="389">is_cdata<DD>
|
|
|
|
|
|
Return 1 if the element is a <TT>"CDATA"</TT> element, returns 0 otherwise.
|
|
<DT id="390">is_text<DD>
|
|
|
|
|
|
Return 1 if the element is a <TT>"CDATA"</TT> or <TT>"PCDATA"</TT> element, returns 0 otherwise.
|
|
<DT id="391">cdata<DD>
|
|
|
|
|
|
Return the text of a <TT>"CDATA"</TT> element or <TT>"undef"</TT> if the element is not
|
|
<TT>"CDATA"</TT>.
|
|
<DT id="392">cdata_string<DD>
|
|
|
|
|
|
Return the <FONT SIZE="-1">XML</FONT> string of a <TT>"CDATA"</TT> element, including the opening and
|
|
closing markers.
|
|
<DT id="393">set_cdata ($text)<DD>
|
|
|
|
|
|
Set the text of a <TT>"CDATA"</TT> element.
|
|
<DT id="394">append_cdata ($text)<DD>
|
|
|
|
|
|
Add the text at the end of a <TT>"CDATA"</TT> element.
|
|
<DT id="395">remove_cdata<DD>
|
|
|
|
|
|
Turns all <TT>"CDATA"</TT> sections in the element into regular <TT>"PCDATA"</TT> elements. This is useful
|
|
when converting <FONT SIZE="-1">XML</FONT> to <FONT SIZE="-1">HTML,</FONT> as browsers do not support <FONT SIZE="-1">CDATA</FONT> sections.
|
|
<DT id="396">extra_data<DD>
|
|
|
|
|
|
Return the extra_data (comments and <FONT SIZE="-1">PI</FONT>'s) attached to an element
|
|
<DT id="397">set_extra_data ($extra_data)<DD>
|
|
|
|
|
|
Set the extra_data (comments and <FONT SIZE="-1">PI</FONT>'s) attached to an element
|
|
<DT id="398">append_extra_data ($extra_data)<DD>
|
|
|
|
|
|
Append extra_data to the existing extra_data before the element (if no
|
|
previous extra_data exists then it is created)
|
|
<DT id="399">set_asis<DD>
|
|
|
|
|
|
Set a property of the element that causes it to be output without being <FONT SIZE="-1">XML</FONT>
|
|
escaped by the print functions: if it contains <TT>"a < b"</TT> it will be output
|
|
as such and not as <TT>"a &lt; b"</TT>. This can be useful to create text elements
|
|
that will be output as markup. Note that all <TT>"PCDATA"</TT> descendants of the
|
|
element are also marked as having the property (they are the ones that are
|
|
actually impacted by the change).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the element is a <TT>"CDATA"</TT> element it will also be output asis, without the
|
|
<TT>"CDATA"</TT> markers. The same goes for any <TT>"CDATA"</TT> descendant of the element
|
|
<DT id="400">set_not_asis<DD>
|
|
|
|
|
|
Unsets the <TT>"asis"</TT> property for the element and its text descendants.
|
|
<DT id="401">is_asis<DD>
|
|
|
|
|
|
Return the <TT>"asis"</TT> property status of the element ( 1 or <TT>"undef"</TT>)
|
|
<DT id="402">closed<DD>
|
|
|
|
|
|
Return true if the element has been closed. Might be useful if you are
|
|
somewhere in the tree, during the parse, and have no idea whether a parent
|
|
element is completely loaded or not.
|
|
<DT id="403">get_type<DD>
|
|
|
|
|
|
Return the type of the element: '<TT>"#ELT"</TT>' for ``real'' elements, or '<TT>"#PCDATA"</TT>',
|
|
'<TT>"#CDATA"</TT>', '<TT>"#COMMENT"</TT>', '<TT>"#ENT"</TT>', '<TT>"#PI"</TT>'
|
|
<DT id="404">is_elt<DD>
|
|
|
|
|
|
Return the tag if the element is a ``real'' element, or 0 if it is <TT>"PCDATA"</TT>,
|
|
<TT>"CDATA"</TT>...
|
|
<DT id="405">contains_only_text<DD>
|
|
|
|
|
|
Return 1 if the element does not contain any other ``real'' element
|
|
<DT id="406">contains_only ($exp)<DD>
|
|
|
|
|
|
Return the list of children if all children of the element match
|
|
the expression <TT>$exp</TT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
if( $para->contains_only( 'tt')) { ... }
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="407">contains_a_single ($exp)<DD>
|
|
|
|
|
|
If the element contains a single child that matches the expression <TT>$exp</TT>
|
|
returns that element. Otherwise returns 0.
|
|
<DT id="408">is_field<DD>
|
|
|
|
|
|
same as <TT>"contains_only_text"</TT>
|
|
<DT id="409">is_pcdata<DD>
|
|
|
|
|
|
Return 1 if the element is a <TT>"PCDATA"</TT> element, returns 0 otherwise.
|
|
<DT id="410">is_ent<DD>
|
|
|
|
|
|
Return 1 if the element is an entity (an unexpanded entity) element,
|
|
return 0 otherwise.
|
|
<DT id="411">is_empty<DD>
|
|
|
|
|
|
Return 1 if the element is empty, 0 otherwise
|
|
<DT id="412">set_empty<DD>
|
|
|
|
|
|
Flags the element as empty. No further check is made, so if the element
|
|
is actually not empty the output will be messed. The only effect of this
|
|
method is that the output will be <TT>"<tag att="value""/>"</TT>.
|
|
<DT id="413">set_not_empty<DD>
|
|
|
|
|
|
Flags the element as not empty. if it is actually empty then the element will
|
|
be output as <TT>"<tag att="value""></tag>"</TT>
|
|
<DT id="414">is_pi<DD>
|
|
|
|
|
|
Return 1 if the element is a processing instruction (<TT>"#PI"</TT>) element,
|
|
return 0 otherwise.
|
|
<DT id="415">target<DD>
|
|
|
|
|
|
Return the target of a processing instruction
|
|
<DT id="416">set_target ($target)<DD>
|
|
|
|
|
|
Set the target of a processing instruction
|
|
<DT id="417">data<DD>
|
|
|
|
|
|
Return the data part of a processing instruction
|
|
<DT id="418">set_data ($data)<DD>
|
|
|
|
|
|
Set the data of a processing instruction
|
|
<DT id="419">set_pi ($target, $data)<DD>
|
|
|
|
|
|
|
|
|
|
Set the target and data of a processing instruction
|
|
<DT id="420">pi_string<DD>
|
|
|
|
|
|
Return the string form of a processing instruction
|
|
(<TT>"<?target data?>"</TT>)
|
|
<DT id="421">is_comment<DD>
|
|
|
|
|
|
Return 1 if the element is a comment (<TT>"#COMMENT"</TT>) element,
|
|
return 0 otherwise.
|
|
<DT id="422">set_comment ($comment_text)<DD>
|
|
|
|
|
|
Set the text for a comment
|
|
<DT id="423">comment<DD>
|
|
|
|
|
|
Return the content of a comment (just the text, not the <TT>"<!--"</TT>
|
|
and <TT>"-->"</TT>)
|
|
<DT id="424">comment_string<DD>
|
|
|
|
|
|
Return the <FONT SIZE="-1">XML</FONT> string for a comment (<TT>"<!-- comment -->"</TT>)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that an <FONT SIZE="-1">XML</FONT> comment cannot start or end with a '-', or include '--'
|
|
(<A HREF="http://www.w3.org/TR/2008/REC-xml-20081126/#sec-comments),">http://www.w3.org/TR/2008/REC-xml-20081126/#sec-comments),</A>
|
|
if that is the case (because you have created the comment yourself presumably,
|
|
as it could not be in the input <FONT SIZE="-1">XML</FONT>), then a space will be inserted before
|
|
an initial '-', after a trailing one or between two '-' in the comment
|
|
(which could presumably mangle javascript ``hidden'' in an <FONT SIZE="-1">XHTML</FONT> comment);
|
|
<DT id="425">set_ent ($entity)<DD>
|
|
|
|
|
|
Set an (non-expanded) entity (<TT>"#ENT"</TT>). <TT>$entity</TT>) is the entity
|
|
text (<TT>"&ent;"</TT>)
|
|
<DT id="426">ent<DD>
|
|
|
|
|
|
Return the entity for an entity (<TT>"#ENT"</TT>) element (<TT>"&ent;"</TT>)
|
|
<DT id="427">ent_name<DD>
|
|
|
|
|
|
Return the entity name for an entity (<TT>"#ENT"</TT>) element (<TT>"ent"</TT>)
|
|
<DT id="428">ent_string<DD>
|
|
|
|
|
|
Return the entity, either expanded if the expanded version is available,
|
|
or non-expanded (<TT>"&ent;"</TT>) otherwise
|
|
<DT id="429">child ($offset, $optional_condition)<DD>
|
|
|
|
|
|
|
|
|
|
Return the <TT>$offset</TT>-th child of the element, optionally the <TT>$offset</TT>-th
|
|
child that matches <TT>$optional_condition</TT>. The children are treated as a list, so
|
|
<TT>"$elt->child( 0)"</TT> is the first child, while <TT>"$elt->child( -1)"</TT> is
|
|
the last child.
|
|
<DT id="430">child_text ($offset, $optional_condition)<DD>
|
|
|
|
|
|
|
|
|
|
Return the text of a child or <TT>"undef"</TT> if the sibling does not exist. Arguments
|
|
are the same as child.
|
|
<DT id="431">last_child ($optional_condition)<DD>
|
|
|
|
|
|
Return the last child of the element, or the last child matching
|
|
<TT>$optional_condition</TT> (ie the last of the element children matching
|
|
the condition).
|
|
<DT id="432">last_child_text ($optional_condition)<DD>
|
|
|
|
|
|
Same as <TT>"first_child_text"</TT> but for the last child.
|
|
<DT id="433">sibling ($offset, $optional_condition)<DD>
|
|
|
|
|
|
|
|
|
|
Return the next or previous <TT>$offset</TT>-th sibling of the element, or the
|
|
<TT>$offset</TT>-th one matching <TT>$optional_condition</TT>. If <TT>$offset</TT> is negative then a
|
|
previous sibling is returned, if <TT>$offset</TT> is positive then a next sibling is
|
|
returned. <TT>"$offset=0"</TT> returns the element if there is no condition or
|
|
if the element matches the condition>, <TT>"undef"</TT> otherwise.
|
|
<DT id="434">sibling_text ($offset, $optional_condition)<DD>
|
|
|
|
|
|
|
|
|
|
Return the text of a sibling or <TT>"undef"</TT> if the sibling does not exist.
|
|
Arguments are the same as <TT>"sibling"</TT>.
|
|
<DT id="435">prev_siblings ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of previous siblings (optionally matching <TT>$optional_condition</TT>)
|
|
for the element. The elements are ordered in document order.
|
|
<DT id="436">next_siblings ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of siblings (optionally matching <TT>$optional_condition</TT>)
|
|
following the element. The elements are ordered in document order.
|
|
<DT id="437">siblings ($optional_condition)<DD>
|
|
|
|
|
|
Return the list of siblings (optionally matching <TT>$optional_condition</TT>)
|
|
of the element (excluding the element itself). The elements are ordered
|
|
in document order.
|
|
<DT id="438">pos ($optional_condition)<DD>
|
|
|
|
|
|
Return the position of the element in the children list. The first child has a
|
|
position of 1 (as in XPath).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If the <TT>$optional_condition</TT> is given then only siblings that match the condition
|
|
are counted. If the element itself does not match the condition then
|
|
0 is returned.
|
|
<DT id="439">atts<DD>
|
|
|
|
|
|
Return a hash ref containing the element attributes
|
|
<DT id="440">set_atts ({ att1=>$att1_val, att2=> $att2_val... })<DD>
|
|
|
|
|
|
|
|
|
|
Set the element attributes with the hash ref supplied as the argument. The previous
|
|
attributes are lost (ie the attributes set by <TT>"set_atts"</TT> replace all of the
|
|
attributes of the element).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can also pass a list instead of a hashref: <TT>"$elt->set_atts( att1 => 'val1',...)"</TT>
|
|
<DT id="441">del_atts<DD>
|
|
|
|
|
|
Deletes all the element attributes.
|
|
<DT id="442">att_nb<DD>
|
|
|
|
|
|
Return the number of attributes for the element
|
|
<DT id="443">has_atts<DD>
|
|
|
|
|
|
Return true if the element has attributes (in fact return the number of
|
|
attributes, thus being an alias to <TT>"att_nb"</TT>
|
|
<DT id="444">has_no_atts<DD>
|
|
|
|
|
|
Return true if the element has no attributes, false (0) otherwise
|
|
<DT id="445">att_names<DD>
|
|
|
|
|
|
return a list of the attribute names for the element
|
|
<DT id="446">att_xml_string ($att, $options)<DD>
|
|
|
|
|
|
|
|
|
|
Return the attribute value, where '&', '<' and quote (" or the value of the quote option
|
|
at twig creation) are XML-escaped.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The options are passed as a hashref, setting <TT>"escape_gt"</TT> to a true value will also escape
|
|
'>' ($elt( 'myatt', { escape_gt => 1 });
|
|
<DT id="447">set_id ($id)<DD>
|
|
|
|
|
|
Set the <TT>"id"</TT> attribute of the element to the value.
|
|
See <TT>"elt_id "</TT> to change the id attribute name
|
|
<DT id="448">id<DD>
|
|
|
|
|
|
Gets the id attribute value
|
|
<DT id="449">del_id ($id)<DD>
|
|
|
|
|
|
Deletes the <TT>"id"</TT> attribute of the element and remove it from the id list
|
|
for the document
|
|
<DT id="450">class<DD>
|
|
|
|
|
|
Return the <TT>"class"</TT> attribute for the element (methods on the <TT>"class"</TT>
|
|
attribute are quite convenient when dealing with <FONT SIZE="-1">XHTML,</FONT> or plain <FONT SIZE="-1">XML</FONT> that
|
|
will eventually be displayed using <FONT SIZE="-1">CSS</FONT>)
|
|
<DT id="451">lclass<DD>
|
|
|
|
|
|
same as class, except that
|
|
this method is an lvalue, so you can do <TT>"$elt->lclass= "foo""</TT>
|
|
<DT id="452">set_class ($class)<DD>
|
|
|
|
|
|
Set the <TT>"class"</TT> attribute for the element to <TT>$class</TT>
|
|
<DT id="453">add_class ($class)<DD>
|
|
|
|
|
|
Add <TT>$class</TT> to the element <TT>"class"</TT> attribute: the new class is added
|
|
only if it is not already present.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that classes are then sorted alphabetically, so the <TT>"class"</TT> attribute
|
|
can be changed even if the class is already there
|
|
<DT id="454">remove_class ($class)<DD>
|
|
|
|
|
|
Remove <TT>$class</TT> from the element <TT>"class"</TT> attribute.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Note that classes are then sorted alphabetically, so the <TT>"class"</TT> attribute can be
|
|
changed even if the class is already there
|
|
<DT id="455">add_to_class ($class)<DD>
|
|
|
|
|
|
alias for add_class
|
|
<DT id="456">att_to_class ($att)<DD>
|
|
|
|
|
|
Set the <TT>"class"</TT> attribute to the value of attribute <TT>$att</TT>
|
|
<DT id="457">add_att_to_class ($att)<DD>
|
|
|
|
|
|
Add the value of attribute <TT>$att</TT> to the <TT>"class"</TT> attribute of the element
|
|
<DT id="458">move_att_to_class ($att)<DD>
|
|
|
|
|
|
Add the value of attribute <TT>$att</TT> to the <TT>"class"</TT> attribute of the element
|
|
and delete the attribute
|
|
<DT id="459">tag_to_class<DD>
|
|
|
|
|
|
Set the <TT>"class"</TT> attribute of the element to the element tag
|
|
<DT id="460">add_tag_to_class<DD>
|
|
|
|
|
|
Add the element tag to its <TT>"class"</TT> attribute
|
|
<DT id="461">set_tag_class ($new_tag)<DD>
|
|
|
|
|
|
Add the element tag to its <TT>"class"</TT> attribute and sets the tag to <TT>$new_tag</TT>
|
|
<DT id="462">in_class ($class)<DD>
|
|
|
|
|
|
Return true (<TT>1</TT>) if the element is in the class <TT>$class</TT> (if <TT>$class</TT> is
|
|
one of the tokens in the element <TT>"class"</TT> attribute)
|
|
<DT id="463">tag_to_span<DD>
|
|
|
|
|
|
Change the element tag tp <TT>"span"</TT> and set its class to the old tag
|
|
<DT id="464">tag_to_div<DD>
|
|
|
|
|
|
Change the element tag tp <TT>"div"</TT> and set its class to the old tag
|
|
<DT id="465"><FONT SIZE="-1">DESTROY</FONT><DD>
|
|
|
|
|
|
Frees the element from memory.
|
|
<DT id="466">start_tag<DD>
|
|
|
|
|
|
Return the string for the start tag for the element, including
|
|
the <TT>"/>"</TT> at the end of an empty element tag
|
|
<DT id="467">end_tag<DD>
|
|
|
|
|
|
Return the string for the end tag of an element. For an empty
|
|
element, this returns the empty string ('').
|
|
<DT id="468">xml_string @optional_options<DD>
|
|
|
|
|
|
|
|
|
|
Equivalent to <TT>"$elt->sprint( 1)"</TT>, returns the string for the entire
|
|
element, excluding the element's tags (but nested element tags are present)
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The '<TT>"no_recurse"</TT>' option will only return the text of the element, not
|
|
of any included sub-elements (same as <TT>"xml_text_only"</TT>).
|
|
<DT id="469">inner_xml<DD>
|
|
|
|
|
|
Another synonym for xml_string
|
|
<DT id="470">outer_xml<DD>
|
|
|
|
|
|
Another synonym for sprint
|
|
<DT id="471">xml_text<DD>
|
|
|
|
|
|
Return the text of the element, encoded (and processed by the current
|
|
<TT>"output_filter"</TT> or <TT>"output_encoding"</TT> options, without any tag.
|
|
<DT id="472">xml_text_only<DD>
|
|
|
|
|
|
Same as <TT>"xml_text"</TT> except that the text returned doesn't include
|
|
the text of sub-elements.
|
|
<DT id="473">set_pretty_print ($style)<DD>
|
|
|
|
|
|
Set the pretty print method, amongst '<TT>"none"</TT>' (default), '<TT>"nsgmls"</TT>',
|
|
'<TT>"nice"</TT>', '<TT>"indented"</TT>', '<TT>"record"</TT>' and '<TT>"record_c"</TT>'
|
|
|
|
|
|
<P>
|
|
|
|
|
|
pretty_print styles:
|
|
<DL COMPACT><DT id="474"><DD>
|
|
<DL COMPACT>
|
|
<DT id="475">none<DD>
|
|
|
|
|
|
the default, no <TT>"\n"</TT> is used
|
|
<DT id="476">nsgmls<DD>
|
|
|
|
|
|
nsgmls style, with <TT>"\n"</TT> added within tags
|
|
<DT id="477">nice<DD>
|
|
|
|
|
|
adds <TT>"\n"</TT> wherever possible (<FONT SIZE="-1">NOT SAFE,</FONT> can lead to invalid <FONT SIZE="-1">XML</FONT>)
|
|
<DT id="478">indented<DD>
|
|
|
|
|
|
same as <TT>"nice"</TT> plus indents elements (<FONT SIZE="-1">NOT SAFE,</FONT> can lead to invalid <FONT SIZE="-1">XML</FONT>)
|
|
<DT id="479">record<DD>
|
|
|
|
|
|
table-oriented pretty print, one field per line
|
|
<DT id="480">record_c<DD>
|
|
|
|
|
|
table-oriented pretty print, more compact than <TT>"record"</TT>, one record per line
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="481"><DD>
|
|
</DL>
|
|
|
|
<DT id="482">set_empty_tag_style ($style)<DD>
|
|
|
|
|
|
Set the method to output empty tags, amongst '<TT>"normal"</TT>' (default), '<TT>"html"</TT>',
|
|
and '<TT>"expand"</TT>',
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>"normal"</TT> outputs an empty tag '<TT>"<tag/>"</TT>', <TT>"html"</TT> adds a space
|
|
'<TT>"<tag />"</TT>' for elements that can be empty in <FONT SIZE="-1">XHTML</FONT> and <TT>"expand"</TT> outputs
|
|
'<TT>"<tag></tag>"</TT>'
|
|
<DT id="483">set_remove_cdata ($flag)<DD>
|
|
|
|
|
|
set (or unset) the flag that forces the twig to output <FONT SIZE="-1">CDATA</FONT> sections as
|
|
regular (escaped) <FONT SIZE="-1">PCDATA</FONT>
|
|
<DT id="484">set_indent ($string)<DD>
|
|
|
|
|
|
Set the indentation for the indented pretty print style (default is 2 spaces)
|
|
<DT id="485">set_quote ($quote)<DD>
|
|
|
|
|
|
Set the quotes used for attributes. can be '<TT>"double"</TT>' (default) or '<TT>"single"</TT>'
|
|
<DT id="486">cmp ($elt)<DD>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
Compare the order of the 2 elements in a twig.
|
|
|
|
C<$a> is the <A>..</A> element, C<$b> is the <B>...</B> element
|
|
|
|
document $a->cmp( $b)
|
|
<A> ... </A> ... <B> ... </B> -1
|
|
<A> ... <B> ... </B> ... </A> -1
|
|
<B> ... </B> ... <A> ... </A> 1
|
|
<B> ... <A> ... </A> ... </B> 1
|
|
$a == $b 0
|
|
$a and $b not in the same tree undef
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="487">before ($elt)<DD>
|
|
|
|
|
|
Return 1 if <TT>$elt</TT> starts before the element, 0 otherwise. If the 2 elements
|
|
are not in the same twig then return <TT>"undef"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="488">after ($elt)<DD>
|
|
|
|
|
|
Return 1 if <TT>$elt</TT> starts after the element, 0 otherwise. If the 2 elements
|
|
are not in the same twig then return <TT>"undef"</TT>.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
if( $a->cmp( $b) == -1) { return 1; } else { return 0; }
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="489">other comparison methods<DD>
|
|
|
|
|
|
<DL COMPACT><DT id="490"><DD>
|
|
|
|
<DL COMPACT>
|
|
<DT id="491">lt<DD>
|
|
|
|
|
|
<DT id="492">le<DD>
|
|
|
|
|
|
<DT id="493">gt<DD>
|
|
|
|
|
|
<DT id="494">ge<DD>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="495"><DD>
|
|
</DL>
|
|
|
|
<DT id="496">path<DD>
|
|
|
|
|
|
|
|
Return the element context in a form similar to XPath's short
|
|
form: '<TT>"/root/tag1/../tag"</TT>'
|
|
<DT id="497">xpath<DD>
|
|
|
|
|
|
Return a unique XPath expression that can be used to find the element
|
|
again.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
It looks like <TT>"/doc/sect[3]/title"</TT>: unique elements do not have an index,
|
|
the others do.
|
|
<DT id="498">flush<DD>
|
|
|
|
|
|
flushes the twig up to the current element (strictly equivalent to
|
|
<TT>"$elt->root->flush"</TT>)
|
|
<DT id="499">private methods<DD>
|
|
|
|
|
|
Low-level methods on the twig:
|
|
<DL COMPACT><DT id="500"><DD>
|
|
<DL COMPACT>
|
|
<DT id="501">set_parent ($parent)<DD>
|
|
|
|
|
|
|
|
<DT id="502">set_first_child ($first_child)<DD>
|
|
|
|
|
|
<DT id="503">set_last_child ($last_child)<DD>
|
|
|
|
|
|
<DT id="504">set_prev_sibling ($prev_sibling)<DD>
|
|
|
|
|
|
<DT id="505">set_next_sibling ($next_sibling)<DD>
|
|
|
|
|
|
<DT id="506">set_twig_current<DD>
|
|
|
|
|
|
<DT id="507">del_twig_current<DD>
|
|
|
|
|
|
<DT id="508">twig_current<DD>
|
|
|
|
|
|
<DT id="509">contains_text<DD>
|
|
|
|
|
|
</DL>
|
|
</DL>
|
|
|
|
<DL COMPACT><DT id="510"><DD>
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Those methods should not be used, unless of course you find some creative
|
|
and interesting, not to mention useful, ways to do it.
|
|
</DL>
|
|
|
|
</DL>
|
|
<A NAME="lbAV"> </A>
|
|
<H3>cond</H3>
|
|
|
|
|
|
|
|
Most of the navigation functions accept a condition as an optional argument
|
|
The first element (or all elements for <TT>"children "</TT> or
|
|
<TT>"ancestors "</TT>) that passes the condition is returned.
|
|
<P>
|
|
|
|
The condition is a single step of an XPath expression using the XPath subset
|
|
defined by <TT>"get_xpath"</TT>. Additional conditions are:
|
|
<P>
|
|
|
|
The condition can be
|
|
<DL COMPACT>
|
|
<DT id="511">#ELT<DD>
|
|
|
|
|
|
return a ``real'' element (not a <FONT SIZE="-1">PCDATA, CDATA,</FONT> comment or pi element)
|
|
<DT id="512">#TEXT<DD>
|
|
|
|
|
|
return a <FONT SIZE="-1">PCDATA</FONT> or <FONT SIZE="-1">CDATA</FONT> element
|
|
<DT id="513">regular expression<DD>
|
|
|
|
|
|
return an element whose tag matches the regexp. The regexp has to be created
|
|
with <TT>"qr//"</TT> (hence this is available only on perl 5.005 and above)
|
|
<DT id="514">code reference<DD>
|
|
|
|
|
|
applies the code, passing the current element as argument, if the code returns
|
|
true then the element is returned, if it returns false then the code is applied
|
|
to the next candidate.
|
|
</DL>
|
|
<A NAME="lbAW"> </A>
|
|
<H3>XML::Twig::XPath</H3>
|
|
|
|
|
|
|
|
XML::Twig implements a subset of XPath through the <TT>"get_xpath"</TT> method.
|
|
<P>
|
|
|
|
If you want to use the whole XPath power, then you can use <TT>"XML::Twig::XPath"</TT>
|
|
instead. In this case <TT>"XML::Twig"</TT> uses <TT>"XML::XPath"</TT> to execute XPath queries.
|
|
You will of course need <TT>"XML::XPath"</TT> installed to be able to use <TT>"XML::Twig::XPath"</TT>.
|
|
<P>
|
|
|
|
See XML::XPath for more information.
|
|
<P>
|
|
|
|
The methods you can use are:
|
|
<DL COMPACT>
|
|
<DT id="515">findnodes ($path)<DD>
|
|
|
|
|
|
return a list of nodes found by <TT>$path</TT>.
|
|
<DT id="516">findnodes_as_string ($path)<DD>
|
|
|
|
|
|
return the nodes found reproduced as <FONT SIZE="-1">XML.</FONT> The result is not guaranteed
|
|
to be valid <FONT SIZE="-1">XML</FONT> though.
|
|
<DT id="517">findvalue ($path)<DD>
|
|
|
|
|
|
return the concatenation of the text content of the result nodes
|
|
</DL>
|
|
<P>
|
|
|
|
In order for <TT>"XML::XPath"</TT> to be used as the XPath engine the following methods
|
|
are included in <TT>"XML::Twig"</TT>:
|
|
<P>
|
|
|
|
in XML::Twig
|
|
<DL COMPACT>
|
|
<DT id="518">getRootNode<DD>
|
|
|
|
|
|
|
|
<DT id="519">getParentNode<DD>
|
|
|
|
|
|
<DT id="520">getChildNodes<DD>
|
|
|
|
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
in XML::Twig::Elt
|
|
<DL COMPACT>
|
|
<DT id="521">string_value<DD>
|
|
|
|
|
|
|
|
<DT id="522">toString<DD>
|
|
|
|
|
|
<DT id="523">getName<DD>
|
|
|
|
|
|
<DT id="524">getRootNode<DD>
|
|
|
|
|
|
<DT id="525">getNextSibling<DD>
|
|
|
|
|
|
<DT id="526">getPreviousSibling<DD>
|
|
|
|
|
|
<DT id="527">isElementNode<DD>
|
|
|
|
|
|
<DT id="528">isTextNode<DD>
|
|
|
|
|
|
<DT id="529">isPI<DD>
|
|
|
|
|
|
<DT id="530">isPINode<DD>
|
|
|
|
|
|
<DT id="531">isProcessingInstructionNode<DD>
|
|
|
|
|
|
<DT id="532">isComment<DD>
|
|
|
|
|
|
<DT id="533">isCommentNode<DD>
|
|
|
|
|
|
<DT id="534">getTarget<DD>
|
|
|
|
|
|
<DT id="535">getChildNodes<DD>
|
|
|
|
|
|
<DT id="536">getElementById<DD>
|
|
|
|
|
|
|
|
</DL>
|
|
<A NAME="lbAX"> </A>
|
|
<H3>XML::Twig::XPath::Elt</H3>
|
|
|
|
|
|
|
|
The methods you can use are the same as on <TT>"XML::Twig::XPath"</TT> elements:
|
|
<DL COMPACT>
|
|
<DT id="537">findnodes ($path)<DD>
|
|
|
|
|
|
return a list of nodes found by <TT>$path</TT>.
|
|
<DT id="538">findnodes_as_string ($path)<DD>
|
|
|
|
|
|
return the nodes found reproduced as <FONT SIZE="-1">XML.</FONT> The result is not guaranteed
|
|
to be valid <FONT SIZE="-1">XML</FONT> though.
|
|
<DT id="539">findvalue ($path)<DD>
|
|
|
|
|
|
return the concatenation of the text content of the result nodes
|
|
</DL>
|
|
<A NAME="lbAY"> </A>
|
|
<H3>XML::Twig::Entity_list</H3>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="540">new<DD>
|
|
|
|
|
|
Create an entity list.
|
|
<DT id="541">add ($ent)<DD>
|
|
|
|
|
|
Add an entity to an entity list.
|
|
<DT id="542">add_new_ent ($name, $val, $sysid, $pubid, $ndata, $param)<DD>
|
|
|
|
|
|
|
|
|
|
Create a new entity and add it to the entity list
|
|
<DT id="543">delete ($ent or $tag).<DD>
|
|
|
|
|
|
|
|
|
|
Delete an entity (defined by its name or by the Entity object)
|
|
from the list.
|
|
<DT id="544">print ($optional_filehandle)<DD>
|
|
|
|
|
|
Print the entity list.
|
|
<DT id="545">list<DD>
|
|
|
|
|
|
Return the list as an array
|
|
</DL>
|
|
<A NAME="lbAZ"> </A>
|
|
<H3>XML::Twig::Entity</H3>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="546">new ($name, $val, $sysid, $pubid, $ndata, $param)<DD>
|
|
|
|
|
|
|
|
|
|
Same arguments as the Entity handler for XML::Parser.
|
|
<DT id="547">print ($optional_filehandle)<DD>
|
|
|
|
|
|
Print an entity declaration.
|
|
<DT id="548">name<DD>
|
|
|
|
|
|
Return the name of the entity
|
|
<DT id="549">val<DD>
|
|
|
|
|
|
Return the value of the entity
|
|
<DT id="550">sysid<DD>
|
|
|
|
|
|
Return the system id for the entity (for <FONT SIZE="-1">NDATA</FONT> entities)
|
|
<DT id="551">pubid<DD>
|
|
|
|
|
|
Return the public id for the entity (for <FONT SIZE="-1">NDATA</FONT> entities)
|
|
<DT id="552">ndata<DD>
|
|
|
|
|
|
Return true if the entity is an <FONT SIZE="-1">NDATA</FONT> entity
|
|
<DT id="553">param<DD>
|
|
|
|
|
|
Return true if the entity is a parameter entity
|
|
<DT id="554">text<DD>
|
|
|
|
|
|
Return the entity declaration text.
|
|
</DL>
|
|
<A NAME="lbBA"> </A>
|
|
<H2>EXAMPLES</H2>
|
|
|
|
|
|
|
|
Additional examples (and a complete tutorial) can be found on the
|
|
<I>XML::Twig Page<<A HREF="http://www.xmltwig.org/xmltwig/">http://www.xmltwig.org/xmltwig/</A>></I>
|
|
<P>
|
|
|
|
To figure out what flush does call the following script with an
|
|
<FONT SIZE="-1">XML</FONT> file and an element name as arguments
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
use XML::Twig;
|
|
|
|
my ($file, $elt)= @ARGV;
|
|
my $t= XML::Twig->new( twig_handlers =>
|
|
{ $elt => sub {$_[0]->flush; print "\n[flushed here]\n";} });
|
|
$t->parsefile( $file, ErrorContext => 2);
|
|
$t->flush;
|
|
print "\n";
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbBB"> </A>
|
|
<H2>NOTES</H2>
|
|
|
|
|
|
|
|
<A NAME="lbBC"> </A>
|
|
<H3>Subclassing XML::Twig</H3>
|
|
|
|
|
|
|
|
Useful methods:
|
|
<DL COMPACT>
|
|
<DT id="555">elt_class<DD>
|
|
|
|
|
|
In order to subclass <TT>"XML::Twig"</TT> you will probably need to subclass also
|
|
<TT>"XML::Twig::Elt"</TT>. Use the <TT>"elt_class"</TT> option when you create the
|
|
<TT>"XML::Twig"</TT> object to get the elements created in a different class
|
|
(which should be a subclass of <TT>"XML::Twig::Elt"</TT>.
|
|
<DT id="556">add_options<DD>
|
|
|
|
|
|
If you inherit <TT>"XML::Twig"</TT> new method but want to add more options to it
|
|
you can use this method to prevent XML::Twig to issue warnings for those
|
|
additional options.
|
|
</DL>
|
|
<A NAME="lbBD"> </A>
|
|
<H3><FONT SIZE="-1">DTD</FONT> Handling</H3>
|
|
|
|
|
|
|
|
There are 3 possibilities here. They are:
|
|
<DL COMPACT>
|
|
<DT id="557">No <FONT SIZE="-1">DTD</FONT><DD>
|
|
|
|
|
|
No doctype, no <FONT SIZE="-1">DTD</FONT> information, no entity information, the world is simple...
|
|
<DT id="558">Internal <FONT SIZE="-1">DTD</FONT><DD>
|
|
|
|
|
|
The <FONT SIZE="-1">XML</FONT> document includes an internal <FONT SIZE="-1">DTD,</FONT> and maybe entity declarations.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If you use the load_DTD option when creating the twig the <FONT SIZE="-1">DTD</FONT> information and
|
|
the entity declarations can be accessed.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The <FONT SIZE="-1">DTD</FONT> and the entity declarations will be <TT>"flush"</TT>'ed (or <TT>"print"</TT>'ed) either
|
|
as is (if they have not been modified) or as reconstructed (poorly, comments
|
|
are lost, order is not kept, due to it's content this <FONT SIZE="-1">DTD</FONT> should not be viewed
|
|
by anyone) if they have been modified. You can also modify them directly by
|
|
changing the <TT>"$twig->{twig_doctype}->{internal}"</TT> field (straight from
|
|
XML::Parser, see the <TT>"Doctype"</TT> handler doc)
|
|
<DT id="559">External <FONT SIZE="-1">DTD</FONT><DD>
|
|
|
|
|
|
The <FONT SIZE="-1">XML</FONT> document includes a reference to an external <FONT SIZE="-1">DTD,</FONT> and maybe entity
|
|
declarations.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If you use the <TT>"load_DTD"</TT> when creating the twig the <FONT SIZE="-1">DTD</FONT> information and the
|
|
entity declarations can be accessed. The entity declarations will be
|
|
<TT>"flush"</TT>'ed (or <TT>"print"</TT>'ed) either as is (if they have not been modified) or
|
|
as reconstructed (badly, comments are lost, order is not kept).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
You can change the doctype through the <TT>"$twig->set_doctype"</TT> method and
|
|
print the dtd through the <TT>"$twig->dtd_text"</TT> or <TT>"$twig->dtd_print"</TT>
|
|
<BR> methods.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If you need to modify the entity list this is probably the easiest way to do it.
|
|
</DL>
|
|
<A NAME="lbBE"> </A>
|
|
<H3>Flush</H3>
|
|
|
|
|
|
|
|
Remember that element handlers are called when the element is <FONT SIZE="-1">CLOSED,</FONT> so
|
|
if you have handlers for nested elements the inner handlers will be called
|
|
first. It makes it for example trickier than it would seem to number nested
|
|
sections (or clauses, or divs), as the titles in the inner sections are handled
|
|
before the outer sections.
|
|
<A NAME="lbBF"> </A>
|
|
<H2>BUGS</H2>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="560">segfault during parsing<DD>
|
|
|
|
|
|
This happens when parsing huge documents, or lots of small ones, with a version
|
|
of Perl before 5.16.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
This is due to a bug in the way weak references are handled in Perl itself.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The fix is either to upgrade to Perl 5.16 or later (<TT>"perlbrew"</TT> is a great
|
|
tool to manage several installations of perl on the same machine).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Another, <FONT SIZE="-1">NOT RECOMMENDED,</FONT> way of fixing the problem, is to switch off weak
|
|
references by writing <TT>"XML::Twig::_set_weakrefs( 0);"</TT> at the top of the code.
|
|
This is totally unsupported, and may lead to other problems though,
|
|
<DT id="561">entity handling<DD>
|
|
|
|
|
|
Due to XML::Parser behaviour, non-base entities in attribute values disappear if
|
|
they are not declared in the document:
|
|
<TT>"att="val&ent;""</TT> will be turned into <TT>"att => val"</TT>, unless you use the
|
|
<TT>"keep_encoding"</TT> argument to <TT>"XML::Twig->new"</TT>
|
|
<DT id="562"><FONT SIZE="-1">DTD</FONT> handling<DD>
|
|
|
|
|
|
The <FONT SIZE="-1">DTD</FONT> handling methods are quite bugged. No one uses them and
|
|
it seems very difficult to get them to work in all cases, including with
|
|
several slightly incompatible versions of XML::Parser and of libexpat.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Basically you can read the <FONT SIZE="-1">DTD,</FONT> output it back properly, and update entities,
|
|
but not much more.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
So use XML::Twig with standalone documents, or with documents referring to an
|
|
external <FONT SIZE="-1">DTD,</FONT> but don't expect it to properly parse and even output back the
|
|
<FONT SIZE="-1">DTD.</FONT>
|
|
<DT id="563">memory leak<DD>
|
|
|
|
|
|
If you use a <FONT SIZE="-1">REALLY</FONT> old Perl (5.005!) and
|
|
a lot of twigs you might find that you leak quite a lot of memory
|
|
(about 2Ks per twig). You can use the <TT>"dispose "</TT> method to free
|
|
that memory after you are done.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If you create elements the same thing might happen, use the <TT>"delete"</TT>
|
|
method to get rid of them.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Alternatively installing the <TT>"Scalar::Util"</TT> (or <TT>"WeakRef"</TT>) module on a version
|
|
of Perl that supports it (>5.6.0) will get rid of the memory leaks automagically.
|
|
<DT id="564"><FONT SIZE="-1">ID</FONT> list<DD>
|
|
|
|
|
|
The <FONT SIZE="-1">ID</FONT> list is <FONT SIZE="-1">NOT</FONT> updated when elements are cut or deleted.
|
|
<DT id="565">change_gi<DD>
|
|
|
|
|
|
This method will not function properly if you do:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$twig->change_gi( $old1, $new);
|
|
$twig->change_gi( $old2, $new);
|
|
$twig->change_gi( $new, $even_newer);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="566">sanity check on XML::Parser method calls<DD>
|
|
|
|
|
|
XML::Twig should really prevent calls to some XML::Parser methods, especially
|
|
the <TT>"setHandlers"</TT> method.
|
|
<DT id="567">pretty printing<DD>
|
|
|
|
|
|
Pretty printing (at least using the '<TT>"indented"</TT>' style) is hard to get right!
|
|
Only elements that belong to the document will be properly indented. Printing
|
|
elements that do not belong to the twig makes it impossible for XML::Twig to
|
|
figure out their depth, and thus their indentation level.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Also there is an unavoidable bug when using <TT>"flush"</TT> and pretty printing for
|
|
elements with mixed content that start with an embedded element:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
<elt><b>b</b>toto<b>bold</b></elt>
|
|
|
|
will be output as
|
|
|
|
<elt>
|
|
<b>b</b>toto<b>bold</b></elt>
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
if you flush the twig when you find the <TT>"<b>"</TT> element
|
|
</DL>
|
|
<A NAME="lbBG"> </A>
|
|
<H2>Globals</H2>
|
|
|
|
|
|
|
|
These are the things that can mess up calling code, especially if threaded.
|
|
They might also cause problem under mod_perl.
|
|
<DL COMPACT>
|
|
<DT id="568">Exported constants<DD>
|
|
|
|
|
|
Whether you want them or not you get them! These are subroutines to use
|
|
as constant when creating or testing elements
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
PCDATA return '#PCDATA'
|
|
CDATA return '#CDATA'
|
|
PI return '#PI', I had the choice between PROC and PI :--(
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="569">Module scoped values: constants<DD>
|
|
|
|
|
|
these should cause no trouble:
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
%base_ent= ( '>' => '&gt;',
|
|
'<' => '&lt;',
|
|
'&' => '&amp;',
|
|
"'" => '&apos;',
|
|
'"' => '&quot;',
|
|
);
|
|
CDATA_START = "<![CDATA[";
|
|
CDATA_END = "]]>";
|
|
PI_START = "<?";
|
|
PI_END = "?>";
|
|
COMMENT_START = "<!--";
|
|
COMMENT_END = "-->";
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
pretty print styles
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
( $NSGMLS, $NICE, $INDENTED, $INDENTED_C, $WRAPPED, $RECORD1, $RECORD2)= (1..7);
|
|
|
|
</PRE>
|
|
|
|
|
|
|
|
|
|
<P>
|
|
|
|
|
|
empty tag output style
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
( $HTML, $EXPAND)= (1..2);
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="570">Module scoped values: might be changed<DD>
|
|
|
|
|
|
Most of these deal with pretty printing, so the worst that can
|
|
happen is probably that <FONT SIZE="-1">XML</FONT> output does not look right, but is
|
|
still valid and processed identically by <FONT SIZE="-1">XML</FONT> processors.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<TT>$empty_tag_style</TT> can mess up <FONT SIZE="-1">HTML</FONT> bowsers though and changing <TT>$ID</TT>
|
|
would most likely create problems.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$pretty=0; # pretty print style
|
|
$quote='"'; # quote for attributes
|
|
$INDENT= ' '; # indent for indented pretty print
|
|
$empty_tag_style= 0; # how to display empty tags
|
|
$ID # attribute used as an id ('id' by default)
|
|
|
|
</PRE>
|
|
|
|
|
|
<DT id="571">Module scoped values: definitely changed<DD>
|
|
|
|
|
|
These 2 variables are used to replace tags by an index, thus
|
|
saving some space when creating a twig. If they really cause
|
|
you too much trouble, let me know, it is probably possible to
|
|
create either a switch or at least a version of XML::Twig that
|
|
does not perform this optimization.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
%gi2index; # tag => index
|
|
@index2gi; # list of tags
|
|
|
|
</PRE>
|
|
|
|
|
|
</DL>
|
|
<P>
|
|
|
|
If you need to manipulate all those values, you can use the following methods on the
|
|
XML::Twig object:
|
|
<DL COMPACT>
|
|
<DT id="572">global_state<DD>
|
|
|
|
|
|
Return a hashref with all the global variables used by XML::Twig
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The hash has the following fields: <TT>"pretty"</TT>, <TT>"quote"</TT>, <TT>"indent"</TT>,
|
|
<TT>"empty_tag_style"</TT>, <TT>"keep_encoding"</TT>, <TT>"expand_external_entities"</TT>,
|
|
<TT>"output_filter"</TT>, <TT>"output_text_filter"</TT>, <TT>"keep_atts_order"</TT>
|
|
<DT id="573">set_global_state ($state)<DD>
|
|
|
|
|
|
Set the global state, <TT>$state</TT> is a hashref
|
|
<DT id="574">save_global_state<DD>
|
|
|
|
|
|
Save the current global state
|
|
<DT id="575">restore_global_state<DD>
|
|
|
|
|
|
Restore the previously saved (using <TT>"Lsave_global_state"</TT>> state
|
|
</DL>
|
|
<A NAME="lbBH"> </A>
|
|
<H2>TODO</H2>
|
|
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="576"><FONT SIZE="-1">SAX</FONT> handlers<DD>
|
|
|
|
|
|
Allowing XML::Twig to work on top of any <FONT SIZE="-1">SAX</FONT> parser
|
|
<DT id="577">multiple twigs are not well supported<DD>
|
|
|
|
|
|
A number of twig features are just global at the moment. These include
|
|
the <FONT SIZE="-1">ID</FONT> list and the ``tag pool'' (if you use <TT>"change_gi"</TT> then you change the tag
|
|
for <FONT SIZE="-1">ALL</FONT> twigs).
|
|
|
|
|
|
<P>
|
|
|
|
|
|
A future version will try to support this while trying not to be to
|
|
hard on performance (at least when a single twig is used!).
|
|
</DL>
|
|
<A NAME="lbBI"> </A>
|
|
<H2>AUTHOR</H2>
|
|
|
|
|
|
|
|
Michel Rodriguez <<A HREF="mailto:mirod@cpan.org">mirod@cpan.org</A>>
|
|
<A NAME="lbBJ"> </A>
|
|
<H2>LICENSE</H2>
|
|
|
|
|
|
|
|
This library is free software; you can redistribute it and/or modify
|
|
it under the same terms as Perl itself.
|
|
<P>
|
|
|
|
Bug reports should be sent using:
|
|
<I></I><FONT SIZE="-1"><I>RT</I></FONT><I> <<A HREF="http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig">http://rt.cpan.org/NoAuth/Bugs.html?Dist=XML-Twig</A>></I>
|
|
<P>
|
|
|
|
Comments can be sent to <A HREF="mailto:mirod@cpan.org">mirod@cpan.org</A>
|
|
<P>
|
|
|
|
The XML::Twig page is at <<A HREF="http://www.xmltwig.org/xmltwig/">http://www.xmltwig.org/xmltwig/</A>>
|
|
It includes the development version of the module, a slightly better version
|
|
of the documentation, examples, a tutorial and a:
|
|
<I>Processing </I><FONT SIZE="-1"><I>XML</I></FONT><I> efficiently with Perl and XML::Twig:
|
|
<<A HREF="http://www.xmltwig.org/xmltwig/tutorial/index.html">http://www.xmltwig.org/xmltwig/tutorial/index.html</A>></I>
|
|
<A NAME="lbBK"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
|
|
|
|
Complete docs, including a tutorial, examples, an easier to use <FONT SIZE="-1">HTML</FONT> version of
|
|
the docs, a quick reference card and a <FONT SIZE="-1">FAQ</FONT> are available at
|
|
<<A HREF="http://www.xmltwig.org/xmltwig/">http://www.xmltwig.org/xmltwig/</A>>
|
|
<P>
|
|
|
|
git repository at <<A HREF="http://github.com/mirod/xmltwig">http://github.com/mirod/xmltwig</A>>
|
|
<P>
|
|
|
|
XML::Parser, XML::Parser::Expat, XML::XPath, Encode,
|
|
Text::Iconv, Scalar::Utils
|
|
<A NAME="lbBL"> </A>
|
|
<H3>Alternative Modules</H3>
|
|
|
|
|
|
|
|
XML::Twig is not the only XML::Processing module available on <FONT SIZE="-1">CPAN</FONT> (far from
|
|
it!).
|
|
<P>
|
|
|
|
The main alternative I would recommend is XML::LibXML.
|
|
<P>
|
|
|
|
Here is a quick comparison of the 2 modules:
|
|
<P>
|
|
|
|
XML::LibXML, actually <TT>"libxml2"</TT> on which it is based, sticks to the standards,
|
|
and implements a good number of them in a rather strict way: <FONT SIZE="-1">XML,</FONT> XPath, <FONT SIZE="-1">DOM,</FONT>
|
|
RelaxNG, I must be forgetting a couple (XInclude?). It is fast and rather
|
|
frugal memory-wise.
|
|
<P>
|
|
|
|
XML::Twig is older: when I started writing it XML::Parser/expat was the only
|
|
game in town. It implements <FONT SIZE="-1">XML</FONT> and that's about it (plus a subset of XPath,
|
|
and you can use XML::Twig::XPath if you have XML::XPathEngine installed for full
|
|
support). It is slower and requires more memory for a full tree than
|
|
XML::LibXML. On the plus side (yes, there is a plus side!) it lets you process
|
|
a big document in chunks, and thus let you tackle documents that couldn't be
|
|
loaded in memory by XML::LibXML, and it offers a lot (and I mean a <FONT SIZE="-1">LOT</FONT>!) of
|
|
higher-level methods, for everything, from adding structure to ``low-level'' <FONT SIZE="-1">XML,</FONT>
|
|
to shortcuts for <FONT SIZE="-1">XHTML</FONT> conversions and more. It also DWIMs quite a bit, getting
|
|
comments and non-significant whitespaces out of the way but preserving them in
|
|
the output for example. As it does not stick to the <FONT SIZE="-1">DOM,</FONT> is also usually leads
|
|
to shorter code than in XML::LibXML.
|
|
<P>
|
|
|
|
Beyond the pure features of the 2 modules, XML::LibXML seems to be preferred by
|
|
``XML-purists'', while XML::Twig seems to be more used by Perl Hackers who have
|
|
to deal with <FONT SIZE="-1">XML.</FONT> As you have noted, XML::Twig also comes with quite a lot of
|
|
docs, but I am sure if you ask for help about XML::LibXML here or on Perlmonks
|
|
you will get answers.
|
|
<P>
|
|
|
|
Note that it is actually quite hard for me to compare the 2 modules: on one hand
|
|
I know XML::Twig inside-out and I can get it to do pretty much anything I need
|
|
to (or I improve it ;--), while I have a very basic knowledge of XML::LibXML.
|
|
So feature-wise, I'd rather use XML::Twig ;--). On the other hand, I am
|
|
painfully aware of some of the deficiencies, potential bugs and plain ugly code
|
|
that lurk in XML::Twig, even though you are unlikely to be affected by them
|
|
(unless for example you need to change the <FONT SIZE="-1">DTD</FONT> of a document programmatically),
|
|
while I haven't looked much into XML::LibXML so it still looks shinny and clean
|
|
to me.
|
|
<P>
|
|
|
|
That said, if you need to process a document that is too big to fit memory
|
|
and XML::Twig is too slow for you, my reluctant advice would be to use ``bare''
|
|
XML::Parser. It won't be as easy to use as XML::Twig: basically with XML::Twig
|
|
you trade some speed (depending on what you do from a factor 3 to... none)
|
|
for ease-of-use, but it will be easier <FONT SIZE="-1">IMHO</FONT> than using <FONT SIZE="-1">SAX</FONT> (albeit not
|
|
standard), and at this point a <FONT SIZE="-1">LOT</FONT> faster (see the last test in
|
|
<<A HREF="http://www.xmltwig.org/article/simple_benchmark/">http://www.xmltwig.org/article/simple_benchmark/</A>>).
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="578"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="579"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="580"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="581"><A HREF="#lbAE">TOOLS</A><DD>
|
|
<DL>
|
|
<DT id="582"><A HREF="#lbAF">xml_pp - xml pretty-printer</A><DD>
|
|
<DT id="583"><A HREF="#lbAG">xml_grep - grep <FONT SIZE="-1">XML</FONT> files looking for specific elements</A><DD>
|
|
<DT id="584"><A HREF="#lbAH">xml_split - cut a big <FONT SIZE="-1">XML</FONT> file into smaller chunks</A><DD>
|
|
<DT id="585"><A HREF="#lbAI">xml_merge - merge back <FONT SIZE="-1">XML</FONT> files split with xml_split</A><DD>
|
|
<DT id="586"><A HREF="#lbAJ">xml_spellcheck - spellcheck <FONT SIZE="-1">XML</FONT> files</A><DD>
|
|
</DL>
|
|
<DT id="587"><A HREF="#lbAK">XML::Twig 101</A><DD>
|
|
<DL>
|
|
<DT id="588"><A HREF="#lbAL">Loading an <FONT SIZE="-1">XML</FONT> document and processing it</A><DD>
|
|
<DT id="589"><A HREF="#lbAM">Processing an <FONT SIZE="-1">XML</FONT> document chunk by chunk</A><DD>
|
|
<DT id="590"><A HREF="#lbAN">Processing just parts of an <FONT SIZE="-1">XML</FONT> document</A><DD>
|
|
<DT id="591"><A HREF="#lbAO">Building an <FONT SIZE="-1">XML</FONT> filter</A><DD>
|
|
<DT id="592"><A HREF="#lbAP">XML::Twig and various versions of Perl, XML::Parser and expat:</A><DD>
|
|
</DL>
|
|
<DT id="593"><A HREF="#lbAQ">Simplifying XML processing</A><DD>
|
|
<DT id="594"><A HREF="#lbAR">CLASSES</A><DD>
|
|
<DT id="595"><A HREF="#lbAS">METHODS</A><DD>
|
|
<DL>
|
|
<DT id="596"><A HREF="#lbAT">XML::Twig</A><DD>
|
|
<DT id="597"><A HREF="#lbAU">XML::Twig::Elt</A><DD>
|
|
<DT id="598"><A HREF="#lbAV">cond</A><DD>
|
|
<DT id="599"><A HREF="#lbAW">XML::Twig::XPath</A><DD>
|
|
<DT id="600"><A HREF="#lbAX">XML::Twig::XPath::Elt</A><DD>
|
|
<DT id="601"><A HREF="#lbAY">XML::Twig::Entity_list</A><DD>
|
|
<DT id="602"><A HREF="#lbAZ">XML::Twig::Entity</A><DD>
|
|
</DL>
|
|
<DT id="603"><A HREF="#lbBA">EXAMPLES</A><DD>
|
|
<DT id="604"><A HREF="#lbBB">NOTES</A><DD>
|
|
<DL>
|
|
<DT id="605"><A HREF="#lbBC">Subclassing XML::Twig</A><DD>
|
|
<DT id="606"><A HREF="#lbBD"><FONT SIZE="-1">DTD</FONT> Handling</A><DD>
|
|
<DT id="607"><A HREF="#lbBE">Flush</A><DD>
|
|
</DL>
|
|
<DT id="608"><A HREF="#lbBF">BUGS</A><DD>
|
|
<DT id="609"><A HREF="#lbBG">Globals</A><DD>
|
|
<DT id="610"><A HREF="#lbBH">TODO</A><DD>
|
|
<DT id="611"><A HREF="#lbBI">AUTHOR</A><DD>
|
|
<DT id="612"><A HREF="#lbBJ">LICENSE</A><DD>
|
|
<DT id="613"><A HREF="#lbBK">SEE ALSO</A><DD>
|
|
<DL>
|
|
<DT id="614"><A HREF="#lbBL">Alternative Modules</A><DD>
|
|
</DL>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:06:01 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|