335 lines
8.2 KiB
HTML
335 lines
8.2 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of HTML::Tree</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>HTML::Tree</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2019-01-13<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
HTML::Tree - build and scan parse-trees of HTML
|
|
<A NAME="lbAC"> </A>
|
|
<H2>VERSION</H2>
|
|
|
|
|
|
|
|
This document describes version 5.07 of
|
|
HTML::Tree, released August 31, 2017
|
|
as part of HTML-Tree.
|
|
<A NAME="lbAD"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
use HTML::TreeBuilder;
|
|
my $tree = HTML::TreeBuilder->new();
|
|
$tree->parse_file($filename);
|
|
|
|
# Then do something with the tree, using HTML::Element
|
|
# methods -- for example:
|
|
|
|
$tree->dump
|
|
|
|
# Finally:
|
|
|
|
$tree->delete;
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAE"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
HTML-Tree is a suite of Perl modules for making parse trees out of
|
|
<FONT SIZE="-1">HTML</FONT> source. It consists of mainly two modules, whose documentation
|
|
you should refer to: HTML::TreeBuilder
|
|
and HTML::Element.
|
|
<P>
|
|
|
|
HTML::TreeBuilder is the module that builds the parse trees. (It uses
|
|
HTML::Parser to do the work of breaking the <FONT SIZE="-1">HTML</FONT> up into tokens.)
|
|
<P>
|
|
|
|
The tree that TreeBuilder builds for you is made up of objects of the
|
|
class HTML::Element.
|
|
<P>
|
|
|
|
If you find that you do not properly understand the documentation
|
|
for HTML::TreeBuilder and HTML::Element, it may be because you are
|
|
unfamiliar with tree-shaped data structures, or with object-oriented
|
|
modules in general. Sean Burke has written some articles for
|
|
<I>The Perl Journal</I> (<TT>"<A HREF="http://www.tpj.com">www.tpj.com</A>"</TT>) that seek to provide that background.
|
|
The full text of those articles is contained in this distribution, as:
|
|
<DL COMPACT>
|
|
<DT id="1">HTML::Tree::AboutObjects<DD>
|
|
|
|
|
|
``User's View of Object-Oriented Modules'' from <FONT SIZE="-1">TPJ17.</FONT>
|
|
<DT id="2">HTML::Tree::AboutTrees<DD>
|
|
|
|
|
|
``Trees'' from <FONT SIZE="-1">TPJ18</FONT>
|
|
<DT id="3">HTML::Tree::Scanning<DD>
|
|
|
|
|
|
``Scanning <FONT SIZE="-1">HTML''</FONT> from <FONT SIZE="-1">TPJ19</FONT>
|
|
</DL>
|
|
<P>
|
|
|
|
Readers already familiar with object-oriented modules and tree-shaped
|
|
data structures should read just the last article. Readers without
|
|
that background should read the first, then the second, and then the
|
|
third.
|
|
<A NAME="lbAF"> </A>
|
|
<H2>METHODS</H2>
|
|
|
|
|
|
|
|
All these methods simply redirect to the corresponding method in
|
|
HTML::TreeBuilder. It's more efficient to use HTML::TreeBuilder
|
|
directly, and skip loading HTML::Tree at all.
|
|
<A NAME="lbAG"> </A>
|
|
<H3>new</H3>
|
|
|
|
|
|
|
|
Redirects to ``new'' in HTML::TreeBuilder.
|
|
<A NAME="lbAH"> </A>
|
|
<H3>new_from_file</H3>
|
|
|
|
|
|
|
|
Redirects to ``new_from_file'' in HTML::TreeBuilder.
|
|
<A NAME="lbAI"> </A>
|
|
<H3>new_from_content</H3>
|
|
|
|
|
|
|
|
Redirects to ``new_from_content'' in HTML::TreeBuilder.
|
|
<A NAME="lbAJ"> </A>
|
|
<H3>new_from_url</H3>
|
|
|
|
|
|
|
|
Redirects to ``new_from_url'' in HTML::TreeBuilder.
|
|
<A NAME="lbAK"> </A>
|
|
<H2>SUPPORT</H2>
|
|
|
|
|
|
|
|
You can find documentation for this module with the perldoc command.
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
perldoc HTML::Tree
|
|
|
|
You can also look for information at:
|
|
|
|
</PRE>
|
|
|
|
|
|
<DL COMPACT>
|
|
<DT id="4">•<DD>
|
|
AnnoCPAN: Annotated <FONT SIZE="-1">CPAN</FONT> documentation
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<<A HREF="http://annocpan.org/dist/HTML-Tree">http://annocpan.org/dist/HTML-Tree</A>>
|
|
<DT id="5">•<DD>
|
|
<FONT SIZE="-1">CPAN</FONT> Ratings
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<<A HREF="http://cpanratings.perl.org/d/HTML-Tree">http://cpanratings.perl.org/d/HTML-Tree</A>>
|
|
<DT id="6">•<DD>
|
|
<FONT SIZE="-1">RT: CPAN</FONT>'s request tracker
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<<A HREF="http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-Tree">http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-Tree</A>>
|
|
<DT id="7">•<DD>
|
|
Search <FONT SIZE="-1">CPAN</FONT>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<<A HREF="http://search.cpan.org/dist/HTML-Tree">http://search.cpan.org/dist/HTML-Tree</A>>
|
|
<DT id="8">•<DD>
|
|
Stack Overflow
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<<A HREF="http://stackoverflow.com/questions/tagged/html-tree">http://stackoverflow.com/questions/tagged/html-tree</A>>
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If you have a question about how to use HTML-Tree, Stack Overflow is
|
|
the place to ask it. Make sure you tag it both <TT>"perl"</TT> and <TT>"html-tree"</TT>.
|
|
</DL>
|
|
<A NAME="lbAL"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
|
|
|
|
HTML::TreeBuilder, HTML::Element, HTML::Tagset,
|
|
HTML::Parser, HTML::DOMbo
|
|
<P>
|
|
|
|
The book <I>Perl & </I><FONT SIZE="-1"><I>LWP</I></FONT><I></I> by Sean M. Burke published by
|
|
O'Reilly and Associates, 2002. <FONT SIZE="-1">ISBN: 0-596-00178-9</FONT>
|
|
<P>
|
|
|
|
It has several chapters to do with <FONT SIZE="-1">HTML</FONT> processing in general,
|
|
and HTML-Tree specifically. There's more info at:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
<A HREF="http://www.oreilly.com/catalog/perllwp/">http://www.oreilly.com/catalog/perllwp/</A>
|
|
|
|
<A HREF="http://www.amazon.com/exec/obidos/ASIN/0596001789">http://www.amazon.com/exec/obidos/ASIN/0596001789</A>
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAM"> </A>
|
|
<H2>SOURCE REPOSITORY</H2>
|
|
|
|
|
|
|
|
HTML-Tree is now maintained using Git. The main public repository is
|
|
<<A HREF="https://github.com/kentfredric/HTML-Tree">https://github.com/kentfredric/HTML-Tree</A>>.
|
|
<P>
|
|
|
|
The best way to send a patch is to make a pull request there.
|
|
<A NAME="lbAN"> </A>
|
|
<H2>ACKNOWLEDGEMENTS</H2>
|
|
|
|
|
|
|
|
Thanks to Gisle Aas, Sean Burke and Andy Lester for their original work.
|
|
<P>
|
|
|
|
Thanks to Chicago Perl Mongers (<A HREF="http://chicago.pm.org)">http://chicago.pm.org)</A> for their
|
|
patches submitted to HTML::Tree as part of the Phalanx project
|
|
(<A HREF="http://qa.perl.org/phalanx).">http://qa.perl.org/phalanx).</A>
|
|
<P>
|
|
|
|
Thanks to the following people for additional patches and documentation:
|
|
Terrence Brannon, Gordon Lack, Chris Madsen and Ricardo Signes.
|
|
<A NAME="lbAO"> </A>
|
|
<H2>AUTHOR</H2>
|
|
|
|
|
|
|
|
Current maintainers:
|
|
<DL COMPACT>
|
|
<DT id="9">•<DD>
|
|
Christopher J. Madsen <TT>"<perl AT cjmweb.net>"</TT>
|
|
<DT id="10">•<DD>
|
|
Jeff Fearn <TT>"<jfearn AT cpan.org>"</TT>
|
|
</DL>
|
|
<P>
|
|
|
|
Original HTML-Tree author:
|
|
<DL COMPACT>
|
|
<DT id="11">•<DD>
|
|
Gisle Aas
|
|
</DL>
|
|
<P>
|
|
|
|
Former maintainers:
|
|
<DL COMPACT>
|
|
<DT id="12">•<DD>
|
|
Sean M. Burke
|
|
<DT id="13">•<DD>
|
|
Andy Lester
|
|
<DT id="14">•<DD>
|
|
Pete Krawczyk <TT>"<petek AT cpan.org>"</TT>
|
|
</DL>
|
|
<P>
|
|
|
|
You can follow or contribute to HTML-Tree's development at
|
|
<<A HREF="https://github.com/kentfredric/HTML-Tree">https://github.com/kentfredric/HTML-Tree</A>>.
|
|
<A NAME="lbAP"> </A>
|
|
<H2>COPYRIGHT AND LICENSE</H2>
|
|
|
|
|
|
|
|
Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke,
|
|
2005 Andy Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn,
|
|
2012 Christopher J. Madsen.
|
|
(Except the articles contained in HTML::Tree::AboutObjects,
|
|
HTML::Tree::AboutTrees, and HTML::Tree::Scanning, which are all
|
|
copyright 2000 The Perl Journal.)
|
|
<P>
|
|
|
|
Except for those three <FONT SIZE="-1">TPJ</FONT> articles, the whole HTML-Tree distribution,
|
|
of which this file is a part, is free software; you can redistribute
|
|
it and/or modify it under the same terms as Perl itself.
|
|
<P>
|
|
|
|
Those three <FONT SIZE="-1">TPJ</FONT> articles may be distributed under the same terms as
|
|
Perl itself.
|
|
<P>
|
|
|
|
The programs in this library are distributed in the hope that they
|
|
will be useful, but without any warranty; without even the implied
|
|
warranty of merchantability or fitness for a particular purpose.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="15"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="16"><A HREF="#lbAC">VERSION</A><DD>
|
|
<DT id="17"><A HREF="#lbAD">SYNOPSIS</A><DD>
|
|
<DT id="18"><A HREF="#lbAE">DESCRIPTION</A><DD>
|
|
<DT id="19"><A HREF="#lbAF">METHODS</A><DD>
|
|
<DL>
|
|
<DT id="20"><A HREF="#lbAG">new</A><DD>
|
|
<DT id="21"><A HREF="#lbAH">new_from_file</A><DD>
|
|
<DT id="22"><A HREF="#lbAI">new_from_content</A><DD>
|
|
<DT id="23"><A HREF="#lbAJ">new_from_url</A><DD>
|
|
</DL>
|
|
<DT id="24"><A HREF="#lbAK">SUPPORT</A><DD>
|
|
<DT id="25"><A HREF="#lbAL">SEE ALSO</A><DD>
|
|
<DT id="26"><A HREF="#lbAM">SOURCE REPOSITORY</A><DD>
|
|
<DT id="27"><A HREF="#lbAN">ACKNOWLEDGEMENTS</A><DD>
|
|
<DT id="28"><A HREF="#lbAO">AUTHOR</A><DD>
|
|
<DT id="29"><A HREF="#lbAP">COPYRIGHT AND LICENSE</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:45 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|