man-pages/man3/HTML::Tree.3pm.html
2021-03-31 01:06:50 +01:00

335 lines
8.2 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of HTML::Tree</TITLE>
</HEAD><BODY>
<H1>HTML::Tree</H1>
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2019-01-13<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
HTML::Tree - build and scan parse-trees of HTML
<A NAME="lbAC">&nbsp;</A>
<H2>VERSION</H2>
This document describes version 5.07 of
HTML::Tree, released August 31, 2017
as part of HTML-Tree.
<A NAME="lbAD">&nbsp;</A>
<H2>SYNOPSIS</H2>
<PRE>
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder-&gt;new();
$tree-&gt;parse_file($filename);
# Then do something with the tree, using HTML::Element
# methods -- for example:
$tree-&gt;dump
# Finally:
$tree-&gt;delete;
</PRE>
<A NAME="lbAE">&nbsp;</A>
<H2>DESCRIPTION</H2>
HTML-Tree is a suite of Perl modules for making parse trees out of
<FONT SIZE="-1">HTML</FONT> source. It consists of mainly two modules, whose documentation
you should refer to: HTML::TreeBuilder
and HTML::Element.
<P>
HTML::TreeBuilder is the module that builds the parse trees. (It uses
HTML::Parser to do the work of breaking the <FONT SIZE="-1">HTML</FONT> up into tokens.)
<P>
The tree that TreeBuilder builds for you is made up of objects of the
class HTML::Element.
<P>
If you find that you do not properly understand the documentation
for HTML::TreeBuilder and HTML::Element, it may be because you are
unfamiliar with tree-shaped data structures, or with object-oriented
modules in general. Sean Burke has written some articles for
<I>The Perl Journal</I> (<TT>&quot;<A HREF="http://www.tpj.com">www.tpj.com</A>&quot;</TT>) that seek to provide that background.
The full text of those articles is contained in this distribution, as:
<DL COMPACT>
<DT id="1">HTML::Tree::AboutObjects<DD>
``User's View of Object-Oriented Modules'' from <FONT SIZE="-1">TPJ17.</FONT>
<DT id="2">HTML::Tree::AboutTrees<DD>
``Trees'' from <FONT SIZE="-1">TPJ18</FONT>
<DT id="3">HTML::Tree::Scanning<DD>
``Scanning <FONT SIZE="-1">HTML''</FONT> from <FONT SIZE="-1">TPJ19</FONT>
</DL>
<P>
Readers already familiar with object-oriented modules and tree-shaped
data structures should read just the last article. Readers without
that background should read the first, then the second, and then the
third.
<A NAME="lbAF">&nbsp;</A>
<H2>METHODS</H2>
All these methods simply redirect to the corresponding method in
HTML::TreeBuilder. It's more efficient to use HTML::TreeBuilder
directly, and skip loading HTML::Tree at all.
<A NAME="lbAG">&nbsp;</A>
<H3>new</H3>
Redirects to ``new'' in HTML::TreeBuilder.
<A NAME="lbAH">&nbsp;</A>
<H3>new_from_file</H3>
Redirects to ``new_from_file'' in HTML::TreeBuilder.
<A NAME="lbAI">&nbsp;</A>
<H3>new_from_content</H3>
Redirects to ``new_from_content'' in HTML::TreeBuilder.
<A NAME="lbAJ">&nbsp;</A>
<H3>new_from_url</H3>
Redirects to ``new_from_url'' in HTML::TreeBuilder.
<A NAME="lbAK">&nbsp;</A>
<H2>SUPPORT</H2>
You can find documentation for this module with the perldoc command.
<P>
<PRE>
perldoc HTML::Tree
You can also look for information at:
</PRE>
<DL COMPACT>
<DT id="4">&bull;<DD>
AnnoCPAN: Annotated <FONT SIZE="-1">CPAN</FONT> documentation
<P>
&lt;<A HREF="http://annocpan.org/dist/HTML-Tree">http://annocpan.org/dist/HTML-Tree</A>&gt;
<DT id="5">&bull;<DD>
<FONT SIZE="-1">CPAN</FONT> Ratings
<P>
&lt;<A HREF="http://cpanratings.perl.org/d/HTML-Tree">http://cpanratings.perl.org/d/HTML-Tree</A>&gt;
<DT id="6">&bull;<DD>
<FONT SIZE="-1">RT: CPAN</FONT>'s request tracker
<P>
&lt;<A HREF="http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-Tree">http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-Tree</A>&gt;
<DT id="7">&bull;<DD>
Search <FONT SIZE="-1">CPAN</FONT>
<P>
&lt;<A HREF="http://search.cpan.org/dist/HTML-Tree">http://search.cpan.org/dist/HTML-Tree</A>&gt;
<DT id="8">&bull;<DD>
Stack Overflow
<P>
&lt;<A HREF="http://stackoverflow.com/questions/tagged/html-tree">http://stackoverflow.com/questions/tagged/html-tree</A>&gt;
<P>
If you have a question about how to use HTML-Tree, Stack Overflow is
the place to ask it. Make sure you tag it both <TT>&quot;perl&quot;</TT> and <TT>&quot;html-tree&quot;</TT>.
</DL>
<A NAME="lbAL">&nbsp;</A>
<H2>SEE ALSO</H2>
HTML::TreeBuilder, HTML::Element, HTML::Tagset,
HTML::Parser, HTML::DOMbo
<P>
The book <I>Perl &amp; </I><FONT SIZE="-1"><I>LWP</I></FONT><I></I> by Sean M. Burke published by
O'Reilly and Associates, 2002. <FONT SIZE="-1">ISBN: 0-596-00178-9</FONT>
<P>
It has several chapters to do with <FONT SIZE="-1">HTML</FONT> processing in general,
and HTML-Tree specifically. There's more info at:
<P>
<PRE>
<A HREF="http://www.oreilly.com/catalog/perllwp/">http://www.oreilly.com/catalog/perllwp/</A>
<A HREF="http://www.amazon.com/exec/obidos/ASIN/0596001789">http://www.amazon.com/exec/obidos/ASIN/0596001789</A>
</PRE>
<A NAME="lbAM">&nbsp;</A>
<H2>SOURCE REPOSITORY</H2>
HTML-Tree is now maintained using Git. The main public repository is
&lt;<A HREF="https://github.com/kentfredric/HTML-Tree">https://github.com/kentfredric/HTML-Tree</A>&gt;.
<P>
The best way to send a patch is to make a pull request there.
<A NAME="lbAN">&nbsp;</A>
<H2>ACKNOWLEDGEMENTS</H2>
Thanks to Gisle Aas, Sean Burke and Andy Lester for their original work.
<P>
Thanks to Chicago Perl Mongers (<A HREF="http://chicago.pm.org)">http://chicago.pm.org)</A> for their
patches submitted to HTML::Tree as part of the Phalanx project
(<A HREF="http://qa.perl.org/phalanx).">http://qa.perl.org/phalanx).</A>
<P>
Thanks to the following people for additional patches and documentation:
Terrence Brannon, Gordon Lack, Chris Madsen and Ricardo Signes.
<A NAME="lbAO">&nbsp;</A>
<H2>AUTHOR</H2>
Current maintainers:
<DL COMPACT>
<DT id="9">&bull;<DD>
Christopher J. Madsen <TT>&quot;&lt;perl&nbsp;AT&nbsp;cjmweb.net&gt;&quot;</TT>
<DT id="10">&bull;<DD>
Jeff Fearn <TT>&quot;&lt;jfearn&nbsp;AT&nbsp;cpan.org&gt;&quot;</TT>
</DL>
<P>
Original HTML-Tree author:
<DL COMPACT>
<DT id="11">&bull;<DD>
Gisle Aas
</DL>
<P>
Former maintainers:
<DL COMPACT>
<DT id="12">&bull;<DD>
Sean M. Burke
<DT id="13">&bull;<DD>
Andy Lester
<DT id="14">&bull;<DD>
Pete Krawczyk <TT>&quot;&lt;petek&nbsp;AT&nbsp;cpan.org&gt;&quot;</TT>
</DL>
<P>
You can follow or contribute to HTML-Tree's development at
&lt;<A HREF="https://github.com/kentfredric/HTML-Tree">https://github.com/kentfredric/HTML-Tree</A>&gt;.
<A NAME="lbAP">&nbsp;</A>
<H2>COPYRIGHT AND LICENSE</H2>
Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke,
2005 Andy Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn,
2012 Christopher J. Madsen.
(Except the articles contained in HTML::Tree::AboutObjects,
HTML::Tree::AboutTrees, and HTML::Tree::Scanning, which are all
copyright 2000 The Perl Journal.)
<P>
Except for those three <FONT SIZE="-1">TPJ</FONT> articles, the whole HTML-Tree distribution,
of which this file is a part, is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
<P>
Those three <FONT SIZE="-1">TPJ</FONT> articles may be distributed under the same terms as
Perl itself.
<P>
The programs in this library are distributed in the hope that they
will be useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular purpose.
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="15"><A HREF="#lbAB">NAME</A><DD>
<DT id="16"><A HREF="#lbAC">VERSION</A><DD>
<DT id="17"><A HREF="#lbAD">SYNOPSIS</A><DD>
<DT id="18"><A HREF="#lbAE">DESCRIPTION</A><DD>
<DT id="19"><A HREF="#lbAF">METHODS</A><DD>
<DL>
<DT id="20"><A HREF="#lbAG">new</A><DD>
<DT id="21"><A HREF="#lbAH">new_from_file</A><DD>
<DT id="22"><A HREF="#lbAI">new_from_content</A><DD>
<DT id="23"><A HREF="#lbAJ">new_from_url</A><DD>
</DL>
<DT id="24"><A HREF="#lbAK">SUPPORT</A><DD>
<DT id="25"><A HREF="#lbAL">SEE ALSO</A><DD>
<DT id="26"><A HREF="#lbAM">SOURCE REPOSITORY</A><DD>
<DT id="27"><A HREF="#lbAN">ACKNOWLEDGEMENTS</A><DD>
<DT id="28"><A HREF="#lbAO">AUTHOR</A><DD>
<DT id="29"><A HREF="#lbAP">COPYRIGHT AND LICENSE</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:45 GMT, March 31, 2021
</BODY>
</HTML>