215 lines
5.1 KiB
HTML
215 lines
5.1 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of HTML::Parse</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>HTML::Parse</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2019-01-13<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
HTML::Parse - Deprecated, a wrapper around HTML::TreeBuilder
|
|
<A NAME="lbAC"> </A>
|
|
<H2>VERSION</H2>
|
|
|
|
|
|
|
|
This document describes version 5.07 of
|
|
HTML::Parse, released August 31, 2017
|
|
as part of HTML-Tree.
|
|
<A NAME="lbAD"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
See the documentation for HTML::TreeBuilder
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAE"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
Disclaimer: This module is provided only for backwards compatibility
|
|
with earlier versions of this library. New code should <I>not</I> use
|
|
this module, and should really use the HTML::Parser and
|
|
HTML::TreeBuilder modules directly, instead.
|
|
<P>
|
|
|
|
The <TT>"HTML::Parse"</TT> module provides functions to parse <FONT SIZE="-1">HTML</FONT> documents.
|
|
There are two functions exported by this module:
|
|
<DL COMPACT>
|
|
<DT id="1">parse_html($html) or parse_html($html, $obj)<DD>
|
|
|
|
|
|
|
|
|
|
This function is really just a synonym for <TT>$obj</TT>->parse($html) and <TT>$obj</TT>
|
|
is assumed to be a subclass of <TT>"HTML::Parser"</TT>. Refer to
|
|
HTML::Parser for more documentation.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If <TT>$obj</TT> is not specified, the <TT>$obj</TT> will default to an internally
|
|
created new <TT>"HTML::TreeBuilder"</TT> object configured with <B>strict_comment()</B>
|
|
turned on. That class implements a parser that builds (and is) a <FONT SIZE="-1">HTML</FONT>
|
|
syntax tree with HTML::Element objects as nodes.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The return value from <B>parse_html()</B> is <TT>$obj</TT>.
|
|
<DT id="2">parse_htmlfile($file, [$obj])<DD>
|
|
|
|
|
|
Same as <B>parse_html()</B>, but pulls the <FONT SIZE="-1">HTML</FONT> to parse, from the named file.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Returns <TT>"undef"</TT> if the file could not be opened, or <TT>$obj</TT> otherwise.
|
|
</DL>
|
|
<P>
|
|
|
|
When a <TT>"HTML::TreeBuilder"</TT> object is created, the following variables
|
|
control how parsing takes place:
|
|
<DL COMPACT>
|
|
<DT id="3">$HTML::Parse::IMPLICIT_TAGS<DD>
|
|
|
|
|
|
|
|
|
|
Setting this variable to true will instruct the parser to try to
|
|
deduce implicit elements and implicit end tags. If this variable is
|
|
false you get a parse tree that just reflects the text as it stands.
|
|
Might be useful for quick & dirty parsing. Default is true.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
Implicit elements have the <B>implicit()</B> attribute set.
|
|
<DT id="4">$HTML::Parse::IGNORE_UNKNOWN<DD>
|
|
|
|
|
|
|
|
|
|
This variable contols whether unknown tags should be represented as
|
|
elements in the parse tree. Default is true.
|
|
<DT id="5">$HTML::Parse::IGNORE_TEXT<DD>
|
|
|
|
|
|
|
|
|
|
Do not represent the text content of elements. This saves space if
|
|
all you want is to examine the structure of the document. Default is
|
|
false.
|
|
<DT id="6">$HTML::Parse::WARN<DD>
|
|
|
|
|
|
|
|
|
|
Call <B>warn()</B> with an appropriate message for syntax errors. Default is
|
|
false.
|
|
</DL>
|
|
<A NAME="lbAF"> </A>
|
|
<H2>REMEMBER!</H2>
|
|
|
|
|
|
|
|
HTML::TreeBuilder objects should be explicitly destroyed when you're
|
|
finished with them. See HTML::TreeBuilder.
|
|
<A NAME="lbAG"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
|
|
|
|
HTML::Parser, HTML::TreeBuilder, HTML::Element
|
|
<A NAME="lbAH"> </A>
|
|
<H2>AUTHOR</H2>
|
|
|
|
|
|
|
|
Current maintainers:
|
|
<DL COMPACT>
|
|
<DT id="7">•<DD>
|
|
Christopher J. Madsen <TT>"<perl AT cjmweb.net>"</TT>
|
|
<DT id="8">•<DD>
|
|
Jeff Fearn <TT>"<jfearn AT cpan.org>"</TT>
|
|
</DL>
|
|
<P>
|
|
|
|
Original HTML-Tree author:
|
|
<DL COMPACT>
|
|
<DT id="9">•<DD>
|
|
Gisle Aas
|
|
</DL>
|
|
<P>
|
|
|
|
Former maintainers:
|
|
<DL COMPACT>
|
|
<DT id="10">•<DD>
|
|
Sean M. Burke
|
|
<DT id="11">•<DD>
|
|
Andy Lester
|
|
<DT id="12">•<DD>
|
|
Pete Krawczyk <TT>"<petek AT cpan.org>"</TT>
|
|
</DL>
|
|
<P>
|
|
|
|
You can follow or contribute to HTML-Tree's development at
|
|
<<A HREF="https://github.com/kentfredric/HTML-Tree">https://github.com/kentfredric/HTML-Tree</A>>.
|
|
<A NAME="lbAI"> </A>
|
|
<H2>COPYRIGHT AND LICENSE</H2>
|
|
|
|
|
|
|
|
Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke,
|
|
2005 Andy Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn,
|
|
2012 Christopher J. Madsen.
|
|
<P>
|
|
|
|
This library is free software; you can redistribute it and/or
|
|
modify it under the same terms as Perl itself.
|
|
<P>
|
|
|
|
The programs in this library are distributed in the hope that they
|
|
will be useful, but without any warranty; without even the implied
|
|
warranty of merchantability or fitness for a particular purpose.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="13"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="14"><A HREF="#lbAC">VERSION</A><DD>
|
|
<DT id="15"><A HREF="#lbAD">SYNOPSIS</A><DD>
|
|
<DT id="16"><A HREF="#lbAE">DESCRIPTION</A><DD>
|
|
<DT id="17"><A HREF="#lbAF">REMEMBER!</A><DD>
|
|
<DT id="18"><A HREF="#lbAG">SEE ALSO</A><DD>
|
|
<DT id="19"><A HREF="#lbAH">AUTHOR</A><DD>
|
|
<DT id="20"><A HREF="#lbAI">COPYRIGHT AND LICENSE</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:45 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|