man-pages/man3/HTML::HeadParser.3pm.html
2021-03-31 01:06:50 +01:00

219 lines
6.0 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of HTML::HeadParser</TITLE>
</HEAD><BODY>
<H1>HTML::HeadParser</H1>
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-18<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
HTML::HeadParser - Parse &lt;HEAD&gt; section of a HTML document
<A NAME="lbAC">&nbsp;</A>
<H2>SYNOPSIS</H2>
<PRE>
require HTML::HeadParser;
$p = HTML::HeadParser-&gt;new;
$p-&gt;parse($text) and print &quot;not finished&quot;;
$p-&gt;header('Title') # to access &lt;title&gt;....&lt;/title&gt;
$p-&gt;header('Content-Base') # to access &lt;base href=&quot;<A HREF="http://...">http://...</A>&quot;&gt;
$p-&gt;header('Foo') # to access &lt;meta http-equiv=&quot;Foo&quot; content=&quot;...&quot;&gt;
$p-&gt;header('X-Meta-Author') # to access &lt;meta name=&quot;author&quot; content=&quot;...&quot;&gt;
$p-&gt;header('X-Meta-Charset') # to access &lt;meta charset=&quot;...&quot;&gt;
</PRE>
<A NAME="lbAD">&nbsp;</A>
<H2>DESCRIPTION</H2>
The <TT>&quot;HTML::HeadParser&quot;</TT> is a specialized (and lightweight)
<TT>&quot;HTML::Parser&quot;</TT> that will only parse the &lt;<FONT SIZE="-1">HEAD</FONT>&gt;...&lt;/HEAD&gt;
section of an <FONT SIZE="-1">HTML</FONT> document. The <B>parse()</B> method
will return a <FONT SIZE="-1">FALSE</FONT> value as soon as some &lt;<FONT SIZE="-1">BODY</FONT>&gt; element or body
text are found, and should not be called again after this.
<P>
Note that the <TT>&quot;HTML::HeadParser&quot;</TT> might get confused if raw undecoded
<FONT SIZE="-1">UTF-8</FONT> is passed to the <B>parse()</B> method. Make sure the strings are
properly decoded before passing them on.
<P>
The <TT>&quot;HTML::HeadParser&quot;</TT> keeps a reference to a header object, and the
parser will update this header object as the various elements of the
&lt;<FONT SIZE="-1">HEAD</FONT>&gt; section of the <FONT SIZE="-1">HTML</FONT> document are recognized. The following
header fields are affected:
<DL COMPACT>
<DT id="1">Content-Base:<DD>
The <I>Content-Base</I> header is initialized from the &lt;base
href=``...''&gt; element.
<DT id="2">Title:<DD>
The <I>Title</I> header is initialized from the &lt;title&gt;...&lt;/title&gt;
element.
<DT id="3">Isindex:<DD>
The <I>Isindex</I> header will be added if there is a &lt;isindex&gt;
element in the &lt;head&gt;. The header value is initialized from the
<I>prompt</I> attribute if it is present. If no <I>prompt</I> attribute is
given it will have '?' as the value.
<DT id="4">X-Meta-Foo:<DD>
All &lt;meta&gt; elements containing a <TT>&quot;name&quot;</TT> attribute will result in
headers using the prefix <TT>&quot;X-Meta-&quot;</TT> appended with the value of the
<TT>&quot;name&quot;</TT> attribute as the name of the header, and the value of the
<TT>&quot;content&quot;</TT> attribute as the pushed header value.
<P>
&lt;meta&gt; elements containing a <TT>&quot;http-equiv&quot;</TT> attribute will result
in headers as in above, but without the <TT>&quot;X-Meta-&quot;</TT> prefix in the
header name.
<P>
&lt;meta&gt; elements containing a <TT>&quot;charset&quot;</TT> attribute will result in
an <TT>&quot;X-Meta-Charset&quot;</TT> header, using the value of the <TT>&quot;charset&quot;</TT>
attribute as the pushed header value.
<P>
The ':' character can't be represented in header field names, so
if the meta element contains this char it's substituted with '-'
before forming the field name.
</DL>
<A NAME="lbAE">&nbsp;</A>
<H2>METHODS</H2>
The following methods (in addition to those provided by the
superclass) are available:
<DL COMPACT>
<DT id="5">$hp = HTML::HeadParser-&gt;new<DD>
<DT id="6">$hp = HTML::HeadParser-&gt;new( $header )<DD>
The object constructor. The optional <TT>$header</TT> argument should be a
reference to an object that implement the <B>header()</B> and <B>push_header()</B>
methods as defined by the <TT>&quot;HTTP::Headers&quot;</TT> class. Normally it will be
of some class that is a or delegates to the <TT>&quot;HTTP::Headers&quot;</TT> class.
<P>
If no <TT>$header</TT> is given <TT>&quot;HTML::HeadParser&quot;</TT> will create an
<TT>&quot;HTTP::Headers&quot;</TT> object by itself (initially empty).
<DT id="7">$hp-&gt;header;<DD>
Returns a reference to the header object.
<DT id="8">$hp-&gt;header( $key )<DD>
Returns a header value. It is just a shorter way to write
<TT>&quot;$hp-&gt;header-&gt;header($key)&quot;</TT>.
</DL>
<A NAME="lbAF">&nbsp;</A>
<H2>EXAMPLE</H2>
<PRE>
$h = HTTP::Headers-&gt;new;
$p = HTML::HeadParser-&gt;new($h);
$p-&gt;parse(&lt;&lt;EOT);
&lt;title&gt;Stupid example&lt;/title&gt;
&lt;base href=&quot;<A HREF="http://www.linpro.no/lwp/">http://www.linpro.no/lwp/</A>&quot;&gt;
Normal text starts here.
EOT
undef $p;
print $h-&gt;title; # should print &quot;Stupid example&quot;
</PRE>
<A NAME="lbAG">&nbsp;</A>
<H2>SEE ALSO</H2>
HTML::Parser, HTTP::Headers
<P>
The <TT>&quot;HTTP::Headers&quot;</TT> class is distributed as part of the
<I>libwww-perl</I> package. If you don't have that distribution installed
you need to provide the <TT>$header</TT> argument to the <TT>&quot;HTML::HeadParser&quot;</TT>
constructor with your own object that implements the documented
protocol.
<A NAME="lbAH">&nbsp;</A>
<H2>COPYRIGHT</H2>
Copyright 1996-2001 Gisle Aas. All rights reserved.
<P>
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="9"><A HREF="#lbAB">NAME</A><DD>
<DT id="10"><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT id="11"><A HREF="#lbAD">DESCRIPTION</A><DD>
<DT id="12"><A HREF="#lbAE">METHODS</A><DD>
<DT id="13"><A HREF="#lbAF">EXAMPLE</A><DD>
<DT id="14"><A HREF="#lbAG">SEE ALSO</A><DD>
<DT id="15"><A HREF="#lbAH">COPYRIGHT</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:45 GMT, March 31, 2021
</BODY>
</HTML>