219 lines
6.0 KiB
HTML
219 lines
6.0 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of HTML::HeadParser</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>HTML::HeadParser</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-18<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
HTML::HeadParser - Parse <HEAD> section of a HTML document
|
|
<A NAME="lbAC"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
require HTML::HeadParser;
|
|
$p = HTML::HeadParser->new;
|
|
$p->parse($text) and print "not finished";
|
|
|
|
$p->header('Title') # to access <title>....</title>
|
|
$p->header('Content-Base') # to access <base href="<A HREF="http://...">http://...</A>">
|
|
$p->header('Foo') # to access <meta http-equiv="Foo" content="...">
|
|
$p->header('X-Meta-Author') # to access <meta name="author" content="...">
|
|
$p->header('X-Meta-Charset') # to access <meta charset="...">
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAD"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
The <TT>"HTML::HeadParser"</TT> is a specialized (and lightweight)
|
|
<TT>"HTML::Parser"</TT> that will only parse the <<FONT SIZE="-1">HEAD</FONT>>...</HEAD>
|
|
section of an <FONT SIZE="-1">HTML</FONT> document. The <B>parse()</B> method
|
|
will return a <FONT SIZE="-1">FALSE</FONT> value as soon as some <<FONT SIZE="-1">BODY</FONT>> element or body
|
|
text are found, and should not be called again after this.
|
|
<P>
|
|
|
|
Note that the <TT>"HTML::HeadParser"</TT> might get confused if raw undecoded
|
|
<FONT SIZE="-1">UTF-8</FONT> is passed to the <B>parse()</B> method. Make sure the strings are
|
|
properly decoded before passing them on.
|
|
<P>
|
|
|
|
The <TT>"HTML::HeadParser"</TT> keeps a reference to a header object, and the
|
|
parser will update this header object as the various elements of the
|
|
<<FONT SIZE="-1">HEAD</FONT>> section of the <FONT SIZE="-1">HTML</FONT> document are recognized. The following
|
|
header fields are affected:
|
|
<DL COMPACT>
|
|
<DT id="1">Content-Base:<DD>
|
|
|
|
|
|
The <I>Content-Base</I> header is initialized from the <base
|
|
href=``...''> element.
|
|
<DT id="2">Title:<DD>
|
|
|
|
|
|
The <I>Title</I> header is initialized from the <title>...</title>
|
|
element.
|
|
<DT id="3">Isindex:<DD>
|
|
|
|
|
|
The <I>Isindex</I> header will be added if there is a <isindex>
|
|
element in the <head>. The header value is initialized from the
|
|
<I>prompt</I> attribute if it is present. If no <I>prompt</I> attribute is
|
|
given it will have '?' as the value.
|
|
<DT id="4">X-Meta-Foo:<DD>
|
|
|
|
|
|
All <meta> elements containing a <TT>"name"</TT> attribute will result in
|
|
headers using the prefix <TT>"X-Meta-"</TT> appended with the value of the
|
|
<TT>"name"</TT> attribute as the name of the header, and the value of the
|
|
<TT>"content"</TT> attribute as the pushed header value.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<meta> elements containing a <TT>"http-equiv"</TT> attribute will result
|
|
in headers as in above, but without the <TT>"X-Meta-"</TT> prefix in the
|
|
header name.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
<meta> elements containing a <TT>"charset"</TT> attribute will result in
|
|
an <TT>"X-Meta-Charset"</TT> header, using the value of the <TT>"charset"</TT>
|
|
attribute as the pushed header value.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
The ':' character can't be represented in header field names, so
|
|
if the meta element contains this char it's substituted with '-'
|
|
before forming the field name.
|
|
</DL>
|
|
<A NAME="lbAE"> </A>
|
|
<H2>METHODS</H2>
|
|
|
|
|
|
|
|
The following methods (in addition to those provided by the
|
|
superclass) are available:
|
|
<DL COMPACT>
|
|
<DT id="5">$hp = HTML::HeadParser->new<DD>
|
|
|
|
|
|
|
|
|
|
|
|
<DT id="6">$hp = HTML::HeadParser->new( $header )<DD>
|
|
|
|
|
|
|
|
|
|
|
|
The object constructor. The optional <TT>$header</TT> argument should be a
|
|
reference to an object that implement the <B>header()</B> and <B>push_header()</B>
|
|
methods as defined by the <TT>"HTTP::Headers"</TT> class. Normally it will be
|
|
of some class that is a or delegates to the <TT>"HTTP::Headers"</TT> class.
|
|
|
|
|
|
<P>
|
|
|
|
|
|
If no <TT>$header</TT> is given <TT>"HTML::HeadParser"</TT> will create an
|
|
<TT>"HTTP::Headers"</TT> object by itself (initially empty).
|
|
<DT id="7">$hp->header;<DD>
|
|
|
|
|
|
|
|
|
|
Returns a reference to the header object.
|
|
<DT id="8">$hp->header( $key )<DD>
|
|
|
|
|
|
|
|
|
|
Returns a header value. It is just a shorter way to write
|
|
<TT>"$hp->header->header($key)"</TT>.
|
|
</DL>
|
|
<A NAME="lbAF"> </A>
|
|
<H2>EXAMPLE</H2>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
$h = HTTP::Headers->new;
|
|
$p = HTML::HeadParser->new($h);
|
|
$p->parse(<<EOT);
|
|
<title>Stupid example</title>
|
|
<base href="<A HREF="http://www.linpro.no/lwp/">http://www.linpro.no/lwp/</A>">
|
|
Normal text starts here.
|
|
EOT
|
|
undef $p;
|
|
print $h->title; # should print "Stupid example"
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAG"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
|
|
|
|
HTML::Parser, HTTP::Headers
|
|
<P>
|
|
|
|
The <TT>"HTTP::Headers"</TT> class is distributed as part of the
|
|
<I>libwww-perl</I> package. If you don't have that distribution installed
|
|
you need to provide the <TT>$header</TT> argument to the <TT>"HTML::HeadParser"</TT>
|
|
constructor with your own object that implements the documented
|
|
protocol.
|
|
<A NAME="lbAH"> </A>
|
|
<H2>COPYRIGHT</H2>
|
|
|
|
|
|
|
|
Copyright 1996-2001 Gisle Aas. All rights reserved.
|
|
<P>
|
|
|
|
This library is free software; you can redistribute it and/or
|
|
modify it under the same terms as Perl itself.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="9"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="10"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
|
<DT id="11"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
|
<DT id="12"><A HREF="#lbAE">METHODS</A><DD>
|
|
<DT id="13"><A HREF="#lbAF">EXAMPLE</A><DD>
|
|
<DT id="14"><A HREF="#lbAG">SEE ALSO</A><DD>
|
|
<DT id="15"><A HREF="#lbAH">COPYRIGHT</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:45 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|