man-pages/man3/HTML::Filter.3pm.html
2021-03-31 01:06:50 +01:00

169 lines
3.7 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of HTML::Filter</TITLE>
</HEAD><BODY>
<H1>HTML::Filter</H1>
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-18<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
HTML::Filter - Filter HTML text through the parser
<A NAME="lbAC">&nbsp;</A>
<H2>NOTE</H2>
<B>This module is deprecated.</B> The <TT>&quot;HTML::Parser&quot;</TT> now provides the
functionally of <TT>&quot;HTML::Filter&quot;</TT> much more efficiently with the
<TT>&quot;default&quot;</TT> handler.
<A NAME="lbAD">&nbsp;</A>
<H2>SYNOPSIS</H2>
<PRE>
require HTML::Filter;
$p = HTML::Filter-&gt;new-&gt;parse_file(&quot;index.html&quot;);
</PRE>
<A NAME="lbAE">&nbsp;</A>
<H2>DESCRIPTION</H2>
<TT>&quot;HTML::Filter&quot;</TT> is an <FONT SIZE="-1">HTML</FONT> parser that by default prints the
original text of each <FONT SIZE="-1">HTML</FONT> element (a slow version of <B><A HREF="/cgi-bin/man/man2html?1+cat">cat</A></B>(1) basically).
The callback methods may be overridden to modify the filtering for some
<FONT SIZE="-1">HTML</FONT> elements and you can override <B>output()</B> method which is called to
print the <FONT SIZE="-1">HTML</FONT> text.
<P>
<TT>&quot;HTML::Filter&quot;</TT> is a subclass of <TT>&quot;HTML::Parser&quot;</TT>. This means that
the document should be given to the parser by calling the <TT>$p</TT>-&gt;<B>parse()</B>
or <TT>$p</TT>-&gt;<B>parse_file()</B> methods.
<A NAME="lbAF">&nbsp;</A>
<H2>EXAMPLES</H2>
The first example is a filter that will remove all comments from an
<FONT SIZE="-1">HTML</FONT> file. This is achieved by simply overriding the comment method
to do nothing.
<P>
<PRE>
package CommentStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub comment { } # ignore comments
</PRE>
<P>
The second example shows a filter that will remove any &lt;<FONT SIZE="-1">TABLE</FONT>&gt;s
found in the <FONT SIZE="-1">HTML</FONT> file. We specialize the <B>start()</B> and <B>end()</B> methods
to count table tags and then make output not happen when inside a
table.
<P>
<PRE>
package TableStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub start
{
my $self = shift;
$self-&gt;{table_seen}++ if $_[0] eq &quot;table&quot;;
$self-&gt;SUPER::start(@_);
}
sub end
{
my $self = shift;
$self-&gt;SUPER::end(@_);
$self-&gt;{table_seen}-- if $_[0] eq &quot;table&quot;;
}
sub output
{
my $self = shift;
unless ($self-&gt;{table_seen}) {
$self-&gt;SUPER::output(@_);
}
}
</PRE>
<P>
If you want to collect the parsed text internally you might want to do
something like this:
<P>
<PRE>
package FilterIntoString;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub output { push(@{$_[0]-&gt;{fhtml}}, $_[1]) }
sub filtered_html { join(&quot;&quot;, @{$_[0]-&gt;{fhtml}}) }
</PRE>
<A NAME="lbAG">&nbsp;</A>
<H2>SEE ALSO</H2>
HTML::Parser
<A NAME="lbAH">&nbsp;</A>
<H2>COPYRIGHT</H2>
Copyright 1997-1999 Gisle Aas.
<P>
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
<DT id="2"><A HREF="#lbAC">NOTE</A><DD>
<DT id="3"><A HREF="#lbAD">SYNOPSIS</A><DD>
<DT id="4"><A HREF="#lbAE">DESCRIPTION</A><DD>
<DT id="5"><A HREF="#lbAF">EXAMPLES</A><DD>
<DT id="6"><A HREF="#lbAG">SEE ALSO</A><DD>
<DT id="7"><A HREF="#lbAH">COPYRIGHT</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:45 GMT, March 31, 2021
</BODY>
</HTML>