man-pages/man3/HTML::Filter.3pm.html


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of HTML::Filter</TITLE>
</HEAD><BODY>
<H1>HTML::Filter</H1>
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-18<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>


<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>

HTML::Filter - Filter HTML text through the parser
<A NAME="lbAC">&nbsp;</A>
<H2>NOTE</H2>


<B>This module is deprecated.</B> The <TT>&quot;HTML::Parser&quot;</TT> now provides the
functionally of <TT>&quot;HTML::Filter&quot;</TT> much more efficiently with the
<TT>&quot;default&quot;</TT> handler.
<A NAME="lbAD">&nbsp;</A>
<H2>SYNOPSIS</H2>


<PRE>
 require HTML::Filter;
 $p = HTML::Filter-&gt;new-&gt;parse_file(&quot;index.html&quot;);

</PRE>


<A NAME="lbAE">&nbsp;</A>
<H2>DESCRIPTION</H2>


<TT>&quot;HTML::Filter&quot;</TT> is an <FONT SIZE="-1">HTML</FONT> parser that by default prints the
original text of each <FONT SIZE="-1">HTML</FONT> element (a slow version of <B><A HREF="/cgi-bin/man/man2html?1+cat">cat</A></B>(1) basically).
The callback methods may be overridden to modify the filtering for some
<FONT SIZE="-1">HTML</FONT> elements and you can override <B>output()</B> method which is called to
print the <FONT SIZE="-1">HTML</FONT> text.
<P>

<TT>&quot;HTML::Filter&quot;</TT> is a subclass of <TT>&quot;HTML::Parser&quot;</TT>. This means that
the document should be given to the parser by calling the <TT>$p</TT>-&gt;<B>parse()</B>
or <TT>$p</TT>-&gt;<B>parse_file()</B> methods.
<A NAME="lbAF">&nbsp;</A>
<H2>EXAMPLES</H2>


The first example is a filter that will remove all comments from an
<FONT SIZE="-1">HTML</FONT> file.  This is achieved by simply overriding the comment method
to do nothing.
<P>


<PRE>
  package CommentStripper;
  require HTML::Filter;
  @ISA=qw(HTML::Filter);
  sub comment { }  # ignore comments

</PRE>


<P>

The second example shows a filter that will remove any &lt;<FONT SIZE="-1">TABLE</FONT>&gt;s
found in the <FONT SIZE="-1">HTML</FONT> file.  We specialize the <B>start()</B> and <B>end()</B> methods
to count table tags and then make output not happen when inside a
table.
<P>


<PRE>
  package TableStripper;
  require HTML::Filter;
  @ISA=qw(HTML::Filter);
  sub start
  {
     my $self = shift;
     $self-&gt;{table_seen}++ if $_[0] eq &quot;table&quot;;
     $self-&gt;SUPER::start(@_);
  }

  sub end
  {
     my $self = shift;
     $self-&gt;SUPER::end(@_);
     $self-&gt;{table_seen}-- if $_[0] eq &quot;table&quot;;
  }

  sub output
  {
      my $self = shift;
      unless ($self-&gt;{table_seen}) {
          $self-&gt;SUPER::output(@_);
      }
  }

</PRE>


<P>

If you want to collect the parsed text internally you might want to do
something like this:
<P>


<PRE>
  package FilterIntoString;
  require HTML::Filter;
  @ISA=qw(HTML::Filter);
  sub output { push(@{$_[0]-&gt;{fhtml}}, $_[1]) }
  sub filtered_html { join(&quot;&quot;, @{$_[0]-&gt;{fhtml}}) }

</PRE>


<A NAME="lbAG">&nbsp;</A>
<H2>SEE ALSO</H2>


HTML::Parser
<A NAME="lbAH">&nbsp;</A>
<H2>COPYRIGHT</H2>


Copyright 1997-1999 Gisle Aas.
<P>

This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
<P>

<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
<DT id="2"><A HREF="#lbAC">NOTE</A><DD>
<DT id="3"><A HREF="#lbAD">SYNOPSIS</A><DD>
<DT id="4"><A HREF="#lbAE">DESCRIPTION</A><DD>
<DT id="5"><A HREF="#lbAF">EXAMPLES</A><DD>
<DT id="6"><A HREF="#lbAG">SEE ALSO</A><DD>
<DT id="7"><A HREF="#lbAH">COPYRIGHT</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:45 GMT, March 31, 2021
</BODY>
</HTML>