169 lines
3.7 KiB
HTML
169 lines
3.7 KiB
HTML
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<HTML><HEAD><TITLE>Man page of HTML::Filter</TITLE>
|
|
</HEAD><BODY>
|
|
<H1>HTML::Filter</H1>
|
|
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-18<BR><A HREF="#index">Index</A>
|
|
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="lbAB"> </A>
|
|
<H2>NAME</H2>
|
|
|
|
HTML::Filter - Filter HTML text through the parser
|
|
<A NAME="lbAC"> </A>
|
|
<H2>NOTE</H2>
|
|
|
|
|
|
|
|
<B>This module is deprecated.</B> The <TT>"HTML::Parser"</TT> now provides the
|
|
functionally of <TT>"HTML::Filter"</TT> much more efficiently with the
|
|
<TT>"default"</TT> handler.
|
|
<A NAME="lbAD"> </A>
|
|
<H2>SYNOPSIS</H2>
|
|
|
|
|
|
|
|
|
|
|
|
<PRE>
|
|
require HTML::Filter;
|
|
$p = HTML::Filter->new->parse_file("index.html");
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAE"> </A>
|
|
<H2>DESCRIPTION</H2>
|
|
|
|
|
|
|
|
<TT>"HTML::Filter"</TT> is an <FONT SIZE="-1">HTML</FONT> parser that by default prints the
|
|
original text of each <FONT SIZE="-1">HTML</FONT> element (a slow version of <B><A HREF="/cgi-bin/man/man2html?1+cat">cat</A></B>(1) basically).
|
|
The callback methods may be overridden to modify the filtering for some
|
|
<FONT SIZE="-1">HTML</FONT> elements and you can override <B>output()</B> method which is called to
|
|
print the <FONT SIZE="-1">HTML</FONT> text.
|
|
<P>
|
|
|
|
<TT>"HTML::Filter"</TT> is a subclass of <TT>"HTML::Parser"</TT>. This means that
|
|
the document should be given to the parser by calling the <TT>$p</TT>-><B>parse()</B>
|
|
or <TT>$p</TT>-><B>parse_file()</B> methods.
|
|
<A NAME="lbAF"> </A>
|
|
<H2>EXAMPLES</H2>
|
|
|
|
|
|
|
|
The first example is a filter that will remove all comments from an
|
|
<FONT SIZE="-1">HTML</FONT> file. This is achieved by simply overriding the comment method
|
|
to do nothing.
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
package CommentStripper;
|
|
require HTML::Filter;
|
|
@ISA=qw(HTML::Filter);
|
|
sub comment { } # ignore comments
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
The second example shows a filter that will remove any <<FONT SIZE="-1">TABLE</FONT>>s
|
|
found in the <FONT SIZE="-1">HTML</FONT> file. We specialize the <B>start()</B> and <B>end()</B> methods
|
|
to count table tags and then make output not happen when inside a
|
|
table.
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
package TableStripper;
|
|
require HTML::Filter;
|
|
@ISA=qw(HTML::Filter);
|
|
sub start
|
|
{
|
|
my $self = shift;
|
|
$self->{table_seen}++ if $_[0] eq "table";
|
|
$self->SUPER::start(@_);
|
|
}
|
|
|
|
sub end
|
|
{
|
|
my $self = shift;
|
|
$self->SUPER::end(@_);
|
|
$self->{table_seen}-- if $_[0] eq "table";
|
|
}
|
|
|
|
sub output
|
|
{
|
|
my $self = shift;
|
|
unless ($self->{table_seen}) {
|
|
$self->SUPER::output(@_);
|
|
}
|
|
}
|
|
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
|
|
If you want to collect the parsed text internally you might want to do
|
|
something like this:
|
|
<P>
|
|
|
|
|
|
|
|
<PRE>
|
|
package FilterIntoString;
|
|
require HTML::Filter;
|
|
@ISA=qw(HTML::Filter);
|
|
sub output { push(@{$_[0]->{fhtml}}, $_[1]) }
|
|
sub filtered_html { join("", @{$_[0]->{fhtml}}) }
|
|
|
|
</PRE>
|
|
|
|
|
|
<A NAME="lbAG"> </A>
|
|
<H2>SEE ALSO</H2>
|
|
|
|
|
|
|
|
HTML::Parser
|
|
<A NAME="lbAH"> </A>
|
|
<H2>COPYRIGHT</H2>
|
|
|
|
|
|
|
|
Copyright 1997-1999 Gisle Aas.
|
|
<P>
|
|
|
|
This library is free software; you can redistribute it and/or
|
|
modify it under the same terms as Perl itself.
|
|
<P>
|
|
|
|
<HR>
|
|
<A NAME="index"> </A><H2>Index</H2>
|
|
<DL>
|
|
<DT id="1"><A HREF="#lbAB">NAME</A><DD>
|
|
<DT id="2"><A HREF="#lbAC">NOTE</A><DD>
|
|
<DT id="3"><A HREF="#lbAD">SYNOPSIS</A><DD>
|
|
<DT id="4"><A HREF="#lbAE">DESCRIPTION</A><DD>
|
|
<DT id="5"><A HREF="#lbAF">EXAMPLES</A><DD>
|
|
<DT id="6"><A HREF="#lbAG">SEE ALSO</A><DD>
|
|
<DT id="7"><A HREF="#lbAH">COPYRIGHT</A><DD>
|
|
</DL>
|
|
<HR>
|
|
This document was created by
|
|
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
|
using the manual pages.<BR>
|
|
Time: 00:05:45 GMT, March 31, 2021
|
|
</BODY>
|
|
</HTML>
|