298 lines
6.2 KiB
HTML
298 lines
6.2 KiB
HTML
|
||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||
<HTML><HEAD><TITLE>Man page of HTML::Entities</TITLE>
|
||
</HEAD><BODY>
|
||
<H1>HTML::Entities</H1>
|
||
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-18<BR><A HREF="#index">Index</A>
|
||
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<A NAME="lbAB"> </A>
|
||
<H2>NAME</H2>
|
||
|
||
HTML::Entities - Encode or decode strings with HTML entities
|
||
<A NAME="lbAC"> </A>
|
||
<H2>SYNOPSIS</H2>
|
||
|
||
|
||
|
||
|
||
|
||
<PRE>
|
||
use HTML::Entities;
|
||
|
||
$a = "V&aring;re norske tegn b&oslash;r &#230res";
|
||
decode_entities($a);
|
||
encode_entities($a, "\200-\377");
|
||
|
||
</PRE>
|
||
|
||
|
||
<P>
|
||
|
||
For example, this:
|
||
<P>
|
||
|
||
|
||
|
||
<PRE>
|
||
$input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
|
||
print encode_entities($input), "\n"
|
||
|
||
</PRE>
|
||
|
||
|
||
<P>
|
||
|
||
Prints this out:
|
||
<P>
|
||
|
||
|
||
|
||
<PRE>
|
||
vis-&agrave;-vis Beyonc&eacute;'s na&iuml;ve
|
||
papier-m&acirc;ch&eacute; r&eacute;sum&eacute;
|
||
|
||
</PRE>
|
||
|
||
|
||
<A NAME="lbAD"> </A>
|
||
<H2>DESCRIPTION</H2>
|
||
|
||
|
||
|
||
This module deals with encoding and decoding of strings with <FONT SIZE="-1">HTML</FONT>
|
||
character entities. The module provides the following functions:
|
||
<DL COMPACT>
|
||
<DT id="1">decode_entities( $string, ... )<DD>
|
||
|
||
|
||
|
||
|
||
This routine replaces <FONT SIZE="-1">HTML</FONT> entities found in the <TT>$string</TT> with the
|
||
corresponding Unicode character. Unrecognized entities are left alone.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
If multiple strings are provided as argument they are each decoded
|
||
separately and the same number of strings are returned.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
If called in void context the arguments are decoded in-place.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
This routine is exported by default.
|
||
<DT id="2">_decode_entities( $string, \%entity2char )<DD>
|
||
|
||
|
||
|
||
|
||
|
||
<DT id="3">_decode_entities( $string, \%entity2char, $expand_prefix )<DD>
|
||
|
||
|
||
|
||
|
||
|
||
This will in-place replace <FONT SIZE="-1">HTML</FONT> entities in <TT>$string</TT>. The <TT>%entity2char</TT>
|
||
hash must be provided. Named entities not found in the <TT>%entity2char</TT>
|
||
hash are left alone. Numeric entities are expanded unless their value
|
||
overflow.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
The keys in <TT>%entity2char</TT> are the entity names to be expanded and their
|
||
values are what they should expand into. The values do not have to be
|
||
single character strings. If a key has ``;'' as suffix,
|
||
then occurrences in <TT>$string</TT> are only expanded if properly terminated
|
||
with ``;''. Entities without ``;'' will be expanded regardless of how
|
||
they are terminated for compatibility with how common browsers treat
|
||
entities in the Latin-1 range.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
If <TT>$expand_prefix</TT> is <FONT SIZE="-1">TRUE</FONT> then entities without trailing ``;'' in
|
||
<TT>%entity2char</TT> will even be expanded as a prefix of a longer
|
||
unrecognized name. The longest matching name in <TT>%entity2char</TT> will be
|
||
used. This is mainly present for compatibility with an <FONT SIZE="-1">MSIE</FONT>
|
||
misfeature.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
|
||
|
||
<PRE>
|
||
$string = "foo&nbspbar";
|
||
_decode_entities($string, { nb => "@", nbsp => "\xA0" }, 1);
|
||
print $string; # will print "foo bar"
|
||
|
||
</PRE>
|
||
|
||
|
||
|
||
|
||
<P>
|
||
|
||
|
||
This routine is exported by default.
|
||
<DT id="4">encode_entities( $string )<DD>
|
||
|
||
|
||
|
||
|
||
|
||
<DT id="5">encode_entities( $string, $unsafe_chars )<DD>
|
||
|
||
|
||
|
||
|
||
|
||
This routine replaces unsafe characters in <TT>$string</TT> with their entity
|
||
representation. A second argument can be given to specify which characters to
|
||
consider unsafe. The unsafe characters is specified using the regular
|
||
expression character class syntax (what you find within brackets in regular
|
||
expressions).
|
||
|
||
|
||
<P>
|
||
|
||
|
||
The default set of characters to encode are control chars, high-bit chars, and
|
||
the <TT>"<"</TT>, <TT>"&"</TT>, <TT>">"</TT>, <TT>"'"</TT> and <TT>"""</TT> characters. But this,
|
||
for example, would encode <I>just</I> the <TT>"<"</TT>, <TT>"&"</TT>, <TT>">"</TT>, and <TT>"""</TT> characters:
|
||
|
||
|
||
<P>
|
||
|
||
|
||
|
||
|
||
<PRE>
|
||
$encoded = encode_entities($input, '<>&"');
|
||
|
||
</PRE>
|
||
|
||
|
||
|
||
|
||
<P>
|
||
|
||
|
||
and this would only encode non-plain ascii:
|
||
|
||
|
||
<P>
|
||
|
||
|
||
|
||
|
||
<PRE>
|
||
$encoded = encode_entities($input, '^\n\x20-\x25\x27-\x7e');
|
||
|
||
</PRE>
|
||
|
||
|
||
|
||
|
||
<P>
|
||
|
||
|
||
This routine is exported by default.
|
||
<DT id="6">encode_entities_numeric( $string )<DD>
|
||
|
||
|
||
|
||
|
||
|
||
<DT id="7">encode_entities_numeric( $string, $unsafe_chars )<DD>
|
||
|
||
|
||
|
||
|
||
|
||
This routine works just like encode_entities, except that the replacement
|
||
entities are always <TT>"&#x</TT>hexnum<TT>;"</TT> and never <TT>"&</TT>entname<TT>;"</TT>. For
|
||
example, <TT>"encode_entities("r\xF4le")"</TT> returns ``r&ocirc;le'', but
|
||
<TT>"encode_entities_numeric("r\xF4le")"</TT> returns ``r&#xF4;le''.
|
||
|
||
|
||
<P>
|
||
|
||
|
||
This routine is <I>not</I> exported by default. But you can always
|
||
export it with <TT>"use HTML::Entities qw(encode_entities_numeric);"</TT>
|
||
or even <TT>"use HTML::Entities qw(:DEFAULT encode_entities_numeric);"</TT>
|
||
</DL>
|
||
<P>
|
||
|
||
All these routines modify the string passed as the first argument, if
|
||
called in a void context. In scalar and array contexts, the encoded or
|
||
decoded string is returned (without changing the input string).
|
||
<P>
|
||
|
||
If you prefer not to import these routines into your namespace, you can
|
||
call them as:
|
||
<P>
|
||
|
||
|
||
|
||
<PRE>
|
||
use HTML::Entities ();
|
||
$decoded = HTML::Entities::decode($a);
|
||
$encoded = HTML::Entities::encode($a);
|
||
$encoded = HTML::Entities::encode_numeric($a);
|
||
|
||
</PRE>
|
||
|
||
|
||
<P>
|
||
|
||
The module can also export the <TT>%char2entity</TT> and the <TT>%entity2char</TT>
|
||
hashes, which contain the mapping from all characters to the
|
||
corresponding entities (and vice versa, respectively).
|
||
<A NAME="lbAE"> </A>
|
||
<H2>COPYRIGHT</H2>
|
||
|
||
|
||
|
||
Copyright 1995-2006 Gisle Aas. All rights reserved.
|
||
<P>
|
||
|
||
This library is free software; you can redistribute it and/or
|
||
modify it under the same terms as Perl itself.
|
||
<P>
|
||
|
||
<HR>
|
||
<A NAME="index"> </A><H2>Index</H2>
|
||
<DL>
|
||
<DT id="8"><A HREF="#lbAB">NAME</A><DD>
|
||
<DT id="9"><A HREF="#lbAC">SYNOPSIS</A><DD>
|
||
<DT id="10"><A HREF="#lbAD">DESCRIPTION</A><DD>
|
||
<DT id="11"><A HREF="#lbAE">COPYRIGHT</A><DD>
|
||
</DL>
|
||
<HR>
|
||
This document was created by
|
||
<A HREF="/cgi-bin/man/man2html">man2html</A>,
|
||
using the manual pages.<BR>
|
||
Time: 00:05:45 GMT, March 31, 2021
|
||
</BODY>
|
||
</HTML>
|