man-pages/man3/URI::Escape.3pm.html
2021-03-31 01:06:50 +01:00

280 lines
6.7 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of URI::Escape</TITLE>
</HEAD><BODY>
<H1>URI::Escape</H1>
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-08<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>
<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>
URI::Escape - Percent-encode and percent-decode unsafe characters
<A NAME="lbAC">&nbsp;</A>
<H2>SYNOPSIS</H2>
<PRE>
use URI::Escape;
$safe = uri_escape(&quot;10% is enough\n&quot;);
$verysafe = uri_escape(&quot;foo&quot;, &quot;\0-\377&quot;);
$str = uri_unescape($safe);
</PRE>
<A NAME="lbAD">&nbsp;</A>
<H2>DESCRIPTION</H2>
This module provides functions to percent-encode and percent-decode <FONT SIZE="-1">URI</FONT> strings as
defined by <FONT SIZE="-1">RFC 3986.</FONT> Percent-encoding <FONT SIZE="-1">URI</FONT>'s is informally called ``<FONT SIZE="-1">URI</FONT> escaping''.
This is the terminology used by this module, which predates the formalization of the
terms by the <FONT SIZE="-1">RFC</FONT> by several years.
<P>
A <FONT SIZE="-1">URI</FONT> consists of a restricted set of characters. The restricted set
of characters consists of digits, letters, and a few graphic symbols
chosen from those common to most of the character encodings and input
facilities available to Internet users. They are made up of the
``unreserved'' and ``reserved'' character sets as defined in <FONT SIZE="-1">RFC 3986.</FONT>
<P>
<PRE>
unreserved = ALPHA / DIGIT / &quot;-&quot; / &quot;.&quot; / &quot;_&quot; / &quot;~&quot;
reserved = &quot;:&quot; / &quot;/&quot; / &quot;?&quot; / &quot;#&quot; / &quot;[&quot; / &quot;]&quot; / &quot;@&quot;
&quot;!&quot; / &quot;$&quot; / &quot;&amp;&quot; / &quot;'&quot; / &quot;(&quot; / &quot;)&quot;
/ &quot;*&quot; / &quot;+&quot; / &quot;,&quot; / &quot;;&quot; / &quot;=&quot;
</PRE>
<P>
In addition, any byte (octet) can be represented in a <FONT SIZE="-1">URI</FONT> by an escape
sequence: a triplet consisting of the character ``%'' followed by two
hexadecimal digits. A byte can also be represented directly by a
character, using the US-ASCII character for that octet.
<P>
Some of the characters are <I>reserved</I> for use as delimiters or as
part of certain <FONT SIZE="-1">URI</FONT> components. These must be escaped if they are to
be treated as ordinary data. Read <FONT SIZE="-1">RFC 3986</FONT> for further details.
<P>
The functions provided (and exported by default) from this module are:
<DL COMPACT>
<DT id="1">uri_escape( $string )<DD>
<DT id="2">uri_escape( $string, $unsafe )<DD>
Replaces each unsafe character in the <TT>$string</TT> with the corresponding
escape sequence and returns the result. The <TT>$string</TT> argument should
be a string of bytes. The <B>uri_escape()</B> function will croak if given a
characters with code above 255. Use <B>uri_escape_utf8()</B> if you know you
have such chars or/and want chars in the 128 .. 255 range treated as
<FONT SIZE="-1">UTF-8.</FONT>
<P>
The <B>uri_escape()</B> function takes an optional second argument that
overrides the set of characters that are to be escaped. The set is
specified as a string that can be used in a regular expression
character class (between [ ]). E.g.:
<P>
<PRE>
&quot;\x00-\x1f\x7f-\xff&quot; # all control and hi-bit characters
&quot;a-z&quot; # all lower case characters
&quot;^A-Za-z&quot; # everything not a letter
</PRE>
<P>
The default set of characters to be escaped is all those which are
<I>not</I> part of the <TT>&quot;unreserved&quot;</TT> character class shown above as well
as the reserved characters. I.e. the default is:
<P>
<PRE>
&quot;^A-Za-z0-9\-\._~&quot;
</PRE>
<DT id="3">uri_escape_utf8( $string )<DD>
<DT id="4">uri_escape_utf8( $string, $unsafe )<DD>
Works like <B>uri_escape()</B>, but will encode chars as <FONT SIZE="-1">UTF-8</FONT> before
escaping them. This makes this function able to deal with characters
with code above 255 in <TT>$string</TT>. Note that chars in the 128 .. 255
range will be escaped differently by this function compared to what
<B>uri_escape()</B> would. For chars in the 0 .. 127 range there is no
difference.
<P>
Equivalent to:
<P>
<PRE>
utf8::encode($string);
my $uri = uri_escape($string);
</PRE>
<P>
Note: JavaScript has a function called <B>escape()</B> that produces the
sequence ``%uXXXX'' for chars in the 256 .. 65535 range. This function
has really nothing to do with <FONT SIZE="-1">URI</FONT> escaping but some folks got confused
since it ``does the right thing'' in the 0 .. 255 range. Because of
this you sometimes see ``URIs'' with these kind of escapes. The
JavaScript <B>encodeURIComponent()</B> function is similar to <B>uri_escape_utf8()</B>.
<DT id="5">uri_unescape($string,...)<DD>
Returns a string with each <TT>%XX</TT> sequence replaced with the actual byte
(octet).
<P>
This does the same as:
<P>
<PRE>
$string =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
</PRE>
<P>
but does not modify the string in-place as this <FONT SIZE="-1">RE</FONT> would. Using the
<B>uri_unescape()</B> function instead of the <FONT SIZE="-1">RE</FONT> might make the code look
cleaner and is a few characters less to type.
<P>
In a simple benchmark test I did,
calling the function (instead of the inline <FONT SIZE="-1">RE</FONT> above) if a few chars
were unescaped was something like 40% slower, and something like 700% slower if none were. If
you are going to unescape a lot of times it might be a good idea to
inline the <FONT SIZE="-1">RE.</FONT>
<P>
If the <B>uri_unescape()</B> function is passed multiple strings, then each
one is returned unescaped.
</DL>
<P>
The module can also export the <TT>%escapes</TT> hash, which contains the
mapping from all 256 bytes to the corresponding escape codes. Lookup
in this hash is faster than evaluating <TT>&quot;sprintf(&quot;%%%02X&quot;, ord($byte))&quot;</TT>
each time.
<A NAME="lbAE">&nbsp;</A>
<H2>SEE ALSO</H2>
<FONT SIZE="-1">URI</FONT>
<A NAME="lbAF">&nbsp;</A>
<H2>COPYRIGHT</H2>
Copyright 1995-2004 Gisle Aas.
<P>
This program is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
<P>
<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="6"><A HREF="#lbAB">NAME</A><DD>
<DT id="7"><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT id="8"><A HREF="#lbAD">DESCRIPTION</A><DD>
<DT id="9"><A HREF="#lbAE">SEE ALSO</A><DD>
<DT id="10"><A HREF="#lbAF">COPYRIGHT</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:59 GMT, March 31, 2021
</BODY>
</HTML>