man-pages/man3/URI::Escape.3pm.html


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of URI::Escape</TITLE>
</HEAD><BODY>
<H1>URI::Escape</H1>
Section: User Contributed Perl Documentation (3pm)<BR>Updated: 2020-02-08<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>


<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>

URI::Escape - Percent-encode and percent-decode unsafe characters
<A NAME="lbAC">&nbsp;</A>
<H2>SYNOPSIS</H2>


<PRE>
 use URI::Escape;
 $safe = uri_escape(&quot;10% is enough\n&quot;);
 $verysafe = uri_escape(&quot;foo&quot;, &quot;\0-\377&quot;);
 $str  = uri_unescape($safe);

</PRE>


<A NAME="lbAD">&nbsp;</A>
<H2>DESCRIPTION</H2>


This module provides functions to percent-encode and percent-decode <FONT SIZE="-1">URI</FONT> strings as
defined by <FONT SIZE="-1">RFC 3986.</FONT> Percent-encoding <FONT SIZE="-1">URI</FONT>'s is informally called ``<FONT SIZE="-1">URI</FONT> escaping''.
This is the terminology used by this module, which predates the formalization of the
terms by the <FONT SIZE="-1">RFC</FONT> by several years.
<P>

A <FONT SIZE="-1">URI</FONT> consists of a restricted set of characters.  The restricted set
of characters consists of digits, letters, and a few graphic symbols
chosen from those common to most of the character encodings and input
facilities available to Internet users.  They are made up of the
``unreserved'' and ``reserved'' character sets as defined in <FONT SIZE="-1">RFC 3986.</FONT>
<P>


<PRE>
   unreserved    = ALPHA / DIGIT / &quot;-&quot; / &quot;.&quot; / &quot;_&quot; / &quot;~&quot;
   reserved      = &quot;:&quot; / &quot;/&quot; / &quot;?&quot; / &quot;#&quot; / &quot;[&quot; / &quot;]&quot; / &quot;@&quot;
                   &quot;!&quot; / &quot;$&quot; / &quot;&amp;&quot; / &quot;'&quot; / &quot;(&quot; / &quot;)&quot;
                 / &quot;*&quot; / &quot;+&quot; / &quot;,&quot; / &quot;;&quot; / &quot;=&quot;

</PRE>


<P>

In addition, any byte (octet) can be represented in a <FONT SIZE="-1">URI</FONT> by an escape
sequence: a triplet consisting of the character ``%'' followed by two
hexadecimal digits.  A byte can also be represented directly by a
character, using the US-ASCII character for that octet.
<P>

Some of the characters are <I>reserved</I> for use as delimiters or as
part of certain <FONT SIZE="-1">URI</FONT> components.  These must be escaped if they are to
be treated as ordinary data.  Read <FONT SIZE="-1">RFC 3986</FONT> for further details.
<P>

The functions provided (and exported by default) from this module are:
<DL COMPACT>
<DT id="1">uri_escape( $string )<DD>


<DT id="2">uri_escape( $string, $unsafe )<DD>


Replaces each unsafe character in the <TT>$string</TT> with the corresponding
escape sequence and returns the result.  The <TT>$string</TT> argument should
be a string of bytes.  The <B>uri_escape()</B> function will croak if given a
characters with code above 255.  Use <B>uri_escape_utf8()</B> if you know you
have such chars or/and want chars in the 128 .. 255 range treated as
<FONT SIZE="-1">UTF-8.</FONT>


<P>


The <B>uri_escape()</B> function takes an optional second argument that
overrides the set of characters that are to be escaped.  The set is
specified as a string that can be used in a regular expression
character class (between [ ]).  E.g.:


<P>


<PRE>
  &quot;\x00-\x1f\x7f-\xff&quot;          # all control and hi-bit characters
  &quot;a-z&quot;                         # all lower case characters
  &quot;^A-Za-z&quot;                     # everything not a letter

</PRE>


<P>


The default set of characters to be escaped is all those which are
<I>not</I> part of the <TT>&quot;unreserved&quot;</TT> character class shown above as well
as the reserved characters.  I.e. the default is:


<P>


<PRE>
    &quot;^A-Za-z0-9\-\._~&quot;

</PRE>


<DT id="3">uri_escape_utf8( $string )<DD>


<DT id="4">uri_escape_utf8( $string, $unsafe )<DD>


Works like <B>uri_escape()</B>, but will encode chars as <FONT SIZE="-1">UTF-8</FONT> before
escaping them.  This makes this function able to deal with characters
with code above 255 in <TT>$string</TT>.  Note that chars in the 128 .. 255
range will be escaped differently by this function compared to what
<B>uri_escape()</B> would.  For chars in the 0 .. 127 range there is no
difference.


<P>


Equivalent to:


<P>


<PRE>
    utf8::encode($string);
    my $uri = uri_escape($string);

</PRE>


<P>


Note: JavaScript has a function called <B>escape()</B> that produces the
sequence ``%uXXXX'' for chars in the 256 .. 65535 range.  This function
has really nothing to do with <FONT SIZE="-1">URI</FONT> escaping but some folks got confused
since it ``does the right thing'' in the 0 .. 255 range.  Because of
this you sometimes see ``URIs'' with these kind of escapes.  The
JavaScript <B>encodeURIComponent()</B> function is similar to <B>uri_escape_utf8()</B>.
<DT id="5">uri_unescape($string,...)<DD>


Returns a string with each <TT>%XX</TT> sequence replaced with the actual byte
(octet).


<P>


This does the same as:


<P>


<PRE>
   $string =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;

</PRE>


<P>


but does not modify the string in-place as this <FONT SIZE="-1">RE</FONT> would.  Using the
<B>uri_unescape()</B> function instead of the <FONT SIZE="-1">RE</FONT> might make the code look
cleaner and is a few characters less to type.


<P>


In a simple benchmark test I did,
calling the function (instead of the inline <FONT SIZE="-1">RE</FONT> above) if a few chars
were unescaped was something like 40% slower, and something like 700% slower if none were.  If
you are going to unescape a lot of times it might be a good idea to
inline the <FONT SIZE="-1">RE.</FONT>


<P>


If the <B>uri_unescape()</B> function is passed multiple strings, then each
one is returned unescaped.
</DL>
<P>

The module can also export the <TT>%escapes</TT> hash, which contains the
mapping from all 256 bytes to the corresponding escape codes.  Lookup
in this hash is faster than evaluating <TT>&quot;sprintf(&quot;%%%02X&quot;, ord($byte))&quot;</TT>
each time.
<A NAME="lbAE">&nbsp;</A>
<H2>SEE ALSO</H2>


<FONT SIZE="-1">URI</FONT>
<A NAME="lbAF">&nbsp;</A>
<H2>COPYRIGHT</H2>


Copyright 1995-2004 Gisle Aas.
<P>

This program is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
<P>

<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="6"><A HREF="#lbAB">NAME</A><DD>
<DT id="7"><A HREF="#lbAC">SYNOPSIS</A><DD>
<DT id="8"><A HREF="#lbAD">DESCRIPTION</A><DD>
<DT id="9"><A HREF="#lbAE">SEE ALSO</A><DD>
<DT id="10"><A HREF="#lbAF">COPYRIGHT</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:05:59 GMT, March 31, 2021
</BODY>
</HTML>