racket/collects/net/scribblings/uri-codec.scrbl
Eli Barzilay 4288c6c2c7 The Scribble reader was improved to make it pull out the syntax
punctuations outside of the form, as it does with quote punctuations.
So things like this

  #, @foo{...}

that required the space to make the @foo read as a scribble form are
now better written as

  @#,foo{...}

This changes all such occurrences.  (In case you see this change in
your files and are worried that there might be changes: I mechanically
verified that the result of `read'ing the modified files is identical
to the previous version.)

svn: r15111
2009-06-07 10:12:32 +00:00

161 lines
6.2 KiB
Racket

#lang scribble/doc
@(require "common.ss"
scribble/bnf
scribble/eval
(for-label net/url
net/uri-codec
net/uri-codec-unit
net/uri-codec-sig))
@(define uri-codec-eval (make-base-eval))
@interaction-eval[#:eval uri-codec-eval (require net/uri-codec)]
@title[#:tag "uri-codec"]{URI Codec: Encoding and Decoding URIs}
@defmodule[net/uri-codec]{The @schememodname[net/uri-codec] module
provides utilities for encoding and decoding strings using the URI
encoding rules given in RFC 2396 @cite["RFC2396"], and to encode and
decode name/value pairs using the
@tt{application/x-www-form-urlencoded} mimetype given the in HTML 4.0
specification. There are minor differences between the two encodings.}
The URI encoding uses allows a few characters to be represented as-is:
@litchar{a} through @litchar{z}, @litchar{A} through @litchar{Z},
@litchar{0}-@litchar{9}, @litchar{-}, @litchar{_}, @litchar{.},
@litchar{!}, @litchar{~}, @litchar{*}, @litchar{'}, @litchar{(} and
@litchar{)}. The remaining characters are encoded as
@litchar{%}@nonterm{xx}, where @nonterm{xx} is the two-character hex
representation of the integer value of the character (where the
mapping character--integer is determined by US-ASCII if the integer is
less than 128).
The encoding, in line with RFC 2396's recommendation, represents a
character as-is, if possible. The decoding allows any characters
to be represented by their hex values, and allows characters to be
incorrectly represented as-is.
The rules for the @tt{application/x-www-form-urlencoded} mimetype
given in the HTML 4.0 spec are:
@itemize[
@item{Control names and values are escaped. Space characters are
replaced by @litchar{+}, and then reserved characters are escaped as
described in RFC 1738, section 2.2: Non-alphanumeric characters are
replaced by @litchar{%}@nonterm{xx} representing the ASCII code of
the character. Line breaks are represented as CRLF pairs:
@litchar{%0D%0A}. Note that RFC 2396 supersedes RFC 1738
@cite["RFC1738"].}
@item{The control names/values are listed in the order they appear
in the document. The name is separated from the value by @litchar{=}
and name/value pairs are separated from each other by either
@litchar{;} or @litchar{&}. When encoding, @litchar{;} is used as
the separator by default. When decoding, both @litchar{;} and
@litchar{&} are parsed as separators by default.}
]
These rules differs slightly from the straight encoding in RFC 2396 in
that @litchar{+} is allowed, and it represents a space. The
@schememodname[net/uri-codec] library follows this convention,
encoding a space as @litchar{+} and decoding @litchar{+} as a space.
In addtion, since there appear to be some brain-dead decoders on the
web, the library also encodes @litchar{!}, @litchar{~}, @litchar{'},
@litchar{(}, and @litchar{)} using their hex representation, which is
the same choice as made by the Java's @tt{URLEncoder}.
@; ----------------------------------------
@section[#:tag "uri-codec-proc"]{Functions}
@defproc[(uri-encode [str string?]) string?]{
Encode a string using the URI encoding rules.}
@defproc[(uri-decode [str string?]) string?]{
Decode a string using the URI decoding rules.}
@defproc[(uri-path-segment-encode [str string?]) string?]{
Encodes a string according to the rules in @cite["RFC3986"] for path segments.
}
@defproc[(uri-path-segment-decode [str string?]) string?]{
Decodes a string according to the rules in @cite["RFC3986"] for path segments.
}
@defproc[(uri-userinfo-encode [str string?]) string?]{
Encodes a string according to the rules in @cite["RFC3986"] for the userinfo field.
}
@defproc[(uri-userinfo-decode [str string?]) string?]{
Decodes a string according to the rules in @cite["RFC3986"] for the userinfo field.
}
@defproc[(form-urlencoded-encode [str string?]) string?]{
Encode a string using the @tt{application/x-www-form-urlencoded}
encoding rules. The result string contains no non-ASCII characters.}
@defproc[(form-urlencoded-decode [str string?]) string?]{
Decode a string encoded using the
@tt{application/x-www-form-urlencoded} encoding rules.}
@defproc[(alist->form-urlencoded [alist (listof (cons/c symbol? string?))])
string?]{
Encode an association list using the
@tt{application/x-www-form-urlencoded} encoding rules.
The @scheme[current-alist-separator-mode] parameter determines the
separator used in the result.}
@defproc[(form-urlencoded->alist [str string])
(listof (cons/c symbol? string?))]{
Decode a string encoded using the
@tt{application/x-www-form-urlencoded} encoding rules into an
association list. All keys are case-folded for conversion to symbols.
The @scheme[current-alist-separator-mode] parameter determines the way
that separators are parsed in the input.}
@defparam[current-alist-separator-mode mode
(one-of/c 'amp 'semi 'amp-or-semi 'semi-or-amp)]{
A parameter that determines the separator used/recognized between
associations in @scheme[form-urlencoded->alist],
@scheme[alist->form-urlencoded], @scheme[url->string], and
@scheme[string->url].
The default value is @scheme['amp-or-semi], which means that both
@litchar{&} and @litchar{;} are treated as separators when parsing,
and @litchar{&} is used as a separator when encoding. The other modes
use/recognize only of the separators.
@examples[
#:eval uri-codec-eval
(define ex '((x . "foo") (y . "bar") (z . "baz")))
(code:line (current-alist-separator-mode 'amp) (code:comment @#,t{try @scheme['amp]...}))
(form-urlencoded->alist "x=foo&y=bar&z=baz")
(form-urlencoded->alist "x=foo;y=bar;z=baz")
(alist->form-urlencoded ex)
(code:line (current-alist-separator-mode 'semi) (code:comment @#,t{try @scheme['semi]...}))
(form-urlencoded->alist "x=foo;y=bar;z=baz")
(form-urlencoded->alist "x=foo&y=bar&z=baz")
(alist->form-urlencoded ex)
(code:line (current-alist-separator-mode 'amp-or-semi) (code:comment @#,t{try @scheme['amp-or-semi]...}))
(form-urlencoded->alist "x=foo&y=bar&z=baz")
(form-urlencoded->alist "x=foo;y=bar;z=baz")
(alist->form-urlencoded ex)
(code:line (current-alist-separator-mode 'semi-or-amp) (code:comment @#,t{try @scheme['semi-or-amp]...}))
(form-urlencoded->alist "x=foo&y=bar&z=baz")
(form-urlencoded->alist "x=foo;y=bar;z=baz")
(alist->form-urlencoded ex)
]}