net/uri-codec: added `uri-path-segment-unreserved-encode'

original commit: 3d1f1289ef
This commit is contained in:
Matthew Flatt 2012-12-17 06:32:03 -07:00
parent 95d818431b
commit 8a061420d0
2 changed files with 25 additions and 4 deletions

View File

@ -28,7 +28,11 @@ less than 128).
The encoding, in line with RFC 2396's recommendation, represents a
character as-is, if possible. The decoding allows any characters
to be represented by their hex values, and allows characters to be
incorrectly represented as-is.
incorrectly represented as-is. The library provides ``unreserved''
encoders that encode @litchar{!}, @litchar{*}, @litchar{'},
@litchar{(}, and @litchar{)} using their hex representation,
which is not recommended by RFC 2396 but avoids problems with some
contexts.
The rules for the @tt{application/x-www-form-urlencoded} mimetype
given in the HTML 4.0 spec are:
@ -52,15 +56,17 @@ given in the HTML 4.0 spec are:
]
These rules differs slightly from the straight encoding in RFC 2396 in
These @tt{application/x-www-form-urlencoded} rules differs slightly from the straight encoding in RFC 2396 in
that @litchar{+} is allowed, and it represents a space. The
@racketmodname[net/uri-codec] library follows this convention,
encoding a space as @litchar{+} and decoding @litchar{+} as a space.
In addtion, since there appear to be some brain-dead decoders on the
In addition, since there appear to be some broken decoders on the
web, the library also encodes @litchar{!}, @litchar{~}, @litchar{'},
@litchar{(}, and @litchar{)} using their hex representation, which is
the same choice as made by the Java's @tt{URLEncoder}.
@; ----------------------------------------
@section[#:tag "uri-codec-proc"]{Functions}
@ -92,6 +98,14 @@ Encodes a string according to the rules in @cite["RFC3986"](section 2.3) for the
@defproc[(uri-unreserved-decode [str string?]) string?]{
Decodes a string according to the rules in @cite["RFC3986"](section 2.3) for the unreserved characters.
}
@defproc[(uri-path-segment-unreserved-encode [str string?]) string?]{
Encodes a string according to the rules in @cite["RFC3986"] for path segments,
but also encodes characters that @racket[uri-unreserved-encode] encodes
and that @racket[uri-encode] does not.
}
@defproc[(uri-path-segment-unreserved-decode [str string?]) string?]{
Decodes a string according to the rules in @cite["RFC3986"] for path segments.
}
@defproc[(form-urlencoded-encode [str string?]) string?]{
@ -184,7 +198,9 @@ Imports nothing, exports @racket[uri-codec^].}
@defsignature[uri-codec^ ()]{}
Includes everything exported by the @racketmodname[net/uri-codec] module.
Includes everything exported by the @racketmodname[net/uri-codec]
module except @racket[uri-path-segment-unreserved-encode] and
@racket[uri-path-segment-unreserved-decode].
@close-eval[uri-codec-eval]

View File

@ -75,11 +75,16 @@
(uri-path-segment-encode "M~(@; ") => "M~(@%3B%20"
(uri-userinfo-encode "M~(@; ") => "M~(%40;%20"
(uri-unreserved-encode "M~(@; ") => "M~%28%40%3B%20"
(uri-path-segment-unreserved-encode "M~(@; ") => "M~%28@%3B%20"
;; matching decodes:
(uri-decode "M~(%40%3B%20") => "M~(@; "
(uri-path-segment-decode "M~(@%3B%20") => "M~(@; "
(uri-userinfo-decode "M~(%40;%20") => "M~(@; "
(uri-unreserved-decode "M~%28%40%3B%20") => "M~(@; "
(uri-path-segment-unreserved-decode "M~%28@%3B%20") => "M~(@; "
(uri-path-segment-decode "M~%28@%3B%20") => "M~(@; "
(uri-path-segment-unreserved-decode "M~(@%3B%20") => "M~(@; "
))
;; tests adapted from Noel Welsh's original test suite