net/uri-codec: added `uri-path-segment-unreserved-encode'

original commit: 3d1f1289ef
2012-12-17 06:32:03 -07:00 · 2012-12-17 06:32:03 -07:00 · 8a061420d0
commit 8a061420d0
parent 95d818431b
2 changed files with 25 additions and 4 deletions
--- a/collects/net/scribblings/uri-codec.scrbl
+++ b/collects/net/scribblings/uri-codec.scrbl
@ -28,7 +28,11 @@ less than 128).
 The encoding, in line with RFC 2396's recommendation, represents a
 character as-is, if possible.  The decoding allows any characters
 to be represented by their hex values, and allows characters to be
-incorrectly represented as-is.
+incorrectly represented as-is. The library provides ``unreserved''
+encoders that encode @litchar{!}, @litchar{*}, @litchar{'},
+@litchar{(}, and @litchar{)} using their hex representation,
+which is not recommended by RFC 2396 but avoids problems with some
+contexts.

 The rules for the @tt{application/x-www-form-urlencoded} mimetype
 given in the HTML 4.0 spec are:
@ -52,15 +56,17 @@ given in the HTML 4.0 spec are:

 ]

-These rules differs slightly from the straight encoding in RFC 2396 in
+These @tt{application/x-www-form-urlencoded} rules differs slightly from the straight encoding in RFC 2396 in
 that @litchar{+} is allowed, and it represents a space.  The
@racketmodname[net/uri-codec] library follows this convention,
 encoding a space as @litchar{+} and decoding @litchar{+} as a space.
-In addtion, since there appear to be some brain-dead decoders on the
+In addition, since there appear to be some broken decoders on the
 web, the library also encodes @litchar{!}, @litchar{~}, @litchar{'},
@litchar{(}, and @litchar{)} using their hex representation, which is
 the same choice as made by the Java's @tt{URLEncoder}.

+
+
@; ----------------------------------------

@section[#:tag "uri-codec-proc"]{Functions}
@ -92,6 +98,14 @@ Encodes a string according to the rules in @cite["RFC3986"](section 2.3) for the
@defproc[(uri-unreserved-decode [str string?]) string?]{
 Decodes a string according to the rules in @cite["RFC3986"](section 2.3) for the unreserved characters.
 }
+@defproc[(uri-path-segment-unreserved-encode [str string?]) string?]{
+Encodes a string according to the rules in @cite["RFC3986"] for path segments,
+but also encodes characters that @racket[uri-unreserved-encode] encodes
+and that @racket[uri-encode] does not.
+}
+@defproc[(uri-path-segment-unreserved-decode [str string?]) string?]{
+Decodes a string according to the rules in @cite["RFC3986"] for path segments.
+}


@defproc[(form-urlencoded-encode [str string?]) string?]{
@ -184,7 +198,9 @@ Imports nothing, exports @racket[uri-codec^].}

@defsignature[uri-codec^ ()]{}

-Includes everything exported by the @racketmodname[net/uri-codec] module.
+Includes everything exported by the @racketmodname[net/uri-codec]
+module except @racket[uri-path-segment-unreserved-encode] and
+@racket[uri-path-segment-unreserved-decode].


@close-eval[uri-codec-eval]
--- a/collects/tests/net/uri-codec.rkt
+++ b/collects/tests/net/uri-codec.rkt
@ -75,11 +75,16 @@
        (uri-path-segment-encode "M~(@; ") =>  "M~(@%3B%20"
        (uri-userinfo-encode "M~(@; ")     =>  "M~(%40;%20"
        (uri-unreserved-encode "M~(@; ")   =>  "M~%28%40%3B%20"         
+        (uri-path-segment-unreserved-encode "M~(@; ") =>  "M~%28@%3B%20"
        ;; matching decodes:
        (uri-decode "M~(%40%3B%20")              =>  "M~(@; "
        (uri-path-segment-decode "M~(@%3B%20")   =>  "M~(@; "
        (uri-userinfo-decode "M~(%40;%20")       =>  "M~(@; "
        (uri-unreserved-decode "M~%28%40%3B%20") =>  "M~(@; "
+        (uri-path-segment-unreserved-decode "M~%28@%3B%20")   =>  "M~(@; "
+
+        (uri-path-segment-decode "M~%28@%3B%20")   =>  "M~(@; "
+        (uri-path-segment-unreserved-decode "M~(@%3B%20")   =>  "M~(@; "
        ))

 ;; tests adapted from Noel Welsh's original test suite