From 51cf8696b3eb15a918b0f33b611af993dd111ab3 Mon Sep 17 00:00:00 2001 From: Robby Findler Date: Thu, 29 Mar 2012 17:22:49 -0500 Subject: [PATCH] adjust the contract on string->url so that it actually catches all of the errors that would be signalled by the body. also, remove url-regexp from the exports (it was only recently added) I believe this eliminates two of Eli's concerns: - the contract is no longer so painful to read - the performance is more reasonable. Specifically, for the performance, here are the times I see to call string->url on "http://www.racket-lang.org": no contract: any/c cpu time: 564 real time: 566 gc time: 3 weak contract: (-> (or/c string? bytes?) url?) cpu time: 590 real time: 590 gc time: 3 strong, regexp-based contract: (-> (or/c (not/c #rx"^([^:/?#]*):") #rx"^[a-zA-Z][a-zA-Z0-9+.-]*:") url?) cpu time: 632 real time: 633 gc time: 5 This appears to be about a 10% slowdown for the regexp-based contract over the weaker contract. related to PR 12652 original commit: 86572cc8c33ba12482043718d16225806c7601f5 --- collects/net/scribblings/url.scrbl | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/collects/net/scribblings/url.scrbl b/collects/net/scribblings/url.scrbl index 2d04a7dc43..4c7297efbf 100644 --- a/collects/net/scribblings/url.scrbl +++ b/collects/net/scribblings/url.scrbl @@ -1,6 +1,5 @@ #lang scribble/doc @(require "common.rkt" scribble/bnf - (only-in net/url url-regexp) (for-label net/url net/url-unit net/url-sig net/head net/uri-codec net/tcp-sig (only-in net/url-connect current-https-protocol) @@ -96,7 +95,9 @@ An HTTP connection is created as a @deftech{pure port} or a have been removed, so that what remains is purely the first content fragment. An impure port is one that still has its MIME headers. -@defproc[(string->url [str (and/c (or/c string? bytes?) url-regexp)]) url?]{ +@defproc[(string->url [str (or/c (not/c #rx"^([^:/?#]*):") + #rx"^[a-zA-Z][a-zA-Z0-9+.-]*:")]) + url?]{ Parses the URL specified by @racket[str] into a @racket[url] struct. The @racket[string->url] procedure uses @@ -104,6 +105,10 @@ struct. The @racket[string->url] procedure uses sensitive to the @racket[current-alist-separator-mode] parameter for determining the association separator. +The contract on @racket[str] insists that, if the url has a scheme, +then the scheme begins with a letter and consists only of letters, +numbers, @litchar{+}, @litchar{-}, and @litchar{.} characters. + If @racket[str] starts with @racket["file:"], then the path is always parsed as an absolute path, and the parsing details depend on @racket[file-url-path-convention-type]: @@ -123,17 +128,6 @@ parsed as an absolute path, and the parsing details depend on ]} -@defthing[url-regexp regexp?]{ - -This is a regular expression based on the one in -Appendix B of RFC 3986 for recognizing urls. - -This is the precise regexp: - -@centered{@tt{@(object-name url-regexp)}} - -} - @defproc[(combine-url/relative [base url?] [relative string?]) url?]{ Given a base URL and a relative path, combines the two and returns a