From 86572cc8c33ba12482043718d16225806c7601f5 Mon Sep 17 00:00:00 2001 From: Robby Findler Date: Thu, 29 Mar 2012 17:22:49 -0500 Subject: [PATCH] adjust the contract on string->url so that it actually catches all of the errors that would be signalled by the body. also, remove url-regexp from the exports (it was only recently added) I believe this eliminates two of Eli's concerns: - the contract is no longer so painful to read - the performance is more reasonable. Specifically, for the performance, here are the times I see to call string->url on "http://www.racket-lang.org": no contract: any/c cpu time: 564 real time: 566 gc time: 3 weak contract: (-> (or/c string? bytes?) url?) cpu time: 590 real time: 590 gc time: 3 strong, regexp-based contract: (-> (or/c (not/c #rx"^([^:/?#]*):") #rx"^[a-zA-Z][a-zA-Z0-9+.-]*:") url?) cpu time: 632 real time: 633 gc time: 5 This appears to be about a 10% slowdown for the regexp-based contract over the weaker contract. related to PR 12652 --- collects/net/scribblings/url.scrbl | 20 +++++++------------- collects/net/url.rkt | 5 +++-- 2 files changed, 10 insertions(+), 15 deletions(-) diff --git a/collects/net/scribblings/url.scrbl b/collects/net/scribblings/url.scrbl index 2d04a7dc43..4c7297efbf 100644 --- a/collects/net/scribblings/url.scrbl +++ b/collects/net/scribblings/url.scrbl @@ -1,6 +1,5 @@ #lang scribble/doc @(require "common.rkt" scribble/bnf - (only-in net/url url-regexp) (for-label net/url net/url-unit net/url-sig net/head net/uri-codec net/tcp-sig (only-in net/url-connect current-https-protocol) @@ -96,7 +95,9 @@ An HTTP connection is created as a @deftech{pure port} or a have been removed, so that what remains is purely the first content fragment. An impure port is one that still has its MIME headers. -@defproc[(string->url [str (and/c (or/c string? bytes?) url-regexp)]) url?]{ +@defproc[(string->url [str (or/c (not/c #rx"^([^:/?#]*):") + #rx"^[a-zA-Z][a-zA-Z0-9+.-]*:")]) + url?]{ Parses the URL specified by @racket[str] into a @racket[url] struct. The @racket[string->url] procedure uses @@ -104,6 +105,10 @@ struct. The @racket[string->url] procedure uses sensitive to the @racket[current-alist-separator-mode] parameter for determining the association separator. +The contract on @racket[str] insists that, if the url has a scheme, +then the scheme begins with a letter and consists only of letters, +numbers, @litchar{+}, @litchar{-}, and @litchar{.} characters. + If @racket[str] starts with @racket["file:"], then the path is always parsed as an absolute path, and the parsing details depend on @racket[file-url-path-convention-type]: @@ -123,17 +128,6 @@ parsed as an absolute path, and the parsing details depend on ]} -@defthing[url-regexp regexp?]{ - -This is a regular expression based on the one in -Appendix B of RFC 3986 for recognizing urls. - -This is the precise regexp: - -@centered{@tt{@(object-name url-regexp)}} - -} - @defproc[(combine-url/relative [base url?] [relative string?]) url?]{ Given a base URL and a relative path, combines the two and returns a diff --git a/collects/net/url.rkt b/collects/net/url.rkt index b55dbda5b1..6abbbe000d 100644 --- a/collects/net/url.rkt +++ b/collects/net/url.rkt @@ -660,8 +660,9 @@ (provide (struct-out url) (struct-out path/param)) (provide/contract - [url-regexp regexp?] - (string->url (url-regexp . -> . url?)) + (string->url (-> (or/c #rx"^[a-zA-Z][a-zA-Z0-9+.-]*:" + (not/c #rx"^([^:/?#]*):")) + url?)) (path->url ((or/c path-string? path-for-some-system?) . -> . url?)) (url->string (url? . -> . string?)) (url->path (->* (url?) ((one-of/c 'unix 'windows)) path-for-some-system?))