adjust the contract on string->url so that it actually catches all of

the errors that would be signalled by the body. also, remove
url-regexp from the exports (it was only recently added)

I believe this eliminates two of Eli's concerns:

  - the contract is no longer so painful to read

  - the performance is more reasonable.

Specifically, for the performance, here are the times I see to call
string->url on "http://www.racket-lang.org":

no contract: any/c
cpu time: 564 real time: 566 gc time: 3

weak contract: (-> (or/c string? bytes?) url?)
cpu time: 590 real time: 590 gc time: 3

strong, regexp-based contract:
(-> (or/c (not/c #rx"^([^:/?#]*):") #rx"^[a-zA-Z][a-zA-Z0-9+.-]*:") url?)
cpu time: 632 real time: 633 gc time: 5

This appears to be about a 10% slowdown for the regexp-based contract
over the weaker contract.

related to PR 12652

original commit: 86572cc8c3
This commit is contained in:
Robby Findler 2012-03-29 17:22:49 -05:00
parent 1b243ce46b
commit 51cf8696b3

View File

@ -1,6 +1,5 @@
#lang scribble/doc #lang scribble/doc
@(require "common.rkt" scribble/bnf @(require "common.rkt" scribble/bnf
(only-in net/url url-regexp)
(for-label net/url net/url-unit net/url-sig (for-label net/url net/url-unit net/url-sig
net/head net/uri-codec net/tcp-sig net/head net/uri-codec net/tcp-sig
(only-in net/url-connect current-https-protocol) (only-in net/url-connect current-https-protocol)
@ -96,7 +95,9 @@ An HTTP connection is created as a @deftech{pure port} or a
have been removed, so that what remains is purely the first content have been removed, so that what remains is purely the first content
fragment. An impure port is one that still has its MIME headers. fragment. An impure port is one that still has its MIME headers.
@defproc[(string->url [str (and/c (or/c string? bytes?) url-regexp)]) url?]{ @defproc[(string->url [str (or/c (not/c #rx"^([^:/?#]*):")
#rx"^[a-zA-Z][a-zA-Z0-9+.-]*:")])
url?]{
Parses the URL specified by @racket[str] into a @racket[url] Parses the URL specified by @racket[str] into a @racket[url]
struct. The @racket[string->url] procedure uses struct. The @racket[string->url] procedure uses
@ -104,6 +105,10 @@ struct. The @racket[string->url] procedure uses
sensitive to the @racket[current-alist-separator-mode] parameter for sensitive to the @racket[current-alist-separator-mode] parameter for
determining the association separator. determining the association separator.
The contract on @racket[str] insists that, if the url has a scheme,
then the scheme begins with a letter and consists only of letters,
numbers, @litchar{+}, @litchar{-}, and @litchar{.} characters.
If @racket[str] starts with @racket["file:"], then the path is always If @racket[str] starts with @racket["file:"], then the path is always
parsed as an absolute path, and the parsing details depend on parsed as an absolute path, and the parsing details depend on
@racket[file-url-path-convention-type]: @racket[file-url-path-convention-type]:
@ -123,17 +128,6 @@ parsed as an absolute path, and the parsing details depend on
]} ]}
@defthing[url-regexp regexp?]{
This is a regular expression based on the one in
Appendix B of RFC 3986 for recognizing urls.
This is the precise regexp:
@centered{@tt{@(object-name url-regexp)}}
}
@defproc[(combine-url/relative [base url?] [relative string?]) url?]{ @defproc[(combine-url/relative [base url?] [relative string?]) url?]{
Given a base URL and a relative path, combines the two and returns a Given a base URL and a relative path, combines the two and returns a