racket/collects/scribblings/reference/windows-paths.scrbl
Eli Barzilay 264af9a6d0 improved scribble syntax use
svn: r8720
2008-02-19 12:22:45 +00:00

305 lines
15 KiB
Racket

#lang scribble/doc
@(require scribble/bnf
"mz.ss")
@(define MzAdd (italic "Scheme-specific:"))
@title[#:tag "windowspaths"]{Windows Path Conventions}
In general, a Windows pathname consists of an optional drive specifier
and a drive-specific path. A Windows path can be @defterm{absolute}
but still relative to the current drive; such paths start with a
@litchar{/} or @litchar["\\"] separator and are not UNC paths or paths
that start with @litchar["\\\\?\\"].
A path that starts with a drive specification is @defterm{complete}.
Roughly, a drive specification is either a Roman letter followed by a
colon, a UNC path of the form
@litchar["\\\\"]@nonterm{machine}@litchar["\\"]@nonterm{volume}, or a
@litchar["\\\\?\\"] form followed by something other than
@litchar["REL\\"]@nonterm{element}, or
@litchar["RED\\"]@nonterm{element}. (Variants of @litchar["\\\\?\\"]
paths are described further below.)
Scheme fails to implement the usual Windows path syntax in one
way. Outside of Scheme, a pathname @filepath{C:rant.txt} can be a
drive-specific relative path. That is, it names a file @filepath{rant.txt}
on drive @filepath{C:}, but the complete path to the file is determined by
the current working directory for drive @filepath{C:}. Scheme does not
support drive-specific working directories (only a working directory
across all drives, as reflected by the @scheme[current-directory]
parameter). Consequently, Scheme implicitly converts a path like
@filepath{C:rant.txt} into @filepath["C:\\rant.txt"].
@itemize{
@item{@|MzAdd| Whenever a path starts with a drive specifier
@nonterm{letter}@litchar{:} that is not followed by a
@litchar{/} or @litchar["\\"], a @litchar["\\"] is inserted as
the path is @tech{cleanse}d.}
}
Otherwise, Scheme follows standard Windows path conventions, but also
adds @litchar["\\\\?\\REL"] and @litchar["\\\\?\\RED"] conventions to
deal with paths inexpressible in the standard conventsion, plus
conventions to deal with excessive @litchar["\\"]s in @litchar["\\\\?\\"]
paths.
In the following, @nonterm{letter} stands for a Roman letter (case
does not matter), @nonterm{machine} stands for any sequence of
characters that does not include @litchar["\\"] or @litchar{/} and is
not @litchar{?}, @nonterm{volume} stands for any sequence of
characters that does not include @litchar["\\"] or @litchar{/} , and
@nonterm{element} stands for any sequence of characters that does not
include @litchar["\\"].
@itemize{
@item{Trailing spaces and @litchar{.} in a path element are ignored
when the element is the last one in the path, unless the path
starts with @litchar["\\\\?\\"] or the element consists of only
spaces and @litchar{.}s.}
@item{The following special ``files'', which access devices, exist in
all directories, case-insensitively, and with all possible
endings after a period or colon, except in pathnames that start
with @litchar["\\\\?\\"]: @indexed-file{NUL}, @indexed-file{CON},
@indexed-file{PRN}, @indexed-file{AUX}, @indexed-file{COM1},
@indexed-file{COM2}, @indexed-file{COM3}, @indexed-file{COM4},
@indexed-file{COM5}, @indexed-file{COM6}, @indexed-file{COM7},
@indexed-file{COM8}, @indexed-file{COM9}, @indexed-file{LPT1},
@indexed-file{LPT2}, @indexed-file{LPT3}, @indexed-file{LPT4},
@indexed-file{LPT5}, @indexed-file{LPT6}, @indexed-file{LPT7},
@indexed-file{LPT8}, @indexed-file{LPT9}.}
@item{Except for @litchar["\\\\?\\"] paths, @litchar{/}s are
equivalent to @litchar["\\"]s. Except for @litchar["\\\\?\\"]
paths and the start of UNC paths, multiple adjacent
@litchar{/}s and @litchar["\\"]s count as a single
@litchar["\\"]. In a path that starts @litchar["\\\\?\\"]
paths, elements can be separated by either a single or double
@litchar["\\"].}
@item{A directory can be accessed with or without a trailing
separator. In the case of a non-@litchar["\\\\?\\"] path, the
trailing separator can be any number of @litchar{/}s and
@litchar["\\"]s; in the case of a @litchar["\\\\?\\"] path, a
trailing separator must be a single @litchar["\\"], except that
two @litchar["\\"]s can follow
@litchar["\\\\?\\"]@nonterm{letter}@litchar{:}.}
@item{Except for @litchar["\\\\?\\"] paths, a single @litchar{.} as a
path element means ``the current directory,'' and a
@litchar{..} as a path element means ``the parent directory.''
Up-directory path elements (i.e., @litchar{..}) immediately
after a drive are ignored.}
@item{A pathname that starts
@litchar["\\\\"]@nonterm{machine}@litchar["\\"]@nonterm{volume}
(where a @litchar{/} can replace any @litchar["\\"]) is a UNC
path, and the starting
@litchar["\\\\"]@nonterm{machine}@litchar["\\"]@nonterm{volume}
counts as the drive specifier.}
@item{Normally, a path element cannot contain any of the following
characters:
@centerline{@litchar{<} @litchar{>} @litchar{:} @litchar{"} @litchar{/} @litchar["\\"] @litchar["|"]}
Except for @litchar["\\"], path elements containing these
characters can be accessed using a @litchar["\\\\?\\"] path
(assuming that the underlying filesystem allows the
characters).}
@item{In a pathname that starts
@litchar["\\\\?\\"]@nonterm{letter}@litchar[":\\"], the
@litchar["\\\\?\\"]@nonterm{letter}@litchar[":\\"] prefix
counts as the path's drive, as long as the path does not both
contain non-drive elements and end with two consecutive
@litchar["\\"]s, and as long as the path contains no sequence
of three or more @litchar["\\"]s. Two @litchar["\\"]s can
appear in place of the @litchar["\\"] before
@nonterm{letter}. A @litchar{/} cannot be used in place of a
@litchar["\\"] (but @litchar{/}s can be used in element names,
though the result typically does not name an actual directory
or file).}
@item{In a pathname that starts
@litchar["\\\\?\\UNC\\"]@nonterm{machine}@litchar["\\"]@nonterm{volume},
the
@litchar["\\\\?\\UNC\\"]@nonterm{machine}@litchar["\\"]@nonterm{volume}
prefix counts as the path's drive, as long as the path does
not end with two consecutive @litchar["\\"]s, and as long as
the path contains no sequence of three or more
@litchar["\\"]s. Two @litchar["\\"]s can appear in place of
the @litchar["\\"] before @litchar{UNC}, the @litchar["\\"]s
after @litchar{UNC}, and/or the @litchar["\\"]s
after@nonterm{machine}. The letters in the @litchar{UNC} part
can be uppercase or lowercase, and @litchar{/} cannot be used
in place of @litchar["\\"]s (but @litchar{/} can be used in
element names).}
@item{@|MzAdd| A pathname that starts
@litchar["\\\\?\\REL\\"]@nonterm{element} or
@litchar["\\\\?\\REL\\\\"]@nonterm{element} is a relative
path, as long as the path does not end with two consecutive
@litchar["\\"]s, and as long as the path contains no sequence of
three or more @litchar["\\"]s. This Scheme-specific path form
supports relative paths with elements that are not normally
expressible in Windows paths (e.g., a final element that ends
in a space). The @litchar{REL} part must be exactly the three
uppercase letters, and @litchar{/}s cannot be used in place
of @litchar["\\"]s. If the path starts
@litchar["\\\\?\\REL\\.."] then for as long as the
path continues with repetitions of @litchar["\\.."],
each element counts as an up-directory element; a single
@litchar["\\"] must be used to separate the up-directory
elements. As soon as a second @litchar["\\"] is used to separate
the elements, or as soon as a non-@litchar{..} element is
encountered, the remaining elements are all literals (never
up-directory elements). When a @litchar["\\\\?\\REL"] path
value is converted to a string (or when the path value is
written or displayed), the string does not contain the
starting @litchar["\\\\?\\REL"] or the immediately following
@litchar["\\"]s; converting a path value to a byte string
preserves the @litchar["\\\\?\\REL"] prefix.}
@item{@|MzAdd| A pathname that starts
@litchar["\\\\?\\RED\\"]@nonterm{element} or
@litchar["\\\\?\\RED\\\\"]@nonterm{element} is a
drive-relative path, as long as the path does not end with two
consecutive @litchar["\\"]s, and as long as the path contains
no sequence of three or more @litchar["\\"]s. This
Scheme-specific path form supports drive-relative paths (i.e.,
absolute given a drive) with elements that are not normally
expressible in Windows paths. The @litchar{RED} part must be
exactly the three uppercase letters, and @litchar{/}s cannot
be used in place of @litchar["\\"]s. Unlike
@litchar["\\\\?\\REL"] paths, a @litchar{..} element is always
a literal path element. When a @litchar["\\\\?\\RED"] path
value is converted to a string (or when the path value is
written or displayed), the string does not contain the
starting @litchar["\\\\?\\RED"] and it contains a single
starting @litchar["\\"]; converting a path value to a byte
string preserves the @litchar["\\\\?\\RED"] prefix.}
}
Three additional Scheme-specific rules provide meanings to character
sequences that are otherwise ill-formed as Windows paths:
@itemize{
@item{@|MzAdd| In a pathname of the form
@litchar["\\\\?\\"]@nonterm{any}@litchar["\\\\"] where
@nonterm{any} is any non-empty sequence of characters other
than @nonterm{letter}@litchar{:} or
@litchar["\\"]@nonterm{letter}@litchar{:}, the entire path
counts as the path's (non-existent) drive.}
@item{@|MzAdd| In a pathname of the form
@litchar["\\\\?\\"]@nonterm{any}@litchar["\\\\\\"]@nonterm{elements},
where @nonterm{any} is any non-empty sequence of characters
and @nonterm{elements} is any sequence that does not start
with a @litchar["\\"], does not end with two @litchar["\\"]s,
and does not contain a sequence of three @litchar["\\"]s, then
@litchar["\\\\?\\"]@nonterm{any}@litchar["\\\\"] counts as the
path's (non-existent) drive.}
@item{@|MzAdd| In a pathname that starts @litchar["\\\\?\\"] and
does not match any of the patterns from the preceding bullets,
@litchar["\\\\?\\"] counts as the path's (non-existent)
drive.}
}
Outside of Scheme, except for @litchar["\\\\?\\"] paths, pathnames are
typically limited to 259 characters. Scheme internally converts
pathnames to @litchar["\\\\?\\"] form as needed to avoid this
limit. The operating system cannot access files through
@litchar["\\\\?\\"] paths that are longer than 32,000 characters or
so.
Where the above descriptions says ``character,'' substitute ``byte''
for interpreting byte strings as paths. The encoding of Windows paths
into bytes preserves ASCII characters, and all special characters
mentioned above are ASCII, so all of the rules are the same.
Beware that the @litchar["\\"] path separator is an escape character
in Scheme strings. Thus, the path @litchar["\\\\?\\REL\\..\\\\.."] as
a string must be written @scheme["\\\\?\\REL\\..\\\\.."].
A path that ends with a directory separator syntactically refers to a
directory. In addition, a path syntactcially refers to a directory if
its last element is a same-directory or up-directory indicator (not
quoted by a @litchar["\\\\?\\"] form), or if it refers to a root.
Windows paths are @techlink{cleanse}d as follows: In paths that start
@litchar["\\\\?\\"], redundant @litchar["\\"]s are removed, an extra
@litchar["\\"] is added in a @litchar["\\\\?\\REL"] if an extra one is
not already present to separate up-directory indicators from literal
path elements, and an extra @litchar["\\"] is similarly added after
@litchar["\\\\?\\RED"] if an extra one is not already present. When
@litchar["\\\\?\\"] acts as the root and the path contains, to
additional @litchar{/}s (which might otherwise be redundant) are
included after the root. For other paths, multiple @litchar{/}s are
converted to single @litchar{/}s (except at the beginning of a shared
folder name), a @litchar{/} is inserted after the colon in a drive
specification if it is missing.
For @scheme[(bytes->path-element _bstr)], @litchar{/}s, colons,
trailing dots, trailing whitespace, and special device names (e.g.,
``aux'') in @scheme[_bstr] are encoded as a literal part of the path
element by using a @litchar["\\\\?\\REL"] prefix. The @scheme[bstr]
argument must not contain a @litchar["\\"], otherwise the
@exnraise[exn:fail:contract].
For @scheme[(path-element->bytes _path)] or
@scheme[(path-element->string _path)], if the byte-string form of
@scheme[_path] starts with a @litchar["\\\\?\\REL"], the prefix is not
included in the result.
For @scheme[(build-path _base-path _sub-path ...)], trailing spaces
and periods are removed from the last element of @scheme[_base-path]
and all but the last @scheme[_sub-path] (unless the element consists of
only spaces and peroids), except for those that start with
@litchar["\\\\?\\"]. If @scheme[_base-path] starts @litchar["\\\\?\\"],
then after each non-@litchar["\\\\?\\REL\\"] and
non-@litchar["\\\\?\\RED\\"] @scheme[_sub-path] is added, all
@litchar{/}s in the addition are converted to @litchar["\\"]s,
multiple consecutive @litchar["\\"]s are converted to a single
@litchar["\\"], added @litchar{.} elements are removed, and added
@litchar{..} elements are removed along with the preceding element;
these conversions are not performed on the original @scheme[_base-path]
part of the result or on any @litchar["\\\\?\\REL\\"] or
@litchar["\\\\?\\RED\\"] or @scheme[_sub-path]. If a
@litchar["\\\\?\\REL\\"] or @litchar["\\\\?\\RED\\"]
@scheme[_sub-path] is added to a non-@litchar["\\\\?\\"]
@scheme[_base-path], the the @scheme[_base-path] (with any additions up
to the @litchar["\\\\?\\REL\\"] or @litchar["\\\\?\\RED\\"]
@scheme[_sub-path]) is simplified and converted to a
@litchar["\\\\?\\"] path. In other cases, a @litchar["\\"] may be
added or removed before combining paths to avoid changing the root
meaning of the path (e.g., combining @litchar{//x} and @litchar{y}
produces @litchar{/x/y}, because @litchar{//x/y} would be a UNC path
instead of a drive-relative path).
For @scheme[(simplify-path _path _use-filesystem?)], @scheme[_path] is
expanded, and if @scheme[_path] does not start with
@litchar["\\\\?\\"], trailing spaces and periods are removed, a
@litchar{/} is inserted after the colon in a drive specification if it
is missing, and a @litchar["\\"] is inserted after @litchar["\\\\?\\"]
as a root if there are elements and no extra @litchar["\\"]
already. Otherwise, if no indicators or redundant separators are in
@scheme[_path], then @scheme[_path] is returned.
For @scheme[(split-path _path)] producing @scheme[_base],
@scheme[_name], and @scheme[_must-be-dir?], splitting a path that does
not start with @litchar["\\\\?\\"] can produce parts that start with
@litchar["\\\\?\\"]. For example, splitting @litchar{C:/x~/aux/}
produces @litchar["\\\\?\\C:\\x~\\"] and @litchar["\\\\?\\REL\\\\aux"];
the @litchar["\\\\?\\"] is needed in these cases to preserve a
trailing space after @litchar{x} and to avoid referring to the AUX
device instead of an @filepath{aux} file.