[//000000001]: # (uri \- Tcl Uniform Resource Identifier Management)
[//000000002]: # (Generated from file 'uri\.man' by tcllib/doctools with format 'markdown')
[//000000003]: # (uri\(n\) 1\.2\.7 tcllib "Tcl Uniform Resource Identifier Management")
[ Main Table Of Contents | Table Of Contents | Keyword Index | Categories | Modules | Applications ]
# NAME
uri \- URI utilities
# Table Of Contents
- [Table Of Contents](#toc)
- [Synopsis](#synopsis)
- [Description](#section1)
- [COMMANDS](#section2)
- [SCHEMES](#section3)
- [EXTENDING](#section4)
- [QUIRK OPTIONS](#section5)
- [BACKWARD COMPATIBILITY](#subsection1)
- [NEW DESIGNS](#subsection2)
- [DEFAULT VALUES](#subsection3)
- [EXAMPLES](#section6)
- [CREDITS](#section7)
- [Bugs, Ideas, Feedback](#section8)
- [Keywords](#keywords)
- [Category](#category)
# SYNOPSIS
package require Tcl 8\.2
package require uri ?1\.2\.7?
[__uri::setQuirkOption__ *option* ?*value*?](#1)
[__uri::split__ *url* ?*defaultscheme*?](#2)
[__uri::join__ ?*key* *value*?\.\.\.](#3)
[__uri::resolve__ *base* *url*](#4)
[__uri::isrelative__ *url*](#5)
[__uri::geturl__ *url* ?*options*\.\.\.?](#6)
[__uri::canonicalize__ *uri*](#7)
[__uri::register__ *schemeList* *script*](#8)
# DESCRIPTION
This package does two things\.
First, it provides a number of commands for manipulating URLs/URIs and fetching
data specified by them\. For fetching data this package analyses the requested
URL/URI and then dispatches it to the appropriate package
\(__[http](\.\./\.\./\.\./\.\./index\.md\#http)__,
__[ftp](\.\./ftp/ftp\.md)__, \.\.\.\) for actual retrieval\. Currently these
commands are defined for the schemes *[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*,
*[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto)*,
*[news](\.\./\.\./\.\./\.\./index\.md\#news)*,
*[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap)*, *ldaps* and
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*\. The package __uri::urn__ adds
scheme *[urn](\.\./\.\./\.\./\.\./index\.md\#urn)*\.
Second, it provides regular expressions for a number of __registered__
URL/URI schemes\. Registered schemes are currently
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*,
*[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap)*, *ldaps*,
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*,
*[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[gopher](\.\./\.\./\.\./\.\./index\.md\#gopher)*,
*[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto)*,
*[news](\.\./\.\./\.\./\.\./index\.md\#news)*,
*[wais](\.\./\.\./\.\./\.\./index\.md\#wais)* and
*[prospero](\.\./\.\./\.\./\.\./index\.md\#prospero)*\. The package __uri::urn__
adds scheme *[urn](\.\./\.\./\.\./\.\./index\.md\#urn)*\.
The commands of the package conform to RFC 3986
\([https://www\.rfc\-editor\.org/rfc/rfc3986\.txt](https://www\.rfc\-editor\.org/rfc/rfc3986\.txt)\),
with the exception of a loophole arising from RFC 1630 and described in RFC 3986
Sections 5\.2\.2 and 5\.4\.2\. The loophole allows a relative URI to include a scheme
if it is the same as the scheme of the base URI against which it is resolved\.
RFC 3986 recommends avoiding this usage\.
# COMMANDS
- __uri::setQuirkOption__ *option* ?*value*?
__uri::setQuirkOption__ is an accessor command for a number of "quirk
options"\. The command has the same semantics as the command
__[set](\.\./\.\./\.\./\.\./index\.md\#set)__: when called with one argument
it reads an existing value; with two arguments it writes a new value\. The
value of a "quirk option" is boolean: the value __false__ requests
conformance with RFC 3986, while __true__ requests use of the quirk\. See
section [QUIRK OPTIONS](#section5) for discussion of the different
options and their purpose\.
- __uri::split__ *url* ?*defaultscheme*?
__uri::split__ takes a *url*, decodes it and then returns a list of
key/value pairs suitable for __array set__ containing the constituents
of the *url*\. If the scheme is missing from the *url* it defaults to the
value of *defaultscheme* if it was specified, or
*[http](\.\./\.\./\.\./\.\./index\.md\#http)* else\. Currently the schemes
*[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*,
*[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto)*,
*[news](\.\./\.\./\.\./\.\./index\.md\#news)*,
*[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap)*, *ldaps* and
*[file](\.\./\.\./\.\./\.\./index\.md\#file)* are supported by the package
itself\. See section [EXTENDING](#section4) on how to expand that range\.
The set of constituents of a URL \(= the set of keys in the returned
dictionary\) is dependent on the scheme of the URL\. The only key which is
therefore always present is __scheme__\. For the following schemes the
constituents and their keys are known:
* ftp
__user__, __pwd__, __host__, __port__, __path__,
__type__, __pbare__\. The pbare is optional\.
* http\(s\)
__user__, __pwd__, __host__, __port__, __path__,
__query__, __fragment__, __pbare__\. The pbare is optional\.
* file
__path__, __host__\. The host is optional\.
* mailto
__user__, __host__\. The host is optional\.
* ldap\(s\)
__host__, __port__, __dn__, __attrs__, __scope__,
__filter__, __extensions__
* news
Either __message\-id__ or __newsgroup\-name__\.
For discussion of the boolean __pbare__ see options *NoInitialSlash*
and *NoExtraKeys* in [QUIRK OPTIONS](#section5)\.
The constituents are returned as slices of the argument *url*, without
removal of percent\-encoding \("url\-encoding"\) or other adaptations\. Notably,
on Windows® the __path__ in scheme
*[file](\.\./\.\./\.\./\.\./index\.md\#file)* is not a valid local filename\. See
[EXAMPLES](#section6) for more information\.
- __uri::join__ ?*key* *value*?\.\.\.
__uri::join__ takes a list of key/value pairs \(generated by
__uri::split__, for example\) and returns the canonical URL they
represent\. Currently the schemes *[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*,
*[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto)*,
*[news](\.\./\.\./\.\./\.\./index\.md\#news)*,
*[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap)*, *ldaps* and
*[file](\.\./\.\./\.\./\.\./index\.md\#file)* are supported by the package
itself\. See section [EXTENDING](#section4) on how to expand that range\.
The arguments are expected to be slices of a valid URL, with
percent\-encoding \("url\-encoding"\) and any other necessary adaptations\.
Notably, on Windows the __path__ in scheme
*[file](\.\./\.\./\.\./\.\./index\.md\#file)* is not a valid local filename\. See
[EXAMPLES](#section6) for more information\.
- __uri::resolve__ *base* *url*
__uri::resolve__ resolves the specified *url* relative to *base*, in
conformance with RFC 3986\. In other words: a non\-relative *url* is
returned unchanged, whereas for a relative *url* the missing parts are
taken from *base* and prepended to it\. The result of this operation is
returned\. For an empty *url* the result is *base*, without its URI
fragment \(if any\)\. The command is available for schemes
*[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*, and
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*\.
- __uri::isrelative__ *url*
__uri::isrelative__ determines whether the specified *url* is absolute
or relative\. The command is available for a *url* of any scheme\.
- __uri::geturl__ *url* ?*options*\.\.\.?
__uri::geturl__ decodes the specified *url* and then dispatches the
request to the package appropriate for the scheme found in the URL\. The
command assumes that the package to handle the given scheme either has the
same name as the scheme itself \(including possible capitalization\) followed
by __::geturl__, or, in case of this failing, has the same name as the
scheme itself \(including possible capitalization\)\. It further assumes that
whatever package was loaded provides a __geturl__\-command in the
namespace of the same name as the package itself\. This command is called
with the given *url* and all given *options*\. Currently __geturl__
does not handle any options itself\.
*Note:* *[file](\.\./\.\./\.\./\.\./index\.md\#file)*\-URLs are an exception to
the rule described above\. They are handled internally\.
It is not possible to specify results of the command\. They depend on the
__geturl__\-command for the scheme the request was dispatched to\.
- __uri::canonicalize__ *uri*
__uri::canonicalize__ returns the canonical form of a URI\. The canonical
form of a URI is one where relative path specifications, i\.e\. "\." and "\.\.",
have been resolved\. The command is available for all URI schemes that have
__uri::split__ and __uri::join__ commands\. The command returns a
canonicalized URI if the URI scheme has a __path__ component \(i\.e\.
*[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*, and
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*\)\. For schemes that have
__uri::split__ and __uri::join__ commands but no __path__
component \(i\.e\. *[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto)*,
*[news](\.\./\.\./\.\./\.\./index\.md\#news)*,
*[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap)*, and *ldaps*\), the command
returns the *uri* unchanged\.
- __uri::register__ *schemeList* *script*
__uri::register__ registers the first element of *schemeList* as a new
scheme and the remaining elements as aliases for this scheme\. It creates the
namespace for the scheme and executes the *script* in the new namespace\.
The script has to declare variables containing regular expressions relevant
to the scheme\. At least the variable __schemepart__ has to be declared
as that one is used to extend the variables keeping track of the registered
schemes\.
# SCHEMES
In addition to the commands mentioned above this package provides regular
expression to recognize URLs for a number of URL schemes\.
For each supported scheme a namespace of the same name as the scheme itself is
provided inside of the namespace *uri* containing the variable __url__
whose contents are a regular expression to recognize URLs of that scheme\.
Additional variables may contain regular expressions for parts of URLs for that
scheme\.
The variable __uri::schemes__ contains a list of all registered schemes\.
Currently these are *[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*,
*[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap)*, *ldaps*,
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*,
*[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*,
*[gopher](\.\./\.\./\.\./\.\./index\.md\#gopher)*,
*[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto)*,
*[news](\.\./\.\./\.\./\.\./index\.md\#news)*,
*[wais](\.\./\.\./\.\./\.\./index\.md\#wais)* and
*[prospero](\.\./\.\./\.\./\.\./index\.md\#prospero)*\.
# EXTENDING
Extending the range of schemes supported by __uri::split__ and
__uri::join__ is easy because both commands do not handle the request by
themselves but dispatch it to another command in the *uri* namespace using the
scheme of the URL as criterion\.
__uri::split__ and __uri::join__ call __Split\[string totitle
\]__ and __Join\[string totitle \]__ respectively\.
The provision of split and join commands is sufficient to extend the commands
__uri::canonicalize__ and __uri::geturl__ \(the latter subject to the
availability of a suitable package with a __geturl__ command\)\. In contrast,
to extend the command __uri::resolve__ to a new scheme, the command itself
must be modified\.
To extend the range of schemes for which pattern information is available, use
the command __uri::register__\.
An example of a package that provides both commands and pattern information for
a new scheme is __uri::urn__, which adds scheme
*[urn](\.\./\.\./\.\./\.\./index\.md\#urn)*\.
# QUIRK OPTIONS
The value of a "quirk option" is boolean: the value __false__ requests
conformance with RFC 3986, while __true__ requests use of the quirk\. Use
command __uri::setQuirkOption__ to access the values of quirk options\.
Quirk options are useful both for allowing backwards compatibility when a
command specification changes, and for adding useful features that are not
included in RFC specifications\. The following quirk options are currently
defined:
- *NoInitialSlash*
This quirk option concerns the leading character of __path__ \(if
non\-empty\) in the schemes *[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*, and
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*\.
RFC 3986 defines __path__ in an absolute URI to have an initial "/",
unless the value of __path__ is the empty string\. For the scheme
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*, all versions of package
__uri__ follow this rule\. The quirk option *NoInitialSlash* does not
apply to scheme *[file](\.\./\.\./\.\./\.\./index\.md\#file)*\.
For the schemes *[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*, and
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*, versions of __uri__ before
1\.2\.7 define the __path__ *NOT* to include an initial "/"\. When the
quirk option *NoInitialSlash* is __true__ \(the default\), this behavior
is also used in version 1\.2\.7\. To use instead values of __path__ as
defined by RFC 3986, set this quirk option to __false__\.
This setting does not affect RFC 3986 conformance\. If *NoInitialSlash* is
__true__, then the value of __path__ in the schemes
*[http](\.\./\.\./\.\./\.\./index\.md\#http)*,
*[https](\.\./\.\./\.\./\.\./index\.md\#https)*, or
*[ftp](\.\./\.\./\.\./\.\./index\.md\#ftp)*, cannot distinguish between URIs in
which the full "RFC 3986 path" is the empty string "" or a single slash "/"
respectively\. The missing information is recorded in an additional
__uri::split__ key __pbare__\.
The boolean __pbare__ is defined when quirk options *NoInitialSlash*
and *NoExtraKeys* have values __true__ and __false__ respectively\.
In this case, if the value of __path__ is the empty string "",
__pbare__ is __true__ if the full "RFC 3986 path" is "", and
__pbare__ is __false__ if the full "RFC 3986 path" is "/"\.
Using this quirk option *NoInitialSlash* is a matter of preference\.
- *NoExtraKeys*
This quirk option permits full backward compatibility with versions of
__uri__ before 1\.2\.7, by omitting the __uri::split__ key
__pbare__ described above \(see quirk option *NoInitialSlash*\)\. The
outcome is greater backward compatibility of the __uri::split__ command,
but an inability to distinguish between URIs in which the full "RFC 3986
path" is the empty string "" or a single slash "/" respectively \- i\.e\. a
minor non\-conformance with RFC 3986\.
If the quirk option *NoExtraKeys* is __false__ \(the default\), command
__uri::split__ returns an additional key __pbare__, and the commands
comply with RFC 3986\. If the quirk option *NoExtraKeys* is __true__,
the key __pbare__ is not defined and there is not full conformance with
RFC 3986\.
Using the quirk option *NoExtraKeys* is *NOT* recommended, because if
set to __true__ it will reduce conformance with RFC 3986\. The option is
included only for compatibility with code, written for earlier versions of
__uri__, that needs values of __path__ without a leading "/", *AND
ALSO* cannot tolerate unexpected keys in the results of __uri::split__\.
- *HostAsDriveLetter*
When handling the scheme *[file](\.\./\.\./\.\./\.\./index\.md\#file)* on the
Windows platform, versions of __uri__ before 1\.2\.7 use the __host__
field to represent a Windows drive letter and the colon that follows it, and
the __path__ field to represent the filename path after the colon\. Such
URIs are invalid, and are not recognized by any RFC\. When the quirk option
*HostAsDriveLetter* is __true__, this behavior is also used in version
1\.2\.7\. To use *[file](\.\./\.\./\.\./\.\./index\.md\#file)* URIs on Windows that
conform to RFC 3986, set this quirk option to __false__ \(the default\)\.
Using this quirk is *NOT* recommended, because if set to __true__ it
will cause the __uri__ commands to expect and produce invalid URIs\. The
option is included only for compatibility with legacy code\.
- *RemoveDoubleSlashes*
When a URI is canonicalized by __uri::canonicalize__, its __path__
is normalized by removal of segments "\." and "\.\."\. RFC 3986 does not mandate
the removal of empty segments "" \(i\.e\. the merger of double slashes, which
is a feature of filename normalization but not of URI __path__
normalization\): it treats URIs with excess slashes as referring to different
resources\. When the quirk option *RemoveDoubleSlashes* is __true__
\(the default\), empty segments will be removed from __path__\. To prevent
removal, and thereby conform to RFC 3986, set this quirk option to
__false__\.
Using this quirk is a matter of preference\. A URI with double slashes in its
path was most likely generated by error, certainly so if it has a
straightforward mapping to a file on a server\. In some cases it may be
better to sanitize the URI; in others, to keep the URI and let the server
handle the possible error\.
## BACKWARD COMPATIBILITY
To behave as similarly as possible to versions of __uri__ earlier than
1\.2\.7, set the following quirk options:
- __uri::setQuirkOption__ *NoInitialSlash* 1
- __uri::setQuirkOption__ *NoExtraKeys* 1
- __uri::setQuirkOption__ *HostAsDriveLetter* 1
- __uri::setQuirkOption__ *RemoveDoubleSlashes* 0
In code that can tolerate the return by __uri::split__ of an additional key
__pbare__, set
- __uri::setQuirkOption__ *NoExtraKeys* 0
in order to achieve greater compliance with RFC 3986\.
## NEW DESIGNS
For new projects, the following settings are recommended:
- __uri::setQuirkOption__ *NoInitialSlash* 0
- __uri::setQuirkOption__ *NoExtraKeys* 0
- __uri::setQuirkOption__ *HostAsDriveLetter* 0
- __uri::setQuirkOption__ *RemoveDoubleSlashes* 0|1
## DEFAULT VALUES
The default values for package __uri__ version 1\.2\.7 are intended to be a
compromise between backwards compatibility and improved features\. Different
default values may be chosen in future versions of package __uri__\.
- __uri::setQuirkOption__ *NoInitialSlash* 1
- __uri::setQuirkOption__ *NoExtraKeys* 0
- __uri::setQuirkOption__ *HostAsDriveLetter* 0
- __uri::setQuirkOption__ *RemoveDoubleSlashes* 1
# EXAMPLES
A Windows® local filename such as "__C:\\Other Files\\startup\.txt__" is not
suitable for use as the __path__ element of a URI in the scheme
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*\.
The Tcl command __file normalize__ will convert the backslashes to forward
slashes\. To generate a valid __path__ for the scheme
*[file](\.\./\.\./\.\./\.\./index\.md\#file)*, the normalized filename must be
prepended with "__/__", and then any characters that do not match the
__regexp__ bracket expression
[a-zA-Z0-9$_.+!*'(,)?:@&=-]
must be percent\-encoded\.
The result in this example is "__/C:/Other%20Files/startup\.txt__" which is a
valid value for __path__\.
% uri::join path /C:/Other%20Files/startup.txt scheme file
file:///C:/Other%20Files/startup.txt
% uri::split file:///C:/Other%20Files/startup.txt
path /C:/Other%20Files/startup.txt scheme file
On UNIX® systems filenames begin with "__/__" which is also used as the
directory separator\. The only action needed to convert a filename to a valid
__path__ is percent\-encoding\.
# CREDITS
Original code \(regular expressions\) by Andreas Kupries\. Modularisation by Steve
Ball, also the split/join/resolve functionality\. RFC 3986 conformance by Keith
Nash\.
# Bugs, Ideas, Feedback
This document, and the package it describes, will undoubtedly contain bugs and
other problems\. Please report such in the category *uri* of the [Tcllib
Trackers](http://core\.tcl\.tk/tcllib/reportlist)\. Please also report any ideas
for enhancements you may have for either package and/or documentation\.
When proposing code changes, please provide *unified diffs*, i\.e the output of
__diff \-u__\.
Note further that *attachments* are strongly preferred over inlined patches\.
Attachments can be made by going to the __Edit__ form of the ticket
immediately after its creation, and then using the left\-most button in the
secondary navigation bar\.
# KEYWORDS
[fetching information](\.\./\.\./\.\./\.\./index\.md\#fetching\_information),
[file](\.\./\.\./\.\./\.\./index\.md\#file), [ftp](\.\./\.\./\.\./\.\./index\.md\#ftp),
[gopher](\.\./\.\./\.\./\.\./index\.md\#gopher),
[http](\.\./\.\./\.\./\.\./index\.md\#http), [https](\.\./\.\./\.\./\.\./index\.md\#https),
[ldap](\.\./\.\./\.\./\.\./index\.md\#ldap),
[mailto](\.\./\.\./\.\./\.\./index\.md\#mailto),
[news](\.\./\.\./\.\./\.\./index\.md\#news),
[prospero](\.\./\.\./\.\./\.\./index\.md\#prospero), [rfc
1630](\.\./\.\./\.\./\.\./index\.md\#rfc\_1630), [rfc
2255](\.\./\.\./\.\./\.\./index\.md\#rfc\_2255), [rfc
2396](\.\./\.\./\.\./\.\./index\.md\#rfc\_2396), [rfc
3986](\.\./\.\./\.\./\.\./index\.md\#rfc\_3986), [uri](\.\./\.\./\.\./\.\./index\.md\#uri),
[url](\.\./\.\./\.\./\.\./index\.md\#url), [wais](\.\./\.\./\.\./\.\./index\.md\#wais),
[www](\.\./\.\./\.\./\.\./index\.md\#www)
# CATEGORY
Networking