Codebase list hyperlink / dfbb138
New upstream version 17.3.1 Free Ekanayaka 6 years ago
33 changed file(s) with 4284 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0 [run]
1 branch = True
2 source =
3 hyperlink
4 ../hyperlink
5
6 [paths]
7 source =
8 ../hyperlink
9 */lib/python*/site-packages/hyperlink
10 */Lib/site-packages/hyperlink
11 */pypy/site-packages/hyperlink
0 # Hyperlink Changelog
1
2 ## dev (not yet released)
3
4 * *None so far*
5
6 ## 17.3.0
7
8 *(July 18, 2017)*
9
10 Fixed a couple major decoding issues and simplified the URL API.
11
12 * limit types accepted by `URL.from_text()` to just text (str on py3,
13 unicode on py2), see #20
14 * fix percent decoding issues surrounding multiple calls to
15 `URL.to_iri()` (see #16)
16 * remove the `socket`-inspired `family` argument from `URL`'s APIs. It
17 was never consistently implemented and leaked slightly more problems
18 than it solved.
19 * Improve authority parsing (see #26)
20 * include LICENSE, README, docs, and other resources in the package
21
22 ## 17.2.1
23
24 *(June 18, 2017)*
25
26 A small bugfix release after yesterday's big changes. This patch
27 version simply reverts an exception message for parameters expecting
28 strings on Python 3, returning to compliance with Twisted's test
29 suite.
30
31 ## 17.2.0
32
33 *(June 17, 2017)*
34
35 Fixed a great round of issues based on the amazing community review
36 (@wsanchez and @jvanasco) after our first listserv announcement and
37 [PyConWeb talk](https://www.youtube.com/watch?v=EIkmADO-r10).
38
39 * Add checking for invalid unescaped delimiters in parameters to the
40 `URL` constructor. No more slashes and question marks allowed in
41 path segments themselves.
42 * More robust support for IDNA decoding on "narrow"/UCS-2 Python
43 builds (e.g., Mac's built-in Python).
44 * Correctly encode colons in the first segment of relative paths for
45 URLs with no scheme set.
46 * Make URLs with empty paths compare as equal (`http://example.com`
47 vs. `http://example.com/`) per RFC 3986. If you need the stricter
48 check, you can check the attributes directly or compare the strings.
49 * Automatically escape the arguments to `.child()` and `.sibling()`
50 * Fix some IPv6 and port parsing corner cases.
51
52 ## 17.1.1
53
54 * Python 2.6 support
55 * Added LICENSE
56 * Automated CI and code coverage
57 * When a host and a query string are present, empty paths are now
58 rendered as a single slash. This is slightly more in line with RFC
59 3986 section 6.2.3, but might need to go further and use an empty
60 slash whenever the authority is present. This also better replicates
61 Twisted URL's old behavior.
62
63 ## 17.1.0
64
65 * Correct encoding for username/password part of URL (userinfo)
66 * Dot segments are resolved on empty URL.click
67 * Many, many more schemes and default ports
68 * Faster percent-encoding with segment-specific functions
69 * Better detection and inference of scheme netloc usage (the presence
70 of `//` in URLs)
71 * IPv6 support with IP literal validation
72 * Faster, regex-based parsing
73 * URLParseError type for errors while parsing URLs
74 * URL is now hashable, so feel free to use URLs as keys in dicts
75 * Improved error on invalid scheme, directing users to URL.from_text
76 in the event that they used the wrong constructor
77 * PEP8-compatible API, with full, transparent backwards compatibility
78 for Twisted APIs, guaranteed.
79 * Extensive docstring expansion.
80
81 ## Pre-17.0.0
82
83 * Lots of good features! Used to be called twisted.python.url
0 Copyright (c) 2017
1 Glyph Lefkowitz
2 Itamar Turner-Trauring
3 Jean Paul Calderone
4 Adi Roiban
5 Amber Hawkie Brown
6 Mahmoud Hashemi
7
8 and others that have contributed code to the public domain.
9
10 Permission is hereby granted, free of charge, to any person obtaining
11 a copy of this software and associated documentation files (the
12 "Software"), to deal in the Software without restriction, including
13 without limitation the rights to use, copy, modify, merge, publish,
14 distribute, sublicense, and/or sell copies of the Software, and to
15 permit persons to whom the Software is furnished to do so, subject to
16 the following conditions:
17
18 The above copyright notice and this permission notice shall be
19 included in all copies or substantial portions of the Software.
20
21 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
22 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
23 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
24 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
25 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
26 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
27 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
0 include README.md LICENSE CHANGELOG.md tox.ini requirements-test.txt .coveragerc Makefile pytest.ini .tox-coveragerc
1 exclude TODO.md appveyor.yml
2
3 graft docs
4 prune docs/_build
0 Metadata-Version: 1.1
1 Name: hyperlink
2 Version: 17.3.1
3 Summary: A featureful, correct URL for Python.
4 Home-page: https://github.com/python-hyper/hyperlink
5 Author: Mahmoud Hashemi and Glyph Lefkowitz
6 Author-email: mahmoud@hatnote.com
7 License: MIT
8 Description: The humble, but powerful, URL runs everything around us. Chances
9 are you've used several just to read this text.
10
11 Hyperlink is a featureful, pure-Python implementation of the URL, with
12 an emphasis on correctness. BSD licensed.
13
14 See the docs at http://hyperlink.readthedocs.io.
15
16 Platform: any
17 Classifier: Topic :: Utilities
18 Classifier: Intended Audience :: Developers
19 Classifier: Topic :: Software Development :: Libraries
20 Classifier: Development Status :: 5 - Production/Stable
21 Classifier: Programming Language :: Python :: 2.6
22 Classifier: Programming Language :: Python :: 2.7
23 Classifier: Programming Language :: Python :: 3.4
24 Classifier: Programming Language :: Python :: 3.5
25 Classifier: Programming Language :: Python :: 3.6
26 Classifier: Programming Language :: Python :: Implementation :: PyPy
0 # Hyperlink
1
2 *Cool URLs that don't change.*
3
4 <a href="https://hyperlink.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat"></a>
5 <a href="https://pypi.python.org/pypi/hyperlink"><img src="https://img.shields.io/pypi/v/boltons.svg"></a>
6 <a href="http://calver.org"><img src="https://img.shields.io/badge/calver-YY.MINOR.MICRO-22bfda.svg"></a>
7
8 Hyperlink provides a pure-Python implementation of immutable
9 URLs. Based on [RFC 3986][rfc3986] and [3987][rfc3987], the Hyperlink URL
10 makes working with both URIs and IRIs easy.
11
12 Hyperlink is tested against Python 2.7, 3.4, 3.5, 3.6, and PyPy.
13
14 Full documentation is available on [Read the Docs][docs].
15
16 [rfc3986]: https://tools.ietf.org/html/rfc3986
17 [rfc3987]: https://tools.ietf.org/html/rfc3987
18 [docs]: http://hyperlink.readthedocs.io/en/latest/
19
20 ## Installation
21
22 Hyperlink is a pure-Python package and requires nothing but
23 Python. The easiest way to install is with pip:
24
25 ```
26 pip install hyperlink
27 ```
28
29 Then, hyperlink away!
30
31 ```python
32 from hyperlink import URL
33
34 url = URL.from_text('http://github.com/mahmoud/hyperlink?utm_source=README')
35 utm_source = url.get('utm_source')
36 better_url = url.replace(scheme='https')
37 user_url = better_url.click('..')
38 ```
39
40 See the full API docs on [Read the Docs][docs].
41
42 ## More information
43
44 Hyperlink would not have been possible without the help of
45 [Glyph Lefkowitz](https://glyph.twistedmatrix.com/) and many other
46 community members, especially considering that it started as an
47 extract from the Twisted networking library. Thanks to them,
48 Hyperlink's URL has been production-grade for well over a decade.
49
50 Still, should you encounter any issues, do file an issue, or submit a
51 pull request.
0 # Makefile for Sphinx documentation
1 #
2
3 # You can set these variables from the command line.
4 SPHINXOPTS =
5 SPHINXBUILD = sphinx-build
6 PAPER =
7 BUILDDIR = _build
8
9 # User-friendly check for sphinx-build
10 ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
11 $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
12 endif
13
14 # Internal variables.
15 PAPEROPT_a4 = -D latex_paper_size=a4
16 PAPEROPT_letter = -D latex_paper_size=letter
17 ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
18 # the i18n builder cannot share the environment and doctrees with the others
19 I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
20
21 .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest coverage gettext
22
23 help:
24 @echo "Please use \`make <target>' where <target> is one of"
25 @echo " html to make standalone HTML files"
26 @echo " dirhtml to make HTML files named index.html in directories"
27 @echo " singlehtml to make a single large HTML file"
28 @echo " pickle to make pickle files"
29 @echo " json to make JSON files"
30 @echo " htmlhelp to make HTML files and a HTML help project"
31 @echo " qthelp to make HTML files and a qthelp project"
32 @echo " applehelp to make an Apple Help Book"
33 @echo " devhelp to make HTML files and a Devhelp project"
34 @echo " epub to make an epub"
35 @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
36 @echo " latexpdf to make LaTeX files and run them through pdflatex"
37 @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
38 @echo " text to make text files"
39 @echo " man to make manual pages"
40 @echo " texinfo to make Texinfo files"
41 @echo " info to make Texinfo files and run them through makeinfo"
42 @echo " gettext to make PO message catalogs"
43 @echo " changes to make an overview of all changed/added/deprecated items"
44 @echo " xml to make Docutils-native XML files"
45 @echo " pseudoxml to make pseudoxml-XML files for display purposes"
46 @echo " linkcheck to check all external links for integrity"
47 @echo " doctest to run all doctests embedded in the documentation (if enabled)"
48 @echo " coverage to run coverage check of the documentation (if enabled)"
49
50 clean:
51 rm -rf $(BUILDDIR)/*
52
53 html:
54 $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
55 @echo
56 @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
57
58 dirhtml:
59 $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
60 @echo
61 @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
62
63 singlehtml:
64 $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
65 @echo
66 @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
67
68 pickle:
69 $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
70 @echo
71 @echo "Build finished; now you can process the pickle files."
72
73 json:
74 $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
75 @echo
76 @echo "Build finished; now you can process the JSON files."
77
78 htmlhelp:
79 $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
80 @echo
81 @echo "Build finished; now you can run HTML Help Workshop with the" \
82 ".hhp project file in $(BUILDDIR)/htmlhelp."
83
84 qthelp:
85 $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
86 @echo
87 @echo "Build finished; now you can run "qcollectiongenerator" with the" \
88 ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
89 @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/hyperlink.qhcp"
90 @echo "To view the help file:"
91 @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/hyperlink.qhc"
92
93 applehelp:
94 $(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
95 @echo
96 @echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
97 @echo "N.B. You won't be able to view it unless you put it in" \
98 "~/Library/Documentation/Help or install it in your application" \
99 "bundle."
100
101 devhelp:
102 $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
103 @echo
104 @echo "Build finished."
105 @echo "To view the help file:"
106 @echo "# mkdir -p $$HOME/.local/share/devhelp/hyperlink"
107 @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/hyperlink"
108 @echo "# devhelp"
109
110 epub:
111 $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
112 @echo
113 @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
114
115 latex:
116 $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
117 @echo
118 @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
119 @echo "Run \`make' in that directory to run these through (pdf)latex" \
120 "(use \`make latexpdf' here to do that automatically)."
121
122 latexpdf:
123 $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
124 @echo "Running LaTeX files through pdflatex..."
125 $(MAKE) -C $(BUILDDIR)/latex all-pdf
126 @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
127
128 latexpdfja:
129 $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
130 @echo "Running LaTeX files through platex and dvipdfmx..."
131 $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
132 @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
133
134 text:
135 $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
136 @echo
137 @echo "Build finished. The text files are in $(BUILDDIR)/text."
138
139 man:
140 $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
141 @echo
142 @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
143
144 texinfo:
145 $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
146 @echo
147 @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
148 @echo "Run \`make' in that directory to run these through makeinfo" \
149 "(use \`make info' here to do that automatically)."
150
151 info:
152 $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
153 @echo "Running Texinfo files through makeinfo..."
154 make -C $(BUILDDIR)/texinfo info
155 @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
156
157 gettext:
158 $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
159 @echo
160 @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
161
162 changes:
163 $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
164 @echo
165 @echo "The overview file is in $(BUILDDIR)/changes."
166
167 linkcheck:
168 $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
169 @echo
170 @echo "Link check complete; look for any errors in the above output " \
171 "or in $(BUILDDIR)/linkcheck/output.txt."
172
173 doctest:
174 $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
175 @echo "Testing of doctests in the sources finished, look at the " \
176 "results in $(BUILDDIR)/doctest/output.txt."
177
178 coverage:
179 $(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
180 @echo "Testing of coverage in the sources finished, look at the " \
181 "results in $(BUILDDIR)/coverage/python.txt."
182
183 xml:
184 $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
185 @echo
186 @echo "Build finished. The XML files are in $(BUILDDIR)/xml."
187
188 pseudoxml:
189 $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
190 @echo
191 @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
0 {% extends "!page.html" %}
1 {% block menu %}
2 {{ super() }}
3 <iframe src="https://ghbtns.com/github-btn.html?user=python-hyper&repo=hyperlink&type=star&count=true&size=medium" frameborder="0" scrolling="0" width="160px" height="30px" style="margin-left: 23px; margin-top: 10px;"></iframe>
4 {% endblock %}
0 .. _hyperlink_api:
1
2 Hyperlink API
3 =============
4
5 .. automodule:: hyperlink._url
6
7 Creation
8 --------
9
10 Before you can work with URLs, you must create URLs. There are two
11 ways to create URLs, from parts and from text.
12
13 .. autoclass:: hyperlink.URL
14 .. automethod:: hyperlink.URL.from_text
15
16 Transformation
17 --------------
18
19 Once a URL is created, some of the most common tasks are to transform
20 it into other URLs and text.
21
22 .. automethod:: hyperlink.URL.to_text
23 .. automethod:: hyperlink.URL.to_uri
24 .. automethod:: hyperlink.URL.to_iri
25 .. automethod:: hyperlink.URL.replace
26
27 Navigation
28 ----------
29
30 Go places with URLs. Simulate browser behavior and perform semantic
31 path operations.
32
33 .. automethod:: hyperlink.URL.click
34 .. automethod:: hyperlink.URL.sibling
35 .. automethod:: hyperlink.URL.child
36
37 Query Parameters
38 ----------------
39
40 CRUD operations on the query string multimap.
41
42 .. automethod:: hyperlink.URL.get
43 .. automethod:: hyperlink.URL.add
44 .. automethod:: hyperlink.URL.set
45 .. automethod:: hyperlink.URL.remove
46
47 Attributes
48 ----------
49
50 URLs have many parts, and URL objects have many attributes to represent them.
51
52 .. autoattribute:: hyperlink.URL.absolute
53 .. autoattribute:: hyperlink.URL.scheme
54 .. autoattribute:: hyperlink.URL.host
55 .. autoattribute:: hyperlink.URL.port
56 .. autoattribute:: hyperlink.URL.path
57 .. autoattribute:: hyperlink.URL.query
58 .. autoattribute:: hyperlink.URL.fragment
59 .. autoattribute:: hyperlink.URL.userinfo
60 .. autoattribute:: hyperlink.URL.user
61 .. autoattribute:: hyperlink.URL.rooted
62 .. autoattribute:: hyperlink.URL.family
63
64 Low-level functions
65 -------------------
66
67 A couple of notable helpers used by the :class:`~hyperlink.URL` type.
68
69 .. autoclass:: hyperlink.URLParseError
70 .. autofunction:: hyperlink.register_scheme
71 .. autofunction:: hyperlink.parse_host
72
73 .. TODO: run doctests in docs?
0 # -*- coding: utf-8 -*-
1 #
2 # hyperlink documentation build configuration file, created by
3 # sphinx-quickstart on Sat Mar 21 00:34:18 2015.
4 #
5 # This file is execfile()d with the current directory set to its
6 # containing dir.
7 #
8 # Note that not all possible configuration values are present in this
9 # autogenerated file.
10 #
11 # All configuration values have a default; values that are commented out
12 # serve to show the default.
13
14 import os
15 import sys
16 import sphinx
17 from pprint import pprint
18
19 # If extensions (or modules to document with autodoc) are in another directory,
20 # add these directories to sys.path here. If the directory is relative to the
21 # documentation root, use os.path.abspath to make it absolute, like shown here.
22 CUR_PATH = os.path.dirname(os.path.abspath(__file__))
23 PROJECT_PATH = os.path.abspath(CUR_PATH + '/../')
24 PACKAGE_PATH = os.path.abspath(CUR_PATH + '/../hyperlink')
25 sys.path.insert(0, PROJECT_PATH)
26 sys.path.insert(0, PACKAGE_PATH)
27
28 pprint(os.environ)
29
30 # -- General configuration ------------------------------------------------
31
32 autosummary_generate = True
33
34 # Add any Sphinx extension module names here, as strings. They can be
35 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
36 # ones.
37 extensions = [
38 'sphinx.ext.autodoc',
39 'sphinx.ext.autosummary',
40 'sphinx.ext.doctest',
41 'sphinx.ext.intersphinx',
42 'sphinx.ext.coverage',
43 'sphinx.ext.viewcode',
44 ]
45
46 # Read the Docs is version 1.2 as of writing
47 if sphinx.version_info[:2] < (1, 3):
48 extensions.append('sphinxcontrib.napoleon')
49 else:
50 extensions.append('sphinx.ext.napoleon')
51
52 # Add any paths that contain templates here, relative to this directory.
53 templates_path = ['_templates']
54
55 # source_suffix = ['.rst', '.md']
56 source_suffix = '.rst'
57
58 # The master toctree document.
59 master_doc = 'index'
60
61 # General information about the project.
62 project = u'hyperlink'
63 copyright = u'2017, Mahmoud Hashemi'
64 author = u'Mahmoud Hashemi'
65
66 version = '17.3'
67 release = '17.3.0'
68
69 if os.name != 'nt':
70 today_fmt = '%B %d, %Y'
71
72 exclude_patterns = ['_build']
73
74 # The name of the Pygments (syntax highlighting) style to use.
75 pygments_style = 'sphinx'
76
77 # Example configuration for intersphinx: refer to the Python standard library.
78 intersphinx_mapping = {'python': ('https://docs.python.org/2.7', None)}
79
80
81 # -- Options for HTML output ----------------------------------------------
82
83 # The theme to use for HTML and HTML Help pages. See the documentation for
84 # a list of builtin themes.
85 on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
86
87 if on_rtd:
88 html_theme = 'default'
89 else: # only import and set the theme if we're building docs locally
90 import sphinx_rtd_theme
91 html_theme = 'sphinx_rtd_theme'
92 html_theme_path = ['_themes', sphinx_rtd_theme.get_html_theme_path()]
93
94 html_theme_options = {'navigation_depth': 3,
95 'collapse_navigation': False}
96
97 # Add any paths that contain custom themes here, relative to this directory.
98 # html_theme_path = []
99
100 # TEMP: see https://github.com/rtfd/readthedocs.org/issues/1692
101 # Add RTD Theme Path.
102 #if 'html_theme_path' in globals():
103 # html_theme_path.append('/home/docs/checkouts/readthedocs.org/readthedocs/templates/sphinx')
104 #else:
105 # html_theme_path = ['_themes', '/home/docs/checkouts/readthedocs.org/readthedocs/templates/sphinx']
106
107 # The name for this set of Sphinx documents. If None, it defaults to
108 # "<project> v<release> documentation".
109 #html_title = None
110
111 # A shorter title for the navigation bar. Default is the same as html_title.
112 #html_short_title = None
113
114 # The name of an image file (relative to this directory) to place at the top
115 # of the sidebar.
116 #html_logo = None
117
118 # The name of an image file (within the static path) to use as favicon of the
119 # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
120 # pixels large.
121 #html_favicon = None
122
123 # Add any paths that contain custom static files (such as style sheets) here,
124 # relative to this directory. They are copied after the builtin static files,
125 # so a file named "default.css" will overwrite the builtin "default.css".
126 html_static_path = ['_static']
127
128 # Add any extra paths that contain custom files (such as robots.txt or
129 # .htaccess) here, relative to this directory. These files are copied
130 # directly to the root of the documentation.
131 #html_extra_path = []
132
133 # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
134 # using the given strftime format.
135 #html_last_updated_fmt = '%b %d, %Y'
136
137 # If true, SmartyPants will be used to convert quotes and dashes to
138 # typographically correct entities.
139 #html_use_smartypants = True
140
141 # Custom sidebar templates, maps document names to template names.
142 #html_sidebars = {}
143
144 # Additional templates that should be rendered to pages, maps page names to
145 # template names.
146 #html_additional_pages = {}
147
148 # If false, no module index is generated.
149 #html_domain_indices = True
150
151 # If false, no index is generated.
152 #html_use_index = True
153
154 # If true, the index is split into individual pages for each letter.
155 #html_split_index = False
156
157 # If true, links to the reST sources are added to the pages.
158 #html_show_sourcelink = True
159
160 # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
161 #html_show_sphinx = True
162
163 # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
164 #html_show_copyright = True
165
166 # If true, an OpenSearch description file will be output, and all pages will
167 # contain a <link> tag referring to it. The value of this option must be the
168 # base URL from which the finished HTML is served.
169 #html_use_opensearch = ''
170
171 # This is the file name suffix for HTML files (e.g. ".xhtml").
172 #html_file_suffix = None
173
174 # Language to be used for generating the HTML full-text search index.
175 # Sphinx supports the following languages:
176 # 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
177 # 'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr'
178 #html_search_language = 'en'
179
180 # A dictionary with options for the search language support, empty by default.
181 # Now only 'ja' uses this config value
182 #html_search_options = {'type': 'default'}
183
184 # The name of a javascript file (relative to the configuration directory) that
185 # implements a search results scorer. If empty, the default will be used.
186 #html_search_scorer = 'scorer.js'
187
188 # Output file base name for HTML help builder.
189 htmlhelp_basename = 'hyperlinkdoc'
190
191 # -- Options for LaTeX output ---------------------------------------------
192
193 latex_elements = {
194 # The paper size ('letterpaper' or 'a4paper').
195 #'papersize': 'letterpaper',
196
197 # The font size ('10pt', '11pt' or '12pt').
198 #'pointsize': '10pt',
199
200 # Additional stuff for the LaTeX preamble.
201 #'preamble': '',
202
203 # Latex figure (float) alignment
204 #'figure_align': 'htbp',
205 }
206
207 # Grouping the document tree into LaTeX files. List of tuples
208 # (source start file, target name, title,
209 # author, documentclass [howto, manual, or own class]).
210 latex_documents = [
211 (master_doc, 'hyperlink.tex', u'hyperlink Documentation',
212 u'Mahmoud Hashemi', 'manual'),
213 ]
214
215 # The name of an image file (relative to this directory) to place at the top of
216 # the title page.
217 #latex_logo = None
218
219 # For "manual" documents, if this is true, then toplevel headings are parts,
220 # not chapters.
221 #latex_use_parts = False
222
223 # If true, show page references after internal links.
224 #latex_show_pagerefs = False
225
226 # If true, show URL addresses after external links.
227 #latex_show_urls = False
228
229 # Documents to append as an appendix to all manuals.
230 #latex_appendices = []
231
232 # If false, no module index is generated.
233 #latex_domain_indices = True
234
235
236 # -- Options for manual page output ---------------------------------------
237
238 # One entry per manual page. List of tuples
239 # (source start file, name, description, authors, manual section).
240 man_pages = [
241 (master_doc, 'hyperlink', u'hyperlink Documentation',
242 [author], 1)
243 ]
244
245 # If true, show URL addresses after external links.
246 #man_show_urls = False
247
248
249 # -- Options for Texinfo output -------------------------------------------
250
251 # Grouping the document tree into Texinfo files. List of tuples
252 # (source start file, target name, title, author,
253 # dir menu entry, description, category)
254 texinfo_documents = [
255 (master_doc, 'hyperlink', u'hyperlink Documentation',
256 author, 'hyperlink', 'One line description of project.',
257 'Miscellaneous'),
258 ]
259
260 # Documents to append as an appendix to all manuals.
261 #texinfo_appendices = []
262
263 # If false, no module index is generated.
264 #texinfo_domain_indices = True
265
266 # How to display URL addresses: 'footnote', 'no', or 'inline'.
267 #texinfo_show_urls = 'footnote'
268
269 # If true, do not generate a @detailmenu in the "Top" node's menu.
270 #texinfo_no_detailmenu = False
0 Hyperlink Design
1 ================
2
3 The URL is a nuanced format with a long history. Suitably, a lot of
4 work has gone into translating the standards, `RFC 3986`_ and `RFC
5 3987`_, into a Pythonic interface. Hyperlink's design strikes a unique
6 balance of correctness and usability.
7
8 .. _uris_and_iris:
9
10 A Tale of Two Representations
11 -----------------------------
12
13 The URL is a powerful construct, designed to be used by both humans
14 and computers.
15
16 This dual purpose has resulted in two canonical representations: the
17 URI and the IRI.
18
19 Even though the W3C themselves have `recognized the confusion`_ this can
20 cause, Hyperlink's URL makes the distinction quite natural. Simply:
21
22 * **URI**: Fully-encoded, ASCII-only, suitable for network transfer
23 * **IRI**: Fully-decoded, Unicode-friendly, suitable for display (e.g., in a browser bar)
24
25 We can use Hyperlink to very easily demonstrate the difference::
26
27 >>> url = URL.from_text('http://example.com/café')
28 >>> url.to_uri().to_text()
29 u'http://example.com/caf%C3%A9'
30
31 We construct a URL from text containing Unicode (``é``), then
32 transform it using :meth:`~URL.to_uri()`. This results in ASCII-only
33 percent-encoding familiar to all web developers, and a common
34 characteristic of URIs.
35
36 Still, Hyperlink's distinction between URIs and IRIs is pragmatic, and
37 only limited to output. Input can contain *any mix* of percent
38 encoding and Unicode, without issue:
39
40 >>> url = URL.from_text('http://example.com/caf%C3%A9/au láit')
41 >>> print(url.to_iri().to_text())
42 http://example.com/café/au láit
43 >>> print(url.to_uri().to_text())
44 http://example.com/caf%C3%A9/au%20l%C3%A1it
45
46 Note that even when a URI and IRI point to the same resource, they
47 will often be different URLs:
48
49 >>> url.to_uri() == url.to_iri()
50 False
51
52 And with that caveat out of the way, you're qualified to correct other
53 people (and their code) on the nuances of URI vs IRI.
54
55 .. _recognized the confusion: https://www.w3.org/TR/uri-clarification/
56
57 Immutability
58 ------------
59
60 Hyperlink's URL is notable for being an `immutable`_ representation. Once
61 constructed, instances are not changed. Methods like
62 :meth:`~URL.click()`, :meth:`~URL.set()`, and :meth:`~URL.replace()`,
63 all return new URL objects. This enables URLs to be used in sets, as
64 well as dictionary keys.
65
66 .. _immutable: https://docs.python.org/2/glossary.html#term-immutable
67 .. _multidict: https://en.wikipedia.org/wiki/Multimap
68 .. _query string: https://en.wikipedia.org/wiki/Query_string
69 .. _GET parameters: http://php.net/manual/en/reserved.variables.get.php
70 .. _twisted.python.url.URL: https://twistedmatrix.com/documents/current/api/twisted.python.url.URL.html
71 .. _boltons.urlutils: http://boltons.readthedocs.io/en/latest/urlutils.html
72 .. _uri clarification: https://www.w3.org/TR/uri-clarification/
73 .. _BNF grammar: https://tools.ietf.org/html/rfc3986#appendix-A
74
75
76 .. _RFC 3986: https://tools.ietf.org/html/rfc3986
77 .. _RFC 3987: https://tools.ietf.org/html/rfc3987
78 .. _section 5.4: https://tools.ietf.org/html/rfc3986#section-5.4
79 .. _section 3.4: https://tools.ietf.org/html/rfc3986#section-3.4
80 .. _section 5.2.4: https://tools.ietf.org/html/rfc3986#section-5.2.4
81 .. _section 2.2: https://tools.ietf.org/html/rfc3986#section-2.2
82 .. _section 2.3: https://tools.ietf.org/html/rfc3986#section-2.3
83 .. _section 3.2.1: https://tools.ietf.org/html/rfc3986#section-3.2.1
84
85
86 Query parameters
87 ----------------
88
89 One of the URL format's most useful features is the mapping formed
90 by the query parameters, sometimes called "query arguments" or "GET
91 parameters". Regardless of what you call them, they are encoded in
92 the query string portion of the URL, and they are very powerful.
93
94 Query parameters are actually a type of "multidict", where a given key
95 can have multiple values. This is why the :meth:`~URL.get()` method
96 returns a list of strings. Keys can also have no value, which is
97 conventionally interpreted as a truthy flag.
98
99 >>> url = URL.from_text('http://example.com/?a=b&c')
100 >>> url.get(u'a')
101 ['b']
102 >>> url.get(u'c')
103 [None]
104 >>> url.get('missing') # returns None
105 []
106
107
108 Values can be modified and added using :meth:`~URL.set()` and
109 :meth:`~URL.add()`.
110
111 >>> url = url.add(u'x', u'x')
112 >>> url = url.add(u'x', u'y')
113 >>> url.to_text()
114 u'http://example.com/?a=b&c&x=x&x=y'
115 >>> url = url.set(u'x', u'z')
116 >>> url.to_text()
117 u'http://example.com/?a=b&c&x=z'
118
119
120 Values can be unset with :meth:`~URL.remove()`.
121
122 >>> url = url.remove(u'a')
123 >>> url = url.remove(u'c')
124 >>> url.to_text()
125 u'http://example.com/?x=z'
126
127 Note how all modifying methods return copies of the URL and do not
128 mutate the URL in place, much like methods on strings.
129
130 Origins and backwards-compatibility
131 -----------------------------------
132
133 Hyperlink's URL is descended directly from `twisted.python.url.URL`_,
134 in all but the literal code-inheritance sense. While a lot of
135 functionality has been incorporated from `boltons.urlutils`_, extra
136 care has been taken to maintain backwards-compatibility for legacy
137 APIs, making Hyperlink's URL a drop-in replacement for Twisted's URL type.
138
139 If you are porting a Twisted project to use Hyperlink's URL, and
140 encounter any sort of incompatibility, please do not hesitate to `file
141 an issue`_.
142
143 .. _file an issue: https://github.com/python-hyper/hyperlink/issues
0 FAQ
1 ===
2
3 There were bound to be questions.
4
5 .. contents::
6 :local:
7
8 Why not just use text?
9 ----------------------
10
11 URLs were designed as a text format, so, apart from the principle of
12 structuring structured data, why use URL objects?
13
14 There are two major advantages of using :class:`~hyperlink.URL` over
15 representing URLs as strings. The first is that it's really easy to
16 evaluate a relative hyperlink, for example, when crawling documents,
17 to figure out what is linked::
18
19 >>> URL.from_text(u'https://example.com/base/uri/').click(u"/absolute")
20 URL.from_text(u'https://example.com/absolute')
21 >>> URL.from_text(u'https://example.com/base/uri/').click(u"rel/path")
22 URL.from_text(u'https://example.com/base/uri/rel/path')
23
24 The other is that URLs have two normalizations. One representation is
25 suitable for humans to read, because it can represent data from many
26 character sets - this is the Internationalized, or IRI, normalization.
27 The other is the older, US-ASCII-only representation, which is
28 necessary for most contexts where you would need to put a URI. You
29 can convert *between* these representations according to certain
30 rules. :class:`~hyperlink.URL` exposes these conversions as methods::
31
32 >>> URL.from_text(u"https://→example.com/foo⇧bar/").to_uri()
33 URL.from_text(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/')
34 >>> URL.from_text(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/').to_iri()
35 URL.from_text(u'https://\\u2192example.com/foo\\u21e7bar/')
36
37 For more info, see A Tale of Two Representations, above.
38
39 How does Hyperlink compare to other libraries?
40 ----------------------------------------------
41
42 Hyperlink certainly isn't the first library to provide a Python model
43 for URLs. It just happens to be among the best.
44
45 urlparse: Built-in to the standard library (merged into urllib for
46 Python 3). No URL type, requires user to juggle a bunch of
47 strings. Overly simple approach makes it easy to make mistakes.
48
49 boltons.urlutils: Shares some underlying implementation. Two key
50 differences. First, the boltons URL is mutable, intended to work like
51 a string factory for URL text. Second, the boltons URL has advanced
52 query parameter mapping type. Complete implementation in a single
53 file.
54
55 furl: Not a single URL type, but types for many parts of the
56 URL. Similar approach to boltons for query parameters. Poor netloc
57 handling (support for non-network schemes like mailto). Unlicensed.
58
59 purl: Another immutable implementation. Method-heavy API.
60
61 rfc3986: Very heavily focused on various types of validation. Large
62 for a URL library, if that matters to you. Exclusively supports URIs,
63 `lacking IRI support`_ at the time of writing.
64
65 In reality, any of the third-party libraries above do a better job
66 than the standard library, and much of the hastily thrown together
67 code in a corner of a util.py deep in a project. URLs are easy to mess
68 up, make sure you use a tested implementation.
69
70 .. _lacking IRI support: https://github.com/sigmavirus24/rfc3986/issues/23
71
72 Are URLs really a big deal in 201X?
73 -----------------------------------
74
75 Hyperlink's first release, in 2017, comes somewhere between 23 and 30
76 years after URLs were already in use. Is the URL really still that big
77 of a deal?
78
79 Look, buddy, I don't know how you got this document, but I'm pretty
80 sure you (and your computer) used one if not many URLs to get
81 here. URLs are only getting more relevant. Buy stock in URLs.
82
83 And if you're worried that URLs are just another technology with an
84 obsoletion date planned in advance, I'll direct your attention to the
85 ``IPvFuture`` rule in the `BNF grammar`_. If it has plans to outlast
86 IPv6, the URL will probably outlast you and me, too.
87
88 .. _BNF grammar: https://tools.ietf.org/html/rfc3986#appendix-A
Binary diff not shown
0 .. hyperlink documentation master file, created on Mon Apr 10 00:34:18 2017.
1 hyperlink
2 =========
3
4 *Cool URLs that don't change.*
5
6 |release| |calver|
7
8 **Hyperlink** provides a pure-Python implementation of immutable
9 URLs. Based on `RFC 3986`_ and `RFC 3987`_, the Hyperlink URL balances
10 simplicity and correctness for both :ref:`URIs and IRIs <uris_and_iris>`.
11
12 Hyperlink is tested against Python 2.7, 3.4, 3.5, and PyPy.
13
14 For an introduction to the hyperlink library, its background, and URLs
15 in general, see `this talk from PyConWeb 2017`_ (and `the accompanying
16 slides`_).
17
18 .. _RFC 3986: https://tools.ietf.org/html/rfc3986
19 .. _RFC 3987: https://tools.ietf.org/html/rfc3987
20 .. _this talk from PyConWeb 2017: https://www.youtube.com/watch?v=EIkmADO-r10
21 .. _the accompanying slides: https://speakerdeck.com/mhashemi/urls-in-plain-view
22 .. |release| image:: https://img.shields.io/pypi/v/hyperlink.svg
23 :target: https://pypi.python.org/pypi/hyperlink
24
25 .. |calver| image:: https://img.shields.io/badge/calver-YY.MINOR.MICRO-22bfda.svg
26 :target: http://calver.org
27
28
29 Installation and Integration
30 ----------------------------
31
32 Hyperlink is a pure-Python package and only depends on the standard
33 library. The easiest way to install is with pip::
34
35 pip install hyperlink
36
37 Then, URLs are just an import away::
38
39 from hyperlink import URL
40
41 url = URL.from_text('http://github.com/mahmoud/hyperlink?utm_souce=readthedocs')
42
43 better_url = url.replace(scheme='https')
44 user_url = better_url.click('..')
45
46 print(user_url.to_text())
47 # prints: https://github.com/mahmoud
48
49 print(user_url.get('utm_source'))
50 # prints: readthedocs
51
52 See :ref:`the API docs <hyperlink_api>` for more usage examples.
53
54 Gaps
55 ----
56
57 Found something missing in hyperlink? `Pull Requests`_ and `Issues`_ weclome!
58
59 .. _Pull Requests: https://github.com/mahmoud/python-hyper/pulls
60 .. _Issues: https://github.com/mahmoud/python-hyper/issues
61
62 Section listing
63 ---------------
64
65 .. toctree::
66 :maxdepth: 2
67
68 design
69 api
70 faq
0 @ECHO OFF
1
2 REM Command file for Sphinx documentation
3
4 if "%SPHINXBUILD%" == "" (
5 set SPHINXBUILD=sphinx-build
6 )
7 set BUILDDIR=_build
8 set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
9 set I18NSPHINXOPTS=%SPHINXOPTS% .
10 if NOT "%PAPER%" == "" (
11 set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
12 set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
13 )
14
15 if "%1" == "" goto help
16
17 if "%1" == "help" (
18 :help
19 echo.Please use `make ^<target^>` where ^<target^> is one of
20 echo. html to make standalone HTML files
21 echo. dirhtml to make HTML files named index.html in directories
22 echo. singlehtml to make a single large HTML file
23 echo. pickle to make pickle files
24 echo. json to make JSON files
25 echo. htmlhelp to make HTML files and a HTML help project
26 echo. qthelp to make HTML files and a qthelp project
27 echo. devhelp to make HTML files and a Devhelp project
28 echo. epub to make an epub
29 echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
30 echo. text to make text files
31 echo. man to make manual pages
32 echo. texinfo to make Texinfo files
33 echo. gettext to make PO message catalogs
34 echo. changes to make an overview over all changed/added/deprecated items
35 echo. xml to make Docutils-native XML files
36 echo. pseudoxml to make pseudoxml-XML files for display purposes
37 echo. linkcheck to check all external links for integrity
38 echo. doctest to run all doctests embedded in the documentation if enabled
39 echo. coverage to run coverage check of the documentation if enabled
40 goto end
41 )
42
43 if "%1" == "clean" (
44 for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
45 del /q /s %BUILDDIR%\*
46 goto end
47 )
48
49
50 REM Check if sphinx-build is available and fallback to Python version if any
51 %SPHINXBUILD% 2> nul
52 if errorlevel 9009 goto sphinx_python
53 goto sphinx_ok
54
55 :sphinx_python
56
57 set SPHINXBUILD=python -m sphinx.__init__
58 %SPHINXBUILD% 2> nul
59 if errorlevel 9009 (
60 echo.
61 echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
62 echo.installed, then set the SPHINXBUILD environment variable to point
63 echo.to the full path of the 'sphinx-build' executable. Alternatively you
64 echo.may add the Sphinx directory to PATH.
65 echo.
66 echo.If you don't have Sphinx installed, grab it from
67 echo.http://sphinx-doc.org/
68 exit /b 1
69 )
70
71 :sphinx_ok
72
73
74 if "%1" == "html" (
75 %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
76 if errorlevel 1 exit /b 1
77 echo.
78 echo.Build finished. The HTML pages are in %BUILDDIR%/html.
79 goto end
80 )
81
82 if "%1" == "dirhtml" (
83 %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
84 if errorlevel 1 exit /b 1
85 echo.
86 echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
87 goto end
88 )
89
90 if "%1" == "singlehtml" (
91 %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
92 if errorlevel 1 exit /b 1
93 echo.
94 echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
95 goto end
96 )
97
98 if "%1" == "pickle" (
99 %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
100 if errorlevel 1 exit /b 1
101 echo.
102 echo.Build finished; now you can process the pickle files.
103 goto end
104 )
105
106 if "%1" == "json" (
107 %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
108 if errorlevel 1 exit /b 1
109 echo.
110 echo.Build finished; now you can process the JSON files.
111 goto end
112 )
113
114 if "%1" == "htmlhelp" (
115 %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
116 if errorlevel 1 exit /b 1
117 echo.
118 echo.Build finished; now you can run HTML Help Workshop with the ^
119 .hhp project file in %BUILDDIR%/htmlhelp.
120 goto end
121 )
122
123 if "%1" == "qthelp" (
124 %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
125 if errorlevel 1 exit /b 1
126 echo.
127 echo.Build finished; now you can run "qcollectiongenerator" with the ^
128 .qhcp project file in %BUILDDIR%/qthelp, like this:
129 echo.^> qcollectiongenerator %BUILDDIR%\qthelp\hyperlink.qhcp
130 echo.To view the help file:
131 echo.^> assistant -collectionFile %BUILDDIR%\qthelp\hyperlink.ghc
132 goto end
133 )
134
135 if "%1" == "devhelp" (
136 %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
137 if errorlevel 1 exit /b 1
138 echo.
139 echo.Build finished.
140 goto end
141 )
142
143 if "%1" == "epub" (
144 %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
145 if errorlevel 1 exit /b 1
146 echo.
147 echo.Build finished. The epub file is in %BUILDDIR%/epub.
148 goto end
149 )
150
151 if "%1" == "latex" (
152 %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
153 if errorlevel 1 exit /b 1
154 echo.
155 echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
156 goto end
157 )
158
159 if "%1" == "latexpdf" (
160 %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
161 cd %BUILDDIR%/latex
162 make all-pdf
163 cd %~dp0
164 echo.
165 echo.Build finished; the PDF files are in %BUILDDIR%/latex.
166 goto end
167 )
168
169 if "%1" == "latexpdfja" (
170 %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
171 cd %BUILDDIR%/latex
172 make all-pdf-ja
173 cd %~dp0
174 echo.
175 echo.Build finished; the PDF files are in %BUILDDIR%/latex.
176 goto end
177 )
178
179 if "%1" == "text" (
180 %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
181 if errorlevel 1 exit /b 1
182 echo.
183 echo.Build finished. The text files are in %BUILDDIR%/text.
184 goto end
185 )
186
187 if "%1" == "man" (
188 %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
189 if errorlevel 1 exit /b 1
190 echo.
191 echo.Build finished. The manual pages are in %BUILDDIR%/man.
192 goto end
193 )
194
195 if "%1" == "texinfo" (
196 %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
197 if errorlevel 1 exit /b 1
198 echo.
199 echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
200 goto end
201 )
202
203 if "%1" == "gettext" (
204 %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
205 if errorlevel 1 exit /b 1
206 echo.
207 echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
208 goto end
209 )
210
211 if "%1" == "changes" (
212 %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
213 if errorlevel 1 exit /b 1
214 echo.
215 echo.The overview file is in %BUILDDIR%/changes.
216 goto end
217 )
218
219 if "%1" == "linkcheck" (
220 %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
221 if errorlevel 1 exit /b 1
222 echo.
223 echo.Link check complete; look for any errors in the above output ^
224 or in %BUILDDIR%/linkcheck/output.txt.
225 goto end
226 )
227
228 if "%1" == "doctest" (
229 %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
230 if errorlevel 1 exit /b 1
231 echo.
232 echo.Testing of doctests in the sources finished, look at the ^
233 results in %BUILDDIR%/doctest/output.txt.
234 goto end
235 )
236
237 if "%1" == "coverage" (
238 %SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage
239 if errorlevel 1 exit /b 1
240 echo.
241 echo.Testing of coverage in the sources finished, look at the ^
242 results in %BUILDDIR%/coverage/python.txt.
243 goto end
244 )
245
246 if "%1" == "xml" (
247 %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
248 if errorlevel 1 exit /b 1
249 echo.
250 echo.Build finished. The XML files are in %BUILDDIR%/xml.
251 goto end
252 )
253
254 if "%1" == "pseudoxml" (
255 %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
256 if errorlevel 1 exit /b 1
257 echo.
258 echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
259 goto end
260 )
261
262 :end
0
1 from ._url import URL, URLParseError, register_scheme, parse_host
2
3 __all__ = [
4 "URL",
5 "URLParseError",
6 "register_scheme",
7 "parse_host"
8 ]
0 # -*- coding: utf-8 -*-
1 u"""Hyperlink provides Pythonic URL parsing, construction, and rendering.
2
3 Usage is straightforward::
4
5 >>> from hyperlink import URL
6 >>> url = URL.from_text(u'http://github.com/mahmoud/hyperlink?utm_source=docs')
7 >>> url.host
8 u'github.com'
9 >>> secure_url = url.replace(scheme=u'https')
10 >>> secure_url.get('utm_source')[0]
11 u'docs'
12
13 As seen here, the API revolves around the lightweight and immutable
14 :class:`URL` type, documented below.
15 """
16
17 import re
18 import string
19 import socket
20 from unicodedata import normalize
21 try:
22 from socket import inet_pton
23 except ImportError:
24 # based on https://gist.github.com/nnemkin/4966028
25 # this code only applies on Windows Python 2.7
26 import ctypes
27
28 class _sockaddr(ctypes.Structure):
29 _fields_ = [("sa_family", ctypes.c_short),
30 ("__pad1", ctypes.c_ushort),
31 ("ipv4_addr", ctypes.c_byte * 4),
32 ("ipv6_addr", ctypes.c_byte * 16),
33 ("__pad2", ctypes.c_ulong)]
34
35 WSAStringToAddressA = ctypes.windll.ws2_32.WSAStringToAddressA
36 WSAAddressToStringA = ctypes.windll.ws2_32.WSAAddressToStringA
37
38 def inet_pton(address_family, ip_string):
39 addr = _sockaddr()
40 ip_string = ip_string.encode('ascii')
41 addr.sa_family = address_family
42 addr_size = ctypes.c_int(ctypes.sizeof(addr))
43
44 if WSAStringToAddressA(ip_string, address_family, None, ctypes.byref(addr), ctypes.byref(addr_size)) != 0:
45 raise socket.error(ctypes.FormatError())
46
47 if address_family == socket.AF_INET:
48 return ctypes.string_at(addr.ipv4_addr, 4)
49 if address_family == socket.AF_INET6:
50 return ctypes.string_at(addr.ipv6_addr, 16)
51 raise socket.error('unknown address family')
52
53
54 unicode = type(u'')
55 try:
56 unichr
57 except NameError:
58 unichr = chr # py3
59 NoneType = type(None)
60
61
62 # from boltons.typeutils
63 def make_sentinel(name='_MISSING', var_name=None):
64 """Creates and returns a new **instance** of a new class, suitable for
65 usage as a "sentinel", a kind of singleton often used to indicate
66 a value is missing when ``None`` is a valid input.
67
68 Args:
69 name (str): Name of the Sentinel
70 var_name (str): Set this name to the name of the variable in
71 its respective module enable pickleability.
72
73 >>> make_sentinel(var_name='_MISSING')
74 _MISSING
75
76 The most common use cases here in boltons are as default values
77 for optional function arguments, partly because of its
78 less-confusing appearance in automatically generated
79 documentation. Sentinels also function well as placeholders in queues
80 and linked lists.
81
82 .. note::
83
84 By design, additional calls to ``make_sentinel`` with the same
85 values will not produce equivalent objects.
86
87 >>> make_sentinel('TEST') == make_sentinel('TEST')
88 False
89 >>> type(make_sentinel('TEST')) == type(make_sentinel('TEST'))
90 False
91
92 """
93 class Sentinel(object):
94 def __init__(self):
95 self.name = name
96 self.var_name = var_name
97
98 def __repr__(self):
99 if self.var_name:
100 return self.var_name
101 return '%s(%r)' % (self.__class__.__name__, self.name)
102 if var_name:
103 def __reduce__(self):
104 return self.var_name
105
106 def __nonzero__(self):
107 return False
108
109 __bool__ = __nonzero__
110
111 return Sentinel()
112
113
114 _unspecified = _UNSET = make_sentinel('_UNSET')
115
116
117 # RFC 3986 Section 2.3, Unreserved URI Characters
118 # https://tools.ietf.org/html/rfc3986#section-2.3
119 _UNRESERVED_CHARS = frozenset('~-._0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
120 'abcdefghijklmnopqrstuvwxyz')
121
122
123 # URL parsing regex (based on RFC 3986 Appendix B, with modifications)
124 _URL_RE = re.compile(r'^((?P<scheme>[^:/?#]+):)?'
125 r'((?P<_netloc_sep>//)'
126 r'(?P<authority>[^/?#]*))?'
127 r'(?P<path>[^?#]*)'
128 r'(\?(?P<query>[^#]*))?'
129 r'(#(?P<fragment>.*))?$')
130 _SCHEME_RE = re.compile(r'^[a-zA-Z0-9+-.]*$')
131 _AUTHORITY_RE = re.compile(r'^(?:(?P<userinfo>[^@/?#]*)@)?'
132 r'(?P<host>'
133 r'(?:\[(?P<ipv6_host>[^[\]/?#]*)\])'
134 r'|(?P<plain_host>[^:/?#[\]]*)'
135 r'|(?P<bad_host>.*?))?'
136 r'(?::(?P<port>.*))?$')
137
138
139 _HEX_CHAR_MAP = dict([((a + b).encode('ascii'),
140 unichr(int(a + b, 16)).encode('charmap'))
141 for a in string.hexdigits for b in string.hexdigits])
142 _ASCII_RE = re.compile('([\x00-\x7f]+)')
143
144 # RFC 3986 section 2.2, Reserved Characters
145 # https://tools.ietf.org/html/rfc3986#section-2.2
146 _GEN_DELIMS = frozenset(u':/?#[]@')
147 _SUB_DELIMS = frozenset(u"!$&'()*+,;=")
148 _ALL_DELIMS = _GEN_DELIMS | _SUB_DELIMS
149
150 _USERINFO_SAFE = _UNRESERVED_CHARS | _SUB_DELIMS
151 _USERINFO_DELIMS = _ALL_DELIMS - _USERINFO_SAFE
152 _PATH_SAFE = _UNRESERVED_CHARS | _SUB_DELIMS | set(u':@%')
153 _PATH_DELIMS = _ALL_DELIMS - _PATH_SAFE
154 _SCHEMELESS_PATH_SAFE = _PATH_SAFE - set(':')
155 _SCHEMELESS_PATH_DELIMS = _ALL_DELIMS - _SCHEMELESS_PATH_SAFE
156 _FRAGMENT_SAFE = _UNRESERVED_CHARS | _PATH_SAFE | set(u'/?')
157 _FRAGMENT_DELIMS = _ALL_DELIMS - _FRAGMENT_SAFE
158 _QUERY_SAFE = _UNRESERVED_CHARS | _FRAGMENT_SAFE - set(u'&=+')
159 _QUERY_DELIMS = _ALL_DELIMS - _QUERY_SAFE
160
161
162 def _make_decode_map(delims, allow_percent=False):
163 ret = dict(_HEX_CHAR_MAP)
164 if not allow_percent:
165 delims = set(delims) | set([u'%'])
166 for delim in delims:
167 _hexord = '{0:02X}'.format(ord(delim)).encode('ascii')
168 _hexord_lower = _hexord.lower()
169 ret.pop(_hexord)
170 if _hexord != _hexord_lower:
171 ret.pop(_hexord_lower)
172 return ret
173
174
175 def _make_quote_map(safe_chars):
176 ret = {}
177 # v is included in the dict for py3 mostly, because bytestrings
178 # are iterables of ints, of course!
179 for i, v in zip(range(256), range(256)):
180 c = chr(v)
181 if c in safe_chars:
182 ret[c] = ret[v] = c
183 else:
184 ret[c] = ret[v] = '%{0:02X}'.format(i)
185 return ret
186
187
188 _USERINFO_PART_QUOTE_MAP = _make_quote_map(_USERINFO_SAFE)
189 _USERINFO_DECODE_MAP = _make_decode_map(_USERINFO_DELIMS)
190 _PATH_PART_QUOTE_MAP = _make_quote_map(_PATH_SAFE)
191 _SCHEMELESS_PATH_PART_QUOTE_MAP = _make_quote_map(_SCHEMELESS_PATH_SAFE)
192 _PATH_DECODE_MAP = _make_decode_map(_PATH_DELIMS)
193 _QUERY_PART_QUOTE_MAP = _make_quote_map(_QUERY_SAFE)
194 _QUERY_DECODE_MAP = _make_decode_map(_QUERY_DELIMS)
195 _FRAGMENT_QUOTE_MAP = _make_quote_map(_FRAGMENT_SAFE)
196 _FRAGMENT_DECODE_MAP = _make_decode_map(_FRAGMENT_DELIMS)
197 _UNRESERVED_DECODE_MAP = dict([(k, v) for k, v in _HEX_CHAR_MAP.items()
198 if v.decode('ascii', 'replace')
199 in _UNRESERVED_CHARS])
200
201 _ROOT_PATHS = frozenset(((), (u'',)))
202
203
204 def _encode_path_part(text, maximal=True):
205 "Percent-encode a single segment of a URL path."
206 if maximal:
207 bytestr = normalize('NFC', text).encode('utf8')
208 return u''.join([_PATH_PART_QUOTE_MAP[b] for b in bytestr])
209 return u''.join([_PATH_PART_QUOTE_MAP[t] if t in _PATH_DELIMS else t
210 for t in text])
211
212
213 def _encode_schemeless_path_part(text, maximal=True):
214 """Percent-encode the first segment of a URL path for a URL without a
215 scheme specified.
216 """
217 if maximal:
218 bytestr = normalize('NFC', text).encode('utf8')
219 return u''.join([_SCHEMELESS_PATH_PART_QUOTE_MAP[b] for b in bytestr])
220 return u''.join([_SCHEMELESS_PATH_PART_QUOTE_MAP[t]
221 if t in _SCHEMELESS_PATH_DELIMS else t for t in text])
222
223
224 def _encode_path_parts(text_parts, rooted=False, has_scheme=True,
225 has_authority=True, joined=True, maximal=True):
226 """
227 Percent-encode a tuple of path parts into a complete path.
228
229 Setting *maximal* to False percent-encodes only the reserved
230 characters that are syntactically necessary for serialization,
231 preserving any IRI-style textual data.
232
233 Leaving *maximal* set to its default True percent-encodes
234 everything required to convert a portion of an IRI to a portion of
235 a URI.
236
237 RFC 3986 3.3:
238
239 If a URI contains an authority component, then the path component
240 must either be empty or begin with a slash ("/") character. If a URI
241 does not contain an authority component, then the path cannot begin
242 with two slash characters ("//"). In addition, a URI reference
243 (Section 4.1) may be a relative-path reference, in which case the
244 first path segment cannot contain a colon (":") character.
245 """
246 if not text_parts:
247 return u'' if joined else text_parts
248 if rooted:
249 text_parts = (u'',) + text_parts
250 # elif has_authority and text_parts:
251 # raise Exception('see rfc above') # TODO: too late to fail like this?
252 encoded_parts = []
253 if has_scheme:
254 encoded_parts = [_encode_path_part(part, maximal=maximal)
255 if part else part for part in text_parts]
256 else:
257 encoded_parts = [_encode_schemeless_path_part(text_parts[0])]
258 encoded_parts.extend([_encode_path_part(part, maximal=maximal)
259 if part else part for part in text_parts[1:]])
260 if joined:
261 return u'/'.join(encoded_parts)
262 return tuple(encoded_parts)
263
264
265 def _encode_query_part(text, maximal=True):
266 """
267 Percent-encode a single query string key or value.
268 """
269 if maximal:
270 bytestr = normalize('NFC', text).encode('utf8')
271 return u''.join([_QUERY_PART_QUOTE_MAP[b] for b in bytestr])
272 return u''.join([_QUERY_PART_QUOTE_MAP[t] if t in _QUERY_DELIMS else t
273 for t in text])
274
275
276 def _encode_fragment_part(text, maximal=True):
277 """Quote the fragment part of the URL. Fragments don't have
278 subdelimiters, so the whole URL fragment can be passed.
279 """
280 if maximal:
281 bytestr = normalize('NFC', text).encode('utf8')
282 return u''.join([_FRAGMENT_QUOTE_MAP[b] for b in bytestr])
283 return u''.join([_FRAGMENT_QUOTE_MAP[t] if t in _FRAGMENT_DELIMS else t
284 for t in text])
285
286
287 def _encode_userinfo_part(text, maximal=True):
288 """Quote special characters in either the username or password
289 section of the URL.
290 """
291 if maximal:
292 bytestr = normalize('NFC', text).encode('utf8')
293 return u''.join([_USERINFO_PART_QUOTE_MAP[b] for b in bytestr])
294 return u''.join([_USERINFO_PART_QUOTE_MAP[t] if t in _USERINFO_DELIMS
295 else t for t in text])
296
297
298
299 # This port list painstakingly curated by hand searching through
300 # https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
301 # and
302 # https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml
303 SCHEME_PORT_MAP = {'acap': 674, 'afp': 548, 'dict': 2628, 'dns': 53,
304 'file': None, 'ftp': 21, 'git': 9418, 'gopher': 70,
305 'http': 80, 'https': 443, 'imap': 143, 'ipp': 631,
306 'ipps': 631, 'irc': 194, 'ircs': 6697, 'ldap': 389,
307 'ldaps': 636, 'mms': 1755, 'msrp': 2855, 'msrps': None,
308 'mtqp': 1038, 'nfs': 111, 'nntp': 119, 'nntps': 563,
309 'pop': 110, 'prospero': 1525, 'redis': 6379, 'rsync': 873,
310 'rtsp': 554, 'rtsps': 322, 'rtspu': 5005, 'sftp': 22,
311 'smb': 445, 'snmp': 161, 'ssh': 22, 'steam': None,
312 'svn': 3690, 'telnet': 23, 'ventrilo': 3784, 'vnc': 5900,
313 'wais': 210, 'ws': 80, 'wss': 443, 'xmpp': None}
314
315 # This list of schemes that don't use authorities is also from the link above.
316 NO_NETLOC_SCHEMES = set(['urn', 'about', 'bitcoin', 'blob', 'data', 'geo',
317 'magnet', 'mailto', 'news', 'pkcs11',
318 'sip', 'sips', 'tel'])
319 # As of Mar 11, 2017, there were 44 netloc schemes, and 13 non-netloc
320
321
322 def register_scheme(text, uses_netloc=True, default_port=None):
323 """Registers new scheme information, resulting in correct port and
324 slash behavior from the URL object. There are dozens of standard
325 schemes preregistered, so this function is mostly meant for
326 proprietary internal customizations or stopgaps on missing
327 standards information. If a scheme seems to be missing, please
328 `file an issue`_!
329
330 Args:
331 text (unicode): Text representing the scheme.
332 (the 'http' in 'http://hatnote.com')
333 uses_netloc (bool): Does the scheme support specifying a
334 network host? For instance, "http" does, "mailto" does
335 not. Defaults to True.
336 default_port (int): The default port, if any, for netloc-using
337 schemes.
338
339 .. _file an issue: https://github.com/mahmoud/hyperlink/issues
340
341 """
342 text = text.lower()
343 if default_port is not None:
344 try:
345 default_port = int(default_port)
346 except (ValueError, TypeError):
347 raise ValueError('default_port expected integer or None, not %r'
348 % (default_port,))
349
350 if uses_netloc is True:
351 SCHEME_PORT_MAP[text] = default_port
352 elif uses_netloc is False:
353 if default_port is not None:
354 raise ValueError('unexpected default port while specifying'
355 ' non-netloc scheme: %r' % default_port)
356 NO_NETLOC_SCHEMES.add(text)
357 else:
358 raise ValueError('uses_netloc expected bool, not: %r' % uses_netloc)
359
360 return
361
362
363 def scheme_uses_netloc(scheme, default=None):
364 """Whether or not a URL uses :code:`:` or :code:`://` to separate the
365 scheme from the rest of the URL depends on the scheme's own
366 standard definition. There is no way to infer this behavior
367 from other parts of the URL. A scheme either supports network
368 locations or it does not.
369
370 The URL type's approach to this is to check for explicitly
371 registered schemes, with common schemes like HTTP
372 preregistered. This is the same approach taken by
373 :mod:`urlparse`.
374
375 URL adds two additional heuristics if the scheme as a whole is
376 not registered. First, it attempts to check the subpart of the
377 scheme after the last ``+`` character. This adds intuitive
378 behavior for schemes like ``git+ssh``. Second, if a URL with
379 an unrecognized scheme is loaded, it will maintain the
380 separator it sees.
381 """
382 if not scheme:
383 return False
384 scheme = scheme.lower()
385 if scheme in SCHEME_PORT_MAP:
386 return True
387 if scheme in NO_NETLOC_SCHEMES:
388 return False
389 if scheme.split('+')[-1] in SCHEME_PORT_MAP:
390 return True
391 return default
392
393
394 class URLParseError(ValueError):
395 """Exception inheriting from :exc:`ValueError`, raised when failing to
396 parse a URL. Mostly raised on invalid ports and IPv6 addresses.
397 """
398 pass
399
400
401 def _optional(argument, default):
402 if argument is _UNSET:
403 return default
404 else:
405 return argument
406
407
408 def _typecheck(name, value, *types):
409 """
410 Check that the given *value* is one of the given *types*, or raise an
411 exception describing the problem using *name*.
412 """
413 if not types:
414 raise ValueError('expected one or more types, maybe use _textcheck?')
415 if not isinstance(value, types):
416 raise TypeError("expected %s for %s, got %r"
417 % (" or ".join([t.__name__ for t in types]),
418 name, value))
419 return value
420
421
422 def _textcheck(name, value, delims=frozenset(), nullable=False):
423 if not isinstance(value, unicode):
424 if nullable and value is None:
425 return value # used by query string values
426 else:
427 str_name = "unicode" if bytes is str else "str"
428 exp = str_name + ' or NoneType' if nullable else str_name
429 raise TypeError('expected %s for %s, got %r' % (exp, name, value))
430 if delims and set(value) & set(delims): # TODO: test caching into regexes
431 raise ValueError('one or more reserved delimiters %s present in %s: %r'
432 % (''.join(delims), name, value))
433 return value
434
435
436 def _decode_unreserved(text, normalize_case=False):
437 return _percent_decode(text, normalize_case=normalize_case,
438 _decode_map=_UNRESERVED_DECODE_MAP)
439
440
441 def _decode_userinfo_part(text, normalize_case=False):
442 return _percent_decode(text, normalize_case=normalize_case,
443 _decode_map=_USERINFO_DECODE_MAP)
444
445
446 def _decode_path_part(text, normalize_case=False):
447 """
448 >>> _decode_path_part(u'%61%77%2f%7a')
449 u'aw%2fz'
450 >>> _decode_path_part(u'%61%77%2f%7a', normalize_case=True)
451 u'aw%2Fz'
452 """
453 return _percent_decode(text, normalize_case=normalize_case,
454 _decode_map=_PATH_DECODE_MAP)
455
456
457 def _decode_query_part(text, normalize_case=False):
458 return _percent_decode(text, normalize_case=normalize_case,
459 _decode_map=_QUERY_DECODE_MAP)
460
461
462 def _decode_fragment_part(text, normalize_case=False):
463 return _percent_decode(text, normalize_case=normalize_case,
464 _decode_map=_FRAGMENT_DECODE_MAP)
465
466
467 def _percent_decode(text, normalize_case=False, _decode_map=_HEX_CHAR_MAP):
468 """Convert percent-encoded text characters to their normal,
469 human-readable equivalents.
470
471 All characters in the input text must be valid ASCII. All special
472 characters underlying the values in the percent-encoding must be
473 valid UTF-8. If a non-UTF8-valid string is passed, the original
474 text is returned with no changes applied.
475
476 Only called by field-tailored variants, e.g.,
477 :func:`_decode_path_part`, as every percent-encodable part of the
478 URL has characters which should not be percent decoded.
479
480 >>> _percent_decode(u'abc%20def')
481 u'abc def'
482
483 Args:
484 text (unicode): The ASCII text with percent-encoding present.
485 normalize_case (bool): Whether undecoded percent segments, such
486 as encoded delimiters, should be uppercased, per RFC 3986
487 Section 2.1. See :func:`_decode_path_part` for an example.
488
489 Returns:
490 unicode: The percent-decoded version of *text*, with UTF-8
491 decoding applied.
492
493 """
494 try:
495 quoted_bytes = text.encode("ascii")
496 except UnicodeEncodeError:
497 return text
498
499 bits = quoted_bytes.split(b'%')
500 if len(bits) == 1:
501 return text
502
503 res = [bits[0]]
504 append = res.append
505
506 if not normalize_case:
507 for item in bits[1:]:
508 try:
509 append(_decode_map[item[:2]])
510 append(item[2:])
511 except KeyError:
512 append(b'%')
513 append(item)
514 else:
515 for item in bits[1:]:
516 try:
517 append(_decode_map[item[:2]])
518 append(item[2:])
519 except KeyError:
520 append(b'%')
521 if item[:2] in _HEX_CHAR_MAP:
522 append(item[:2].upper())
523 append(item[2:])
524 else:
525 append(item)
526
527 unquoted_bytes = b''.join(res)
528
529 try:
530 return unquoted_bytes.decode("utf-8")
531 except UnicodeDecodeError:
532 return text
533
534
535 def _resolve_dot_segments(path):
536 """Normalize the URL path by resolving segments of '.' and '..'. For
537 more details, see `RFC 3986 section 5.2.4, Remove Dot Segments`_.
538
539 Args:
540 path (list): path segments in string form
541
542 Returns:
543 list: a new list of path segments with the '.' and '..' elements
544 removed and resolved.
545
546 .. _RFC 3986 section 5.2.4, Remove Dot Segments: https://tools.ietf.org/html/rfc3986#section-5.2.4
547 """
548 segs = []
549
550 for seg in path:
551 if seg == u'.':
552 pass
553 elif seg == u'..':
554 if segs:
555 segs.pop()
556 else:
557 segs.append(seg)
558
559 if list(path[-1:]) in ([u'.'], [u'..']):
560 segs.append(u'')
561
562 return segs
563
564
565 def parse_host(host):
566 """Parse the host into a tuple of ``(family, host)``, where family
567 is the appropriate :mod:`socket` module constant when the host is
568 an IP address. Family is ``None`` when the host is not an IP.
569
570 Will raise :class:`URLParseError` on invalid IPv6 constants.
571
572 Returns:
573 tuple: family (socket constant or None), host (string)
574
575 >>> parse_host('googlewebsite.com') == (None, 'googlewebsite.com')
576 True
577 >>> parse_host('::1') == (socket.AF_INET6, '::1')
578 True
579 >>> parse_host('192.168.1.1') == (socket.AF_INET, '192.168.1.1')
580 True
581 """
582 if not host:
583 return None, u''
584 if u':' in host:
585 try:
586 inet_pton(socket.AF_INET6, host)
587 except socket.error as se:
588 raise URLParseError('invalid IPv6 host: %r (%r)' % (host, se))
589 except UnicodeEncodeError:
590 pass # TODO: this can't be a real host right?
591 else:
592 family = socket.AF_INET6
593 return family, host
594 try:
595 inet_pton(socket.AF_INET, host)
596 except (socket.error, UnicodeEncodeError):
597 family = None # not an IP
598 else:
599 family = socket.AF_INET
600 return family, host
601
602
603 class URL(object):
604 """From blogs to billboards, URLs are so common, that it's easy to
605 overlook their complexity and power. With hyperlink's
606 :class:`URL` type, working with URLs doesn't have to be hard.
607
608 URLs are made of many parts. Most of these parts are officially
609 named in `RFC 3986`_ and this diagram may prove handy in identifying
610 them::
611
612 foo://user:pass@example.com:8042/over/there?name=ferret#nose
613 \_/ \_______/ \_________/ \__/\_________/ \_________/ \__/
614 | | | | | | |
615 scheme userinfo host port path query fragment
616
617 While :meth:`~URL.from_text` is used for parsing whole URLs, the
618 :class:`URL` constructor builds a URL from the individual
619 components, like so::
620
621 >>> from hyperlink import URL
622 >>> url = URL(scheme=u'https', host=u'example.com', path=[u'hello', u'world'])
623 >>> print(url.to_text())
624 https://example.com/hello/world
625
626 The constructor runs basic type checks. All strings are expected
627 to be decoded (:class:`unicode` in Python 2). All arguments are
628 optional, defaulting to appropriately empty values. A full list of
629 constructor arguments is below.
630
631 Args:
632 scheme (unicode): The text name of the scheme.
633 host (unicode): The host portion of the network location
634 port (int): The port part of the network location. If
635 ``None`` or no port is passed, the port will default to
636 the default port of the scheme, if it is known. See the
637 ``SCHEME_PORT_MAP`` and :func:`register_default_port`
638 for more info.
639 path (tuple): A tuple of strings representing the
640 slash-separated parts of the path.
641 query (tuple): The query parameters, as a tuple of
642 key-value pairs.
643 fragment (unicode): The fragment part of the URL.
644 rooted (bool): Whether or not the path begins with a slash.
645 userinfo (unicode): The username or colon-separated
646 username:password pair.
647 uses_netloc (bool): Indicates whether two slashes appear
648 between the scheme and the host (``http://eg.com`` vs
649 ``mailto:e@g.com``). Set automatically based on scheme.
650
651 All of these parts are also exposed as read-only attributes of
652 URL instances, along with several useful methods.
653
654 .. _RFC 3986: https://tools.ietf.org/html/rfc3986
655 .. _RFC 3987: https://tools.ietf.org/html/rfc3987
656 """
657
658 def __init__(self, scheme=None, host=None, path=(), query=(), fragment=u'',
659 port=None, rooted=None, userinfo=u'', uses_netloc=None):
660 if host is not None and scheme is None:
661 scheme = u'http' # TODO: why
662 if port is None:
663 port = SCHEME_PORT_MAP.get(scheme)
664 if host and query and not path:
665 # per RFC 3986 6.2.3, "a URI that uses the generic syntax
666 # for authority with an empty path should be normalized to
667 # a path of '/'."
668 path = (u'',)
669
670 # Now that we're done detecting whether they were passed, we can set
671 # them to their defaults:
672 if scheme is None:
673 scheme = u''
674 if host is None:
675 host = u''
676 if rooted is None:
677 rooted = bool(host)
678
679 # Set attributes.
680 self._scheme = _textcheck("scheme", scheme)
681 if self._scheme:
682 if not _SCHEME_RE.match(self._scheme):
683 raise ValueError('invalid scheme: %r. Only alphanumeric, "+",'
684 ' "-", and "." allowed. Did you meant to call'
685 ' %s.from_text()?'
686 % (self._scheme, self.__class__.__name__))
687
688 _, self._host = parse_host(_textcheck('host', host, '/?#@'))
689 if isinstance(path, unicode):
690 raise TypeError("expected iterable of text for path, not: %r"
691 % (path,))
692 self._path = tuple((_textcheck("path segment", segment, '/?#')
693 for segment in path))
694 self._query = tuple(
695 (_textcheck("query parameter name", k, '&=#'),
696 _textcheck("query parameter value", v, '&#', nullable=True))
697 for (k, v) in query
698 )
699 self._fragment = _textcheck("fragment", fragment)
700 self._port = _typecheck("port", port, int, NoneType)
701 self._rooted = _typecheck("rooted", rooted, bool)
702 self._userinfo = _textcheck("userinfo", userinfo, '/?#@')
703
704 uses_netloc = scheme_uses_netloc(self._scheme, uses_netloc)
705 self._uses_netloc = _typecheck("uses_netloc",
706 uses_netloc, bool, NoneType)
707
708 return
709
710 @property
711 def scheme(self):
712 """The scheme is a string, and the first part of an absolute URL, the
713 part before the first colon, and the part which defines the
714 semantics of the rest of the URL. Examples include "http",
715 "https", "ssh", "file", "mailto", and many others. See
716 :func:`~hyperlink.register_scheme()` for more info.
717 """
718 return self._scheme
719
720 @property
721 def host(self):
722 """The host is a string, and the second standard part of an absolute
723 URL. When present, a valid host must be a domain name, or an
724 IP (v4 or v6). It occurs before the first slash, or the second
725 colon, if a :attr:`~hyperlink.URL.port` is provided.
726 """
727 return self._host
728
729 @property
730 def port(self):
731 """The port is an integer that is commonly used in connecting to the
732 :attr:`host`, and almost never appears without it.
733
734 When not present in the original URL, this attribute defaults
735 to the scheme's default port. If the scheme's default port is
736 not known, and the port is not provided, this attribute will
737 be set to None.
738
739 >>> URL.from_text(u'http://example.com/pa/th').port
740 80
741 >>> URL.from_text(u'foo://example.com/pa/th').port
742 >>> URL.from_text(u'foo://example.com:8042/pa/th').port
743 8042
744
745 .. note::
746
747 Per the standard, when the port is the same as the schemes
748 default port, it will be omitted in the text URL.
749
750 """
751 return self._port
752
753 @property
754 def path(self):
755 """A tuple of strings, created by splitting the slash-separated
756 hierarchical path. Started by the first slash after the host,
757 terminated by a "?", which indicates the start of the
758 :attr:`~hyperlink.URL.query` string.
759 """
760 return self._path
761
762 @property
763 def query(self):
764 """Tuple of pairs, created by splitting the ampersand-separated
765 mapping of keys and optional values representing
766 non-hierarchical data used to identify the resource. Keys are
767 always strings. Values are strings when present, or None when
768 missing.
769
770 For more operations on the mapping, see
771 :meth:`~hyperlink.URL.get()`, :meth:`~hyperlink.URL.add()`,
772 :meth:`~hyperlink.URL.set()`, and
773 :meth:`~hyperlink.URL.delete()`.
774 """
775 return self._query
776
777 @property
778 def fragment(self):
779 """A string, the last part of the URL, indicated by the first "#"
780 after the :attr:`~hyperlink.URL.path` or
781 :attr:`~hyperlink.URL.query`. Enables indirect identification
782 of a secondary resource, like an anchor within an HTML page.
783
784 """
785 return self._fragment
786
787 @property
788 def rooted(self):
789 """Whether or not the path starts with a forward slash (``/``).
790
791 This is taken from the terminology in the BNF grammar,
792 specifically the "path-rootless", rule, since "absolute path"
793 and "absolute URI" are somewhat ambiguous. :attr:`path` does
794 not contain the implicit prefixed ``"/"`` since that is
795 somewhat awkward to work with.
796
797 """
798 return self._rooted
799
800 @property
801 def userinfo(self):
802 """The colon-separated string forming the username-password
803 combination.
804 """
805 return self._userinfo
806
807 @property
808 def uses_netloc(self):
809 """
810 """
811 return self._uses_netloc
812
813 @property
814 def user(self):
815 """
816 The user portion of :attr:`~hyperlink.URL.userinfo`.
817 """
818 return self.userinfo.split(u':')[0]
819
820 def authority(self, with_password=False, **kw):
821 """Compute and return the appropriate host/port/userinfo combination.
822
823 >>> url = URL.from_text(u'http://user:pass@localhost:8080/a/b?x=y')
824 >>> url.authority()
825 u'user:@localhost:8080'
826 >>> url.authority(with_password=True)
827 u'user:pass@localhost:8080'
828
829 Args:
830 with_password (bool): Whether the return value of this
831 method include the password in the URL, if it is
832 set. Defaults to False.
833
834 Returns:
835 str: The authority (network location and user information) portion
836 of the URL.
837 """
838 # first, a bit of twisted compat
839 with_password = kw.pop('includeSecrets', with_password)
840 if kw:
841 raise TypeError('got unexpected keyword arguments: %r' % kw.keys())
842 host = self.host
843 if ':' in host:
844 hostport = ['[' + host + ']']
845 else:
846 hostport = [self.host]
847 if self.port != SCHEME_PORT_MAP.get(self.scheme):
848 hostport.append(unicode(self.port))
849 authority = []
850 if self.userinfo:
851 userinfo = self.userinfo
852 if not with_password and u":" in userinfo:
853 userinfo = userinfo[:userinfo.index(u":") + 1]
854 authority.append(userinfo)
855 authority.append(u":".join(hostport))
856 return u"@".join(authority)
857
858 def __eq__(self, other):
859 if not isinstance(other, self.__class__):
860 return NotImplemented
861 for attr in ['scheme', 'userinfo', 'host', 'query',
862 'fragment', 'port', 'uses_netloc']:
863 if getattr(self, attr) != getattr(other, attr):
864 return False
865 if self.path == other.path or (self.path in _ROOT_PATHS
866 and other.path in _ROOT_PATHS):
867 return True
868 return False
869
870 def __ne__(self, other):
871 if not isinstance(other, self.__class__):
872 return NotImplemented
873 return not self.__eq__(other)
874
875 def __hash__(self):
876 return hash((self.__class__, self.scheme, self.userinfo, self.host,
877 self.path, self.query, self.fragment, self.port,
878 self.rooted, self.uses_netloc))
879
880 @property
881 def absolute(self):
882 """Whether or not the URL is "absolute". Absolute URLs are complete
883 enough to resolve to a network resource without being relative
884 to a base URI.
885
886 >>> URL.from_text(u'http://wikipedia.org/').absolute
887 True
888 >>> URL.from_text(u'?a=b&c=d').absolute
889 False
890
891 Absolute URLs must have both a scheme and a host set.
892 """
893 return bool(self.scheme and self.host)
894
895 def replace(self, scheme=_UNSET, host=_UNSET, path=_UNSET, query=_UNSET,
896 fragment=_UNSET, port=_UNSET, rooted=_UNSET, userinfo=_UNSET,
897 uses_netloc=_UNSET):
898 """:class:`URL` objects are immutable, which means that attributes
899 are designed to be set only once, at construction. Instead of
900 modifying an existing URL, one simply creates a copy with the
901 desired changes.
902
903 If any of the following arguments is omitted, it defaults to
904 the value on the current URL.
905
906 Args:
907 scheme (unicode): The text name of the scheme.
908 host (unicode): The host portion of the network location
909 port (int): The port part of the network location.
910 path (tuple): A tuple of strings representing the
911 slash-separated parts of the path.
912 query (tuple): The query parameters, as a tuple of
913 key-value pairs.
914 fragment (unicode): The fragment part of the URL.
915 rooted (bool): Whether or not the path begins with a slash.
916 userinfo (unicode): The username or colon-separated
917 username:password pair.
918 uses_netloc (bool): Indicates whether two slashes appear
919 between the scheme and the host (``http://eg.com`` vs
920 ``mailto:e@g.com``)
921
922 Returns:
923 URL: a copy of the current :class:`URL`, with new values for
924 parameters passed.
925
926 """
927 return self.__class__(
928 scheme=_optional(scheme, self.scheme),
929 host=_optional(host, self.host),
930 path=_optional(path, self.path),
931 query=_optional(query, self.query),
932 fragment=_optional(fragment, self.fragment),
933 port=_optional(port, self.port),
934 rooted=_optional(rooted, self.rooted),
935 userinfo=_optional(userinfo, self.userinfo),
936 uses_netloc=_optional(uses_netloc, self.uses_netloc)
937 )
938
939 @classmethod
940 def from_text(cls, text):
941 """Whereas the :class:`URL` constructor is useful for constructing
942 URLs from parts, :meth:`~URL.from_text` supports parsing whole
943 URLs from their string form::
944
945 >>> URL.from_text(u'http://example.com')
946 URL.from_text(u'http://example.com')
947 >>> URL.from_text(u'?a=b&x=y')
948 URL.from_text(u'?a=b&x=y')
949
950 As you can see above, it's also used as the :func:`repr` of
951 :class:`URL` objects. The natural counterpart to
952 :func:`~URL.to_text()`. This method only accepts *text*, so be
953 sure to decode those bytestrings.
954
955 Args:
956 text (unicode): A valid URL string.
957
958 Returns:
959 URL: The structured object version of the parsed string.
960
961 .. note::
962
963 Somewhat unexpectedly, URLs are a far more permissive
964 format than most would assume. Many strings which don't
965 look like URLs are still valid URLs. As a result, this
966 method only raises :class:`URLParseError` on invalid port
967 and IPv6 values in the host portion of the URL.
968
969 """
970 um = _URL_RE.match(_textcheck('text', text))
971 try:
972 gs = um.groupdict()
973 except AttributeError:
974 raise URLParseError('could not parse url: %r' % text)
975
976 au_text = gs['authority'] or u''
977 au_m = _AUTHORITY_RE.match(au_text)
978 try:
979 au_gs = au_m.groupdict()
980 except AttributeError:
981 raise URLParseError('invalid authority %r in url: %r'
982 % (au_text, text))
983 if au_gs['bad_host']:
984 raise URLParseError('invalid host %r in url: %r')
985
986 userinfo = au_gs['userinfo'] or u''
987
988 host = au_gs['ipv6_host'] or au_gs['plain_host']
989 port = au_gs['port']
990 if port is not None:
991 try:
992 port = int(port)
993 except ValueError:
994 if not port: # TODO: excessive?
995 raise URLParseError('port must not be empty: %r' % au_text)
996 raise URLParseError('expected integer for port, not %r' % port)
997
998 scheme = gs['scheme'] or u''
999 fragment = gs['fragment'] or u''
1000 uses_netloc = bool(gs['_netloc_sep'])
1001
1002 if gs['path']:
1003 path = gs['path'].split(u"/")
1004 if not path[0]:
1005 path.pop(0)
1006 rooted = True
1007 else:
1008 rooted = False
1009 else:
1010 path = ()
1011 rooted = bool(au_text)
1012 if gs['query']:
1013 query = ((qe.split(u"=", 1) if u'=' in qe else (qe, None))
1014 for qe in gs['query'].split(u"&"))
1015 else:
1016 query = ()
1017 return cls(scheme, host, path, query, fragment, port,
1018 rooted, userinfo, uses_netloc)
1019
1020 def normalize(self, scheme=True, host=True, path=True, query=True,
1021 fragment=True):
1022 """Return a new URL object with several standard normalizations
1023 applied:
1024
1025 * Decode unreserved characters (`RFC 3986 2.3`_)
1026 * Uppercase remaining percent-encoded octets (`RFC 3986 2.1`_)
1027 * Convert scheme and host casing to lowercase (`RFC 3986 3.2.2`_)
1028 * Resolve any "." and ".." references in the path (`RFC 3986 6.2.2.3`_)
1029 * Ensure an ending slash on URLs with an empty path (`RFC 3986 6.2.3`_)
1030
1031 All are applied by default, but normalizations can be disabled
1032 per-part by passing `False` for that part's corresponding
1033 name.
1034
1035 Args:
1036 scheme (bool): Convert the scheme to lowercase
1037 host (bool): Convert the host to lowercase
1038 path (bool): Normalize the path (see above for details)
1039 query (bool): Normalize the query string
1040 fragment (bool): Normalize the fragment
1041
1042 >>> url = URL.from_text(u'Http://example.COM/a/../b/./c%2f?%61')
1043 >>> print(url.normalize().to_text())
1044 http://example.com/b/c%2F?a
1045
1046 .. _RFC 3986 3.2.2: https://tools.ietf.org/html/rfc3986#section-3.2.2
1047 .. _RFC 3986 2.3: https://tools.ietf.org/html/rfc3986#section-2.3
1048 .. _RFC 3986 2.1: https://tools.ietf.org/html/rfc3986#section-2.1
1049 .. _RFC 3986 6.2.2.3: https://tools.ietf.org/html/rfc3986#section-6.2.2.3
1050 .. _RFC 3986 6.2.3: https://tools.ietf.org/html/rfc3986#section-6.2.3
1051
1052 """
1053 # TODO: userinfo?
1054 kw = {}
1055 if scheme:
1056 kw['scheme'] = self.scheme.lower()
1057 if host:
1058 kw['host'] = self.host.lower()
1059 if path:
1060 if self.path:
1061 kw['path'] = [_decode_unreserved(p, normalize_case=True)
1062 for p in _resolve_dot_segments(self.path)]
1063 else:
1064 kw['path'] = (u'',)
1065 if query:
1066 kw['query'] = [(_decode_unreserved(k, normalize_case=True),
1067 _decode_unreserved(v, normalize_case=True)
1068 if v else v) for k, v in self.query]
1069 if fragment:
1070 kw['fragment'] = _decode_unreserved(self.fragment,
1071 normalize_case=True)
1072 return self.replace(**kw)
1073
1074 def child(self, *segments):
1075 """Make a new :class:`URL` where the given path segments are a child
1076 of this URL, preserving other parts of the URL, including the
1077 query string and fragment.
1078
1079 For example::
1080
1081 >>> url = URL.from_text(u'http://localhost/a/b?x=y')
1082 >>> child_url = url.child(u"c", u"d")
1083 >>> child_url.to_text()
1084 u'http://localhost/a/b/c/d?x=y'
1085
1086 Args:
1087 segments (unicode): Additional parts to be joined and added to
1088 the path, like :func:`os.path.join`. Special characters
1089 in segments will be percent encoded.
1090
1091 Returns:
1092 URL: A copy of the current URL with the extra path segments.
1093
1094 """
1095 segments = [_textcheck('path segment', s) for s in segments]
1096 new_segs = _encode_path_parts(segments, joined=False, maximal=False)
1097 new_path = self.path[:-1 if (self.path and self.path[-1] == u'')
1098 else None] + new_segs
1099 return self.replace(path=new_path)
1100
1101 def sibling(self, segment):
1102 """Make a new :class:`URL` with a single path segment that is a
1103 sibling of this URL path.
1104
1105 Args:
1106 segment (unicode): A single path segment.
1107
1108 Returns:
1109 URL: A copy of the current URL with the last path segment
1110 replaced by *segment*. Special characters such as
1111 ``/?#`` will be percent encoded.
1112
1113 """
1114 _textcheck('path segment', segment)
1115 new_path = self.path[:-1] + (_encode_path_part(segment),)
1116 return self.replace(path=new_path)
1117
1118 def click(self, href=u''):
1119 """Resolve the given URL relative to this URL.
1120
1121 The resulting URI should match what a web browser would
1122 generate if you visited the current URL and clicked on *href*.
1123
1124 >>> url = URL.from_text(u'http://blog.hatnote.com/')
1125 >>> url.click(u'/post/155074058790').to_text()
1126 u'http://blog.hatnote.com/post/155074058790'
1127 >>> url = URL.from_text(u'http://localhost/a/b/c/')
1128 >>> url.click(u'../d/./e').to_text()
1129 u'http://localhost/a/b/d/e'
1130
1131 Args:
1132 href (unicode): A string representing a clicked URL.
1133
1134 Return:
1135 URL: A copy of the current URL with navigation logic applied.
1136
1137 For more information, see `RFC 3986 section 5`_.
1138
1139 .. _RFC 3986 section 5: https://tools.ietf.org/html/rfc3986#section-5
1140 """
1141 if href:
1142 if isinstance(href, URL):
1143 clicked = href
1144 else:
1145 # TODO: This error message is not completely accurate,
1146 # as URL objects are now also valid, but Twisted's
1147 # test suite (wrongly) relies on this exact message.
1148 _textcheck('relative URL', href)
1149 clicked = URL.from_text(href)
1150 if clicked.absolute:
1151 return clicked
1152 else:
1153 clicked = self
1154
1155 query = clicked.query
1156 if clicked.scheme and not clicked.rooted:
1157 # Schemes with relative paths are not well-defined. RFC 3986 calls
1158 # them a "loophole in prior specifications" that should be avoided,
1159 # or supported only for backwards compatibility.
1160 raise NotImplementedError('absolute URI with rootless path: %r'
1161 % (href,))
1162 else:
1163 if clicked.rooted:
1164 path = clicked.path
1165 elif clicked.path:
1166 path = self.path[:-1] + clicked.path
1167 else:
1168 path = self.path
1169 if not query:
1170 query = self.query
1171 return self.replace(scheme=clicked.scheme or self.scheme,
1172 host=clicked.host or self.host,
1173 port=clicked.port or self.port,
1174 path=_resolve_dot_segments(path),
1175 query=query,
1176 fragment=clicked.fragment)
1177
1178 def to_uri(self):
1179 u"""Make a new :class:`URL` instance with all non-ASCII characters
1180 appropriately percent-encoded. This is useful to do in preparation
1181 for sending a :class:`URL` over a network protocol.
1182
1183 For example::
1184
1185 >>> URL.from_text(u'https://→example.com/foo⇧bar/').to_uri()
1186 URL.from_text(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/')
1187
1188 Returns:
1189 URL: A new instance with its path segments, query parameters, and
1190 hostname encoded, so that they are all in the standard
1191 US-ASCII range.
1192 """
1193 new_userinfo = u':'.join([_encode_userinfo_part(p) for p in
1194 self.userinfo.split(':', 1)])
1195 new_path = _encode_path_parts(self.path, has_scheme=bool(self.scheme),
1196 rooted=False, joined=False, maximal=True)
1197 return self.replace(
1198 userinfo=new_userinfo,
1199 host=self.host.encode("idna").decode("ascii"),
1200 path=new_path,
1201 query=tuple([tuple(_encode_query_part(x, maximal=True)
1202 if x is not None else None
1203 for x in (k, v))
1204 for k, v in self.query]),
1205 fragment=_encode_fragment_part(self.fragment, maximal=True)
1206 )
1207
1208 def to_iri(self):
1209 u"""Make a new :class:`URL` instance with all but a few reserved
1210 characters decoded into human-readable format.
1211
1212 Percent-encoded Unicode and IDNA-encoded hostnames are
1213 decoded, like so::
1214
1215 >>> url = URL.from_text(u'https://xn--example-dk9c.com/foo%E2%87%A7bar/')
1216 >>> print(url.to_iri().to_text())
1217 https://→example.com/foo⇧bar/
1218
1219 .. note::
1220
1221 As a general Python issue, "narrow" (UCS-2) builds of
1222 Python may not be able to fully decode certain URLs, and
1223 the in those cases, this method will return a best-effort,
1224 partially-decoded, URL which is still valid. This issue
1225 does not affect any Python builds 3.4+.
1226
1227 Returns:
1228 URL: A new instance with its path segments, query parameters, and
1229 hostname decoded for display purposes.
1230 """
1231 new_userinfo = u':'.join([_decode_userinfo_part(p) for p in
1232 self.userinfo.split(':', 1)])
1233 try:
1234 asciiHost = self.host.encode("ascii")
1235 except UnicodeEncodeError:
1236 textHost = self.host
1237 else:
1238 try:
1239 textHost = asciiHost.decode("idna")
1240 except ValueError:
1241 # only reached on "narrow" (UCS-2) Python builds <3.4, see #7
1242 textHost = self.host
1243 return self.replace(userinfo=new_userinfo,
1244 host=textHost,
1245 path=[_decode_path_part(segment)
1246 for segment in self.path],
1247 query=[tuple(_decode_query_part(x)
1248 if x is not None else None
1249 for x in (k, v))
1250 for k, v in self.query],
1251 fragment=_decode_fragment_part(self.fragment))
1252
1253 def to_text(self, with_password=False):
1254 """Render this URL to its textual representation.
1255
1256 By default, the URL text will *not* include a password, if one
1257 is set. RFC 3986 considers using URLs to represent such
1258 sensitive information as deprecated. Quoting from RFC 3986,
1259 `section 3.2.1`:
1260
1261 "Applications should not render as clear text any data after the
1262 first colon (":") character found within a userinfo subcomponent
1263 unless the data after the colon is the empty string (indicating no
1264 password)."
1265
1266 Args:
1267 with_password (bool): Whether or not to include the
1268 password in the URL text. Defaults to False.
1269
1270 Returns:
1271 str: The serialized textual representation of this URL,
1272 such as ``u"http://example.com/some/path?some=query"``.
1273
1274 The natural counterpart to :class:`URL.from_text()`.
1275
1276 .. _section 3.2.1: https://tools.ietf.org/html/rfc3986#section-3.2.1
1277 """
1278 scheme = self.scheme
1279 authority = self.authority(with_password)
1280 path = _encode_path_parts(self.path,
1281 rooted=self.rooted,
1282 has_scheme=bool(scheme),
1283 has_authority=bool(authority),
1284 maximal=False)
1285 query_string = u'&'.join(
1286 u'='.join((_encode_query_part(x, maximal=False)
1287 for x in ([k] if v is None else [k, v])))
1288 for (k, v) in self.query)
1289
1290 fragment = self.fragment
1291
1292 parts = []
1293 _add = parts.append
1294 if scheme:
1295 _add(scheme)
1296 _add(':')
1297 if authority:
1298 _add('//')
1299 _add(authority)
1300 elif (scheme and path[:2] != '//' and self.uses_netloc):
1301 _add('//')
1302 if path:
1303 if scheme and authority and path[:1] != '/':
1304 _add('/') # relpaths with abs authorities auto get '/'
1305 _add(path)
1306 if query_string:
1307 _add('?')
1308 _add(query_string)
1309 if fragment:
1310 _add('#')
1311 _add(fragment)
1312 return u''.join(parts)
1313
1314 def __repr__(self):
1315 """Convert this URL to an representation that shows all of its
1316 constituent parts, as well as being a valid argument to
1317 :func:`eval`.
1318 """
1319 return '%s.from_text(%r)' % (self.__class__.__name__, self.to_text())
1320
1321 # # Begin Twisted Compat Code
1322 asURI = to_uri
1323 asIRI = to_iri
1324
1325 @classmethod
1326 def fromText(cls, s):
1327 return cls.from_text(s)
1328
1329 def asText(self, includeSecrets=False):
1330 return self.to_text(with_password=includeSecrets)
1331
1332 def __dir__(self):
1333 try:
1334 ret = object.__dir__(self)
1335 except AttributeError:
1336 # object.__dir__ == AttributeError # pdw for py2
1337 ret = dir(self.__class__) + list(self.__dict__.keys())
1338 ret = sorted(set(ret) - set(['fromText', 'asURI', 'asIRI', 'asText']))
1339 return ret
1340
1341 # # End Twisted Compat Code
1342
1343 def add(self, name, value=None):
1344 """Make a new :class:`URL` instance with a given query argument,
1345 *name*, added to it with the value *value*, like so::
1346
1347 >>> URL.from_text(u'https://example.com/?x=y').add(u'x')
1348 URL.from_text(u'https://example.com/?x=y&x')
1349 >>> URL.from_text(u'https://example.com/?x=y').add(u'x', u'z')
1350 URL.from_text(u'https://example.com/?x=y&x=z')
1351
1352 Args:
1353 name (unicode): The name of the query parameter to add. The
1354 part before the ``=``.
1355 value (unicode): The value of the query parameter to add. The
1356 part after the ``=``. Defaults to ``None``, meaning no
1357 value.
1358
1359 Returns:
1360 URL: A new :class:`URL` instance with the parameter added.
1361 """
1362 return self.replace(query=self.query + ((name, value),))
1363
1364 def set(self, name, value=None):
1365 """Make a new :class:`URL` instance with the query parameter *name*
1366 set to *value*. All existing occurences, if any are replaced
1367 by the single name-value pair.
1368
1369 >>> URL.from_text(u'https://example.com/?x=y').set(u'x')
1370 URL.from_text(u'https://example.com/?x')
1371 >>> URL.from_text(u'https://example.com/?x=y').set(u'x', u'z')
1372 URL.from_text(u'https://example.com/?x=z')
1373
1374 Args:
1375 name (unicode): The name of the query parameter to set. The
1376 part before the ``=``.
1377 value (unicode): The value of the query parameter to set. The
1378 part after the ``=``. Defaults to ``None``, meaning no
1379 value.
1380
1381 Returns:
1382 URL: A new :class:`URL` instance with the parameter set.
1383 """
1384 # Preserve the original position of the query key in the list
1385 q = [(k, v) for (k, v) in self.query if k != name]
1386 idx = next((i for (i, (k, v)) in enumerate(self.query)
1387 if k == name), -1)
1388 q[idx:idx] = [(name, value)]
1389 return self.replace(query=q)
1390
1391 def get(self, name):
1392 """Get a list of values for the given query parameter, *name*::
1393
1394 >>> url = URL.from_text(u'?x=1&x=2')
1395 >>> url.get('x')
1396 [u'1', u'2']
1397 >>> url.get('y')
1398 []
1399
1400 If the given *name* is not set, an empty list is returned. A
1401 list is always returned, and this method raises no exceptions.
1402
1403 Args:
1404 name (unicode): The name of the query parameter to get.
1405
1406 Returns:
1407 list: A list of all the values associated with the key, in
1408 string form.
1409
1410 """
1411 return [value for (key, value) in self.query if name == key]
1412
1413 def remove(self, name):
1414 """Make a new :class:`URL` instance with all occurrences of the query
1415 parameter *name* removed. No exception is raised if the
1416 parameter is not already set.
1417
1418 Args:
1419 name (unicode): The name of the query parameter to remove.
1420
1421 Returns:
1422 URL: A new :class:`URL` instance with the parameter removed.
1423
1424 """
1425 return self.replace(query=((k, v) for (k, v) in self.query
1426 if k != name))
(New empty file)
0
1
2 from unittest import TestCase
3
4
5 class HyperlinkTestCase(TestCase):
6 """This type mostly exists to provide a backwards-compatible
7 assertRaises method for Python 2.6 testing.
8 """
9 def assertRaises(self, excClass, callableObj=None, *args, **kwargs):
10 """Fail unless an exception of class excClass is raised
11 by callableObj when invoked with arguments args and keyword
12 arguments kwargs. If a different type of exception is
13 raised, it will not be caught, and the test case will be
14 deemed to have suffered an error, exactly as for an
15 unexpected exception.
16
17 If called with callableObj omitted or None, will return a
18 context object used like this::
19
20 with self.assertRaises(SomeException):
21 do_something()
22
23 The context manager keeps a reference to the exception as
24 the 'exception' attribute. This allows you to inspect the
25 exception after the assertion::
26
27 with self.assertRaises(SomeException) as cm:
28 do_something()
29 the_exception = cm.exception
30 self.assertEqual(the_exception.error_code, 3)
31 """
32 context = _AssertRaisesContext(excClass, self)
33 if callableObj is None:
34 return context
35 with context:
36 callableObj(*args, **kwargs)
37
38
39 class _AssertRaisesContext(object):
40 "A context manager used to implement HyperlinkTestCase.assertRaises."
41
42 def __init__(self, expected, test_case):
43 self.expected = expected
44 self.failureException = test_case.failureException
45
46 def __enter__(self):
47 return self
48
49 def __exit__(self, exc_type, exc_value, tb):
50 if exc_type is None:
51 exc_name = self.expected.__name__
52 raise self.failureException("%s not raised" % (exc_name,))
53 if not issubclass(exc_type, self.expected):
54 # let unexpected exceptions pass through
55 return False
56 self.exception = exc_value # store for later retrieval
57 return True
0 """
1 Tests for hyperlink.test.common
2 """
3 from unittest import TestCase
4 from .common import HyperlinkTestCase
5
6
7 class _ExpectedException(Exception):
8 """An exception used to test HyperlinkTestCase.assertRaises.
9
10 """
11
12
13 class _UnexpectedException(Exception):
14 """An exception used to test HyperlinkTestCase.assertRaises.
15
16 """
17
18
19 class TestHyperlink(TestCase):
20 """Tests for HyperlinkTestCase"""
21
22 def setUp(self):
23 self.hyperlink_test = HyperlinkTestCase("run")
24
25 def test_assertRaisesWithCallable(self):
26 """HyperlinkTestCase.assertRaises does not raise an AssertionError
27 when given a callable that, when called with the provided
28 arguments, raises the expected exception.
29
30 """
31 called_with = []
32
33 def raisesExpected(*args, **kwargs):
34 called_with.append((args, kwargs))
35 raise _ExpectedException
36
37 self.hyperlink_test.assertRaises(_ExpectedException,
38 raisesExpected, 1, keyword=True)
39 self.assertEqual(called_with, [((1,), {"keyword": True})])
40
41 def test_assertRaisesWithCallableUnexpectedException(self):
42 """When given a callable that raises an unexpected exception,
43 HyperlinkTestCase.assertRaises raises that exception.
44
45 """
46
47 def doesNotRaiseExpected(*args, **kwargs):
48 raise _UnexpectedException
49
50 try:
51 self.hyperlink_test.assertRaises(_ExpectedException,
52 doesNotRaiseExpected)
53 except _UnexpectedException:
54 pass
55
56 def test_assertRaisesWithCallableDoesNotRaise(self):
57 """HyperlinkTestCase.assertRaises raises an AssertionError when given
58 a callable that, when called, does not raise any exception.
59
60 """
61
62 def doesNotRaise(*args, **kwargs):
63 return True
64
65 try:
66 self.hyperlink_test.assertRaises(_ExpectedException,
67 doesNotRaise)
68 except AssertionError:
69 pass
70
71 def test_assertRaisesContextManager(self):
72 """HyperlinkTestCase.assertRaises does not raise an AssertionError
73 when used as a context manager with a suite that raises the
74 expected exception. The context manager stores the exception
75 instance under its `exception` instance variable.
76
77 """
78 with self.hyperlink_test.assertRaises(_ExpectedException) as cm:
79 raise _ExpectedException
80
81 self.assertTrue(isinstance(cm.exception, _ExpectedException))
82
83 def test_assertRaisesContextManagerUnexpectedException(self):
84 """When used as a context manager with a block that raises an
85 unexpected exception, HyperlinkTestCase.assertRaises raises
86 that unexpected exception.
87
88 """
89 try:
90 with self.hyperlink_test.assertRaises(_ExpectedException):
91 raise _UnexpectedException
92 except _UnexpectedException:
93 pass
94
95 def test_assertRaisesContextManagerDoesNotRaise(self):
96 """HyperlinkTestcase.assertRaises raises an AssertionError when used
97 as a context manager with a block that does not raise any
98 exception.
99
100 """
101 try:
102 with self.hyperlink_test.assertRaises(_ExpectedException):
103 pass
104 except AssertionError:
105 pass
0 # -*- coding: utf-8 -*-
1 from __future__ import unicode_literals
2
3
4 from .. import _url
5 from .common import HyperlinkTestCase
6 from .._url import register_scheme, URL
7
8
9 class TestSchemeRegistration(HyperlinkTestCase):
10
11 def setUp(self):
12 self._orig_scheme_port_map = dict(_url.SCHEME_PORT_MAP)
13 self._orig_no_netloc_schemes = set(_url.NO_NETLOC_SCHEMES)
14
15 def tearDown(self):
16 _url.SCHEME_PORT_MAP = self._orig_scheme_port_map
17 _url.NO_NETLOC_SCHEMES = self._orig_no_netloc_schemes
18
19 def test_register_scheme_basic(self):
20 register_scheme('deltron', uses_netloc=True, default_port=3030)
21
22 u1 = URL.from_text('deltron://example.com')
23 assert u1.scheme == 'deltron'
24 assert u1.port == 3030
25 assert u1.uses_netloc is True
26
27 # test netloc works even when the original gives no indication
28 u2 = URL.from_text('deltron:')
29 u2 = u2.replace(host='example.com')
30 assert u2.to_text() == 'deltron://example.com'
31
32 # test default port means no emission
33 u3 = URL.from_text('deltron://example.com:3030')
34 assert u3.to_text() == 'deltron://example.com'
35
36 register_scheme('nonetron', default_port=3031)
37 u4 = URL(scheme='nonetron')
38 u4 = u4.replace(host='example.com')
39 assert u4.to_text() == 'nonetron://example.com'
40
41 def test_register_no_netloc_scheme(self):
42 register_scheme('noloctron', uses_netloc=False)
43 u4 = URL(scheme='noloctron')
44 u4 = u4.replace(path=("example", "path"))
45 assert u4.to_text() == 'noloctron:example/path'
46
47 def test_register_no_netloc_with_port(self):
48 with self.assertRaises(ValueError):
49 register_scheme('badnetlocless', uses_netloc=False, default_port=7)
50
51 def test_invalid_uses_netloc(self):
52 with self.assertRaises(ValueError):
53 register_scheme('badnetloc', uses_netloc=None)
54 with self.assertRaises(ValueError):
55 register_scheme('badnetloc', uses_netloc=object())
56
57 def test_register_invalid_uses_netloc(self):
58 with self.assertRaises(ValueError):
59 register_scheme('lol', uses_netloc=lambda: 'nope')
60
61 def test_register_invalid_port(self):
62 with self.assertRaises(ValueError):
63 register_scheme('nope', default_port=lambda: 'lol')
0 # -*- coding: utf-8 -*-
1
2 # Copyright (c) Twisted Matrix Laboratories.
3 # See LICENSE for details.
4
5 from __future__ import unicode_literals
6
7 import socket
8
9 from .common import HyperlinkTestCase
10 from .. import URL, URLParseError
11 # automatically import the py27 windows implementation when appropriate
12 from .. import _url
13 from .._url import inet_pton, SCHEME_PORT_MAP, parse_host
14
15 unicode = type(u'')
16
17
18 BASIC_URL = "http://www.foo.com/a/nice/path/?zot=23&zut"
19
20 # Examples from RFC 3986 section 5.4, Reference Resolution Examples
21 relativeLinkBaseForRFC3986 = 'http://a/b/c/d;p?q'
22 relativeLinkTestsForRFC3986 = [
23 # "Normal"
24 # ('g:h', 'g:h'), # can't click on a scheme-having url without an abs path
25 ('g', 'http://a/b/c/g'),
26 ('./g', 'http://a/b/c/g'),
27 ('g/', 'http://a/b/c/g/'),
28 ('/g', 'http://a/g'),
29 ('//g', 'http://g'),
30 ('?y', 'http://a/b/c/d;p?y'),
31 ('g?y', 'http://a/b/c/g?y'),
32 ('#s', 'http://a/b/c/d;p?q#s'),
33 ('g#s', 'http://a/b/c/g#s'),
34 ('g?y#s', 'http://a/b/c/g?y#s'),
35 (';x', 'http://a/b/c/;x'),
36 ('g;x', 'http://a/b/c/g;x'),
37 ('g;x?y#s', 'http://a/b/c/g;x?y#s'),
38 ('', 'http://a/b/c/d;p?q'),
39 ('.', 'http://a/b/c/'),
40 ('./', 'http://a/b/c/'),
41 ('..', 'http://a/b/'),
42 ('../', 'http://a/b/'),
43 ('../g', 'http://a/b/g'),
44 ('../..', 'http://a/'),
45 ('../../', 'http://a/'),
46 ('../../g', 'http://a/g'),
47
48 # Abnormal examples
49 # ".." cannot be used to change the authority component of a URI.
50 ('../../../g', 'http://a/g'),
51 ('../../../../g', 'http://a/g'),
52
53 # Only include "." and ".." when they are only part of a larger segment,
54 # not by themselves.
55 ('/./g', 'http://a/g'),
56 ('/../g', 'http://a/g'),
57 ('g.', 'http://a/b/c/g.'),
58 ('.g', 'http://a/b/c/.g'),
59 ('g..', 'http://a/b/c/g..'),
60 ('..g', 'http://a/b/c/..g'),
61 # Unnecessary or nonsensical forms of "." and "..".
62 ('./../g', 'http://a/b/g'),
63 ('./g/.', 'http://a/b/c/g/'),
64 ('g/./h', 'http://a/b/c/g/h'),
65 ('g/../h', 'http://a/b/c/h'),
66 ('g;x=1/./y', 'http://a/b/c/g;x=1/y'),
67 ('g;x=1/../y', 'http://a/b/c/y'),
68 # Separating the reference's query and fragment components from the path.
69 ('g?y/./x', 'http://a/b/c/g?y/./x'),
70 ('g?y/../x', 'http://a/b/c/g?y/../x'),
71 ('g#s/./x', 'http://a/b/c/g#s/./x'),
72 ('g#s/../x', 'http://a/b/c/g#s/../x')
73 ]
74
75
76 ROUNDTRIP_TESTS = (
77 "http://localhost",
78 "http://localhost/",
79 "http://127.0.0.1/",
80 "http://[::127.0.0.1]/",
81 "http://[::1]/",
82 "http://localhost/foo",
83 "http://localhost/foo/",
84 "http://localhost/foo!!bar/",
85 "http://localhost/foo%20bar/",
86 "http://localhost/foo%2Fbar/",
87 "http://localhost/foo?n",
88 "http://localhost/foo?n=v",
89 "http://localhost/foo?n=/a/b",
90 "http://example.com/foo!@$bar?b!@z=123",
91 "http://localhost/asd?a=asd%20sdf/345",
92 "http://(%2525)/(%2525)?(%2525)&(%2525)=(%2525)#(%2525)",
93 "http://(%C3%A9)/(%C3%A9)?(%C3%A9)&(%C3%A9)=(%C3%A9)#(%C3%A9)",
94 "?sslrootcert=/Users/glyph/Downloads/rds-ca-2015-root.pem&sslmode=verify",
95
96 # from boltons.urlutils' tests
97
98 'http://googlewebsite.com/e-shops.aspx',
99 'http://example.com:8080/search?q=123&business=Nothing%20Special',
100 'http://hatnote.com:9000/?arg=1&arg=2&arg=3',
101 'https://xn--bcher-kva.ch',
102 'http://xn--ggbla1c4e.xn--ngbc5azd/',
103 'http://tools.ietf.org/html/rfc3986#section-3.4',
104 # 'http://wiki:pedia@hatnote.com',
105 'ftp://ftp.rfc-editor.org/in-notes/tar/RFCs0001-0500.tar.gz',
106 'http://[1080:0:0:0:8:800:200C:417A]/index.html',
107 'ssh://192.0.2.16:2222/',
108 'https://[::101.45.75.219]:80/?hi=bye',
109 'ldap://[::192.9.5.5]/dc=example,dc=com??sub?(sn=Jensen)',
110 'mailto:me@example.com?to=me@example.com&body=hi%20http://wikipedia.org',
111 'news:alt.rec.motorcycle',
112 'tel:+1-800-867-5309',
113 'urn:oasis:member:A00024:x',
114 ('magnet:?xt=urn:btih:1a42b9e04e122b97a5254e3df77ab3c4b7da725f&dn=Puppy%'
115 '20Linux%20precise-5.7.1.iso&tr=udp://tracker.openbittorrent.com:80&'
116 'tr=udp://tracker.publicbt.com:80&tr=udp://tracker.istole.it:6969&'
117 'tr=udp://tracker.ccc.de:80&tr=udp://open.demonii.com:1337'),
118
119 # percent-encoded delimiters in percent-encodable fields
120
121 'https://%3A@example.com/', # colon in username
122 'https://%40@example.com/', # at sign in username
123 'https://%2f@example.com/', # slash in username
124 'https://a:%3a@example.com/', # colon in password
125 'https://a:%40@example.com/', # at sign in password
126 'https://a:%2f@example.com/', # slash in password
127 'https://a:%3f@example.com/', # question mark in password
128 'https://example.com/%2F/', # slash in path
129 'https://example.com/%3F/', # question mark in path
130 'https://example.com/%23/', # hash in path
131 'https://example.com/?%23=b', # hash in query param name
132 'https://example.com/?%3D=b', # equals in query param name
133 'https://example.com/?%26=b', # ampersand in query param name
134 'https://example.com/?a=%23', # hash in query param value
135 'https://example.com/?a=%26', # ampersand in query param value
136 'https://example.com/?a=%3D', # equals in query param value
137 # double-encoded percent sign in all percent-encodable positions:
138 "http://(%2525):(%2525)@example.com/(%2525)/?(%2525)=(%2525)#(%2525)",
139 # colon in first part of schemeless relative url
140 'first_seg_rel_path__colon%3Anotok/second_seg__colon%3Aok',
141 )
142
143
144 class TestURL(HyperlinkTestCase):
145 """
146 Tests for L{URL}.
147 """
148
149 def assertUnicoded(self, u):
150 """
151 The given L{URL}'s components should be L{unicode}.
152
153 @param u: The L{URL} to test.
154 """
155 self.assertTrue(isinstance(u.scheme, unicode) or u.scheme is None,
156 repr(u))
157 self.assertTrue(isinstance(u.host, unicode) or u.host is None,
158 repr(u))
159 for seg in u.path:
160 self.assertEqual(type(seg), unicode, repr(u))
161 for (k, v) in u.query:
162 self.assertEqual(type(seg), unicode, repr(u))
163 self.assertTrue(v is None or isinstance(v, unicode), repr(u))
164 self.assertEqual(type(u.fragment), unicode, repr(u))
165
166 def assertURL(self, u, scheme, host, path, query,
167 fragment, port, userinfo=''):
168 """
169 The given L{URL} should have the given components.
170
171 @param u: The actual L{URL} to examine.
172
173 @param scheme: The expected scheme.
174
175 @param host: The expected host.
176
177 @param path: The expected path.
178
179 @param query: The expected query.
180
181 @param fragment: The expected fragment.
182
183 @param port: The expected port.
184
185 @param userinfo: The expected userinfo.
186 """
187 actual = (u.scheme, u.host, u.path, u.query,
188 u.fragment, u.port, u.userinfo)
189 expected = (scheme, host, tuple(path), tuple(query),
190 fragment, port, u.userinfo)
191 self.assertEqual(actual, expected)
192
193 def test_initDefaults(self):
194 """
195 L{URL} should have appropriate default values.
196 """
197 def check(u):
198 self.assertUnicoded(u)
199 self.assertURL(u, 'http', '', [], [], '', 80, '')
200
201 check(URL('http', ''))
202 check(URL('http', '', [], []))
203 check(URL('http', '', [], [], ''))
204
205 def test_init(self):
206 """
207 L{URL} should accept L{unicode} parameters.
208 """
209 u = URL('s', 'h', ['p'], [('k', 'v'), ('k', None)], 'f')
210 self.assertUnicoded(u)
211 self.assertURL(u, 's', 'h', ['p'], [('k', 'v'), ('k', None)],
212 'f', None)
213
214 self.assertURL(URL('http', '\xe0', ['\xe9'],
215 [('\u03bb', '\u03c0')], '\u22a5'),
216 'http', '\xe0', ['\xe9'],
217 [('\u03bb', '\u03c0')], '\u22a5', 80)
218
219 def test_initPercent(self):
220 """
221 L{URL} should accept (and not interpret) percent characters.
222 """
223 u = URL('s', '%68', ['%70'], [('%6B', '%76'), ('%6B', None)],
224 '%66')
225 self.assertUnicoded(u)
226 self.assertURL(u,
227 's', '%68', ['%70'],
228 [('%6B', '%76'), ('%6B', None)],
229 '%66', None)
230
231 def test_repr(self):
232 """
233 L{URL.__repr__} will display the canonical form of the URL, wrapped in
234 a L{URL.from_text} invocation, so that it is C{eval}-able but still easy
235 to read.
236 """
237 self.assertEqual(
238 repr(URL(scheme='http', host='foo', path=['bar'],
239 query=[('baz', None), ('k', 'v')],
240 fragment='frob')),
241 "URL.from_text(%s)" % (repr(u"http://foo/bar?baz&k=v#frob"),)
242 )
243
244 def test_from_text(self):
245 """
246 Round-tripping L{URL.from_text} with C{str} results in an equivalent
247 URL.
248 """
249 urlpath = URL.from_text(BASIC_URL)
250 self.assertEqual(BASIC_URL, urlpath.to_text())
251
252 def test_roundtrip(self):
253 """
254 L{URL.to_text} should invert L{URL.from_text}.
255 """
256 for test in ROUNDTRIP_TESTS:
257 result = URL.from_text(test).to_text(with_password=True)
258 self.assertEqual(test, result)
259
260 def test_roundtrip_double_iri(self):
261 for test in ROUNDTRIP_TESTS:
262 url = URL.from_text(test)
263 iri = url.to_iri()
264 double_iri = iri.to_iri()
265 assert iri == double_iri
266
267 iri_text = iri.to_text(with_password=True)
268 double_iri_text = double_iri.to_text(with_password=True)
269 assert iri_text == double_iri_text
270 return
271
272 def test_equality(self):
273 """
274 Two URLs decoded using L{URL.from_text} will be equal (C{==}) if they
275 decoded same URL string, and unequal (C{!=}) if they decoded different
276 strings.
277 """
278 urlpath = URL.from_text(BASIC_URL)
279 self.assertEqual(urlpath, URL.from_text(BASIC_URL))
280 self.assertNotEqual(
281 urlpath,
282 URL.from_text('ftp://www.anotherinvaliddomain.com/'
283 'foo/bar/baz/?zot=21&zut')
284 )
285
286 def test_fragmentEquality(self):
287 """
288 An URL created with the empty string for a fragment compares equal
289 to an URL created with an unspecified fragment.
290 """
291 self.assertEqual(URL(fragment=''), URL())
292 self.assertEqual(URL.from_text(u"http://localhost/#"),
293 URL.from_text(u"http://localhost/"))
294
295 def test_child(self):
296 """
297 L{URL.child} appends a new path segment, but does not affect the query
298 or fragment.
299 """
300 urlpath = URL.from_text(BASIC_URL)
301 self.assertEqual("http://www.foo.com/a/nice/path/gong?zot=23&zut",
302 urlpath.child('gong').to_text())
303 self.assertEqual("http://www.foo.com/a/nice/path/gong%2F?zot=23&zut",
304 urlpath.child('gong/').to_text())
305 self.assertEqual(
306 "http://www.foo.com/a/nice/path/gong%2Fdouble?zot=23&zut",
307 urlpath.child('gong/double').to_text()
308 )
309 self.assertEqual(
310 "http://www.foo.com/a/nice/path/gong%2Fdouble%2F?zot=23&zut",
311 urlpath.child('gong/double/').to_text()
312 )
313
314 def test_multiChild(self):
315 """
316 L{URL.child} receives multiple segments as C{*args} and appends each in
317 turn.
318 """
319 url = URL.from_text('http://example.com/a/b')
320 self.assertEqual(url.child('c', 'd', 'e').to_text(),
321 'http://example.com/a/b/c/d/e')
322
323 def test_childInitRoot(self):
324 """
325 L{URL.child} of a L{URL} without a path produces a L{URL} with a single
326 path segment.
327 """
328 childURL = URL(host=u"www.foo.com").child(u"c")
329 self.assertTrue(childURL.rooted)
330 self.assertEqual("http://www.foo.com/c", childURL.to_text())
331
332 def test_sibling(self):
333 """
334 L{URL.sibling} of a L{URL} replaces the last path segment, but does not
335 affect the query or fragment.
336 """
337 urlpath = URL.from_text(BASIC_URL)
338 self.assertEqual(
339 "http://www.foo.com/a/nice/path/sister?zot=23&zut",
340 urlpath.sibling('sister').to_text()
341 )
342 # Use an url without trailing '/' to check child removal.
343 url_text = "http://www.foo.com/a/nice/path?zot=23&zut"
344 urlpath = URL.from_text(url_text)
345 self.assertEqual(
346 "http://www.foo.com/a/nice/sister?zot=23&zut",
347 urlpath.sibling('sister').to_text()
348 )
349
350 def test_click(self):
351 """
352 L{URL.click} interprets the given string as a relative URI-reference
353 and returns a new L{URL} interpreting C{self} as the base absolute URI.
354 """
355 urlpath = URL.from_text(BASIC_URL)
356 # A null uri should be valid (return here).
357 self.assertEqual("http://www.foo.com/a/nice/path/?zot=23&zut",
358 urlpath.click("").to_text())
359 # A simple relative path remove the query.
360 self.assertEqual("http://www.foo.com/a/nice/path/click",
361 urlpath.click("click").to_text())
362 # An absolute path replace path and query.
363 self.assertEqual("http://www.foo.com/click",
364 urlpath.click("/click").to_text())
365 # Replace just the query.
366 self.assertEqual("http://www.foo.com/a/nice/path/?burp",
367 urlpath.click("?burp").to_text())
368 # One full url to another should not generate '//' between authority.
369 # and path
370 self.assertTrue("//foobar" not in
371 urlpath.click('http://www.foo.com/foobar').to_text())
372
373 # From a url with no query clicking a url with a query, the query
374 # should be handled properly.
375 u = URL.from_text('http://www.foo.com/me/noquery')
376 self.assertEqual('http://www.foo.com/me/17?spam=158',
377 u.click('/me/17?spam=158').to_text())
378
379 # Check that everything from the path onward is removed when the click
380 # link has no path.
381 u = URL.from_text('http://localhost/foo?abc=def')
382 self.assertEqual(u.click('http://www.python.org').to_text(),
383 'http://www.python.org')
384
385 # https://twistedmatrix.com/trac/ticket/8184
386 u = URL.from_text('http://hatnote.com/a/b/../c/./d/e/..')
387 res = 'http://hatnote.com/a/c/d/'
388 self.assertEqual(u.click('').to_text(), res)
389
390 # test click default arg is same as empty string above
391 self.assertEqual(u.click().to_text(), res)
392
393 # test click on a URL instance
394 u = URL.fromText('http://localhost/foo/?abc=def')
395 u2 = URL.from_text('bar')
396 u3 = u.click(u2)
397 self.assertEqual(u3.to_text(), 'http://localhost/foo/bar')
398
399 def test_clickRFC3986(self):
400 """
401 L{URL.click} should correctly resolve the examples in RFC 3986.
402 """
403 base = URL.from_text(relativeLinkBaseForRFC3986)
404 for (ref, expected) in relativeLinkTestsForRFC3986:
405 self.assertEqual(base.click(ref).to_text(), expected)
406
407 def test_clickSchemeRelPath(self):
408 """
409 L{URL.click} should not accept schemes with relative paths.
410 """
411 base = URL.from_text(relativeLinkBaseForRFC3986)
412 self.assertRaises(NotImplementedError, base.click, 'g:h')
413 self.assertRaises(NotImplementedError, base.click, 'http:h')
414
415 def test_cloneUnchanged(self):
416 """
417 Verify that L{URL.replace} doesn't change any of the arguments it
418 is passed.
419 """
420 urlpath = URL.from_text('https://x:1/y?z=1#A')
421 self.assertEqual(urlpath.replace(urlpath.scheme,
422 urlpath.host,
423 urlpath.path,
424 urlpath.query,
425 urlpath.fragment,
426 urlpath.port),
427 urlpath)
428 self.assertEqual(urlpath.replace(), urlpath)
429
430 def test_clickCollapse(self):
431 """
432 L{URL.click} collapses C{.} and C{..} according to RFC 3986 section
433 5.2.4.
434 """
435 tests = [
436 ['http://localhost/', '.', 'http://localhost/'],
437 ['http://localhost/', '..', 'http://localhost/'],
438 ['http://localhost/a/b/c', '.', 'http://localhost/a/b/'],
439 ['http://localhost/a/b/c', '..', 'http://localhost/a/'],
440 ['http://localhost/a/b/c', './d/e', 'http://localhost/a/b/d/e'],
441 ['http://localhost/a/b/c', '../d/e', 'http://localhost/a/d/e'],
442 ['http://localhost/a/b/c', '/./d/e', 'http://localhost/d/e'],
443 ['http://localhost/a/b/c', '/../d/e', 'http://localhost/d/e'],
444 ['http://localhost/a/b/c/', '../../d/e/',
445 'http://localhost/a/d/e/'],
446 ['http://localhost/a/./c', '../d/e', 'http://localhost/d/e'],
447 ['http://localhost/a/./c/', '../d/e', 'http://localhost/a/d/e'],
448 ['http://localhost/a/b/c/d', './e/../f/../g',
449 'http://localhost/a/b/c/g'],
450 ['http://localhost/a/b/c', 'd//e', 'http://localhost/a/b/d//e'],
451 ]
452 for start, click, expected in tests:
453 actual = URL.from_text(start).click(click).to_text()
454 self.assertEqual(
455 actual,
456 expected,
457 "{start}.click({click}) => {actual} not {expected}".format(
458 start=start,
459 click=repr(click),
460 actual=actual,
461 expected=expected,
462 )
463 )
464
465 def test_queryAdd(self):
466 """
467 L{URL.add} adds query parameters.
468 """
469 self.assertEqual(
470 "http://www.foo.com/a/nice/path/?foo=bar",
471 URL.from_text("http://www.foo.com/a/nice/path/")
472 .add(u"foo", u"bar").to_text())
473 self.assertEqual(
474 "http://www.foo.com/?foo=bar",
475 URL(host=u"www.foo.com").add(u"foo", u"bar")
476 .to_text())
477 urlpath = URL.from_text(BASIC_URL)
478 self.assertEqual(
479 "http://www.foo.com/a/nice/path/?zot=23&zut&burp",
480 urlpath.add(u"burp").to_text())
481 self.assertEqual(
482 "http://www.foo.com/a/nice/path/?zot=23&zut&burp=xxx",
483 urlpath.add(u"burp", u"xxx").to_text())
484 self.assertEqual(
485 "http://www.foo.com/a/nice/path/?zot=23&zut&burp=xxx&zing",
486 urlpath.add(u"burp", u"xxx").add(u"zing").to_text())
487 # Note the inversion!
488 self.assertEqual(
489 "http://www.foo.com/a/nice/path/?zot=23&zut&zing&burp=xxx",
490 urlpath.add(u"zing").add(u"burp", u"xxx").to_text())
491 # Note the two values for the same name.
492 self.assertEqual(
493 "http://www.foo.com/a/nice/path/?zot=23&zut&burp=xxx&zot=32",
494 urlpath.add(u"burp", u"xxx").add(u"zot", '32')
495 .to_text())
496
497 def test_querySet(self):
498 """
499 L{URL.set} replaces query parameters by name.
500 """
501 urlpath = URL.from_text(BASIC_URL)
502 self.assertEqual(
503 "http://www.foo.com/a/nice/path/?zot=32&zut",
504 urlpath.set(u"zot", '32').to_text())
505 # Replace name without value with name/value and vice-versa.
506 self.assertEqual(
507 "http://www.foo.com/a/nice/path/?zot&zut=itworked",
508 urlpath.set(u"zot").set(u"zut", u"itworked").to_text()
509 )
510 # Q: what happens when the query has two values and we replace?
511 # A: we replace both values with a single one
512 self.assertEqual(
513 "http://www.foo.com/a/nice/path/?zot=32&zut",
514 urlpath.add(u"zot", u"xxx").set(u"zot", '32').to_text()
515 )
516
517 def test_queryRemove(self):
518 """
519 L{URL.remove} removes all instances of a query parameter.
520 """
521 url = URL.from_text(u"https://example.com/a/b/?foo=1&bar=2&foo=3")
522 self.assertEqual(
523 url.remove(u"foo"),
524 URL.from_text(u"https://example.com/a/b/?bar=2")
525 )
526
527 def test_parseEqualSignInParamValue(self):
528 """
529 Every C{=}-sign after the first in a query parameter is simply included
530 in the value of the parameter.
531 """
532 u = URL.from_text('http://localhost/?=x=x=x')
533 self.assertEqual(u.get(''), ['x=x=x'])
534 self.assertEqual(u.to_text(), 'http://localhost/?=x%3Dx%3Dx')
535 u = URL.from_text('http://localhost/?foo=x=x=x&bar=y')
536 self.assertEqual(u.query, (('foo', 'x=x=x'), ('bar', 'y')))
537 self.assertEqual(u.to_text(), 'http://localhost/?foo=x%3Dx%3Dx&bar=y')
538
539 def test_empty(self):
540 """
541 An empty L{URL} should serialize as the empty string.
542 """
543 self.assertEqual(URL().to_text(), '')
544
545 def test_justQueryText(self):
546 """
547 An L{URL} with query text should serialize as just query text.
548 """
549 u = URL(query=[(u"hello", u"world")])
550 self.assertEqual(u.to_text(), '?hello=world')
551
552 def test_identicalEqual(self):
553 """
554 L{URL} compares equal to itself.
555 """
556 u = URL.from_text('http://localhost/')
557 self.assertEqual(u, u)
558
559 def test_similarEqual(self):
560 """
561 URLs with equivalent components should compare equal.
562 """
563 u1 = URL.from_text('http://u@localhost:8080/p/a/t/h?q=p#f')
564 u2 = URL.from_text('http://u@localhost:8080/p/a/t/h?q=p#f')
565 self.assertEqual(u1, u2)
566
567 def test_differentNotEqual(self):
568 """
569 L{URL}s that refer to different resources are both unequal (C{!=}) and
570 also not equal (not C{==}).
571 """
572 u1 = URL.from_text('http://localhost/a')
573 u2 = URL.from_text('http://localhost/b')
574 self.assertFalse(u1 == u2, "%r != %r" % (u1, u2))
575 self.assertNotEqual(u1, u2)
576
577 def test_otherTypesNotEqual(self):
578 """
579 L{URL} is not equal (C{==}) to other types.
580 """
581 u = URL.from_text('http://localhost/')
582 self.assertFalse(u == 42, "URL must not equal a number.")
583 self.assertFalse(u == object(), "URL must not equal an object.")
584 self.assertNotEqual(u, 42)
585 self.assertNotEqual(u, object())
586
587 def test_identicalNotUnequal(self):
588 """
589 Identical L{URL}s are not unequal (C{!=}) to each other.
590 """
591 u = URL.from_text('http://u@localhost:8080/p/a/t/h?q=p#f')
592 self.assertFalse(u != u, "%r == itself" % u)
593
594 def test_similarNotUnequal(self):
595 """
596 Structurally similar L{URL}s are not unequal (C{!=}) to each other.
597 """
598 u1 = URL.from_text('http://u@localhost:8080/p/a/t/h?q=p#f')
599 u2 = URL.from_text('http://u@localhost:8080/p/a/t/h?q=p#f')
600 self.assertFalse(u1 != u2, "%r == %r" % (u1, u2))
601
602 def test_differentUnequal(self):
603 """
604 Structurally different L{URL}s are unequal (C{!=}) to each other.
605 """
606 u1 = URL.from_text('http://localhost/a')
607 u2 = URL.from_text('http://localhost/b')
608 self.assertTrue(u1 != u2, "%r == %r" % (u1, u2))
609
610 def test_otherTypesUnequal(self):
611 """
612 L{URL} is unequal (C{!=}) to other types.
613 """
614 u = URL.from_text('http://localhost/')
615 self.assertTrue(u != 42, "URL must differ from a number.")
616 self.assertTrue(u != object(), "URL must be differ from an object.")
617
618 def test_asURI(self):
619 """
620 L{URL.asURI} produces an URI which converts any URI unicode encoding
621 into pure US-ASCII and returns a new L{URL}.
622 """
623 unicodey = ('http://\N{LATIN SMALL LETTER E WITH ACUTE}.com/'
624 '\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}'
625 '?\N{LATIN SMALL LETTER A}\N{COMBINING ACUTE ACCENT}='
626 '\N{LATIN SMALL LETTER I}\N{COMBINING ACUTE ACCENT}'
627 '#\N{LATIN SMALL LETTER U}\N{COMBINING ACUTE ACCENT}')
628 iri = URL.from_text(unicodey)
629 uri = iri.asURI()
630 self.assertEqual(iri.host, '\N{LATIN SMALL LETTER E WITH ACUTE}.com')
631 self.assertEqual(iri.path[0],
632 '\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}')
633 self.assertEqual(iri.to_text(), unicodey)
634 expectedURI = 'http://xn--9ca.com/%C3%A9?%C3%A1=%C3%AD#%C3%BA'
635 actualURI = uri.to_text()
636 self.assertEqual(actualURI, expectedURI,
637 '%r != %r' % (actualURI, expectedURI))
638
639 def test_asIRI(self):
640 """
641 L{URL.asIRI} decodes any percent-encoded text in the URI, making it
642 more suitable for reading by humans, and returns a new L{URL}.
643 """
644 asciiish = 'http://xn--9ca.com/%C3%A9?%C3%A1=%C3%AD#%C3%BA'
645 uri = URL.from_text(asciiish)
646 iri = uri.asIRI()
647 self.assertEqual(uri.host, 'xn--9ca.com')
648 self.assertEqual(uri.path[0], '%C3%A9')
649 self.assertEqual(uri.to_text(), asciiish)
650 expectedIRI = ('http://\N{LATIN SMALL LETTER E WITH ACUTE}.com/'
651 '\N{LATIN SMALL LETTER E WITH ACUTE}'
652 '?\N{LATIN SMALL LETTER A WITH ACUTE}='
653 '\N{LATIN SMALL LETTER I WITH ACUTE}'
654 '#\N{LATIN SMALL LETTER U WITH ACUTE}')
655 actualIRI = iri.to_text()
656 self.assertEqual(actualIRI, expectedIRI,
657 '%r != %r' % (actualIRI, expectedIRI))
658
659 def test_badUTF8AsIRI(self):
660 """
661 Bad UTF-8 in a path segment, query parameter, or fragment results in
662 that portion of the URI remaining percent-encoded in the IRI.
663 """
664 urlWithBinary = 'http://xn--9ca.com/%00%FF/%C3%A9'
665 uri = URL.from_text(urlWithBinary)
666 iri = uri.asIRI()
667 expectedIRI = ('http://\N{LATIN SMALL LETTER E WITH ACUTE}.com/'
668 '%00%FF/'
669 '\N{LATIN SMALL LETTER E WITH ACUTE}')
670 actualIRI = iri.to_text()
671 self.assertEqual(actualIRI, expectedIRI,
672 '%r != %r' % (actualIRI, expectedIRI))
673
674 def test_alreadyIRIAsIRI(self):
675 """
676 A L{URL} composed of non-ASCII text will result in non-ASCII text.
677 """
678 unicodey = ('http://\N{LATIN SMALL LETTER E WITH ACUTE}.com/'
679 '\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}'
680 '?\N{LATIN SMALL LETTER A}\N{COMBINING ACUTE ACCENT}='
681 '\N{LATIN SMALL LETTER I}\N{COMBINING ACUTE ACCENT}'
682 '#\N{LATIN SMALL LETTER U}\N{COMBINING ACUTE ACCENT}')
683 iri = URL.from_text(unicodey)
684 alsoIRI = iri.asIRI()
685 self.assertEqual(alsoIRI.to_text(), unicodey)
686
687 def test_alreadyURIAsURI(self):
688 """
689 A L{URL} composed of encoded text will remain encoded.
690 """
691 expectedURI = 'http://xn--9ca.com/%C3%A9?%C3%A1=%C3%AD#%C3%BA'
692 uri = URL.from_text(expectedURI)
693 actualURI = uri.asURI().to_text()
694 self.assertEqual(actualURI, expectedURI)
695
696 def test_userinfo(self):
697 """
698 L{URL.from_text} will parse the C{userinfo} portion of the URI
699 separately from the host and port.
700 """
701 url = URL.from_text(
702 'http://someuser:somepassword@example.com/some-segment@ignore'
703 )
704 self.assertEqual(url.authority(True),
705 'someuser:somepassword@example.com')
706 self.assertEqual(url.authority(False), 'someuser:@example.com')
707 self.assertEqual(url.userinfo, 'someuser:somepassword')
708 self.assertEqual(url.user, 'someuser')
709 self.assertEqual(url.to_text(),
710 'http://someuser:@example.com/some-segment@ignore')
711 self.assertEqual(
712 url.replace(userinfo=u"someuser").to_text(),
713 'http://someuser@example.com/some-segment@ignore'
714 )
715
716 def test_portText(self):
717 """
718 L{URL.from_text} parses custom port numbers as integers.
719 """
720 portURL = URL.from_text(u"http://www.example.com:8080/")
721 self.assertEqual(portURL.port, 8080)
722 self.assertEqual(portURL.to_text(), u"http://www.example.com:8080/")
723
724 def test_mailto(self):
725 """
726 Although L{URL} instances are mainly for dealing with HTTP, other
727 schemes (such as C{mailto:}) should work as well. For example,
728 L{URL.from_text}/L{URL.to_text} round-trips cleanly for a C{mailto:} URL
729 representing an email address.
730 """
731 self.assertEqual(URL.from_text(u"mailto:user@example.com").to_text(),
732 u"mailto:user@example.com")
733
734 def test_queryIterable(self):
735 """
736 When a L{URL} is created with a C{query} argument, the C{query}
737 argument is converted into an N-tuple of 2-tuples.
738 """
739 url = URL(query=[['alpha', 'beta']])
740 self.assertEqual(url.query, (('alpha', 'beta'),))
741
742 def test_pathIterable(self):
743 """
744 When a L{URL} is created with a C{path} argument, the C{path} is
745 converted into a tuple.
746 """
747 url = URL(path=['hello', 'world'])
748 self.assertEqual(url.path, ('hello', 'world'))
749
750 def test_invalidArguments(self):
751 """
752 Passing an argument of the wrong type to any of the constructor
753 arguments of L{URL} will raise a descriptive L{TypeError}.
754
755 L{URL} typechecks very aggressively to ensure that its constitutent
756 parts are all properly immutable and to prevent confusing errors when
757 bad data crops up in a method call long after the code that called the
758 constructor is off the stack.
759 """
760 class Unexpected(object):
761 def __str__(self):
762 return "wrong"
763
764 def __repr__(self):
765 return "<unexpected>"
766
767 defaultExpectation = "unicode" if bytes is str else "str"
768
769 def assertRaised(raised, expectation, name):
770 self.assertEqual(str(raised.exception),
771 "expected {0} for {1}, got {2}".format(
772 expectation,
773 name, "<unexpected>"))
774
775 def check(param, expectation=defaultExpectation):
776 with self.assertRaises(TypeError) as raised:
777 URL(**{param: Unexpected()})
778
779 assertRaised(raised, expectation, param)
780
781 check("scheme")
782 check("host")
783 check("fragment")
784 check("rooted", "bool")
785 check("userinfo")
786 check("port", "int or NoneType")
787
788 with self.assertRaises(TypeError) as raised:
789 URL(path=[Unexpected()])
790
791 assertRaised(raised, defaultExpectation, "path segment")
792
793 with self.assertRaises(TypeError) as raised:
794 URL(query=[(u"name", Unexpected())])
795
796 assertRaised(raised, defaultExpectation + " or NoneType",
797 "query parameter value")
798
799 with self.assertRaises(TypeError) as raised:
800 URL(query=[(Unexpected(), u"value")])
801
802 assertRaised(raised, defaultExpectation, "query parameter name")
803 # No custom error message for this one, just want to make sure
804 # non-2-tuples don't get through.
805
806 with self.assertRaises(TypeError):
807 URL(query=[Unexpected()])
808
809 with self.assertRaises(ValueError):
810 URL(query=[('k', 'v', 'vv')])
811
812 with self.assertRaises(ValueError):
813 URL(query=[('k',)])
814
815 url = URL.from_text("https://valid.example.com/")
816 with self.assertRaises(TypeError) as raised:
817 url.child(Unexpected())
818 assertRaised(raised, defaultExpectation, "path segment")
819 with self.assertRaises(TypeError) as raised:
820 url.sibling(Unexpected())
821 assertRaised(raised, defaultExpectation, "path segment")
822 with self.assertRaises(TypeError) as raised:
823 url.click(Unexpected())
824 assertRaised(raised, defaultExpectation, "relative URL")
825
826 def test_technicallyTextIsIterableBut(self):
827 """
828 Technically, L{str} (or L{unicode}, as appropriate) is iterable, but
829 C{URL(path="foo")} resulting in C{URL.from_text("f/o/o")} is never what
830 you want.
831 """
832 with self.assertRaises(TypeError) as raised:
833 URL(path='foo')
834 self.assertEqual(
835 str(raised.exception),
836 "expected iterable of text for path, not: {0}"
837 .format(repr('foo'))
838 )
839
840 def test_netloc(self):
841 url = URL(scheme='https')
842 self.assertEqual(url.uses_netloc, True)
843
844 url = URL(scheme='git+https')
845 self.assertEqual(url.uses_netloc, True)
846
847 url = URL(scheme='mailto')
848 self.assertEqual(url.uses_netloc, False)
849
850 url = URL(scheme='ztp')
851 self.assertEqual(url.uses_netloc, None)
852
853 url = URL.from_text('ztp://test.com')
854 self.assertEqual(url.uses_netloc, True)
855
856 url = URL.from_text('ztp:test:com')
857 self.assertEqual(url.uses_netloc, False)
858
859 def test_ipv6_with_port(self):
860 t = 'https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:80/'
861 url = URL.from_text(t)
862 assert url.host == '2001:0db8:85a3:0000:0000:8a2e:0370:7334'
863 assert url.port == 80
864 assert SCHEME_PORT_MAP[url.scheme] != url.port
865
866 def test_basic(self):
867 text = 'https://user:pass@example.com/path/to/here?k=v#nice'
868 url = URL.from_text(text)
869 assert url.scheme == 'https'
870 assert url.userinfo == 'user:pass'
871 assert url.host == 'example.com'
872 assert url.path == ('path', 'to', 'here')
873 assert url.fragment == 'nice'
874
875 text = 'https://user:pass@127.0.0.1/path/to/here?k=v#nice'
876 url = URL.from_text(text)
877 assert url.scheme == 'https'
878 assert url.userinfo == 'user:pass'
879 assert url.host == '127.0.0.1'
880 assert url.path == ('path', 'to', 'here')
881
882 text = 'https://user:pass@[::1]/path/to/here?k=v#nice'
883 url = URL.from_text(text)
884 assert url.scheme == 'https'
885 assert url.userinfo == 'user:pass'
886 assert url.host == '::1'
887 assert url.path == ('path', 'to', 'here')
888
889 def test_invalid_url(self):
890 self.assertRaises(URLParseError, URL.from_text, '#\n\n')
891
892 def test_invalid_authority_url(self):
893 self.assertRaises(URLParseError, URL.from_text, 'http://abc:\n\n/#')
894
895 def test_invalid_ipv6(self):
896 invalid_ipv6_ips = ['2001::0234:C1ab::A0:aabc:003F',
897 '2001::1::3F',
898 ':',
899 '::::',
900 '::256.0.0.1']
901 for ip in invalid_ipv6_ips:
902 url_text = 'http://[' + ip + ']'
903 self.assertRaises(socket.error, inet_pton,
904 socket.AF_INET6, ip)
905 self.assertRaises(URLParseError, URL.from_text, url_text)
906
907 def test_invalid_port(self):
908 self.assertRaises(URLParseError, URL.from_text, 'ftp://portmouth:smash')
909 self.assertRaises(ValueError, URL.from_text,
910 'http://reader.googlewebsite.com:neverforget')
911
912 def test_idna(self):
913 u1 = URL.from_text('http://bücher.ch')
914 self.assertEquals(u1.host, 'bücher.ch')
915 self.assertEquals(u1.to_text(), 'http://bücher.ch')
916 self.assertEquals(u1.to_uri().to_text(), 'http://xn--bcher-kva.ch')
917
918 u2 = URL.from_text('https://xn--bcher-kva.ch')
919 self.assertEquals(u2.host, 'xn--bcher-kva.ch')
920 self.assertEquals(u2.to_text(), 'https://xn--bcher-kva.ch')
921 self.assertEquals(u2.to_iri().to_text(), u'https://bücher.ch')
922
923 def test_netloc_slashes(self):
924 # basic sanity checks
925 url = URL.from_text('mailto:mahmoud@hatnote.com')
926 self.assertEquals(url.scheme, 'mailto')
927 self.assertEquals(url.to_text(), 'mailto:mahmoud@hatnote.com')
928
929 url = URL.from_text('http://hatnote.com')
930 self.assertEquals(url.scheme, 'http')
931 self.assertEquals(url.to_text(), 'http://hatnote.com')
932
933 # test that unrecognized schemes stay consistent with '//'
934 url = URL.from_text('newscheme:a:b:c')
935 self.assertEquals(url.scheme, 'newscheme')
936 self.assertEquals(url.to_text(), 'newscheme:a:b:c')
937
938 url = URL.from_text('newerscheme://a/b/c')
939 self.assertEquals(url.scheme, 'newerscheme')
940 self.assertEquals(url.to_text(), 'newerscheme://a/b/c')
941
942 # test that reasonable guesses are made
943 url = URL.from_text('git+ftp://gitstub.biz/glyph/lefkowitz')
944 self.assertEquals(url.scheme, 'git+ftp')
945 self.assertEquals(url.to_text(),
946 'git+ftp://gitstub.biz/glyph/lefkowitz')
947
948 url = URL.from_text('what+mailto:freerealestate@enotuniq.org')
949 self.assertEquals(url.scheme, 'what+mailto')
950 self.assertEquals(url.to_text(),
951 'what+mailto:freerealestate@enotuniq.org')
952
953 url = URL(scheme='ztp', path=('x', 'y', 'z'), rooted=True)
954 self.assertEquals(url.to_text(), 'ztp:/x/y/z')
955
956 # also works when the input doesn't include '//'
957 url = URL(scheme='git+ftp', path=('x', 'y', 'z' ,''),
958 rooted=True, uses_netloc=True)
959 # broken bc urlunsplit
960 self.assertEquals(url.to_text(), 'git+ftp:///x/y/z/')
961
962 # really why would this ever come up but ok
963 url = URL.from_text('file:///path/to/heck')
964 url2 = url.replace(scheme='mailto')
965 self.assertEquals(url2.to_text(), 'mailto:/path/to/heck')
966
967 url_text = 'unregisteredscheme:///a/b/c'
968 url = URL.from_text(url_text)
969 no_netloc_url = url.replace(uses_netloc=False)
970 self.assertEquals(no_netloc_url.to_text(), 'unregisteredscheme:/a/b/c')
971 netloc_url = url.replace(uses_netloc=True)
972 self.assertEquals(netloc_url.to_text(), url_text)
973
974 return
975
976 def test_wrong_constructor(self):
977 with self.assertRaises(ValueError):
978 # whole URL not allowed
979 URL(BASIC_URL)
980 with self.assertRaises(ValueError):
981 # explicitly bad scheme not allowed
982 URL('HTTP_____more_like_imHoTTeP')
983
984 def test_encoded_userinfo(self):
985 url = URL.from_text('http://user:pass@example.com')
986 assert url.userinfo == 'user:pass'
987 url = url.replace(userinfo='us%20her:pass')
988 iri = url.to_iri()
989 assert iri.to_text(with_password=True) == 'http://us her:pass@example.com'
990 assert iri.to_text(with_password=False) == 'http://us her:@example.com'
991 assert iri.to_uri().to_text(with_password=True) == 'http://us%20her:pass@example.com'
992
993 def test_hash(self):
994 url_map = {}
995 url1 = URL.from_text('http://blog.hatnote.com/ask?utm_source=geocity')
996 assert hash(url1) == hash(url1) # sanity
997
998 url_map[url1] = 1
999
1000 url2 = URL.from_text('http://blog.hatnote.com/ask')
1001 url2 = url2.set('utm_source', 'geocity')
1002
1003 url_map[url2] = 2
1004
1005 assert len(url_map) == 1
1006 assert list(url_map.values()) == [2]
1007
1008 assert hash(URL()) == hash(URL()) # slightly more sanity
1009
1010 def test_dir(self):
1011 url = URL()
1012 res = dir(url)
1013
1014 assert len(res) > 15
1015 # twisted compat
1016 assert 'fromText' not in res
1017 assert 'asText' not in res
1018 assert 'asURI' not in res
1019 assert 'asIRI' not in res
1020
1021 def test_twisted_compat(self):
1022 url = URL.fromText(u'http://example.com/a%20té%C3%A9st')
1023 assert url.asText() == 'http://example.com/a%20té%C3%A9st'
1024 assert url.asURI().asText() == 'http://example.com/a%20t%C3%A9%C3%A9st'
1025 # TODO: assert url.asIRI().asText() == u'http://example.com/a%20téést'
1026
1027 def test_set_ordering(self):
1028 # TODO
1029 url = URL.from_text('http://example.com/?a=b&c')
1030 url = url.set(u'x', u'x')
1031 url = url.add(u'x', u'y')
1032 assert url.to_text() == u'http://example.com/?a=b&x=x&c&x=y'
1033 # Would expect:
1034 # assert url.to_text() == u'http://example.com/?a=b&c&x=x&x=y'
1035
1036 def test_schemeless_path(self):
1037 "See issue #4"
1038 u1 = URL.from_text("urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob")
1039 u2 = URL.from_text(u1.to_text())
1040 assert u1 == u2 # sanity testing roundtripping
1041
1042 u3 = URL.from_text(u1.to_iri().to_text())
1043 assert u1 == u3
1044 assert u2 == u3
1045
1046 # test that colons are ok past the first segment
1047 u4 = URL.from_text("first-segment/urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob")
1048 u5 = u4.to_iri()
1049 assert u5.to_text() == u'first-segment/urn:ietf:wg:oauth:2.0:oob'
1050
1051 u6 = URL.from_text(u5.to_text()).to_uri()
1052 assert u5 == u6 # colons stay decoded bc they're not in the first seg
1053
1054 def test_emoji_domain(self):
1055 "See issue #7, affecting only narrow builds (2.6-3.3)"
1056 url = URL.from_text('https://xn--vi8hiv.ws')
1057 iri = url.to_iri()
1058 iri.to_text()
1059 # as long as we don't get ValueErrors, we're good
1060
1061 def test_delim_in_param(self):
1062 "Per issue #6 and #8"
1063 self.assertRaises(ValueError, URL, scheme=u'http', host=u'a/c')
1064 self.assertRaises(ValueError, URL, path=(u"?",))
1065 self.assertRaises(ValueError, URL, path=(u"#",))
1066 self.assertRaises(ValueError, URL, query=((u"&", "test")))
1067
1068 def test_empty_paths_eq(self):
1069 u1 = URL.from_text('http://example.com/')
1070 u2 = URL.from_text('http://example.com')
1071
1072 assert u1 == u2
1073
1074 u1 = URL.from_text('http://example.com')
1075 u2 = URL.from_text('http://example.com')
1076
1077 assert u1 == u2
1078
1079 u1 = URL.from_text('http://example.com')
1080 u2 = URL.from_text('http://example.com/')
1081
1082 assert u1 == u2
1083
1084 u1 = URL.from_text('http://example.com/')
1085 u2 = URL.from_text('http://example.com/')
1086
1087 assert u1 == u2
1088
1089 def test_from_text_type(self):
1090 assert URL.from_text(u'#ok').fragment == u'ok' # sanity
1091 self.assertRaises(TypeError, URL.from_text, b'bytes://x.y.z')
1092 self.assertRaises(TypeError, URL.from_text, object())
1093
1094 def test_from_text_bad_authority(self):
1095 # bad ipv6 brackets
1096 self.assertRaises(URLParseError, URL.from_text, 'http://[::1/')
1097 self.assertRaises(URLParseError, URL.from_text, 'http://::1]/')
1098 self.assertRaises(URLParseError, URL.from_text, 'http://[[::1]/')
1099 self.assertRaises(URLParseError, URL.from_text, 'http://[::1]]/')
1100
1101 # empty port
1102 self.assertRaises(URLParseError, URL.from_text, 'http://127.0.0.1:')
1103 # non-integer port
1104 self.assertRaises(URLParseError, URL.from_text, 'http://127.0.0.1:hi')
1105 # extra port colon (makes for an invalid host)
1106 self.assertRaises(URLParseError, URL.from_text, 'http://127.0.0.1::80')
1107
1108 def test_normalize(self):
1109 url = URL.from_text('HTTP://Example.com/A%61/./../A%61?B%62=C%63#D%64')
1110 assert url.get('Bb') == []
1111 assert url.get('B%62') == ['C%63']
1112 assert len(url.path) == 4
1113
1114 # test that most expected normalizations happen
1115 norm_url = url.normalize()
1116
1117 assert norm_url.scheme == 'http'
1118 assert norm_url.host == 'example.com'
1119 assert norm_url.path == ('Aa',)
1120 assert norm_url.get('Bb') == ['Cc']
1121 assert norm_url.fragment == 'Dd'
1122 assert norm_url.to_text() == 'http://example.com/Aa?Bb=Cc#Dd'
1123
1124 # test that flags work
1125 noop_norm_url = url.normalize(scheme=False, host=False,
1126 path=False, query=False, fragment=False)
1127 assert noop_norm_url == url
1128
1129 # test that empty paths get at least one slash
1130 slashless_url = URL.from_text('http://example.io')
1131 slashful_url = slashless_url.normalize()
1132 assert slashful_url.to_text() == 'http://example.io/'
1133
1134 # test case normalization for percent encoding
1135 delimited_url = URL.from_text('/a%2fb/cd%3f?k%3d=v%23#test')
1136 norm_delimited_url = delimited_url.normalize()
1137 assert norm_delimited_url.to_text() == '/a%2Fb/cd%3F?k%3D=v%23#test'
1138
1139 # test invalid percent encoding during normalize
1140 assert URL(path=('', '%te%sts')).normalize().to_text() == '/%te%sts'
0 Metadata-Version: 1.1
1 Name: hyperlink
2 Version: 17.3.1
3 Summary: A featureful, correct URL for Python.
4 Home-page: https://github.com/python-hyper/hyperlink
5 Author: Mahmoud Hashemi and Glyph Lefkowitz
6 Author-email: mahmoud@hatnote.com
7 License: MIT
8 Description: The humble, but powerful, URL runs everything around us. Chances
9 are you've used several just to read this text.
10
11 Hyperlink is a featureful, pure-Python implementation of the URL, with
12 an emphasis on correctness. BSD licensed.
13
14 See the docs at http://hyperlink.readthedocs.io.
15
16 Platform: any
17 Classifier: Topic :: Utilities
18 Classifier: Intended Audience :: Developers
19 Classifier: Topic :: Software Development :: Libraries
20 Classifier: Development Status :: 5 - Production/Stable
21 Classifier: Programming Language :: Python :: 2.6
22 Classifier: Programming Language :: Python :: 2.7
23 Classifier: Programming Language :: Python :: 3.4
24 Classifier: Programming Language :: Python :: 3.5
25 Classifier: Programming Language :: Python :: 3.6
26 Classifier: Programming Language :: Python :: Implementation :: PyPy
0 .tox-coveragerc
1 CHANGELOG.md
2 LICENSE
3 MANIFEST.in
4 README.md
5 pytest.ini
6 requirements-test.txt
7 setup.cfg
8 setup.py
9 tox.ini
10 docs/Makefile
11 docs/api.rst
12 docs/conf.py
13 docs/design.rst
14 docs/faq.rst
15 docs/hyperlink_logo_proto.png
16 docs/hyperlink_logo_v1.png
17 docs/index.rst
18 docs/make.bat
19 docs/_templates/page.html
20 hyperlink/__init__.py
21 hyperlink/_url.py
22 hyperlink.egg-info/PKG-INFO
23 hyperlink.egg-info/SOURCES.txt
24 hyperlink.egg-info/dependency_links.txt
25 hyperlink.egg-info/not-zip-safe
26 hyperlink.egg-info/top_level.txt
27 hyperlink/test/__init__.py
28 hyperlink/test/common.py
29 hyperlink/test/test_common.py
30 hyperlink/test/test_scheme_registration.py
31 hyperlink/test/test_url.py
0 [pytest]
1 doctest_optionflags = ALLOW_UNICODE
0 pytest==2.9.2
1 pytest-cov==2.3.0
2 tox==2.6.0
0 [wheel]
1 universal = 1
2
3 [egg_info]
4 tag_build =
5 tag_date = 0
6
0 """The humble, but powerful, URL runs everything around us. Chances
1 are you've used several just to read this text.
2
3 Hyperlink is a featureful, pure-Python implementation of the URL, with
4 an emphasis on correctness. BSD licensed.
5
6 See the docs at http://hyperlink.readthedocs.io.
7 """
8
9 from setuptools import setup
10
11
12 __author__ = 'Mahmoud Hashemi and Glyph Lefkowitz'
13 __version__ = '17.3.1'
14 __contact__ = 'mahmoud@hatnote.com'
15 __url__ = 'https://github.com/python-hyper/hyperlink'
16 __license__ = 'MIT'
17
18
19 setup(name='hyperlink',
20 version=__version__,
21 description="A featureful, correct URL for Python.",
22 long_description=__doc__,
23 author=__author__,
24 author_email=__contact__,
25 url=__url__,
26 packages=['hyperlink', 'hyperlink.test'],
27 include_package_data=True,
28 zip_safe=False,
29 license=__license__,
30 platforms='any',
31 classifiers=[
32 'Topic :: Utilities',
33 'Intended Audience :: Developers',
34 'Topic :: Software Development :: Libraries',
35 'Development Status :: 5 - Production/Stable',
36 'Programming Language :: Python :: 2.6',
37 'Programming Language :: Python :: 2.7',
38 'Programming Language :: Python :: 3.4',
39 'Programming Language :: Python :: 3.5',
40 'Programming Language :: Python :: 3.6',
41 'Programming Language :: Python :: Implementation :: PyPy', ]
42 )
43
44 """
45 A brief checklist for release:
46
47 * tox
48 * git commit (if applicable)
49 * Bump setup.py version off of -dev
50 * git commit -a -m "bump version for x.y.z release"
51 * python setup.py sdist bdist_wheel upload
52 * bump docs/conf.py version
53 * git commit
54 * git tag -a vx.y.z -m "brief summary"
55 * write CHANGELOG
56 * git commit
57 * bump setup.py version onto n+1 dev
58 * git commit
59 * git push
60
61 """
0 [tox]
1 envlist = py26,py27,py34,py35,py36,pypy,coverage-report,packaging
2
3 [testenv]
4 changedir = .tox
5 deps = -rrequirements-test.txt
6 commands = coverage run --parallel --rcfile {toxinidir}/.tox-coveragerc -m pytest --doctest-modules {envsitepackagesdir}/hyperlink {posargs}
7
8 # Uses default basepython otherwise reporting doesn't work on Travis where
9 # Python 3.6 is only available in 3.6 jobs.
10 [testenv:coverage-report]
11 changedir = .tox
12 deps = coverage
13 commands = coverage combine --rcfile {toxinidir}/.tox-coveragerc
14 coverage report --rcfile {toxinidir}/.tox-coveragerc
15 coverage html --rcfile {toxinidir}/.tox-coveragerc -d {toxinidir}/htmlcov
16
17
18 [testenv:packaging]
19 changedir = {toxinidir}
20 deps =
21 check-manifest==0.35
22 readme_renderer==17.2
23 commands =
24 check-manifest
25 python setup.py check --metadata --restructuredtext --strict