Commit c90e5fc5645f18b939a97b739c7c4f53b8f1d0ee - natsort

Update upstream source from tag 'upstream/6.0.0' Update to upstream version '6.0.0' with Debian dir 7a2e9f60755e7b601aad22f648a4bc5534ed5ac7 Agustin Henze 5 years ago

95 changed file(s) with 5237 addition(s) and 5195 deletion(s). Raw diff Collapse all Expand all

-1

.coveragerc less more

13	13	if __name__ == .__main__.:
14	14
15	15	ignore_errors = True
16

+23

-0

.github/ISSUE_TEMPLATE/bug_report.md less more

	0	---
	1	name: Bug report
	2	about: Report unexpected behavior, a crash, or incorrect results
	3
	4	---
	5
	6	Describe the bug
	7	A clear and concise description of what the bug is.
	8
	9	Expected behavior
	10	A clear and concise description of what you expected to happen.
	11
	12	Environment (please complete the following information):
	13	- Python Version: [e.g. 3.6]
	14	- OS [e.g. Windows, Fedora]
	15	- If the bug involves `LOCALE` or `humansorted`:
	16	- Is `PyICU` installed?
	17	- Do you have a locale set? If so, to what?
	18
	19	To Reproduce
	20	Include a Minimum, Complete, Verifiable Example. If there is a traceback (or error message), please include the entire traceback (or error message), even if you think it is too big.
	21
	22	See https://stackoverflow.com/help/mcve for an explanation.

+14

-0

.github/ISSUE_TEMPLATE/feature_request.md less more

	0	---
	1	name: Feature request
	2	about: Suggest or request an enhancement
	3
	4	---
	5
	6	Describe the feature or enhancement
	7	Be as descriptive and precise as possible.
	8
	9	Provide a concrete example of how the feature or enhancement will improve `natsort`
	10	Code examples are an excellent way to show how this feature or enhancement will help. To make your case stronger, show the current workaround due to the lack of the feature. What is the return-on-investment for including the feature or enhancement?
	11
	12	Would you be willing to submit a Pull Request for this feature?
	13	Extra help is always welcome.

-0

.github/ISSUE_TEMPLATE/question.md less more

	0	---
	1	name: Question
	2	about: Inquiry about natsort
	3
	4	---
	5
	6	- [ ] I have read the [`natsort` documentation](https://natsort.readthedocs.io/en/master/) and the [README](https://github.com/SethMMorton/natsort#natsort), and my question is still not answered

-18

.travis.yml less more

	0	dist: xenial
	1	sudo: false
0	2	language: python
	3	cache: pip
1	4
2	5	jobs:
3	6	include:
4	7	- python: "2.7"
5		dist: trusty
6		sudo: false
7	8	env: WITH_EXTRAS=""
8	9	- python: "2.7"
9		dist: trusty
10		sudo: false
11	10	env: WITH_EXTRAS="fast,icu"
12	11	addons:
13	12	apt:

16	15	- language-pack-de
17	16	- language-pack-en
18	17	- python: "3.4"
19		dist: trusty
20		sudo: false
21	18	env: WITH_EXTRAS=""
22	19	- python: "3.5"
23		dist: trusty
24		sudo: false
25	20	env: WITH_EXTRAS=""
26	21	- python: "3.6"
27		dist: trusty
28		sudo: false
29	22	env: WITH_EXTRAS=""
30	23	- python: "3.6"
31		dist: trusty
32		sudo: false
33	24	env: WITH_EXTRAS="fast,icu"
34	25	addons:
35	26	apt:

38	29	- language-pack-de
39	30	- language-pack-en
40	31	- python: "3.7"
41		dist: xenial
42		sudo: true
43	32	env: WITH_EXTRAS=""
44	33	- stage: code-quality
45	34	python: "3.6"
46		dist: trusty
47		sudo: false
48		install: pip install flake8 flake8-import-order flake8-bugbear pep8-naming
49		script: flake8
	35	install: pip install flake8 flake8-import-order flake8-bugbear pep8-naming twine check-manifest
	36	script:
	37	- flake8
	38	- check-manifest --ignore ".github,.md,.coveragerc"
	39	- python setup.py sdist
	40	- twine check dist/*
50	41
51	42	install:
52	43	- pip install -U pip

+382

-0

CHANGELOG.rst less more

	0	02-04-2019 v. 6.0.0
	1	+++++++++++++++++++
	2
	3	- Drop support for Python 2.6 and 3.3 (thanks @jdufresne) (issue #70)
	4	- Remove deprecated APIs (kwargs number_type, signed, exp, as_path, py3_safe; enums ns.TYPESAFE, ns.DIGIT, ns.VERSION; functions versorted, index_versorted) (issue #81)
	5	- Remove pipenv as a dependency for building (issue #86)
	6	- Simply Travis-CI configuration (thanks @jdufresne) (issue #88)
	7	- Fix README rendering in PyPI (thanks @altendky) (issue #89)
	8
	9	11-18-2018 v. 5.5.0
	10	+++++++++++++++++++
	11
	12	- Formally deprecated old or misleading APIs (issue #83)
	13	- Documentation, packaging, and CI cleanup (thanks @jdufresne) (issues #69, #71-#80)
	14	- Consolidate API documentation into a single page (issue #82)
	15	- Add a CHANGELOG.rst to the top-level of the repository (issue #85)
	16	- Add back support for very old versions of setuptools (issue #84)
	17
	18	09-09-2018 v. 5.4.1
	19	+++++++++++++++++++
	20
	21	- Fix error in a newly added test (issues #65, #67)
	22	- Changed code format and quality checking infrastructure (issue #68)
	23
	24	09-06-2018 v. 5.4.0
	25	+++++++++++++++++++
	26
	27	- Re-expose ``natsort_key`` as "public" and remove the
	28	associated ``DepricationWarning``
	29	- Add better developer documentation
	30	- Refactor tests (issue #66)
	31	- Bump allowed ``fastnumbers`` version
	32
	33	07-07-2018 v. 5.3.3
	34	+++++++++++++++++++
	35
	36	- Update docs with a FAQ and quick how-it-works (issue #60)
	37	- Fix a StopIteration error in the testing code
	38	- Enable Python 3.7 support in Travis-CI (issue #61)
	39
	40	05-17-2018 v. 5.3.2
	41	+++++++++++++++++++
	42
	43	- Fix bug that prevented install on old versions of setuptools (issues #55, #56)
	44	- Revert layout from src/natsort/ back to natsort/ to make user
	45	testing simpler (issues #57, #58)
	46
	47	05-14-2018 v. 5.3.1
	48	+++++++++++++++++++
	49
	50	- No bugfixes or features, just infrastructure and installation updates
	51	- Move to defining dependencies with Pipfile
	52	- Development layout is now src/natsort/ instead of natsort/
	53	- Add bumpversion infrastructure
	54	- Extras can be installed by "[]" notation
	55
	56	04-20-2018 v. 5.3.0
	57	+++++++++++++++++++
	58
	59	- Fix bug in assessing ``fastnumbers`` version at import-time (thanks @hholzgra) (issues #51, #53)
	60	- Add ability to consider unicode-decimal numbers as numbers (issues #52, #54)
	61
	62	02-14-2018 v. 5.2.0
	63	+++++++++++++++++++
	64
	65	- Add ``ns.NUMAFTER`` to cause numbers to be placed after non-numbers (issues #48, #49)
	66	- Add ``natcmp`` function (Python 2 only) (thanks @rinslow) (issue #47)
	67
	68	11-11-2017 v. 5.1.1
	69	+++++++++++++++++++
	70
	71	- Added additional unicode number support for Python 3.7
	72	- Added information on how to install and test (issue #46)
	73
	74	08-19-2017 v. 5.1.0
	75	+++++++++++++++++++
	76
	77	- Fixed ``StopIteration`` warning on Python 3.6+ (thanks @lykinsbd) (issues #42, #43)
	78	- All Unicode input is now normalized (issue #44, #45)
	79
	80	04-30-2017 v. 5.0.3
	81	+++++++++++++++++++
	82
	83	- Improved development infrastructure
	84	- Migrated documentation to ReadTheDocs
	85
	86	01-02-2017 v. 5.0.2
	87	+++++++++++++++++++
	88
	89	- Added additional unicode number support for Python 3.6
	90	- Renamed several internal functions and variables to improve clarity
	91	- Improved documentation examples
	92	- Added a "how does it work?" section to the documentation
	93
	94	06-04-2016 v. 5.0.1
	95	+++++++++++++++++++
	96
	97	- The ``ns`` enum attributes can now be imported from the top-level
	98	namespace
	99	- Fixed a bug with the ``from natsort import *`` mechanism
	100	- Fixed bug with using ``natsort`` with ``python -OO`` (issues #38, #39)
	101
	102	05-08-2016 v. 5.0.0
	103	+++++++++++++++++++
	104
	105	- ``ns.LOCALE``/``humansorted`` now accounts for thousands separators (issue #36)
	106	- Refactored entire codebase to be more functional (as in use functions as
	107	units). Previously, the code was rather monolithic and difficult to follow. The
	108	goal is that with the code existing in smaller units, contributing will
	109	be easier (issue #37)
	110	- Deprecated ``ns.TYPESAFE`` option as it is now always on (due to a new
	111	iterator-based algorithm, the typesafe function is now cheap)
	112	- Increased speed of execution (came for free with the new functional approach
	113	because the new factory function paradigm eliminates most ``if`` branches
	114	during execution)
	115
	116	- For the most cases, the code is 30-40% faster than version 4.0.4
	117	- If using ``ns.LOCALE`` or ``humansorted``, the code is 1100% faster than
	118	version 4.0.4
	119
	120	- Improved clarity of documentaion with regards to locale-aware sorting
	121	- Added a new ``chain_functions`` function for convenience in creating
	122	a complex user-given ``key`` from several existing functions
	123
	124	11-01-2015 v. 4.0.4
	125	+++++++++++++++++++
	126
	127	- Improved coverage of unit tests
	128	- Unit tests use new and improved hypothesis library
	129	- Fixed compatibility issues with Python 3.5
	130
	131	06-25-2015 v. 4.0.3
	132	+++++++++++++++++++
	133
	134	- Fixed bad install on last release (sorry guys!) (issue #30)
	135
	136	06-24-2015 v. 4.0.2
	137	+++++++++++++++++++
	138
	139	- Added back Python 2.6 and Python 3.2 compatibility. Unit testing is now
	140	performed for these versions (thanks @dpetzold) (issue #29)
	141	- Consolidated under-the-hood compatibility functionality
	142
	143	06-04-2015 v. 4.0.1
	144	+++++++++++++++++++
	145
	146	- Added support for sorting NaN by internally converting to -Infinity
	147	or +Infinity (issue #27)
	148
	149	05-17-2015 v. 4.0.0
	150	+++++++++++++++++++
	151
	152	- Made default behavior of 'natsort' search for unsigned ints,
	153	rather than signed floats. This is a backwards-incompatible
	154	change but in 99% of use cases it should not require any
	155	end-user changes (issue #20)
	156	- Improved handling of locale-aware sorting on systems where the
	157	underlying locale library is broken (issue #34))
	158	- Greatly improved all unit tests by adding the hypothesis library
	159
	160	04-06-2015 v. 3.5.6
	161	+++++++++++++++++++
	162
	163	- Added 'UNGROUPLETTERS' algorithm to get the case-grouping behavior of
	164	an ordinal sort when using 'LOCALE' (issue #23)
	165	- Added convenience functions 'decoder', 'as_ascii', and 'as_utf8' for
	166	dealing with bytes types
	167
	168	04-04-2015 v. 3.5.5
	169	+++++++++++++++++++
	170
	171	- Added 'realsorted' and 'index_realsorted' functions for
	172	forward-compatibility with >= 4.0.0
	173	- Made explanation of when to use "TYPESAFE" more clear in the docs
	174
	175	04-02-2015 v. 3.5.4
	176	+++++++++++++++++++
	177
	178	- Fixed bug where a 'TypeError' was raised if a string containing a leading
	179	number was sorted with alpha-only strings when 'LOCALE' is used (issue #22)
	180
	181	03-26-2015 v. 3.5.3
	182	+++++++++++++++++++
	183
	184	- Fixed bug where '--reverse-filter' option in shell script was not
	185	getting checked for correctness
	186	- Documentation updates to better describe locale bug, and illustrate
	187	upcoming default behavior change
	188	- Internal improvements, including making test suite more granular
	189
	190	01-13-2015 v. 3.5.2
	191	+++++++++++++++++++
	192
	193	- Enhancement that will convert a 'pathlib.Path' object to a 'str' if
	194	'ns.PATH' is enabled (issue #16)
	195
	196	09-25-2014 v. 3.5.1
	197	+++++++++++++++++++
	198
	199	- Fixed bug that caused list/tuples to fail when using 'ns.LOWECASEFIRST'
	200	or 'ns.IGNORECASE' (issue #15)
	201	- Refactored modules so that only the public API was in natsort.py and
	202	ns_enum.py
	203	- Refactored all import statements to be absolute, not relative
	204
	205
	206	09-02-2014 v. 3.5.0
	207	+++++++++++++++++++
	208
	209	- Added the 'alg' argument to the 'natsort' functions. This argument
	210	accepts an enum that is used to indicate the options the user wishes
	211	to use. The 'number_type', 'signed', 'exp', 'as_path', and 'py3_safe'
	212	options are being deprecated and will become (undocumented)
	213	keyword-only options in natsort version 4.0.0
	214	- The user can now modify how 'natsort' handles the case of non-numeric
	215	characters (issue #14)
	216	- The user can now instruct 'natsort' to use locale-aware sorting, which
	217	allows 'natsort' to perform true "human sorting" (issue #14)
	218
	219	- The `humansorted` convenience function has been included to make this
	220	easier
	221
	222	- Updated shell script with locale functionality
	223
	224	08-12-2014 v. 3.4.1
	225	+++++++++++++++++++
	226
	227	- 'natsort' will now use the 'fastnumbers' module if it is installed. This
	228	gives up to an extra 30% boost in speed over the previous performance
	229	enhancements
	230	- Made documentation point to more 'natsort' resources, and also added a
	231	new example in the examples section
	232
	233	07-19-2014 v. 3.4.0
	234	+++++++++++++++++++
	235
	236	- Fixed a bug that caused user's options to the 'natsort_key' to not be
	237	passed on to recursive calls of 'natsort_key' (issue #12)
	238	- Added a 'natsort_keygen' function that will generate a wrapped version
	239	of 'natsort_key' that is easier to call. 'natsort_key' is now set to
	240	deprecate at natsort version 4.0.0
	241	- Added an 'as_path' option to 'natsorted' & co. that will try to treat
	242	input strings as filepaths. This will help yield correct results for
	243	OS-generated inputs like
	244	``['/p/q/o.x', '/p/q (1)/o.x', '/p/q (10)/o.x', '/p/q/o (1).x']`` (issue #3)
	245	- Massive performance enhancements for string input (1.8x-2.0x), at the expense
	246	of reduction in speed for numeric input (~2.0x)
	247
	248	- This is a good compromise because the most common input will be strings,
	249	not numbers, and sorting numbers still only takes 0.6x the time of sorting
	250	strings. If you are sorting only numbers, you would use 'sorted' anyway
	251
	252	- Added the 'order_by_index' function to help in using the output of
	253	'index_natsorted' and 'index_versorted'
	254	- Added the 'reverse' option to 'natsorted' & co. to make it's API more
	255	similar to the builtin 'sorted'
	256	- Added more unit tests
	257	- Added auxillary test code that helps in profiling and stress-testing
	258	- Reworked the documentation, moving most of it to PyPI's hosting platform
	259	- Added support for coveralls.io
	260	- Entire codebase is now PyFlakes and PEP8 compliant
	261
	262	06-28-2014 v. 3.3.0
	263	+++++++++++++++++++
	264
	265	- Added a 'versorted' method for more convenient sorting of versions (issue #11)
	266	- Updated command-line tool --number_type option with 'version' and 'ver'
	267	to make it more clear how to sort version numbers
	268	- Moved unit-testing mechanism from being docstring-based to actual unit tests
	269	in actual functions (issue #10)
	270
	271	- This has provided the ability determine the coverage of the unit tests (99%)
	272	- This also makes the pydoc documentation a bit more clear
	273
	274	- Made docstrings for public functions mirror the README API
	275	- Connected natsort development to Travis-CI to help ensure quality releases
	276
	277	06-20-2014 v. 3.2.1
	278	+++++++++++++++++++
	279
	280	- Re-"Fixed" unorderable types issue on Python 3.x - this workaround
	281	is for when the problem occurs in the middle of the string (issue #7 again)
	282
	283	05-07-2014 v. 3.2.0
	284	+++++++++++++++++++
	285
	286	- "Fixed" unorderable types issue on Python 3.x with a workaround that
	287	attempts to replicate the Python 2.x behavior by putting all the numbers
	288	(or strings that begin with numbers) first (issue #7)
	289	- Now explicitly excluding __pycache__ from releases by adding a prune statement
	290	to MANIFEST.in
	291
	292	05-05-2014 v. 3.1.2
	293	+++++++++++++++++++
	294
	295	- Added setup.cfg to support universal wheels (issue #6)
	296	- Added Python 3.0 and Python 3.1 as requiring the argparse module
	297
	298	03-01-2014 v. 3.1.1
	299	+++++++++++++++++++
	300
	301	- Added ability to sort lists of lists (issue #5)
	302	- Cleaned up import statements
	303
	304	01-20-2014 v. 3.1.0
	305	+++++++++++++++++++
	306
	307	- Added the ``signed`` and ``exp`` options to allow finer tuning of the sorting
	308	- Entire codebase now works for both Python 2 and Python 3 without needing to run
	309	``2to3``
	310	- Updated all doctests
	311	- Further simplified the ``natsort`` base code by removing unneeded functions.
	312	- Simplified documentation where possible
	313	- Improved the shell script code
	314
	315	- Made the documentation less "path"-centric to make it clear it is not just
	316	for sorting file paths
	317	- Removed the filesystem-based options because these can be achieved better
	318	though a pipeline
	319	- Added doctests
	320	- Added new options that correspond to ``signed`` and ``exp``
	321	- The user can now specify multiple numbers to exclude or multiple ranges
	322	to filter by
	323
	324	10-01-2013 v. 3.0.2
	325	+++++++++++++++++++
	326
	327	- Made float, int, and digit searching algorithms all share the same base function
	328	- Fixed some outdated comments
	329	- Made the ``__version__`` variable available when importing the module
	330
	331	8-15-2013 v. 3.0.1
	332	++++++++++++++++++
	333
	334	- Added support for unicode strings (issue #2)
	335	- Removed extraneous ``string2int`` function
	336	- Fixed empty string removal function
	337
	338	7-13-2013 v. 3.0.0
	339	++++++++++++++++++
	340
	341	- Added a ``number_type`` argument to the sorting functions to specify how
	342	liberal to be when deciding what a number is
	343	- Reworked the documentation
	344
	345	6-25-2013 v. 2.2.0
	346	++++++++++++++++++
	347
	348	- Added ``key`` attribute to ``natsorted`` and ``index_natsorted`` so that
	349	it mimics the functionality of the built-in ``sorted`` (issue #1)
	350	- Added tests to reflect the new functionality, as well as tests demonstrating
	351	how to get similar functionality using ``natsort_key``
	352
	353	12-5-2012 v. 2.1.0
	354	++++++++++++++++++
	355
	356	- Reorganized package
	357	- Now using a platform independent shell script generator (entry_points
	358	from distribute)
	359	- Can now execute natsort from command line with ``python -m natsort``
	360	as well
	361
	362	11-30-2012 v. 2.0.2
	363	+++++++++++++++++++
	364
	365	- Added the use_2to3 option to setup.py
	366	- Added distribute_setup.py to the distribution
	367	- Added dependency to the argparse module (for python2.6)
	368
	369	11-21-2012 v. 2.0.1
	370	+++++++++++++++++++
	371
	372	- Reorganized directory structure
	373	- Added tests into the natsort.py file iteself
	374
	375	11-16-2012, v. 2.0.0
	376	++++++++++++++++++++
	377
	378	- Updated sorting algorithm to support floats (including exponentials) and
	379	basic version number support
	380	- Added better README documentation
	381	- Added doctests

-3

CODE_OF_CONDUCT.md less more

39	39
40	40	## Attribution
41	41
42		This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
	42	This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct.html][version]
43	43
44		[homepage]: http://contributor-covenant.org
45		[version]: http://contributor-covenant.org/version/1/4/
	44	[homepage]: https://www.contributor-covenant.org/
	45	[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

-1

CONTRIBUTING.md less more

1	1
2	2	If you have an idea for how to improve `natsort`, please contribute! It can
3	3	be as simple as a bug fix or documentation update, or as complicated as a more
4		robust algorithm.
	4	robust algorithm. Contributions that change the public API of
	5	`natsort` will have to ensure that the library does not become
	6	less usable after the contribution and is backwards-compatible (unless there is
	7	a good reason not to be).
5	8
6	9	I do not have strong opinions on how one should contribute, so
7	10	I have copy/pasted some text verbatim from the

-5

~~ISSUE_TEMPLATE.md~~ less more

0		## Minimum, Complete, Verifiable Example
1
2		See https://stackoverflow.com/help/mcve for explanation.
3
4		## Error message, Traceback, Desired behavior, Suggestion, Request, or Question

-11

MANIFEST.in less more

0		include README.rst
1	0	include LICENSE
2		include *.md
3		include *.sh
4		include Pipfile
5		include setup.py
6		include setup.cfg
	1	include CHANGELOG.rst
	2	include clean.sh
	3	include dev-requirements.txt
7	4	include tox.ini
8		include .travis.yml
9		include .coveragerc
10		include .gitignore
11		include .bumpversion.cfg
12	5	graft docs
13	6	graft natsort
14		graft test_natsort
	7	graft tests
15	8	global-exclude .py[cod] __pycache__ .so

-10

~~Pipfile~~ less more

0		[dev-packages]
1		coverage = "*"
2		pytest = ">=3.5"
3		pytest-cov = "*"
4		pytest-mock = ">=1.1"
5		hypothesis = ">=3.8.0"
6		pytest-faulthandler = {version = "*", platform_python_implementation = "== 'CPython'"}
7
8		# These packages are standard on newer python versions.
9		pathlib = {version = "*", python_version = "< '3.4'"}

+129

-52

README.rst less more

22	22
23	23	- Source Code: https://github.com/SethMMorton/natsort
24	24	- Downloads: https://pypi.org/project/natsort/
25		- Documentation: http://natsort.readthedocs.io/
26
27		- `Examples and Recipes <http://natsort.readthedocs.io/en/master/examples.html>`_
28		- `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_
29		- `API <http://natsort.readthedocs.io/en/master/api.html>`_
	25	- Documentation: https://natsort.readthedocs.io/
	26
	27	- `Examples and Recipes <https://natsort.readthedocs.io/en/master/examples.html>`_
	28	- `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_
	29	- `API <https://natsort.readthedocs.io/en/master/api.html>`_
30	30
31	31	- `FAQ`_
32	32	- `Optional Dependencies`_
33	33
34	34	- `fastnumbers <https://pypi.org/project/fastnumbers>`_ >= 2.0.0
35	35	- `PyICU <https://pypi.org/project/PyICU>`_ >= 1.0.0
	36
	37	NOTE: Please see the `Deprecation Schedule`_ section for changes in
	38	``natsort`` version 6.0.0 and in the upcoming version 7.0.0.
36	39
37	40	Quick Description
38	41	-----------------

41	44	sort algorithm sorts lexicographically, so you might not get the results that you
42	45	expect:
43	46
44		.. code-block:: python
	47	.. code-block:: pycon
45	48
46	49	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
47	50	>>> sorted(a)

56	59	sorting based on meaning and not computer code point).
57	60	Using ``natsorted`` is simple:
58	61
59		.. code-block:: python
	62	.. code-block:: pycon
60	63
61	64	>>> from natsort import natsorted
62	65	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']

65	68
66	69	``natsorted`` identifies numbers anywhere in a string and sorts them
67	70	naturally. Below are some other things you can do with ``natsort``
68		(also see the `examples <http://natsort.readthedocs.io/en/master/examples.html>`_
	71	(also see the `examples <https://natsort.readthedocs.io/en/master/examples.html>`_
69	72	for a quick start guide, or the
70		`api <http://natsort.readthedocs.io/en/master/api.html>`_ for complete details).
	73	`api <https://natsort.readthedocs.io/en/master/api.html>`_ for complete details).
71	74
72	75	Note: ``natsorted`` is designed to be a drop-in replacement for the built-in
73	76	``sorted`` function. Like ``sorted``, ``natsorted`` `does not sort in-place`.
74	77	To sort a list and assign the output to the same variable, you must
75	78	explicitly assign the output to a variable:
76	79
77		.. code-block:: python
	80	.. code-block:: pycon
78	81
79	82	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
80	83	>>> natsorted(a)

94	97	Sorting Versions
95	98	++++++++++++++++
96	99
97		This is handled properly by default (as of ``natsort`` version >= 4.0.0):
98
99		.. code-block:: python
	100	``natsort`` does not actually comprehend version numbers.
	101	It just so happens that the most common versioning schemes are designed to
	102	work with standard natural sorting techniques; these schemes include
	103	``MAJOR.MINOR``, ``MAJOR.MINOR.PATCH``, ``YEAR.MONTH.DAY``. If your data
	104	conforms to a scheme like this, then it will work out-of-the-box with
	105	``natsorted`` (as of ``natsort`` version >= 4.0.0):
	106
	107	.. code-block:: pycon
100	108
101	109	>>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
102	110	>>> natsorted(a)
103	111	['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
104	112
105		If you need to sort release candidates, please see
106		`this useful hack <http://natsort.readthedocs.io/en/master/examples.html#rc-sorting>`_.
	113	If you need to versions that use a more complicated scheme, please see
	114	`these examples <https://natsort.readthedocs.io/en/master/examples.html#rc-sorting>`_.
107	115
108	116	Sorting by Real Numbers (i.e. Signed Floats)
109	117	++++++++++++++++++++++++++++++++++++++++++++
110	118
111		This is useful in scientific data analysis and was
	119	This is useful in scientific data analysis (and was
112	120	the default behavior of ``natsorted`` for ``natsort``
113		version < 4.0.0. Use the ``realsorted`` function:
114
115		.. code-block:: python
	121	version < 4.0.0). Use the ``realsorted`` function:
	122
	123	.. code-block:: pycon
116	124
117	125	>>> from natsort import realsorted, ns
118	126	>>> # Note that when interpreting as signed floats, the below numbers are

133	141	separator is accounted for in the number.
134	142	This can be achieved with the ``humansorted`` function:
135	143
136		.. code-block:: python
	144	.. code-block:: pycon
137	145
138	146	>>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
139	147	>>> natsorted(a)

149	157
150	158	You may find you need to explicitly set the locale to get this to work
151	159	(as shown in the example).
152		Please see `locale issues <http://natsort.readthedocs.io/en/master/locale_issues.html>`_ and the
	160	Please see `locale issues <https://natsort.readthedocs.io/en/master/locale_issues.html>`_ and the
153	161	`Optional Dependencies`_ section below before using the ``humansorted`` function.
154	162
155	163	Further Customizing Natsort

159	167	``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the
160	168	bitwise OR operator (``\|``). For example,
161	169
162		.. code-block:: python
	170	.. code-block:: pycon
163	171
164	172	>>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
165	173	>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE)

174	182	True
175	183
176	184	All of the available customizations can be found in the documentation for
177		`the ns enum <http://natsort.readthedocs.io/en/master/ns_class.html>`_.
	185	`the ns enum <https://natsort.readthedocs.io/en/master/api.html#natsort.ns>`_.
178	186
179	187	You can also add your own custom transformation functions with the ``key`` argument.
180	188	These can be used with ``alg`` if you wish.
181	189
182		.. code-block:: python
	190	.. code-block:: pycon
183	191
184	192	>>> a = ['apple2.50', '2.3apple']
185	193	>>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL)

191	199	You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
192	200	when you sort:
193	201
194		.. code-block:: python
	202	.. code-block:: pycon
195	203
196	204	>>> a = ['4.5', 6, 2.0, '5', 'a']
197	205	>>> natsorted(a)

205	213	``natsort`` does not officially support the `bytes` type on Python 3, but
206	214	convenience functions are provided that help you decode to `str` first:
207	215
208		.. code-block:: python
	216	.. code-block:: pycon
209	217
210	218	>>> from natsort import as_utf8
211	219	>>> a = [b'a', 14.0, 'b']

228	236	generate a custom sorting key to sort in-place using the ``list.sort``
229	237	method.
230	238
231		.. code-block:: python
	239	.. code-block:: pycon
232	240
233	241	>>> from natsort import natsort_keygen
234	242	>>> natsort_key = natsort_keygen()

247	255
248	256	- recursively descend into lists of lists
249	257	- automatic unicode normalization of input data
250		- `controlling the case-sensitivity <http://natsort.readthedocs.io/en/master/examples.html#case-sort>`_
251		- `sorting file paths correctly <http://natsort.readthedocs.io/en/master/examples.html#path-sort>`_
252		- `allow custom sorting keys <http://natsort.readthedocs.io/en/master/examples.html#custom-sort>`_
	258	- `controlling the case-sensitivity <https://natsort.readthedocs.io/en/master/examples.html#case-sort>`_
	259	- `sorting file paths correctly <https://natsort.readthedocs.io/en/master/examples.html#path-sort>`_
	260	- `allow custom sorting keys <https://natsort.readthedocs.io/en/master/examples.html#custom-sort>`_
253	261
254	262	FAQ
255	263	---

260	268	exactly what is being done with their input using this key - it is highly recommended
261	269	to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
262	270	for how to debug, and also to review the
263		`How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_
	271	`How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_
264	272	page for why ``natsort`` is doing that to your data.
265	273
266	274	If you are trying to sort custom classes and running into trouble, please take a look at

271	279	use the ``natsort`` key as part of your rich comparison operator definition.
272	280
273	281	How does ``natsort`` work?
274		If you don't want to read `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_,
	282	If you don't want to read `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_,
275	283	here is a quick primer.
276	284
277	285	``natsort`` provides a `key function <https://docs.python.org/3/howto/sorting.html#key-functions>`_

281	289	key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()`` is essentially
282	290	a wrapper for the following code:
283	291
284		.. code-block:: python
	292	.. code-block:: pycon
285	293
286	294	>>> from natsort import natsort_keygen
287	295	>>> natsort_key = natsort_keygen()

315	323	------------
316	324
317	325	``natsort`` comes with a shell script called ``natsort``, or can also be called
318		from the command line with ``python -m natsort``.
	326	from the command line with ``python -m natsort``.
319	327
320	328	Requirements
321	329	------------
322	330
323		``natsort`` requires Python version 2.6 or greater or Python 3.3 or greater.
324		It may run on (but is not tested against) Python 3.2.
	331	``natsort`` requires Python version 2.7 or Python 3.4 or greater.
325	332
326	333	Optional Dependencies
327	334	---------------------

343	350
344	351	It is recommended that you install `PyICU <https://pypi.org/project/PyICU>`_
345	352	if you wish to sort in a locale-dependent manner, see
346		http://natsort.readthedocs.io/en/master/locale_issues.html for an explanation why.
	353	https://natsort.readthedocs.io/en/master/locale_issues.html for an explanation why.
347	354
348	355	Installation
349	356	------------
350	357
351	358	Use ``pip``!
352	359
353		.. code-block:: sh
	360	.. code-block:: console
354	361
355	362	$ pip install natsort
356	363

360	367	`fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for
361	368	`PyICU <https://pypi.org/project/PyICU>`_.
362	369
363		.. code-block:: sh
	370	.. code-block:: console
364	371
365	372	# Install both optional dependencies.
366	373	$ pip install natsort[fast,icu]

376	383	After installing ``tox``, running tests is as simple as executing the following in the
377	384	``natsort`` directory:
378	385
379		.. code-block:: sh
	386	.. code-block:: console
380	387
381	388	$ tox
382	389
383	390	``tox`` will create virtual a virtual environment for your tests and install all the
384	391	needed testing requirements for you. You can specify a particular python version
385		with the ``-e`` flag, e.g. ``tox -e py36``.
386
387		If you do not wish to use ``tox``, you can install the testing dependencies and run the
388		tests manually using `pytest <https://docs.pytest.org/en/latest/>`_ - ``natsort``
389		contains a ``Pipfile`` for use with `pipenv <https://github.com/pypa/pipenv>`_ that
390		makes it easy for you to install the testing dependencies:
391
392		.. code-block:: sh
393
394		$ pipenv install --skip-lock --dev
395		$ pipenv run python -m pytest
	392	with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``.
	393	You can see all available testing environments with ``tox --listenvs``.
	394
	395	If you do not wish to use ``tox``, you can install the testing dependencies with the
	396	``dev-requirements.txt`` file and then run the tests manually using
	397	`pytest <https://docs.pytest.org/en/latest/>`_.
	398
	399	.. code-block:: console
	400
	401	$ pip install -r dev-requirements.txt
	402	$ python -m pytest
396	403
397	404	Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this is because
398	405	`the former puts the CWD on sys.path <https://docs.pytest.org/en/latest/usage.html#calling-pytest-through-python-m-pytest>`_.
399	406
	407	How to Build Documentation
	408	--------------------------
	409
	410	If you want to build the documentation for ``natsort``, it is recommended to use ``tox``:
	411
	412	.. code-block:: console
	413
	414	$ tox -e docs
	415
	416	This will place the documentation in ``build/sphinx/html``. If you do not
	417	which to use ``tox``, you can do the following:
	418
	419	.. code-block:: console
	420
	421	$ pip install sphinx sphinx_rtd_theme
	422	$ python setup.py build_sphinx
	423
	424	Deprecation Schedule
	425	--------------------
	426
	427	Dropping Python 2.7 Support
	428	+++++++++++++++++++++++++++
	429
	430	``natsort`` version 7.0.0 will drop support for Python 2.7.
	431
	432	The version 6.X branch will remain as a "long term support" branch where bug fixes
	433	are applied so that users who cannot update from Python 2.7 will not be forced to
	434	use a buggy ``natsort`` version. Once version 7.0.0 is released, new features
	435	will not be added to version 6.X, only bug fixes.
	436
	437	Deprecated APIs
	438	+++++++++++++++
	439
	440	In ``natsort`` version 6.0.0, the following APIs and functions were removed
	441
	442	- ``number_type`` keyword argument (deprecated since 3.4.0)
	443	- ``signed`` keyword argument (deprecated since 3.4.0)
	444	- ``exp`` keyword argument (deprecated since 3.4.0)
	445	- ``as_path`` keyword argument (deprecated since 3.4.0)
	446	- ``py3_safe`` keyword argument (deprecated since 3.4.0)
	447	- ``ns.TYPESAFE`` (deprecated since version 5.0.0)
	448	- ``ns.DIGIT`` (deprecated since version 5.0.0)
	449	- ``ns.VERSION`` (deprecated since version 5.0.0)
	450	- ``versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
	451	- ``index_versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
	452
	453	In general, if you want to determine if you are using deprecated APIs you can run your
	454	code with the following flag
	455
	456	.. code-block:: console
	457
	458	$ python -Wdefault::DeprecationWarning my-code.py
	459
	460	By default ``DeprecationWarnings`` are not shown, but this will cause them to be shown.
	461	Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to
	462	"default::DeprecationWarning" and then run your code.
	463
	464	Dropped Pipenv for Development
	465	++++++++++++++++++++++++++++++
	466
	467	``natsort`` version 6.0.0 no longer uses `Pipenv <https://pipenv.readthedocs.io/en/latest/>`_
	468	to install development dependencies.
	469
	470	Dropped Python 2.6 and 3.3 Support
	471	++++++++++++++++++++++++++++++++++
	472
	473	``natsort`` version 6.0.0 dropped support for Python 2.6 and Python 3.3.
	474
400	475	Author
401	476	------
402	477

405	480	History
406	481	-------
407	482
408		Please visit the `changelog <http://natsort.readthedocs.io/en/master/changelog.html>`_.
	483	Please visit the changelog
	484	`on GitHub <https://github.com/SethMMorton/natsort/blob/master/CHANGELOG.rst>`_ or
	485	`in the documentation <https://natsort.readthedocs.io/en/master/changelog.html>`_.

-0

dev-requirements.txt less more

	0	coverage
	1	pytest >= 3.5
	2	pytest-cov
	3	pytest-mock >= 1.1
	4	hypothesis >= 3.8.0
	5	pytest-faulthandler; platform_python_implementation == 'CPython'
	6	semver
	7	# These packages are standard on newer python versions.
	8	pathlib; python_version < '3.4'

+97

-0

docs/api.rst less more

	0	.. default-domain:: py
	1	.. currentmodule:: natsort
	2
	3	.. _api:
	4
	5	natsort API
	6	===========
	7
	8	.. contents::
	9	:local:
	10
	11	Standard API
	12	------------
	13
	14	:func:`~natsort.natsorted`
	15	++++++++++++++++++++++++++
	16
	17	.. autofunction:: natsorted
	18
	19	The :class:`~natsort.ns` enum
	20	+++++++++++++++++++++++++++++
	21
	22	.. autodata:: ns
	23	:annotation:
	24
	25	:func:`~natsort.natsort_key`
	26	++++++++++++++++++++++++++++
	27
	28	.. autofunction:: natsort_key
	29
	30	:func:`~natsort.natsort_keygen`
	31	+++++++++++++++++++++++++++++++
	32
	33	.. autofunction:: natsort_keygen
	34
	35	Convenience Functions
	36	---------------------
	37
	38	:func:`~natsort.realsorted`
	39	+++++++++++++++++++++++++++
	40
	41	.. autofunction:: realsorted
	42
	43	:func:`~natsort.humansorted`
	44	++++++++++++++++++++++++++++
	45
	46	.. autofunction:: humansorted
	47
	48	:func:`~natsort.index_natsorted`
	49	++++++++++++++++++++++++++++++++
	50
	51	.. autofunction:: index_natsorted
	52
	53	:func:`~natsort.index_realsorted`
	54	+++++++++++++++++++++++++++++++++
	55
	56	.. autofunction:: index_realsorted
	57
	58	:func:`~natsort.index_humansorted`
	59	++++++++++++++++++++++++++++++++++
	60
	61	.. autofunction:: index_humansorted
	62
	63	:func:`~natsort.order_by_index`
	64	+++++++++++++++++++++++++++++++
	65
	66	.. autofunction:: order_by_index
	67
	68	.. _bytes_help:
	69
	70	Help With Bytes On Python 3
	71	+++++++++++++++++++++++++++
	72
	73	The official stance of :mod:`natsort` is to not support `bytes` for
	74	sorting; there is just too much that can go wrong when trying to automate
	75	conversion between `bytes` and `str`. But rather than completely give up
	76	on `bytes`, :mod:`natsort` provides three functions that make it easy to
	77	quickly decode `bytes` to `str` so that sorting is possible.
	78
	79	.. autofunction:: decoder
	80
	81	.. autofunction:: as_ascii
	82
	83	.. autofunction:: as_utf8
	84
	85	.. _function_help:
	86
	87	Help With Creating Function Keys
	88	++++++++++++++++++++++++++++++++
	89
	90	If you need to create a complicated key argument to (for example)
	91	:func:`natsorted` that is actually multiple functions called one after the other,
	92	the following function can help you easily perform this action. It is
	93	used internally to :mod:`natsort`, and has been exposed publically for
	94	the convenience of the user.
	95
	96	.. autofunction:: chain_functions

-0

docs/changelog.rst less more

	0	.. _changelog:
	1
	2	Changelog
	3	---------
	4
	5	.. include:: ../CHANGELOG.rst

+275

-0

docs/conf.py less more

	0	# -- coding: utf-8 --
	1	#
	2	# natsort documentation build configuration file, created by
	3	# sphinx-quickstart on Thu Jul 17 21:01:29 2014.
	4	#
	5	# This file is execfile()d with the current directory set to its
	6	# containing dir.
	7	#
	8	# Note that not all possible configuration values are present in this
	9	# autogenerated file.
	10	#
	11	# All configuration values have a default; values that are commented out
	12	# serve to show the default.
	13
	14	import os
	15
	16	# If extensions (or modules to document with autodoc) are in another directory,
	17	# add these directories to sys.path here. If the directory is relative to the
	18	# documentation root, use os.path.abspath to make it absolute, like shown here.
	19	# sys.path.insert(0, os.path.abspath('.'))
	20
	21	# -- General configuration ------------------------------------------------
	22
	23	# If your documentation needs a minimal Sphinx version, state it here.
	24	# needs_sphinx = '1.0'
	25
	26	# Add any Sphinx extension module names here, as strings. They can be
	27	# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
	28	# ones.
	29	extensions = [
	30	'sphinx.ext.autodoc',
	31	'sphinx.ext.autosummary',
	32	'sphinx.ext.intersphinx',
	33	'sphinx.ext.mathjax',
	34	'sphinx.ext.napoleon',
	35	]
	36
	37	# Add any paths that contain templates here, relative to this directory.
	38	templates_path = ['_templates']
	39
	40	# The suffix of source filenames.
	41	source_suffix = '.rst'
	42
	43	# The encoding of source files.
	44	# source_encoding = 'utf-8-sig'
	45
	46	# The master toctree document.
	47	master_doc = 'index'
	48
	49	# General information about the project.
	50	project = u'natsort'
	51	# noinspection PyShadowingBuiltins
	52	copyright = u'2014, Seth M. Morton'
	53
	54	# The version info for the project you're documenting, acts as replacement for
	55	# \|version\| and \|release\|, also used in various other places throughout the
	56	# built documents.
	57	#
	58	# The full version, including alpha/beta/rc tags.
	59	release = '6.0.0'
	60	# The short X.Y version.
	61	version = '.'.join(release.split('.')[0:2])
	62
	63	# The language for content autogenerated by Sphinx. Refer to documentation
	64	# for a list of supported languages.
	65	# language = None
	66
	67	# There are two options for replacing \|today\|: either, you set today to some
	68	# non-false value, then it is used:
	69	# today = ''
	70	# Else, today_fmt is used as the format for a strftime call.
	71	# today_fmt = '%B %d, %Y'
	72
	73	# List of patterns, relative to source directory, that match files and
	74	# directories to ignore when looking for source files.
	75	# exclude_patterns = ['solar/*']
	76
	77	# The reST default role (used for this markup: `text`) to use for all
	78	# documents.
	79	# default_role = None
	80
	81	# If true, '()' will be appended to :func: etc. cross-reference text.
	82	# add_function_parentheses = True
	83
	84	# If true, the current module name will be prepended to all description
	85	# unit titles (such as .. function::).
	86	# add_module_names = True
	87
	88	# If true, sectionauthor and moduleauthor directives will be shown in the
	89	# output. They are ignored by default.
	90	# show_authors = False
	91
	92	# The name of the Pygments (syntax highlighting) style to use.
	93	pygments_style = 'sphinx'
	94	highlight_language = 'python'
	95
	96	# A list of ignored prefixes for module index sorting.
	97	# modindex_common_prefix = []
	98
	99	# If true, keep warnings as "system message" paragraphs in the built documents.
	100	# keep_warnings = False
	101
	102
	103	# -- Options for HTML output ----------------------------------------------
	104
	105	# The theme to use for HTML and HTML Help pages. See the documentation for
	106	# a list of builtin themes.
	107	on_rtd = os.environ.get('READTHEDOCS') == 'True'
	108	if on_rtd:
	109	html_theme = 'default'
	110	else:
	111	import sphinx_rtd_theme
	112
	113	html_theme = 'sphinx_rtd_theme'
	114	# html_theme = 'solar'
	115
	116	# Theme options are theme-specific and customize the look and feel of a theme
	117	# further. For a list of options available for each theme, see the
	118	# documentation.
	119	# html_theme_options = {}
	120
	121	# Add any paths that contain custom themes here, relative to this directory.
	122	html_theme_path = ['.']
	123
	124	# The name for this set of Sphinx documents. If None, it defaults to
	125	# "<project> v<release> documentation".
	126	# html_title = None
	127
	128	# A shorter title for the navigation bar. Default is the same as html_title.
	129	# html_short_title = None
	130
	131	# The name of an image file (relative to this directory) to place at the top
	132	# of the sidebar.
	133	# html_logo = None
	134
	135	# The name of an image file (within the static path) to use as favicon of the
	136	# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
	137	# pixels large.
	138	# html_favicon = None
	139
	140	# Add any paths that contain custom static files (such as style sheets) here,
	141	# relative to this directory. They are copied after the builtin static files,
	142	# so a file named "default.css" will overwrite the builtin "default.css".
	143	# html_static_path = ['_static']
	144
	145	# Add any extra paths that contain custom files (such as robots.txt or
	146	# .htaccess) here, relative to this directory. These files are copied
	147	# directly to the root of the documentation.
	148	# html_extra_path = []
	149
	150	# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
	151	# using the given strftime format.
	152	# html_last_updated_fmt = '%b %d, %Y'
	153
	154	# If true, SmartyPants will be used to convert quotes and dashes to
	155	# typographically correct entities.
	156	# html_use_smartypants = True
	157
	158	# Custom sidebar templates, maps document names to template names.
	159	# html_sidebars = {}
	160
	161	# Additional templates that should be rendered to pages, maps page names to
	162	# template names.
	163	# html_additional_pages = {}
	164
	165	# If false, no module index is generated.
	166	# html_domain_indices = True
	167
	168	# If false, no index is generated.
	169	# html_use_index = True
	170
	171	# If true, the index is split into individual pages for each letter.
	172	# html_split_index = False
	173
	174	# If true, links to the reST sources are added to the pages.
	175	# html_show_sourcelink = True
	176
	177	# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
	178	# html_show_sphinx = True
	179
	180	# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
	181	# html_show_copyright = True
	182
	183	# If true, an OpenSearch description file will be output, and all pages will
	184	# contain a <link> tag referring to it. The value of this option must be the
	185	# base URL from which the finished HTML is served.
	186	# html_use_opensearch = ''
	187
	188	# This is the file name suffix for HTML files (e.g. ".xhtml").
	189	# html_file_suffix = None
	190
	191	# Output file base name for HTML help builder.
	192	htmlhelp_basename = 'natsortdoc'
	193
	194	# -- Options for LaTeX output ---------------------------------------------
	195
	196	latex_elements = {
	197	# The paper size ('letterpaper' or 'a4paper').
	198	# 'papersize': 'letterpaper',
	199
	200	# The font size ('10pt', '11pt' or '12pt').
	201	# 'pointsize': '10pt',
	202
	203	# Additional stuff for the LaTeX preamble.
	204	# 'preamble': '',
	205	}
	206
	207	# Grouping the document tree into LaTeX files. List of tuples
	208	# (source start file, target name, title,
	209	# author, documentclass [howto, manual, or own class]).
	210	latex_documents = [
	211	('index', 'natsort.tex', u'natsort Documentation',
	212	u'Seth M. Morton', 'manual'),
	213	]
	214
	215	# The name of an image file (relative to this directory) to place at the top of
	216	# the title page.
	217	# latex_logo = None
	218
	219	# For "manual" documents, if this is true, then toplevel headings are parts,
	220	# not chapters.
	221	# latex_use_parts = False
	222
	223	# If true, show page references after internal links.
	224	# latex_show_pagerefs = False
	225
	226	# If true, show URL addresses after external links.
	227	# latex_show_urls = False
	228
	229	# Documents to append as an appendix to all manuals.
	230	# latex_appendices = []
	231
	232	# If false, no module index is generated.
	233	# latex_domain_indices = True
	234
	235
	236	# -- Options for manual page output ---------------------------------------
	237
	238	# One entry per manual page. List of tuples
	239	# (source start file, name, description, authors, manual section).
	240	man_pages = [
	241	('index', 'natsort', u'natsort Documentation',
	242	[u'Seth M. Morton'], 1)
	243	]
	244
	245	# If true, show URL addresses after external links.
	246	# man_show_urls = False
	247
	248
	249	# -- Options for Texinfo output -------------------------------------------
	250
	251	# Grouping the document tree into Texinfo files. List of tuples
	252	# (source start file, target name, title, author,
	253	# dir menu entry, description, category)
	254	texinfo_documents = [
	255	('index', 'natsort', u'natsort Documentation',
	256	u'Seth M. Morton', 'natsort', 'One line description of project.',
	257	'Miscellaneous'),
	258	]
	259
	260	# Documents to append as an appendix to all manuals.
	261	# texinfo_appendices = []
	262
	263	# If false, no module index is generated.
	264	# texinfo_domain_indices = True
	265
	266	# How to display URL addresses: 'footnote', 'no', or 'inline'.
	267	# texinfo_show_urls = 'footnote'
	268
	269	# If true, do not generate a @detailmenu in the "Top" node's menu.
	270	# texinfo_no_detailmenu = False
	271
	272
	273	# Example configuration for intersphinx: refer to the Python standard library.
	274	intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}

+385

-0

docs/examples.rst less more

	0	.. default-domain:: py
	1	.. currentmodule:: natsort
	2
	3	.. _examples:
	4
	5	Examples and Recipes
	6	====================
	7
	8	If you want more detailed examples than given on this page, please see
	9	https://github.com/SethMMorton/natsort/tree/master/tests.
	10
	11	.. contents::
	12	:local:
	13
	14	Basic Usage
	15	-----------
	16
	17	In the most basic use case, simply import :func:`~natsorted` and use
	18	it as you would :func:`sorted`:
	19
	20	.. code-block:: pycon
	21
	22	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	23	>>> sorted(a)
	24	['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
	25	>>> from natsort import natsorted, ns
	26	>>> natsorted(a)
	27	['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
	28
	29	Sort Version Numbers
	30	--------------------
	31
	32	As of :mod:`natsort` version >= 4.0.0, :func:`~natsorted` will work for
	33	well-behaved version numbers, like ``MAJOR.MINOR.PATCH``.
	34
	35	.. _rc_sorting:
	36
	37	Sorting More Expressive Versioning Schemes
	38	++++++++++++++++++++++++++++++++++++++++++
	39
	40	By default, if you wish to sort versions that are not as simple as
	41	``MAJOR.MINOR.PATCH`` (or similar), you may not get the results you expect:
	42
	43	.. code-block:: pycon
	44
	45	>>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta1', '1.2alpha', '1.2.1', '1.1', '1.3']
	46	>>> natsorted(a)
	47	['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3']
	48
	49	To make the '1.2' pre-releases come before '1.2.1', you need to use the following
	50	recipe:
	51
	52	.. code-block:: pycon
	53
	54	>>> natsorted(a, key=lambda x: x.replace('.', '~'))
	55	['1.1', '1.2', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2.1', '1.3']
	56
	57	If you also want '1.2' after all the alpha, beta, and rc candidates, you can
	58	modify the above recipe:
	59
	60	.. code-block:: pycon
	61
	62	>>> natsorted(a, key=lambda x: x.replace('.', '~')+'z')
	63	['1.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2', '1.2.1', '1.3']
	64
	65	Please see `this issue <https://github.com/SethMMorton/natsort/issues/13>`_ to
	66	see why this works.
	67
	68	Sorting Rigorously Defined Versioning Schemes (e.g. SemVer or PEP 440)
	69	""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
	70
	71	If you know you are using a versioning scheme that follows a well-defined format
	72	for which there is third-party module support, you should use those modules
	73	to assist in sorting. Some examples might be
	74	`PEP 440 <https://packaging.pypa.io/en/latest/version>`_ or
	75	`SemVer <https://python-semver.readthedocs.io/en/latest/api.html>`_.
	76
	77	If we are being honest, using these methods to parse a version means you don't
	78	need to use :mod:`natsort` - you should probably just use :func:`sorted` directly.
	79	Here's an example with SemVer:
	80
	81	.. code-block:: pycon
	82
	83	>>> from semver import parse_version_info
	84	>>> a = ['3.4.5-pre.1', '3.4.5', '3.4.5-pre.2+build.4']
	85	>>> sorted(a, key=parse_version_info)
	86	['3.4.5-pre.1', '3.4.5-pre.2+build.4', '3.4.5']
	87
	88	.. _path_sort:
	89
	90	Sort OS-Generated Paths
	91	-----------------------
	92
	93	In some cases when sorting file paths with OS-Generated names, the default
	94	:mod:`~natsorted` algorithm may not be sufficient. In cases like these,
	95	you may need to use the ``ns.PATH`` option:
	96
	97	.. code-block:: pycon
	98
	99	>>> a = ['./folder/file (1).txt',
	100	... './folder/file.txt',
	101	... './folder (1)/file.txt',
	102	... './folder (10)/file.txt']
	103	>>> natsorted(a)
	104	['./folder (1)/file.txt', './folder (10)/file.txt', './folder/file (1).txt', './folder/file.txt']
	105	>>> natsorted(a, alg=ns.PATH)
	106	['./folder/file.txt', './folder/file (1).txt', './folder (1)/file.txt', './folder (10)/file.txt']
	107
	108	Locale-Aware Sorting (Human Sorting)
	109	------------------------------------
	110
	111	.. note::
	112	Please read :ref:`locale_issues` before using ``ns.LOCALE``, :func:`humansorted`,
	113	or :func:`index_humansorted`.
	114
	115	You can instruct :mod:`natsort` to use locale-aware sorting with the
	116	``ns.LOCALE`` option. In addition to making this understand non-ASCII
	117	characters, it will also properly interpret non-'.' decimal separators
	118	and also properly order case. It may be more convenient to just use
	119	the :func:`humansorted` function:
	120
	121	.. code-block:: pycon
	122
	123	>>> from natsort import humansorted
	124	>>> import locale
	125	>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
	126	'en_US.UTF-8'
	127	>>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
	128	>>> natsorted(a, alg=ns.LOCALE)
	129	['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
	130	>>> humansorted(a)
	131	['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
	132
	133	You may find that if you do not explicitly set the locale your results may not
	134	be as you expect... I have found that it depends on the system you are on.
	135	If you use `PyICU <https://pypi.org/project/PyICU>`_ (see below) then
	136	you should not need to do this.
	137
	138	.. _case_sort:
	139
	140	Controlling Case When Sorting
	141	-----------------------------
	142
	143	For non-numbers, by default :mod:`natsort` used ordinal sorting (i.e.
	144	it sorts by the character's value in the ASCII table). For example:
	145
	146	.. code-block:: pycon
	147
	148	>>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
	149	>>> natsorted(a)
	150	['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
	151
	152	There are times when you wish to ignore the case when sorting,
	153	you can easily do this with the ``ns.IGNORECASE`` option:
	154
	155	.. code-block:: pycon
	156
	157	>>> natsorted(a, alg=ns.IGNORECASE)
	158	['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
	159
	160	Note thats since Python's sorting is stable, the order of equivalent
	161	elements after lowering the case is the same order they appear in the
	162	original list.
	163
	164	Upper-case letters appear first in the ASCII table, but many natural
	165	sorting methods place lower-case first. To do this, use
	166	``ns.LOWERCASEFIRST``:
	167
	168	.. code-block:: pycon
	169
	170	>>> natsorted(a, alg=ns.LOWERCASEFIRST)
	171	['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
	172
	173	It may be undesirable to have the upper-case letters grouped together
	174	and the lower-case letters grouped together; most would expect all
	175	"a"s to bet together regardless of case, and all "b"s, and so on. To
	176	achieve this, use ``ns.GROUPLETTERS``:
	177
	178	.. code-block:: pycon
	179
	180	>>> natsorted(a, alg=ns.GROUPLETTERS)
	181	['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
	182
	183	You might combine this with ``ns.LOWERCASEFIRST`` to get what most
	184	would expect to be "natural" sorting:
	185
	186	.. code-block:: pycon
	187
	188	>>> natsorted(a, alg=ns.G \| ns.LF)
	189	['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
	190
	191	Customizing Float Definition
	192	----------------------------
	193
	194	You can make :func:`~natsorted` search for any float that would be
	195	a valid Python float literal, such as 5, 0.4, -4.78, +4.2E-34, etc.
	196	using the ``ns.FLOAT`` key. You can disable the exponential component
	197	of the number with ``ns.NOEXP``.
	198
	199	.. code-block:: pycon
	200
	201	>>> a = ['a50', 'a51.', 'a+50.4', 'a5.034e1', 'a+50.300']
	202	>>> natsorted(a, alg=ns.FLOAT)
	203	['a50', 'a5.034e1', 'a51.', 'a+50.300', 'a+50.4']
	204	>>> natsorted(a, alg=ns.FLOAT \| ns.SIGNED)
	205	['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
	206	>>> natsorted(a, alg=ns.FLOAT \| ns.SIGNED \| ns.NOEXP)
	207	['a5.034e1', 'a50', 'a+50.300', 'a+50.4', 'a51.']
	208
	209	For convenience, the ``ns.REAL`` option is provided which is a shortcut
	210	for ``ns.FLOAT \| ns.SIGNED`` and can be used to sort on real numbers.
	211	This can be easily accessed with the :func:`~realsorted` convenience
	212	function. Please note that the behavior of the :func:`~realsorted` function
	213	was the default behavior of :func:`~natsorted` for :mod:`natsort`
	214	version < 4.0.0:
	215
	216	.. code-block:: pycon
	217
	218	>>> natsorted(a, alg=ns.REAL)
	219	['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
	220	>>> from natsort import realsorted
	221	>>> realsorted(a)
	222	['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
	223
	224	.. _custom_sort:
	225
	226	Using a Custom Sorting Key
	227	--------------------------
	228
	229	Like the built-in ``sorted`` function, ``natsorted`` can accept a custom
	230	sort key so that:
	231
	232	.. code-block:: pycon
	233
	234	>>> from operator import attrgetter, itemgetter
	235	>>> a = [['a', 'num4'], ['b', 'num8'], ['c', 'num2']]
	236	>>> natsorted(a, key=itemgetter(1))
	237	[['c', 'num2'], ['a', 'num4'], ['b', 'num8']]
	238	>>> class Foo:
	239	... def __init__(self, bar):
	240	... self.bar = bar
	241	... def __repr__(self):
	242	... return "Foo('{}')".format(self.bar)
	243	>>> b = [Foo('num3'), Foo('num5'), Foo('num2')]
	244	>>> natsorted(b, key=attrgetter('bar'))
	245	[Foo('num2'), Foo('num3'), Foo('num5')]
	246
	247	Generating a Natsort Key
	248	------------------------
	249
	250	If you need to sort a list in-place, you cannot use :func:`~natsorted`; you
	251	need to pass a key to the :meth:`list.sort` method. The function
	252	:func:`~natsort_keygen` is a convenient way to generate these keys for you:
	253
	254	.. code-block:: pycon
	255
	256	>>> from natsort import natsort_keygen
	257	>>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300']
	258	>>> natsort_key = natsort_keygen(alg=ns.FLOAT)
	259	>>> a.sort(key=natsort_key)
	260	>>> a
	261	['a50', 'a50.300', 'a5.034e1', 'a50.4', 'a51.']
	262
	263	:func:`~natsort_keygen` has the same API as :func:`~natsorted` (minus the
	264	`reverse` option).
	265
	266	Natural Sorting with ``cmp`` (Python 2 only)
	267	--------------------------------------------
	268
	269	.. note::
	270	This is a Python2-only feature! The :func:`natcmp` function is not
	271	exposed on Python3. Because this documentation is built with
	272	Python3, you will not find :func:`natcmp` in the API.
	273
	274	If you are using a legacy codebase that requires you to use :func:`cmp` instead
	275	of a key-function, you can use :func:`~natcmp`.
	276
	277	.. code-block:: pycon
	278
	279	>>> import sys
	280	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	281	>>> if sys.version_info[0] == 2:
	282	... from natsort import natcmp
	283	... sorted(a, cmp=natcmp)
	284	... else:
	285	... natsorted(a) # so docstrings don't fail
	286	['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
	287
	288	:func:`natcmp` also accepts an ``alg`` argument so you can customize your
	289	sorting experience.
	290
	291	Sorting Multiple Lists According to a Single List
	292	-------------------------------------------------
	293
	294	Sometimes you have multiple lists, and you want to sort one of those
	295	lists and reorder the other lists according to how the first was sorted.
	296	To achieve this you could use the :func:`~index_natsorted` in combination
	297	with the convenience function
	298	:func:`~order_by_index`:
	299
	300	.. code-block:: pycon
	301
	302	>>> from natsort import index_natsorted, order_by_index
	303	>>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
	304	>>> b = [4, 5, 6, 7, 8]
	305	>>> c = ['hi', 'lo', 'ah', 'do', 'up']
	306	>>> index = index_natsorted(a)
	307	>>> order_by_index(a, index)
	308	['a1', 'a2', 'a4', 'a9', 'a10']
	309	>>> order_by_index(b, index)
	310	[6, 4, 7, 5, 8]
	311	>>> order_by_index(c, index)
	312	['ah', 'hi', 'do', 'lo', 'up']
	313
	314	Returning Results in Reverse Order
	315	----------------------------------
	316
	317	Just like the :func:`sorted` built-in function, you can supply the
	318	``reverse`` option to return the results in reverse order:
	319
	320	.. code-block:: pycon
	321
	322	>>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
	323	>>> natsorted(a, reverse=True)
	324	['a10', 'a9', 'a4', 'a2', 'a1']
	325
	326	Sorting Bytes on Python 3
	327	-------------------------
	328
	329	Python 3 is rather strict about comparing strings and bytes, and this
	330	can make it difficult to deal with collections of both. Because of the
	331	challenge of guessing which encoding should be used to decode a bytes
	332	array to a string, :mod:`natsort` does not try to guess and automatically
	333	convert for you; in fact, the official stance of :mod:`natsort` is to
	334	not support sorting bytes. Instead, some decoding convenience functions
	335	have been provided to you (see :ref:`bytes_help`) that allow you to
	336	provide a codec for decoding bytes through the ``key`` argument that
	337	will allow :mod:`natsort` to convert byte arrays to strings for sorting;
	338	these functions know not to raise an error if the input is not a byte
	339	array, so you can use the key on any arbitrary collection of data.
	340
	341	.. code-block:: pycon
	342
	343	>>> from natsort import as_ascii
	344	>>> a = [b'a', 14.0, 'b']
	345	>>> # On Python 2, natsorted(a) would would work as expected.
	346	>>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
	347	>>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
	348	True
	349
	350	Additionally, regular expressions cannot be run on byte arrays, making it
	351	so that :mod:`natsort` cannot parse them for numbers. As a result, if you
	352	run :mod:`natsort` on a list of bytes, you will get results that are like
	353	Python's default sorting behavior. Of course, you can use the decoding
	354	functions to solve this:
	355
	356	.. code-block:: pycon
	357
	358	>>> from natsort import as_utf8
	359	>>> a = [b'a56', b'a5', b'a6', b'a40']
	360	>>> natsorted(a) # doctest: +SKIP
	361	[b'a40', b'a5', b'a56', b'a6']
	362	>>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
	363	True
	364
	365	If you need a codec different from ASCII or UTF-8, you can use
	366	:func:`decoder` to generate a custom key:
	367
	368	.. code-block:: pycon
	369
	370	>>> from natsort import decoder
	371	>>> a = [b'a56', b'a5', b'a6', b'a40']
	372	>>> natsorted(a, key=decoder('latin1')) == [b'a5', b'a6', b'a40', b'a56']
	373	True
	374
	375	Sorting a Pandas DataFrame
	376	--------------------------
	377
	378	As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` argument,
	379	so you cannot simply pass :func:`natsort_keygen` to a Pandas DataFrame and sort.
	380	This request has been made to the Pandas devs; see
	381	`issue 3942 <https://github.com/pydata/pandas/issues/3942>`_ if you are interested.
	382	If you need to sort a Pandas DataFrame, please check out
	383	`this answer on StackOverflow <https://stackoverflow.com/a/29582718/1399279>`_
	384	for ways to do this without the ``key`` argument to ``sort``.

+1113

-0

docs/howitworks.rst less more

	0	.. default-domain:: py
	1	.. currentmodule:: natsort
	2
	3	.. _howitworks:
	4
	5	How Does Natsort Work?
	6	======================
	7
	8	.. contents::
	9	:local:
	10
	11	:mod:`natsort` works by breaking strings into smaller sub-components (numbers
	12	or everything else), and returning these components in a tuple. Sorting
	13	tuples in Python is well-defined, and this fact is used to sort the input
	14	strings properly. But how does one break a string into sub-components?
	15	And what does one do to those components once they are split? Below I
	16	will explain the algorithm that was chosen for the :mod:`natsort` module,
	17	and some of the thinking that went into those design decisions. I will
	18	also mention some of the stumbling blocks I ran into because
	19	`getting sorting right is surprisingly hard`_.
	20
	21	If you are impatient, you can skip to :ref:`tldr1` for the algorithm
	22	in the simplest case, and :ref:`tldr2`
	23	to see what extra code is needed to handle special cases.
	24
	25	First, How Does Natural Sorting Work At a High Level?
	26	-----------------------------------------------------
	27
	28	If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following
	29
	30	.. code-block:: pycon
	31
	32	>>> '2 ft 7 in' < '2 ft 11 in'
	33	False
	34
	35	We as humans know that the above should be true, but why does Python think it
	36	is false? Here is how it is performing the comparison:
	37
	38	.. code-block:: none
	39
	40	'2' <=> '2' ==> equal, so keep going
	41	' ' <=> ' ' ==> equal, so keep going
	42	'f' <=> 'f' ==> equal, so keep going
	43	't' <=> 't' ==> equal, so keep going
	44	' ' <=> ' ' ==> equal, so keep going
	45	'7' <=> '1' ==> different, use result of '7' < '1'
	46
	47	'7' evaluates as greater than '1' so the statement is false. When sorting, if
	48	a value is less than another it is placed first, so in our above example
	49	'2 ft 11 in' would end up before '2 ft 7 in', which is not correct. What to do?
	50
	51	The best way to handle this is to break the string into sub-components
	52	of numbers and non-numbers, and then convert the numeric parts into
	53	:func:`float` or :func:`int` types. This will force Python to
	54	actually understand the context of what it is sorting and then "do the
	55	right thing." Luckily, it handles sorting lists of strings right out-of-the-box,
	56	so the only hard part is actually making this string-to-list transformation
	57	and then Python will handle the rest.
	58
	59	.. code-block:: none
	60
	61	'2 ft 7 in' ==> (2, ' ft ', 7, ' in')
	62	'2 ft 11 in' ==> (2, ' ft ', 11, ' in')
	63
	64	When Python compares the two, it roughly follows the below logic:
	65
	66	.. code-block:: none
	67
	68	2 <=> 2 ==> equal, so keep going
	69	' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually
	70	\|\|
	71	-->
	72	' ' <=> ' ' ==> equal, so keep going
	73	'f' <=> 'f' ==> equal, so keep going
	74	't' <=> 't' ==> equal, so keep going
	75	' ' <=> ' ' ==> equal, so keep going
	76	<== Back to parent sequence
	77	7 <=> 11 ==> different, use the result of 7 < 11
	78
	79	Clearly, seven is less than eleven, so our comparison is as we expect, and we
	80	would get the sorting order we wanted.
	81
	82	At its heart, :mod:`natsort` is simply a tool to break strings into tuples,
	83	turning numbers in strings (i.e. ``'79'``) into ints and floats as it does this.
	84
	85	Natsort's Approach
	86	------------------
	87
	88	.. contents::
	89	:local:
	90
	91	Decomposing Strings Into Sub-Components
	92	+++++++++++++++++++++++++++++++++++++++
	93
	94	The first major hurtle to overcome is to decompose the string into sub-components.
	95	Remarkably, this turns out to be the easy part, owing mostly to Python's easy access
	96	to regular expressions. Breaking an arbitrary string based on a pattern is pretty
	97	straightforward.
	98
	99	.. code-block:: pycon
	100
	101	>>> import re
	102	>>> re.split(r'(\d+)', '2 ft 11 in')
	103	['', '2', ' ft ', '11', ' in']
	104
	105	Clear (assuming you can read regular expressions) and concise.
	106
	107	The reason I began developing :mod:`natsort` in the first place was because I
	108	needed to handle the natural sorting of strings containing real numbers, not just
	109	unsigned integers as the above example contains. By real numbers, I mean those like
	110	``-45.4920E-23``. :mod:`natsort` can handle just about any number definition;
	111	to that end, here are all the regular expressions used in :mod:`natsort`:
	112
	113	.. code-block:: pycon
	114
	115	>>> unsigned_int = r'([0-9]+)'
	116	>>> signed_int = r'([-+]?[0-9]+)'
	117	>>> unsigned_float = r'((?:[0-9]+\.?[0-9]*\|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
	118	>>> signed_float = r'([-+]?(?:[0-9]+\.?[0-9]*\|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
	119	>>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*\|\.[0-9]+))'
	120	>>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*\|\.[0-9]+))'
	121
	122	Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float definition because you
	123	wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``,
	124	Let's see an example:
	125
	126	.. code-block:: pycon
	127
	128	>>> re.split(signed_float, 'The mass of 3 electrons is 2.732815068E-30 kg')
	129	['The mass of ', '3', ' electrons is ', '2.732815068E-30', ' kg']
	130
	131	.. note::
	132
	133	It is a bit of a lie to say the above are the complete regular expressions. In the
	134	actual code there is also handling for non-ASCII unicode characters (such as ⑦),
	135	but I will ignore that aspect of :mod:`natsort` in this discussion.
	136
	137	Now, when the user wants to change the definition of a number, it is as easy as changing
	138	the pattern supplied to the regular expression engine.
	139
	140	Choosing the right default is hard, though (well, in this case it shouldn't have been
	141	but I was rather thick-headed).
	142	In retrospect, it should have been obvious that since essentially all the code examples
	143	I had/have seen for natural sorting were for unsigned integers, I should have made the default
	144	definition of a number an unsigned integer. But, in the brash days of my youth I assumed
	145	that since my use case was real numbers, everyone else would be happier sorting by real numbers;
	146	so, I made the default definition of a number a signed float with exponent.
	147	`This astonished`_ `a lot`_ `of people`_
	148	(`and some people aren't very nice when they are astonished`_).
	149	Starting with :mod:`natsort` version 4.0.0 the default number definition was
	150	changed to an unsigned integer which satisfies the "least astonishment" principle, and
	151	I have not heard a complaint since.
	152
	153	Coercing Strings Containing Numbers Into Numbers
	154	++++++++++++++++++++++++++++++++++++++++++++++++
	155
	156	There has been some debate on Stack Overflow as to what method is best to
	157	coerce a string to a number if it can be coerced, and leaving it alone otherwise
	158	(see `this one for coercion`_ and `this one for checking`_ for some high traffic questions),
	159	but it mostly boils down to two different solutions, shown here:
	160
	161	.. code-block:: pycon
	162
	163	>>> def coerce_try_except(x):
	164	... try:
	165	... return int(x)
	166	... except ValueError:
	167	... return x
	168	...
	169	>>> def coerce_regex(x):
	170	... # Note that precompiling the regex is more performant,
	171	... # but I do not show that here for clarity's sake.
	172	... return int(x) if re.match(r'[-+]?\d+$', x) else x
	173	...
	174
	175	Here are some timing results run on my machine:
	176
	177	.. code-block:: pycon
	178
	179	In [0]: numbers = list(map(str, range(100))) # A list of numbers as strings
	180
	181	In [1]: not_numbers = ['banana' + x for x in numbers]
	182
	183	In [2]: %timeit [coerce_try_except(x) for x in numbers]
	184	10000 loops, best of 3: 51.1 µs per loop
	185
	186	In [3]: %timeit [coerce_try_except(x) for x in not_numbers]
	187	1000 loops, best of 3: 289 µs per loop
	188
	189	In [4]: %timeit [coerce_regex(x) for x in not_numbers]
	190	10000 loops, best of 3: 67.6 µs per loop
	191
	192	In [5]: %timeit [coerce_regex(x) for x in numbers]
	193	10000 loops, best of 3: 123 µs per loop
	194
	195	What can we learn from this? The ``try: except`` method (arguably the most "pythonic"
	196	of the solutions) is best for numeric input, but performs over 5X slower for non-numeric
	197	input. Conversely, the regular expression method, though slower than ``try: except`` for
	198	both input types, is more efficient for non-numeric input than for input that can be
	199	converted to an ``int``. Further, even though the regular expression method is slower
	200	for both input types, it is always at least twice as fast as the worst case for the
	201	``try: except``.
	202
	203	Why do I care? Shouldn't I just pick a method and not worry about it? Probably. However,
	204	I am very conscious about the performance of :mod:`natsort`, and want it to be a true
	205	drop-in replacement for :func:`sorted` without having to incur a performance penalty.
	206	For the purposes of :mod:`natsort`, there is no clear winner between the two algorithms -
	207	the data being passed to this function will likely be a mix of numeric and non-numeric
	208	string content. Do I use the ``try: except`` method and hope the speed gains on
	209	numbers will offset the non-number performance, or do I use regular expressions and
	210	take the more stable performance?
	211
	212	It turns out that within the context of :mod:`natsort`, some assumptions can be
	213	made that make a hybrid approach attractive. Because all strings are pre-split
	214	into numeric and non-numeric content before being passed to this coercion function,
	215	the assumption can be made that *if a string begins with a digit or a sign, it
	216	can be coerced into a number*.
	217
	218	.. code-block:: pycon
	219
	220	>>> def coerce_to_int(x):
	221	... if x[0] in '0123456789+-':
	222	... try:
	223	... return int(x)
	224	... except ValueError:
	225	... return x
	226	... else:
	227	... return x
	228	...
	229
	230	So how does this perform compared to the standard coercion methods?
	231
	232	.. code-block:: pycon
	233
	234	In [6]: %timeit [coerce_to_int(x) for x in numbers]
	235	10000 loops, best of 3: 71.6 µs per loop
	236
	237	In [7]: %timeit [coerce_to_int(x) for x in not_numbers]
	238	10000 loops, best of 3: 26.4 µs per loop
	239
	240	The hybrid method eliminates most of the time wasted on numbers checking that it
	241	is in fact a number before passing to :func:`int`, and eliminates the time wasted
	242	in the exception stack for input that is not a number.
	243
	244	That's as fast as we can get, right? In pure Python, probably. At least, it's
	245	close. But because I am crazy and a glutton for punishment, I decided to see
	246	if I could get any faster writing a C extension. It's called
	247	`fastnumbers`_ and contains a C implementation of the above coercion functions
	248	called :func:`fast_int`. How does it fair? Pretty well.
	249
	250	.. code-block:: pycon
	251
	252	In [8]: %timeit [fast_int(x) for x in numbers]
	253	10000 loops, best of 3: 30.9 µs per loop
	254
	255	In [9]: %timeit [fast_int(x) for x in not_numbers]
	256	10000 loops, best of 3: 30 µs per loop
	257
	258	During development of :mod:`natsort`, I wanted to ensure that using it did not
	259	get in the way of a user's program by introducing a performance penalty to their code.
	260	To that end, I do not feel like my adventures down the rabbit hole of optimization
	261	of coercion functions was a waste; I can confidently look users in the eye and
	262	say I considered every option in ensuring :mod:`natsort` is as efficient as possible.
	263	This is why if `fastnumbers`_ is installed it will be used for this step,
	264	and otherwise the hybrid method will be used.
	265
	266	.. note::
	267
	268	Modifying the hybrid coercion function for floats is straightforward.
	269
	270	.. code-block:: pycon
	271
	272	>>> def coerce_to_float(x):
	273	... if x[0] in '.0123456789+-' or x.lower().lstrip()[:3] in ('nan', 'inf'):
	274	... try:
	275	... return float(x)
	276	... except ValueError:
	277	... return x
	278	... else:
	279	... return x
	280	...
	281
	282	.. _tldr1:
	283
	284	TL;DR 1 - The Simple "No Special Cases" Algorithm
	285	+++++++++++++++++++++++++++++++++++++++++++++++++
	286
	287	At this point, our :mod:`natsort` algorithm is essentially the following:
	288
	289	.. code-block:: pycon
	290
	291	>>> import re
	292	>>> def natsort_key(x, as_float=False, signed=False):
	293	... if as_float:
	294	... regex = signed_float if signed else unsigned_float
	295	... else:
	296	... regex = signed_int if signed else unsigned_int
	297	... split_input = re.split(regex, x)
	298	... split_input = filter(None, split_input) # removes null strings
	299	... coerce = coerce_to_float if as_float else coerce_to_int
	300	... return tuple(coerce(s) for s in split_input)
	301	...
	302
	303	I have written the above for clarity and not performance.
	304	This pretty much matches `most natural sort solutions for python on Stack Overflow`_
	305	(except the above includes customization of the definition of a number).
	306
	307	Special Cases Everywhere!
	308	-------------------------
	309
	310	.. contents::
	311	:local:
	312
	313	.. image:: special_cases_everywhere.jpg
	314
	315	If what I described in :ref:`TL;DR 1 <tldr1>` were
	316	all that :mod:`natsort` needed to
	317	do then there probably wouldn't be much need for a third-party module, right?
	318	Probably. But it turns out that in real-world data there are a lot of
	319	special cases that need to be handled, and in true `80%/20%`_ fashion, the
	320	majority of the code in :mod:`natsort` is devoted to handling special cases
	321	like those described below.
	322
	323	Sorting Filesystem Paths
	324	++++++++++++++++++++++++
	325
	326	`The first major special case I encountered was sorting filesystem paths`_
	327	(if you go to the link, you will see I didn't handle it well for a year...
	328	this was before I fully realized how much functionality I could really add
	329	to :mod:`natsort`). Let's apply the :func:`natsort_key` from above to some
	330	filesystem paths that you might see being auto-generated from your operating
	331	system:
	332
	333	.. code-block:: pycon
	334
	335	>>> paths = ['/p/Folder (10)/file.tar.gz',
	336	... '/p/Folder/file.tar.gz',
	337	... '/p/Folder (1)/file (1).tar.gz',
	338	... '/p/Folder (1)/file.tar.gz']
	339	>>> sorted(paths, key=natsort_key)
	340	['/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz', '/p/Folder/file.tar.gz']
	341
	342	Well that's not right! What is ``'/p/Folder/file.tar.gz'`` doing at the end?
	343	It has to do with the numerical ASCII code assigned to the space and
	344	``/`` characters in the `ASCII table`_. According to the `ASCII table`_, the
	345	space character (number 32) comes before the ``/`` character (number 47). If
	346	we remove the common prefix in all of the above strings (``'/p/Folder'``), we
	347	can see why this happens:
	348
	349	.. code-block:: pycon
	350
	351	>>> ' (1)/file.tar.gz' < '/file.tar.gz'
	352	True
	353	>>> ' ' < '/'
	354	True
	355
	356	This isn't very convenient... how do we solve it? We can split the path
	357	across the path separators and then sort. A convenient way do to this is
	358	with the :data:`Path.parts <pathlib.PurePath.parts>` property from
	359	:mod:`pathlib`:
	360
	361	.. code-block:: pycon
	362
	363	>>> import pathlib
	364	>>> sorted(paths, key=lambda x: tuple(natsort_key(s) for s in pathlib.Path(x).parts))
	365	['/p/Folder/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz']
	366
	367	Almost! It seems like there is some funny business going on in the final
	368	filename component as well. We can solve that nicely and quickly with
	369	:data:`Path.suffixes <pathlib.PurePath.suffixes>` and :data:`Path.stem
	370	<pathlib.PurePath.stem>`.
	371
	372	.. code-block:: pycon
	373
	374	>>> def decompose_path_into_components(x):
	375	... path_split = list(pathlib.Path(x).parts)
	376	... # Remove the final filename component from the path.
	377	... final_component = pathlib.Path(path_split.pop())
	378	... # Split off all the extensions.
	379	... suffixes = final_component.suffixes
	380	... stem = final_component.name.replace(''.join(suffixes), '')
	381	... # Remove the '.' prefix of each extension, and make that
	382	... # final component a list of the stem and each suffix.
	383	... final_component = [stem] + [x[1:] for x in suffixes]
	384	... # Replace the split final filename component.
	385	... path_split.extend(final_component)
	386	... return path_split
	387	...
	388	>>> def natsort_key_with_path_support(x):
	389	... return tuple(natsort_key(s) for s in decompose_path_into_components(x))
	390	...
	391	>>> sorted(paths, key=natsort_key_with_path_support)
	392	['/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (10)/file.tar.gz']
	393
	394	This works because in addition to breaking the input by path separators, the final
	395	filename component is separated from its extensions as well [#f1]_. Then, each of these
	396	separated components is sent to the :mod:`natsort` algorithm, so the result is
	397	a tuple of tuples. Once that is done, we can see how comparisons can be done in
	398	the expected manner.
	399
	400	.. code-block:: pycon
	401
	402	>>> a = natsort_key_with_path_support('/p/Folder (1)/file (1).tar.gz')
	403	>>> a
	404	(('/',), ('p',), ('Folder (', 1, ')'), ('file (', 1, ')'), ('tar',), ('gz',))
	405	>>>
	406	>>> b = natsort_key_with_path_support('/p/Folder/file.tar.gz')
	407	>>> b
	408	(('/',), ('p',), ('Folder',), ('file',), ('tar',), ('gz',))
	409	>>>
	410	>>> a > b
	411	True
	412
	413	Comparing Different Types on Python 3
	414	+++++++++++++++++++++++++++++++++++++
	415
	416	`The second major special case I encountered was sorting of different types`_.
	417	If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter too
	418	much since it uses an arbitrary heuristic to allow traditionally un-comparable
	419	types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
	420	(i.e. Python) it simply won't let you perform such nonsense, raising a
	421	:exc:`TypeError` instead.
	422
	423	You can imagine that a module that breaks strings into tuples of numbers and
	424	strings is walking a dangerous line if it does not have special handling for
	425	comparing numbers and strings. My imagination was not so great at first.
	426	Let's take a look at all the ways this can fail with real-world data.
	427
	428	.. code-block:: pycon
	429
	430	>>> def natsort_key_with_poor_real_number_support(x):
	431	... split_input = re.split(signed_float, x)
	432	... split_input = filter(None, split_input) # removes null strings
	433	... return tuple(coerce_to_float(s) for s in split_input)
	434	>>>
	435	>>> sorted([5, '4'], key=natsort_key_with_poor_real_number_support)
	436	Traceback (most recent call last):
	437	...
	438	TypeError: ...
	439	>>>
	440	>>> sorted(['12 apples', 'apples'], key=natsort_key_with_poor_real_number_support)
	441	Traceback (most recent call last):
	442	...
	443	TypeError: ...
	444	>>>
	445	>>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_poor_real_number_support)
	446	Traceback (most recent call last):
	447	...
	448	TypeError: ...
	449
	450	Let's break these down.
	451
	452	#. The integer ``5`` is sent to ``re.split`` which expects only strings
	453	or bytes, which is a no-no.
	454	#. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')``
	455	is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets
	456	compared to a string [#f2]_ which also is a no-no.
	457	#. This one scores big on the astonishment scale, especially if one accidentally
	458	uses signed integers or real numbers when they mean to use unsigned integers.
	459	``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')``
	460	is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, so in the
	461	third element a number gets compared to a string, once again the same
	462	old no-no. (The same would happen with ``'version5-3'`` and ``'version5-a'``,
	463	which would be come ``('version', 5, -3)`` and ``('version', 5, '-a')``).
	464
	465	As you might expect, the solution to the first issue is to wrap the ``re.split``
	466	call in a ``try: except:`` block and handle the number specially if a
	467	:exc:`TypeError` is raised. The second and third cases could be handled
	468	in a "special case" manner, meaning only respond and do something different
	469	if these problems are detected. But a less error-prone method is to ensure
	470	that the data is correct-by-construction, and this can be done by ensuring
	471	that the returned tuples always start with a string, and then alternate
	472	in a string-number-string-number-string patter;n this can be achieved by
	473	adding an empty string wherever the pattern is not followed [#f3]_. This ends
	474	up working out pretty nicely because empty strings are always "less" than
	475	any non-empty string, and we typically want numbers to come before strings.
	476
	477	Let's take a look at how this works out.
	478
	479	.. code-block:: pycon
	480
	481	>>> from natsort.utils import sep_inserter
	482	>>> list(sep_inserter(iter(['apples']), ''))
	483	['apples']
	484	>>>
	485	>>> list(sep_inserter(iter([12, ' apples']), ''))
	486	['', 12, ' apples']
	487	>>>
	488	>>> list(sep_inserter(iter(['version', 5, -3]), ''))
	489	['version', 5, '', -3]
	490	>>>
	491	>>> from natsort import natsort_keygen, ns
	492	>>> natsort_key_with_good_real_number_support = natsort_keygen(alg=ns.REAL)
	493	>>>
	494	>>> sorted([5, '4'], key=natsort_key_with_good_real_number_support)
	495	['4', 5]
	496	>>>
	497	>>> sorted(['12 apples', 'apples'], key=natsort_key_with_good_real_number_support)
	498	['12 apples', 'apples']
	499	>>>
	500	>>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support)
	501	['version5.3.0', 'version5.3rc1']
	502
	503	How the "good" version works will be given in `TL;DR 2 - Handling Crappy, Real-World Input`_.
	504
	505	Handling NaN
	506	++++++++++++
	507
	508	`A rather unexpected special case I encountered was sorting collections containing NaN`_.
	509	Let's see what happens when you try to sort a plain old list of numbers when there
	510	is a NaN floating around in there.
	511
	512	.. code-block:: pycon
	513
	514	>>> danger = [7, float('nan'), 22.7, 19, -14, 59.123, 4]
	515	>>> sorted(danger)
	516	[7, nan, -14, 4, 19, 22.7, 59.123]
	517
	518	Clearly that isn't correct, and for once it isn't my fault!
	519	`It's hard to compare floating point numbers`_. By definition, NaN is unorderable
	520	to any other number, and is never equal to any other number, including itself.
	521
	522	.. code-block:: pycon
	523
	524	>>> nan = float('nan')
	525	>>> 5 > nan
	526	False
	527	>>> 5 < nan
	528	False
	529	>>> 5 == nan
	530	False
	531	>>> 5 != nan
	532	True
	533	>>> nan == nan
	534	False
	535	>>> nan != nan
	536	True
	537
	538	The implication of all this for us is that if there is an NaN in the
	539	data-set we are trying to sort, the data-set will end up being sorted in
	540	two separate yet individually sorted sequences - the one before the NaN,
	541	and the one after. This is because the ``<`` operation that is used
	542	to sort always returns :const:`False` with NaN.
	543
	544	Because :mod:`natsort` aims to sort sequences in a way that does not surprise
	545	the user, keeping this behavior is not acceptable (I don't require my users
	546	to know how NaN will behave in a sorting algorithm). The simplest way to
	547	satisfy the "least astonishment" principle is to substitute NaN with
	548	some other value. But what value is least astonishing? I chose to replace
	549	NaN with :math:`-\infty` so that these poorly behaved elements always
	550	end up at the front where the users will most likely be alerted to their presence.
	551
	552	.. code-block:: pycon
	553
	554	>>> def fix_nan(x):
	555	... if x != x: # only true for NaN
	556	... return float('-inf')
	557	... else:
	558	... return x
	559	...
	560
	561	Let's check out :ref:`TL;DR 2 <tldr2>` to see how this can be
	562	incorporated into the simple key function from :ref:`TL;DR 1 <tldr1>`.
	563
	564	.. _tldr2:
	565
	566	TL;DR 2 - Handling Crappy, Real-World Input
	567	+++++++++++++++++++++++++++++++++++++++++++
	568
	569	Let's see how our elegant key function from :ref:`TL;DR 1 <tldr1>` has
	570	become bastardized in order to support handling mixed real-world data
	571	and user customizations.
	572
	573	>>> def natsort_key(x, as_float=False, signed=False, as_path=False):
	574	... if as_float:
	575	... regex = signed_float if signed else unsigned_float
	576	... else:
	577	... regex = signed_int if signed else unsigned_int
	578	... try:
	579	... if as_path:
	580	... x = decompose_path_into_components(x) # Decomposes into list of strings
	581	... # If this raises a TypeError, input is not a string.
	582	... split_input = re.split(regex, x)
	583	... except TypeError:
	584	... try:
	585	... # Does this need to be applied recursively (list-of-list)?
	586	... return tuple(map(natsort_key, x))
	587	... except TypeError:
	588	... # Must be a number
	589	... ret = ('', fix_nan(x)) # Maintain string-number-string pattern
	590	... return (ret,) if as_path else ret # as_path returns tuple-of-tuples
	591	... else:
	592	... split_input = filter(None, split_input) # removes null strings
	593	... # Note that the coerce_to_int/coerce_to_float functions
	594	... # are also modified to use the fix_nan function.
	595	... if as_float:
	596	... coerced_input = (coerce_to_float(s) for s in split_input)
	597	... else:
	598	... coerced_input = (coerce_to_int(s) for s in split_input)
	599	... return tuple(sep_inserter(coerced_input, ''))
	600	...
	601
	602	And this doesn't even show handling :class:`bytes` type! Notice that we have
	603	to do non-obvious things like modify the return form of numbers when ``as_path``
	604	is given, just to avoid comparing strings and numbers for the case in which a user provides
	605	input like ``['/home/me', 42]``.
	606
	607	Let's take it out for a spin!
	608
	609	.. code-block:: pycon
	610
	611	>>> danger = [7, float('nan'), 22.7, '19', '-14', '59.123', 4]
	612	>>> sorted(danger, key=lambda x: natsort_key(x, as_float=True, signed=True))
	613	[nan, '-14', 4, 7, '19', 22.7, '59.123']
	614	>>>
	615	>>> paths = ['/p/Folder (1)/file.tar.gz',
	616	... '/p/Folder/file.tar.gz',
	617	... 123456]
	618	>>> sorted(paths, key=lambda x: natsort_key(x, as_path=True))
	619	[123456, '/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz']
	620
	621	Here Be Dragons: Adding Locale Support
	622	--------------------------------------
	623
	624	.. contents::
	625	:local:
	626
	627	Probably the most challenging special case I had to handle was getting
	628	:mod:`natsort` to handle sorting the non-numerical parts of input
	629	correctly, and also allowing it to sort the numerical bits in different
	630	locales. This was in no way what I originally set out to do with this
	631	library, so I was `caught a bit off guard when the request was initially made`_.
	632	I discovered the :mod:`locale` library, and assumed that if it's part of Python's
	633	StdLib there can't be too many dragons, right?
	634
	635	.. admonition:: INCOMPLETE LIST OF DRAGONS
	636
	637	- https://github.com/SethMMorton/natsort/issues/21
	638	- https://github.com/SethMMorton/natsort/issues/22
	639	- https://github.com/SethMMorton/natsort/issues/23
	640	- https://github.com/SethMMorton/natsort/issues/36
	641	- https://github.com/SethMMorton/natsort/issues/44
	642	- https://bugs.python.org/issue2481
	643	- https://bugs.python.org/issue23195
	644	- https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
	645	- https://stackoverflow.com/questions/22203550/sort-dictionary-by-key-using-locale-collation
	646	- https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
	647	- https://stackoverflow.com/questions/36431810/sort-numeric-lines-with-thousand-separators
	648	- https://stackoverflow.com/questions/45734562/how-can-i-get-a-reasonable-string-sorting-with-python
	649
	650	These can be summed up as follows:
	651
	652	#. :mod:`locale` is a thin wrapper over your operating system's locale
	653	library, so if that is broken (like it is on BSD and OSX) then
	654	:mod:`locale` is broken in Python.
	655	#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform way to use
	656	the :mod:`locale` sorting functionality between legacy Python and Python 3.
	657	#. People have differing opinions of how capitalization should affect word order.
	658	#. There is no built-in way to handle locale-dependent thousands separators
	659	and decimal points robustly.
	660	#. Proper handling of Unicode is complicated.
	661	#. Proper handling of :mod:`locale` is complicated.
	662
	663	Easily over half of the the code in :mod:`natsort` is in some way dealing with some
	664	aspect of :mod:`locale` or basic case handling. It would have been
	665	impossible to get right without a `really good`_ `testing strategy`_.
	666
	667	Don't expect any more TL;DR's... if you want to see how all this is fully
	668	incorporated into the :mod:`natsort` algorithm then please take a look
	669	`at the code`_. However, I will hint at how specific steps are taken in
	670	each section.
	671
	672	Let's see how we can handle some of the dragons, one-by-one.
	673
	674	Basic Case Control Support
	675	++++++++++++++++++++++++++
	676
	677	Without even thinking about the mess that is adding :mod:`locale` support,
	678	:mod:`natsort` can introduce support for controlling how case is interpreted.
	679
	680	First, let's take a look at how it is sorted by default (due to
	681	where characters lie on the `ASCII table`_).
	682
	683	.. code-block:: pycon
	684
	685	>>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
	686	>>> sorted(a)
	687	['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
	688
	689	All uppercase letters come before lowercase letters in the `ASCII table`_,
	690	so all capitalized words appear first. Not everyone agrees that this
	691	is the correct order. Some believe that the capitalized words should
	692	be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``).
	693	Some believe that both the lowercase and uppercase versions
	694	should appear together (``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``).
	695	Some believe that both should be true ☹. Some people don't care at all [#f4]_.
	696
	697	Solving the first case (I call it LOWERCASEFIRST) is actually pretty
	698	easy... just call the :meth:`str.swapcase` method on the input.
	699
	700	.. code-block:: pycon
	701
	702	>>> sorted(a, key=lambda x: x.swapcase())
	703	['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
	704
	705	The last (i call it IGNORECASE) should be super easy, right?
	706	Simply call :meth:`str.lowercase` on the input. This will work but may
	707	not always give the correct answer on non-latin character sets. It's
	708	a good thing that in Python 3.3
	709	:meth:`str.casefold` was introduced, which does a better job of removing
	710	all case information from unicode characters in
	711	non-latin alphabets.
	712
	713	.. code-block:: pycon
	714
	715	>>> def remove_case(x):
	716	... try:
	717	... return x.casefold()
	718	... except AttributeError: # Legacy Python backwards compatibility
	719	... return x.lowercase()
	720	...
	721	>>> sorted(a, key=remove_case)
	722	['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
	723
	724	The middle case (I call it GROUPLETTERS) is less straightforward.
	725	The most efficient way to handle this is to duplicate each character
	726	with its lowercase version and then the original character.
	727
	728	.. code-block:: pycon
	729
	730	>>> import itertools
	731	>>> def groupletters(x):
	732	... return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
	733	...
	734	>>> groupletters('Apple')
	735	'aAppppllee'
	736	>>> groupletters('apple')
	737	'aappppllee'
	738	>>> sorted(a, key=groupletters)
	739	['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
	740
	741	The effect of this is that both ``'Apple'`` and ``'apple'`` are
	742	placed adjacent to each other because their transformations both begin
	743	with ``'a'``, and then the second character can be used to order them
	744	appropriately with respect to each other.
	745
	746	There's a problem with this, though. Within the context of :mod:`natsort`
	747	we are trying to correctly sort numbers and those should be left alone.
	748
	749	.. code-block:: pycon
	750
	751	>>> a = ['Apple5', 'apple', 'Apple4E10', 'Banana']
	752	>>> sorted(a, key=lambda x: natsort_key(x, as_float=True))
	753	['Apple5', 'Apple4E10', 'Banana', 'apple']
	754	>>> sorted(a, key=lambda x: natsort_key(groupletters(x), as_float=True))
	755	['Apple4E10', 'Apple5', 'apple', 'Banana']
	756	>>> groupletters('Apple4E10')
	757	'aAppppllee44eE1100'
	758
	759	We messed up the numbers! Looks like :func:`groupletters` needs to be applied
	760	after the strings are broken into their components. I'm not going to show
	761	how this is done here, but basically it requires applying the function in
	762	the ``else:`` block of :func:`coerce_to_int`/:func:`coerce_to_float`.
	763
	764	.. code-block:: pycon
	765
	766	>>> better_groupletters = natsort_keygen(alg=ns.GROUPLETTERS \| ns.REAL)
	767	>>> better_groupletters('Apple4E10')
	768	('aAppppllee', 40000000000.0)
	769	>>> sorted(a, key=better_groupletters)
	770	['Apple5', 'Apple4E10', 'apple', 'Banana']
	771
	772	Of course, applying both LOWERCASEFIRST and GROUPLETTERS is just
	773	a matter of turning on both functions.
	774
	775	Basic Unicode Support
	776	+++++++++++++++++++++
	777
	778	Unicode is hard and complicated. Here's an example.
	779
	780	.. code-block:: pycon
	781
	782	>>> b = [b'\x66', b'\x65', b'\xc3\xa9', b'\x65\xcc\x81', b'\x61', b'\x7a']
	783	>>> a = [x.decode('utf8') for x in b]
	784	>>> a # doctest: +SKIP
	785	['f', 'e', 'é', 'é', 'a', 'z']
	786	>>> sorted(a) # doctest: +SKIP
	787	['a', 'e', 'é', 'f', 'z', 'é']
	788
	789
	790	There are more than one way to represent the character 'é' in Unicode.
	791	In fact, many characters have multiple representations. This is a challenge
	792	because comparing the two representations would return ``False`` even though
	793	they look the same.
	794
	795	.. code-block:: pycon
	796
	797	>>> a[2] == a[3]
	798	False
	799
	800	Alas, since characters are compared based on the numerical value of their
	801	representation, sorting Unicode often gives unexpected results (like seeing
	802	'é' come both before and after 'z').
	803
	804	The original approach that :mod:`natsort` took with respect to non-ASCII
	805	Unicode characters was to say "just use
	806	the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers
	807	and hope those libraries take care of it. As you will find in the following
	808	sections, that comes with its own baggage, and turned out to not always work anyway
	809	(see https://stackoverflow.com/q/45734562/1399279). A more robust approach is to
	810	handle the Unicode out-of-the-box without invoking a heavy-handed library
	811	like :mod:`locale` or :mod:`PyICU`. To do this, we must use normalization.
	812
	813	To fully understand Unicode normalization, `check out some official Unicode documentation`_.
	814	Just kidding... that's too much text. The following StackOverflow answers do
	815	a good job at explaining Unicode normalization in simple terms:
	816	https://stackoverflow.com/a/7934397/1399279 and
	817	https://stackoverflow.com/a/7931547/1399279. Put simply, normalization
	818	ensures that Unicode characters with multiple representations are in
	819	some canonical and consistent representation so that (for example) comparisons
	820	of the characters can be performed in a sane way. The following discussion
	821	assumes you at least read the StackOverflow answers.
	822
	823	Looking back at our 'é' example, we can see that the two versions were
	824	constructed with the byte strings ``b'\xc3\xa9'`` and ``b'\x65\xcc\x81'``.
	825	The former representation is actually
	826	`LATIN SMALL LETTER E WITH ACUTE <https://www.fileformat.info/info/unicode/char/e9/index.htm>`_
	827	and is a single character in the Unicode standard. This is known as the
	828	compressed form and corresponds to the 'NFC' normalization scheme.
	829	The latter representation is actually the letter 'e' followed by
	830	`COMBINING ACUTE ACCENT <https://www.fileformat.info/info/unicode/char/0301/index.htm>`_
	831	and so is two characters in the Unicode standard. This is known as the
	832	decompressed form and corresponds to the 'NFD' normalization scheme.
	833	Since the first character in the decompressed form is actually the letter 'e',
	834	when compared to other ASCII characters it fits where you might expect.
	835	Unfortunately, all Unicode compressed form characters come after the
	836	ASCII characters and so they always will be placed after 'z' when sorting.
	837
	838	It seems that most Unicode data is stored and shared in the compressed form
	839	which makes it challenging to sort. This can be solved by normalizing all
	840	incoming Unicode data to the decompressed form ('NFD') and then sorting.
	841
	842	.. code-block:: pycon
	843
	844	>>> import unicodedata
	845	>>> c = [unicodedata.normalize('NFD', x) for x in a]
	846	>>> c # doctest: +SKIP
	847	['f', 'e', 'é', 'é', 'a', 'z']
	848	>>> sorted(c) # doctest: +SKIP
	849	['a', 'e', 'é', 'é', 'f', 'z']
	850
	851	Huzzah! Sane sorting without having to resort to :mod:`locale`!
	852
	853	Using Locale to Compare Strings
	854	+++++++++++++++++++++++++++++++
	855
	856	The :mod:`locale` module is actually pretty cool, and provides lowly
	857	spare-time programmers like myself a way to handle the daunting task
	858	of proper locale-dependent support of their libraries and utilities.
	859	Having said that, it can be a bit of a bear to get right,
	860	`although they do point out in the documentation that it will be painful to use`_.
	861	Aside from the caveats spelled out in that link, it turns out that just
	862	comparing strings with :mod:`locale` in a cross-platform and
	863	cross-python-version manner is not as straightforward as one might hope.
	864
	865	First, how to use :mod:`locale` to compare strings? It's actually
	866	pretty straightforward. Simply run the input through the :mod:`locale`
	867	transformation function :func:`locale.strxfrm`.
	868
	869	.. code-block:: pycon
	870
	871	>>> import locale, sys
	872	>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
	873	'en_US.UTF-8'
	874	>>> a = ['a', 'b', 'ä']
	875	>>> sorted(a)
	876	['a', 'b', 'ä']
	877	>>> # The below fails on OSX, so don't run doctest on darwin.
	878	>>> is_osx = sys.platform == 'darwin'
	879	>>> sorted(a, key=locale.strxfrm) if not is_osx else ['a', 'ä', 'b']
	880	['a', 'ä', 'b']
	881	>>>
	882	>>> a = ['apple', 'Banana', 'banana', 'Apple']
	883	>>> sorted(a, key=locale.strxfrm) if not is_osx else ['apple', 'Apple', 'banana', 'Banana']
	884	['apple', 'Apple', 'banana', 'Banana']
	885
	886	It turns out that locale-aware sorting groups numbers in the same
	887	way as turning on GROUPLETTERS and LOWERCASEFIRST.
	888	The trick is that you have to apply :func:`locale.strxfrm` only to non-numeric
	889	characters; otherwise, numbers won't be parsed properly. Therefore, it must
	890	be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
	891	functions in a manner similar to :func:`groupletters`.
	892
	893	As you might have guessed, there is a small problem.
	894	It turns out the there is a bug in the legacy Python implementation of
	895	:func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
	896	input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
	897	but is intended for use with ``cmp``, which does not exist in current Python
	898	implementations. Luckily, the :func:`functools.cmp_to_key` function
	899	makes :func:`locale.strcoll` behave like :func:`locale.strxfrm`.
	900
	901	Handling Broken Locale On OSX
	902	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	903
	904	But what if the underlying locale implementation that :mod:`locale`
	905	relies upon is simply broken? It turns out that the locale library on
	906	OSX (and other BSD systems) is broken (and for some reason has never been
	907	fixed?), and so :mod:`locale` does not work as expected.
	908
	909	How do I define doesn't work as expected?
	910
	911	.. code-block:: pycon
	912
	913	>>> a = ['apple', 'Banana', 'banana', 'Apple']
	914	>>> sorted(a)
	915	['Apple', 'Banana', 'apple', 'banana']
	916	>>>
	917	>>> sorted(a, key=locale.strxfrm) if is_osx else sorted(a)
	918	['Apple', 'Banana', 'apple', 'banana']
	919
	920	IT'S SORTING AS IF :func:`locale.stfxfrm` WAS NEVER USED!! (and it's worse
	921	once non-ASCII characters get thrown into the mix.) I'm really not
	922	sure why this is considered OK for the OSX/BSD maintainers to not fix,
	923	but it's more than frustrating for poor developers who have been dragged
	924	into the locale game kicking and screaming. <deep breath>.
	925
	926	So, how to deal with this situation? There are two ways to do so.
	927
	928	#. Detect if :mod:`locale` is sorting incorrectly (i.e. ``dumb``) by seeing
	929	if ``'A'`` is sorted before ``'a'`` (incorrect) or not.
	930
	931	.. code-block:: pycon
	932
	933	>>> # This is genuinely the name of this function.
	934	>>> # See natsort.compat.locale.py
	935	>>> def dumb_sort():
	936	... return locale.strxfrm('A') < locale.strxfrm('a')
	937	...
	938
	939	If a ``dumb`` locale implementation is found, then automatically
	940	turn on LOWERCASEFIRST and GROUPLETTERS.
	941	#. Use an alternate library if installed. `ICU <http://site.icu-project.org/>`_
	942	is a great and powerful library that has a pretty decent Python port
	943	called (you guessed it) `PyICU <https://pypi.org/project/PyICU/>`_.
	944	If a user has this library installed on their computer, :mod:`natsort`
	945	chooses to use that instead of :mod:`locale`. With a little bit of
	946	planning, one can write a set of wrapper functions that call
	947	the correct library under the hood such that the business logic never
	948	has to know what library is being used (see `natsort.compat.locale.py`_).
	949
	950	Let me tell you, this little complication really makes a challenge of testing
	951	the code, since one must set up different environments on different operating
	952	systems in order to test all possible code paths. Not to mention that
	953	certain checks will fail for certain operating systems and environments
	954	so one must be diligent in either writing the tests not to fail, or ignoring
	955	those tests when on offending environments.
	956
	957	Handling Locale-Aware Numbers
	958	+++++++++++++++++++++++++++++
	959
	960	`Thousands separator support`_ is a problem that I knew would someday be
	961	requested but had decided to push off until a rainy day. One day it finally
	962	rained, and I decided to tackle the problem.
	963
	964	So what is the problem? Consider the number ``1,234,567`` (assuming the
	965	``','`` is the thousands separator). Try to run that through :func:`int`
	966	and you will get a :exc:`ValueError`. To handle this properly the thousands
	967	separators must be removed.
	968
	969	.. code-block:: pycon
	970
	971	>>> float('1,234,567'.replace(',', ''))
	972	1234567.0
	973
	974	What if, in our current locale, the thousands separator is ``'.'`` and
	975	the ``','`` is the decimal separator (like for the German locale de_DE)?
	976
	977	.. code-block:: pycon
	978
	979	>>> float('1.234.567'.replace('.', '').replace(',', '.'))
	980	1234567.0
	981	>>> float('1.234.567,89'.replace('.', '').replace(',', '.'))
	982	1234567.89
	983
	984	This is pretty much what :func:`locale.atoi` and :func:`locale.atof` do
	985	under the hood. So what's the problem? Why doesn't :mod:`natsort` just
	986	use this method under its hood?
	987	Well, let's take a look at what would happen if we send some possible
	988	:mod:`natsort` input through our the above function:
	989
	990	.. code-block:: pycon
	991
	992	>>> natsort_key('1,234 apples, please.'.replace(',', ''))
	993	('', 1234, ' apples please.')
	994	>>> natsort_key('Sir, €1.234,50 please.'.replace('.', '').replace(',', '.'), as_float=True)
	995	('Sir. €', 1234.5, ' please')
	996
	997	Any character matching the thousands separator was dropped, and anything
	998	matching the decimal separator was changed to ``'.'``! If these characters
	999	were critical to how your data was ordered, this would break :mod:`natsort`.
	1000
	1001	The first solution one might consider would be to first decompose the
	1002	input into sub-components (like we did for the GROUPLETTERS method
	1003	above) and then only apply these transformations on the number components.
	1004	This is a chicken-and-egg problem, though, because *we cannot appropriately
	1005	separate out the numbers because of the thousands separators and
	1006	non-'.' decimal separators* (well, at least not without making multiple
	1007	passes over the data which I do not consider to be a valid option).
	1008
	1009	Regular expressions to the rescue! With regular expressions, we can
	1010	remove the thousands separators and change the decimal separator only
	1011	when they are actually within a number. Once the input has been
	1012	pre-processed with this regular expression, all the infrastructure
	1013	shown previously will work.
	1014
	1015	Beware, these regular expressions will make your eyes bleed.
	1016
	1017	.. code-block:: pycon
	1018
	1019	>>> decimal = ',' # Assume German locale, so decimal separator is ','
	1020	>>> # Look-behind assertions cannot accept range modifiers, so instead of i.e.
	1021	>>> # (?<!\.[0-9]{1,3}) I have to repeat the look-behind for 1, 2, and 3.
	1022	>>> nodecimal = r'(?<!{dec}[0-9])(?<!{dec}[0-9]{{2}})(?<!{dec}[0-9]{{3}})'.format(dec=decimal)
	1023	>>> strip_thousands = r'''
	1024	... (?<=[0-9]{{1}}) # At least 1 number
	1025	... (?<![0-9]{{4}}) # No more than 3 numbers
	1026	... {nodecimal} # Cannot follow decimal
	1027	... {thou} # The thousands separator
	1028	... (?=[0-9]{{3}} # Three numbers must follow
	1029	... ([^0-9]\|$) # But a non-number after that
	1030	... )
	1031	... '''.format(nodecimal=nodecimal, thou='.') # Thousands separator is '.' in German locale.
	1032	...
	1033	>>> re.sub(strip_thousands, '', 'Sir, €1.234,50 please.', flags=re.X)
	1034	'Sir, €1234,50 please.'
	1035	>>>
	1036	>>> # The decimal point must be preceded by a number or after
	1037	>>> # a number. This option only needs to be performed in the
	1038	>>> # case when the decimal separator for the locale is not '.'.
	1039	>>> switch_decimal = r'(?<=[0-9]){decimal}\|{decimal}(?=[0-9])'
	1040	>>> switch_decimal = switch_decimal.format(decimal=decimal)
	1041	>>> re.sub(switch_decimal, '.', 'Sir, €1234,50 please.', flags=re.X)
	1042	'Sir, €1234.50 please.'
	1043	>>>
	1044	>>> natsort_key('Sir, €1234.50 please.', as_float=True)
	1045	('Sir, €', 1234.5, ' please.')
	1046
	1047	Final Thoughts
	1048	--------------
	1049
	1050	My hope is that users of :mod:`natsort` never have to think about or worry
	1051	about all the bookkeeping or any of the details described above, and that using
	1052	:mod:`natsort` seems to magically "just work". For those of you who
	1053	took the time to read this engineering description, I hope it has enlightened
	1054	you to some of the issues that can be encountered when code is released
	1055	into the wild and has to accept "real-world data", or to what happens
	1056	to developers who naïvely make bold assumptions that are counter to
	1057	what the rest of the world assumes.
	1058
	1059	.. rubric:: Footnotes
	1060
	1061	.. [#f1]
	1062	To anyone looking through the actual code, you will note that I don't
	1063	actually use :mod:`pathlib` to split the paths... I wrote my own version
	1064	to avoid adding an external dependency of :mod:`pathlib` on Python < 3.4.
	1065	.. [#f2]
	1066	*"But if you hadn't removed the leading empty string from re.split this
	1067	wouldn't have happened!!"* I can hear you saying. Well, that's true. I don't
	1068	have a great reason for having done that except that in an earlier
	1069	non-optimal incarnation of the algorithm I needed to it, and it kind of
	1070	stuck, and it made other parts of the code easier if the assumption that
	1071	there were no empty strings was valid.
	1072	.. [#f3]
	1073	I'm not going to show how this is implemented in this document,
	1074	but if you are interested you can look at the code to
	1075	:func:`sep_inserter` in `util.py`_.
	1076	.. [#f4]
	1077	Handling each of these is straightforward, but coupled with the rapidly
	1078	fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can imagine
	1079	this will get out of hand quickly. If you take a look at `natsort.py`_ and
	1080	`util.py`_ you can observe that to avoid this I take a more functional approach
	1081	to construting the :mod:`natsort` algorithm as opposed to the procedural approach
	1082	illustrated in :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`.
	1083
	1084	.. _ASCII table: https://www.asciitable.com/
	1085	.. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/
	1086	.. _This astonished: https://github.com/SethMMorton/natsort/issues/19
	1087	.. _a lot: https://stackoverflow.com/questions/29548742/python-natsort-sort-strings-recursively
	1088	.. _of people: https://stackoverflow.com/questions/24045348/sort-set-of-numbers-in-the-form-xx-yy-in-python
	1089	.. _and some people aren't very nice when they are astonished:
	1090	https://github.com/xolox/python-naturalsort/blob/ed3e6b6ffaca3bdea3b76e08acbb8bd2a5fee463/README.rst#why-another-natsort-module
	1091	.. _fastnumbers: https://github.com/SethMMorton/fastnumbers
	1092	.. _as part of my testing: https://github.com/SethMMorton/natsort/blob/master/test_natsort/slow_splitters.py
	1093	.. _this one for coercion: https://stackoverflow.com/questions/736043/checking-if-a-string-can-be-converted-to-float-in-python
	1094	.. _this one for checking: https://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float
	1095	.. _most natural sort solutions for python on Stack Overflow: https://stackoverflow.com/q/4836710/1399279
	1096	.. _80%/20%: https://en.wikipedia.org/wiki/Pareto_principle
	1097	.. _The first major special case I encountered was sorting filesystem paths: https://github.com/SethMMorton/natsort/issues/3
	1098	.. _The second major special case I encountered was sorting of different types: https://github.com/SethMMorton/natsort/issues/7
	1099	.. _A rather unexpected special case I encountered was sorting collections containing NaN:
	1100	https://github.com/SethMMorton/natsort/issues/27
	1101	.. _It's hard to compare floating point numbers: http://www.drdobbs.com/cpp/its-hard-to-compare-floating-point-numbe/240149806
	1102	.. _caught a bit off guard when the request was initially made: https://github.com/SethMMorton/natsort/issues/14
	1103	.. _at the code: https://github.com/SethMMorton/natsort/tree/master/natsort
	1104	.. _natsort.py: https://github.com/SethMMorton/natsort/blob/master/natsort/natsort.py
	1105	.. _util.py: https://github.com/SethMMorton/natsort/blob/master/natsort/util.py
	1106	.. _although they do point out in the documentation that it will be painful to use:
	1107	https://docs.python.org/3/library/locale.html#background-details-hints-tips-and-caveats
	1108	.. _natsort.compat.locale.py: https://github.com/SethMMorton/natsort/blob/master/natsort/compat/locale.py
	1109	.. _Thousands separator support: https://github.com/SethMMorton/natsort/issues/36
	1110	.. _really good: https://hypothesis.readthedocs.io/en/latest/
	1111	.. _testing strategy: https://docs.pytest.org/en/latest/
	1112	.. _check out some official Unicode documentation: https://unicode.org/reports/tr15/

+28

-0

docs/index.rst less more

	0	.. natsort documentation master file, created by
	1	sphinx-quickstart on Thu Jul 17 21:01:29 2014.
	2	You can adapt this file completely to your liking, but it should at least
	3	contain the root `toctree` directive.
	4
	5	natsort: Simple yet flexible natural sorting in Python.
	6	=======================================================
	7
	8	Contents:
	9
	10	.. toctree::
	11	:maxdepth: 2
	12	:numbered:
	13
	14	intro.rst
	15	howitworks.rst
	16	examples.rst
	17	api.rst
	18	locale_issues.rst
	19	shell.rst
	20	changelog.rst
	21
	22	Indices and tables
	23	==================
	24
	25	* :ref:`genindex`
	26	* :ref:`modindex`
	27	* :ref:`search`

+469

-0

docs/intro.rst less more

	0	.. default-domain:: py
	1	.. module:: natsort
	2
	3	The :mod:`natsort` module
	4	=========================
	5
	6	Simple yet flexible natural sorting in Python.
	7
	8	- Source Code: https://github.com/SethMMorton/natsort
	9	- Downloads: https://pypi.org/project/natsort/
	10	- Documentation: https://natsort.readthedocs.io/
	11	- Optional Dependencies:
	12
	13	- `fastnumbers <https://pypi.org/project/fastnumbers>`_ >= 2.0.0
	14	- `PyICU <https://pypi.org/project/PyICU>`_ >= 1.0.0
	15
	16	NOTE: Please see the `Deprecation Schedule`_ section for changes in
	17	:mod:`natsort` version 6.0.0 and in the upcoming version 7.0.0.
	18
	19	:mod:`natsort` is a general utility for sorting lists naturally; the definition
	20	of "naturally" is not well-defined, but the most common definition is that numbers
	21	contained within the string should be sorted as numbers and not as you would
	22	other characters. If you need to present sorted output to a user, you probably
	23	want to sort it naturally.
	24
	25	:mod:`natsort` was initially created for sorting scientific output filenames that
	26	contained signed floating point numbers in the names. There was a lack of
	27	algorithms out there that could perform a natural sort on `floats` but
	28	plenty for `ints`; check out
	29	`this StackOverflow question <https://stackoverflow.com/q/4836710/1399279>`_
	30	and its answers and links therein,
	31	`this ActiveState forum <https://code.activestate.com/recipes/285264-natural-string-sorting/>`_,
	32	and of course `this great article on natural sorting <https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/>`_
	33	from CodingHorror.com for examples of what I mean.
	34	:mod:`natsort` was created to fill in this gap, but has since expanded to handle
	35	just about any definition of a number, as well as other sorting customizations.
	36
	37	Quick Description
	38	-----------------
	39
	40	When you try to sort a list of strings that contain numbers, the normal python
	41	sort algorithm sorts lexicographically, so you might not get the results that you
	42	expect:
	43
	44	.. code-block:: pycon
	45
	46	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	47	>>> sorted(a)
	48	['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
	49
	50	Notice that it has the order ('1', '10', '2') - this is because the list is
	51	being sorted in lexicographical order, which sorts numbers like you would
	52	letters (i.e. 'b', 'ba', 'c').
	53
	54	:mod:`natsort` provides a function :func:`~natsorted` that helps sort lists
	55	"naturally" ("naturally" is rather ill-defined, but in general it means
	56	sorting based on meaning and not computer code point)..
	57	Using :func:`~natsorted` is simple:
	58
	59	.. code-block:: pycon
	60
	61	>>> from natsort import natsorted
	62	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	63	>>> natsorted(a)
	64	['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
	65
	66	:func:`~natsorted` identifies numbers anywhere in a string and sorts them
	67	naturally. Below are some other things you can do with :mod:`natsort`
	68	(please see the :ref:`examples` for a quick start guide, or the :ref:`api`
	69	for more details).
	70
	71	.. note::
	72
	73	:func:`~natsorted` is designed to be a drop-in replacement for the built-in
	74	:func:`sorted` function. Like :func:`sorted`, :func:`~natsorted`
	75	`does not sort in-place`. To sort a list and assign the output to the
	76	same variable, you must explicitly assign the output to a variable:
	77
	78	.. code-block:: pycon
	79
	80	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	81	>>> natsorted(a)
	82	['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
	83	>>> print(a) # 'a' was not sorted; "natsorted" simply returned a sorted list
	84	['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	85	>>> a = natsorted(a) # Now 'a' will be sorted because the sorted list was assigned to 'a'
	86	>>> print(a)
	87	['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
	88
	89	Please see `Generating a Reusable Sorting Key and Sorting In-Place`_ for
	90	an alternate way to sort in-place naturally.
	91
	92	Examples
	93	--------
	94
	95	Sorting Versions
	96	++++++++++++++++
	97
	98	:mod:`natsort` does not (and never has) actually comprehend version numbers.
	99	It just so happens that the most common versioning schemes are designed to
	100	work with standard natural sorting techniques; these schemes include
	101	``MAJOR.MINOR``, ``MAJOR.MINOR.PATCH``, ``YEAR.MONTH.DAY``. If your data
	102	conforms to a scheme like this, then it will work out-of-the-box with
	103	``natsorted`` (as of ``natsort`` version >= 4.0.0):
	104
	105	.. code-block:: pycon
	106
	107	>>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
	108	>>> natsorted(a)
	109	['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
	110
	111	If you need to versions that use a more complicated scheme, please see
	112	:ref:`rc_sorting` for examples.
	113
	114	Sorting by Real Numbers (i.e. Signed Floats)
	115	++++++++++++++++++++++++++++++++++++++++++++
	116
	117	This is useful in scientific data analysis and was
	118	the default behavior of :func:`~natsorted` for :mod:`natsort`
	119	version < 4.0.0. Use the :func:`~realsorted` function:
	120
	121	.. code-block:: pycon
	122
	123	>>> from natsort import realsorted, ns
	124	>>> # Note that when interpreting as signed floats, the below numbers are
	125	>>> # +5.10, -3.00, +5.30, +2.00
	126	>>> a = ['position5.10.data', 'position-3.data', 'position5.3.data', 'position2.data']
	127	>>> natsorted(a)
	128	['position2.data', 'position5.3.data', 'position5.10.data', 'position-3.data']
	129	>>> natsorted(a, alg=ns.REAL)
	130	['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
	131	>>> realsorted(a) # shortcut for natsorted with alg=ns.REAL
	132	['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
	133
	134	Locale-Aware Sorting (or "Human Sorting")
	135	+++++++++++++++++++++++++++++++++++++++++
	136
	137	This is where the non-numeric characters are ordered based on their meaning,
	138	not on their ordinal value, and a locale-dependent thousands separator and decimal
	139	separator is accounted for in the number.
	140	This can be achieved with the :func:`~humansorted` function:
	141
	142	.. code-block:: pycon
	143
	144	>>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
	145	>>> natsorted(a)
	146	['Apple', 'Banana', 'apple14,689', 'apple15', 'banana']
	147	>>> import locale
	148	>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
	149	'en_US.UTF-8'
	150	>>> natsorted(a, alg=ns.LOCALE)
	151	['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
	152	>>> from natsort import humansorted
	153	>>> humansorted(a)
	154	['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
	155
	156	You may find you need to explicitly set the locale to get this to work
	157	(as shown in the example).
	158	Please see :ref:`locale_issues` and the Installation section
	159	below before using the :func:`~humansorted` function.
	160
	161	Further Customizing Natsort
	162	+++++++++++++++++++++++++++
	163
	164	If you need to combine multiple algorithm modifiers (such as ``ns.REAL``,
	165	``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the
	166	bitwise OR operator (``\|``). For example,
	167
	168	.. code-block:: pycon
	169
	170	>>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
	171	>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE)
	172	['Apple', 'apple15', 'apple14,689', 'Banana', 'banana']
	173	>>> # The ns enum provides long and short forms for each option.
	174	>>> ns.LOCALE == ns.L
	175	True
	176	>>> # You can also customize the convenience functions, too.
	177	>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE) == realsorted(a, alg=ns.L \| ns.IC)
	178	True
	179	>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE) == humansorted(a, alg=ns.R \| ns.IC)
	180	True
	181
	182	All of the available customizations can be found in the documentation for
	183	the :class:`~natsort.ns` enum.
	184
	185	You can also add your own custom transformation functions with the ``key`` argument.
	186	These can be used with ``alg`` if you wish:
	187
	188	.. code-block:: pycon
	189
	190	>>> a = ['apple2.50', '2.3apple']
	191	>>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL)
	192	['2.3apple', 'apple2.50']
	193
	194	Sorting Mixed Types
	195	+++++++++++++++++++
	196
	197	You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
	198	when you sort:
	199
	200	.. code-block:: pycon
	201
	202	>>> a = ['4.5', 6, 2.0, '5', 'a']
	203	>>> natsorted(a)
	204	[2.0, '4.5', '5', 6, 'a']
	205	>>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
	206	>>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
	207
	208	Handling Bytes on Python 3
	209	++++++++++++++++++++++++++
	210
	211	:mod:`natsort` does not officially support the `bytes` type on Python 3, but
	212	convenience functions are provided that help you decode to `str` first:
	213
	214	.. code-block:: pycon
	215
	216	>>> from natsort import as_utf8
	217	>>> a = [b'a', 14.0, 'b']
	218	>>> # On Python 2, natsorted(a) would would work as expected.
	219	>>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
	220	>>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
	221	True
	222	>>> a = [b'a56', b'a5', b'a6', b'a40']
	223	>>> # On Python 2, natsorted(a) would would work as expected.
	224	>>> # On Python 3, natsorted(a) would return the same results as sorted(a)
	225	>>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
	226	True
	227
	228	Generating a Reusable Sorting Key and Sorting In-Place
	229	++++++++++++++++++++++++++++++++++++++++++++++++++++++
	230
	231	Under the hood, :func:`~natsorted` works by generating a custom sorting
	232	key using :func:`~natsort_keygen` and then passes that to the built-in
	233	:func:`sorted`. You can use the :func:`~natsort_keygen` function yourself to
	234	generate a custom sorting key to sort in-place using the :meth:`list.sort`
	235	method.
	236
	237	.. code-block:: pycon
	238
	239	>>> from natsort import natsort_keygen
	240	>>> natsort_key = natsort_keygen()
	241	>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
	242	>>> natsorted(a) == sorted(a, key=natsort_key)
	243	True
	244	>>> a.sort(key=natsort_key)
	245	>>> a
	246	['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
	247
	248	All of the algorithm customizations mentioned in the `Further Customizing Natsort`_
	249	section can also be applied to :func:`~natsort_keygen` through the alg keyword option.
	250
	251	Other Useful Things
	252	+++++++++++++++++++
	253
	254	- recursively descend into lists of lists
	255	- automatic unicode normalization of input data
	256	- controlling the case-sensitivity (see :ref:`case_sort`)
	257	- sorting file paths correctly (see :ref:`path_sort`)
	258	- allow custom sorting keys (see :ref:`custom_sort`)
	259
	260	FAQ
	261	---
	262
	263	How do I debug :func:`~natsorted`?
	264	The best way to debug :func:`~natsorted` is to generate a key using :func:`~natsort_keygen`
	265	with the same options being passed to :func:`~natsorted`. One can take a look at
	266	exactly what is being done with their input using this key - it is highly recommended
	267	to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
	268	for how to debug, and also to review the :ref:`howitworks` page for why
	269	:mod:`natsort` is doing that to your data.
	270
	271	If you are trying to sort custom classes and running into trouble, please take a look at
	272	https://github.com/SethMMorton/natsort/issues/60. In short,
	273	custom classes are not likely to be sorted correctly if one relies
	274	on the behavior of ``__lt__`` and the other rich comparison operators in their
	275	custom class - it is better to use a ``key`` function with :mod:`natsort`, or
	276	use the :mod:`natsort` key as part of your rich comparison operator definition.
	277
	278	How does :mod:`natsort` work?
	279	If you don't want to read :ref:`howitworks`, here is a quick primer.
	280
	281	:mod:`natsort` provides a :term:`key function` that can be passed to
	282	:meth:`list.sort` or :func:`sorted` in order to modify the default sorting
	283	behavior. This key is generated on-demand with the key generator
	284	:func:`natsort.natsort_keygen`. :func:`natsort.natsorted` is essentially a
	285	wrapper for the following code:
	286
	287	.. code-block:: pycon
	288
	289	>>> from natsort import natsort_keygen
	290	>>> natsort_key = natsort_keygen()
	291	>>> sorted(['1', '10', '2'], key=natsort_key)
	292	['1', '2', '10']
	293
	294	Users can further customize :mod:`natsort` sorting behavior with the ``key``
	295	and/or ``alg`` options (see details in the `Further Customizing Natsort`_
	296	section).
	297
	298	The key generated by :func:`natsort.natsort_keygen` always returns a :class:`tuple`. It
	299	does so in the following way (some details omitted for clarity):
	300
	301	1. Assume the input is a string, and attempt to split it into numbers and
	302	non-numbers using regular expressions. Numbers are then converted into
	303	either :class:`int` or :class:`float`.
	304	2. If the above fails because the input is not a string, assume the input
	305	is some other sequence (e.g. :class:`list` or :class:`tuple`), and recursively
	306	apply the key to each element of the sequence.
	307	3. If the above fails because the input is not iterable, assume the input
	308	is an :class:`int` or :class:`float`, and just return the input in a :class:`tuple`.
	309
	310	Because a :class:`tuple` is always returned, a :exc:`TypeError` should not be common
	311	unless one tries to do something odd like sort an :class:`int` against a :class:`list`.
	312
	313	:mod:`natsort` gave me results I didn't expect, and it's a terrible library!
	314	Did you try to debug using the above advice? If so, and you still cannot figure out
	315	the error, then please `file an issue <https://github.com/SethMMorton/natsort/issues/new>`_.
	316
	317	Shell script
	318	------------
	319
	320	:mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be called
	321	from the command line with ``python -m natsort``.
	322
	323	Requirements
	324	------------
	325
	326	:mod:`natsort` requires Python version 2.7 or Python 3.4 or greater.
	327
	328	Optional Dependencies
	329	---------------------
	330
	331	fastnumbers
	332	+++++++++++
	333
	334	The most efficient sorting can occur if you install the
	335	`fastnumbers <https://pypi.org/project/fastnumbers>`_ package
	336	(version >=2.0.0); it helps with the string to number conversions.
	337	:mod:`natsort` will still run (efficiently) without the package, but if you need
	338	to squeeze out that extra juice it is recommended you include this as a dependency.
	339	:mod:`natsort` will not require (or check) that
	340	`fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed
	341	at installation.
	342
	343	PyICU
	344	+++++
	345
	346	It is recommended that you install `PyICU <https://pypi.org/project/PyICU>`_
	347	if you wish to sort in a locale-dependent manner, see :ref:`locale_issues` for
	348	an explanation why.
	349
	350	Installation
	351	------------
	352
	353	Use ``pip``!
	354
	355	.. code-block:: sh
	356
	357	$ pip install natsort
	358
	359	If you want to install the `Optional Dependencies`_, you can use the
	360	`"extras" notation <https://packaging.python.org/tutorials/installing-packages/#installing-setuptools-extras>`_
	361	at installation time to install those dependencies as well - use ``fast`` for
	362	`fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for
	363	`PyICU <https://pypi.org/project/PyICU>`_.
	364
	365	.. code-block:: sh
	366
	367	# Install both optional dependencies.
	368	$ pip install natsort[fast,icu]
	369	# Install just fastnumbers
	370	$ pip install natsort[fast]
	371
	372	How to Run Tests
	373	----------------
	374
	375	Please note that :mod:`natsort` is NOT set-up to support ``python setup.py test``.
	376
	377	The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_.
	378	After installing ``tox``, running tests is as simple as executing the following in the
	379	``natsort`` directory:
	380
	381	.. code-block:: sh
	382
	383	$ tox
	384
	385	``tox`` will create virtual a virtual environment for your tests and install all the
	386	needed testing requirements for you. You can specify a particular python version
	387	with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``.
	388	You can see all available testing environments with ``tox --listenvs``.
	389
	390	If you do not wish to use ``tox``, you can install the testing dependencies with the
	391	``dev-requirements.txt`` file and then run the tests manually using
	392	`pytest <https://docs.pytest.org/en/latest/>`_.
	393
	394	.. code-block:: console
	395
	396	$ pip install -r dev-requirements.txt
	397	$ python -m pytest
	398
	399	Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this is because
	400	`the former puts the CWD on sys.path <https://docs.pytest.org/en/latest/usage.html#calling-pytest-through-python-m-pytest>`_.
	401
	402	How to Build Documentation
	403	--------------------------
	404
	405	If you want to build the documentation for :mod:`natsort`, it is recommended to use ``tox``:
	406
	407	.. code-block:: console
	408
	409	$ tox -e docs
	410
	411	This will place the documentation in ``build/sphinx/html``. If you do not
	412	which to use ``tox``, you can do the following:
	413
	414	.. code-block:: console
	415
	416	$ pip install sphinx sphinx_rtd_theme
	417	$ python setup.py build_sphinx
	418
	419	Deprecation Schedule
	420	--------------------
	421
	422	Dropping Python 2.7 Support
	423	+++++++++++++++++++++++++++
	424
	425	:mod:`natsort` version 7.0.0 will drop support for Python 2.7.
	426
	427	The version 6.X branch will remain as a "long term support" branch where bug fixes
	428	are applied so that users who cannot update from Python 2.7 will not be forced to
	429	use a buggy :mod:`natsort` version. Once version 7.0.0 is released, new features
	430	will not be added to version 6.X, only bug fixes.
	431
	432	Deprecated APIs
	433	+++++++++++++++
	434
	435	In :mod:`natsort` version 6.0.0, the following APIs and functions were removed
	436
	437	- ``number_type`` keyword argument (deprecated since 3.4.0)
	438	- ``signed`` keyword argument (deprecated since 3.4.0)
	439	- ``exp`` keyword argument (deprecated since 3.4.0)
	440	- ``as_path`` keyword argument (deprecated since 3.4.0)
	441	- ``py3_safe`` keyword argument (deprecated since 3.4.0)
	442	- ``ns.TYPESAFE`` (deprecated since version 5.0.0)
	443	- ``ns.DIGIT`` (deprecated since version 5.0.0)
	444	- ``ns.VERSION`` (deprecated since version 5.0.0)
	445	- :func:`~natsort.versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
	446	- :func:`~natsort.index_versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
	447
	448	In general, if you want to determine if you are using deprecated APIs you can run your
	449	code with the following flag
	450
	451	.. code-block:: console
	452
	453	$ python -Wdefault::DeprecationWarning my-code.py
	454
	455	By default :exc:`DeprecationWarnings` are not shown, but this will cause them to be shown.
	456	Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to
	457	"default::DeprecationWarning" and then run your code.
	458
	459	Dropped Pipenv for Development
	460	++++++++++++++++++++++++++++++
	461
	462	:mod:`natsort` version 6.0.0 no longer uses `Pipenv <https://pipenv.readthedocs.io/en/latest/>`_
	463	to install development dependencies.
	464
	465	Dropped Python 2.6 and 3.3 Support
	466	++++++++++++++++++++++++++++++++++
	467
	468	:mod:`natsort` version 6.0.0 dropped support for Python 2.6 and Python 3.3.

+97

-0

docs/locale_issues.rst less more

	0	.. default-domain:: py
	1	.. currentmodule:: natsort
	2
	3	.. _locale_issues:
	4
	5	Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE``
	6	==================================================================
	7
	8	Being Locale-Aware Means Both Numbers and Non-Numbers
	9	-----------------------------------------------------
	10
	11	In addition to modifying how characters are sorted, ``ns.LOCALE`` will take into
	12	account locale-dependent thousands separators (and locale-dependent decimal
	13	separators if ``ns.FLOAT`` is enabled). This means that if you are in a
	14	locale that uses commas as the thousands separator, a number like
	15	``123,456`` will be interpreted as ``123456``. If this is not what you want,
	16	you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware
	17	sorting for non-numbers (similarly, ``ns.LOCALENUM`` enables locale-aware
	18	sorting only for numbers).
	19
	20	Regenerate Key With :func:`~natsort.natsort_keygen` After Changing Locale
	21	-------------------------------------------------------------------------
	22
	23	When :func:`~natsort.natsort_keygen` is called it returns a key function that
	24	hard-codes the provided settings. This means that the key returned when
	25	``ns.LOCALE`` is used contins the settings specifed by the locale
	26	loaded at the time the key is generated. If you change the locale,
	27	you should regenerate the key to account for the new locale.
	28
	29	Corollary: Do Not Reuse :func:`~natsort.natsort_keygen` After Changing Locale
	30	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
	31
	32	If you change locale, the old function will not work as expected.
	33	The :mod:`locale` library works with a global state. When
	34	:func:`~natsort.natsort_keygen` is called it does the best job that it can to
	35	make the returned function as static as possible and independent of the global
	36	state, but the :func:`locale.strxfrm` function must access this global state to
	37	work; therefore, if you change locale and use ``ns.LOCALE`` then you should
	38	discard the old key.
	39
	40	.. note:: If you use `PyICU`_ then you may be able to reuse keys after changing
	41	locale.
	42
	43	The :mod:`locale` Module From the StdLib Has Issues
	44	---------------------------------------------------
	45
	46	:mod:`natsort` will use `PyICU`_ for :func:`~natsort.humansorted` or
	47	``ns.LOCALE`` if it is installed. If not, it will fall back on the
	48	:mod:`locale` library from the Python stdlib. If you do not have `PyICU`_
	49	installed, please keep the following known problems and issues in mind.
	50
	51	.. note:: Remember, if you have `PyICU`_ installed you shouldn't need to worry
	52	about any of these.
	53
	54	Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCALE``
	55	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
	56
	57	I have found that unless you explicitly set a locale, the sorted order may not
	58	be what you expect. Setting this is straightforward
	59	(in the below example I use 'en_US.UTF-8', but you should use your
	60	locale):
	61
	62	.. code-block:: pycon
	63
	64	>>> import locale
	65	>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
	66	'en_US.UTF-8'
	67
	68	.. _bug_note:
	69
	70	The :mod:`locale` Module Is Broken on Mac OS X
	71	++++++++++++++++++++++++++++++++++++++++++++++
	72
	73	It's not Python's fault, but the OS... the locale library for BSD-based systems
	74	(of which Mac OS X is one) is broken. See the following links:
	75
	76	- https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
	77	- https://bugs.python.org/issue23195
	78	- https://github.com/SethMMorton/natsort/issues/21 (contains instructons on installing)
	79	- https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
	80	- https://github.com/SethMMorton/natsort/issues/34
	81
	82	Of course, installing `PyICU`_ fixes this, but if you don't want to or cannot
	83	install this there is some hope.
	84
	85	1. As of ``natsort`` version 4.0.0, ``natsort`` is configured
	86	to compensate for a broken ``locale`` library. When sorting non-numbers
	87	it will handle case as you expect, but it will still not be able to
	88	comprehend non-ASCII characters properly. Additionally, it has
	89	a built-in lookup table of thousands separators that are incorrect
	90	on OS X/BSD (but is possible it is not complete... please file an
	91	issue if you see it is not complete)
	92	2. Use "\.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "\.UTF-8"
	93	locale. I have found that these have fewer issues than "UTF-8", but
	94	your mileage may vary.
	95
	96	.. _PyICU: https://pypi.org/project/PyICU

+158

-0

docs/shell.rst less more

	0	.. default-domain:: py
	1	.. currentmodule:: natsort
	2
	3	.. _shell:
	4
	5	Shell Script
	6	============
	7
	8	The ``natsort`` shell script is automatically installed when you install
	9	:mod:`natsort` with pip.
	10
	11	Below is the usage and some usage examples for the ``natsort`` shell script.
	12
	13	Usage
	14	-----
	15
	16	.. code-block:: none
	17
	18	usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE]
	19	[-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp]
	20	[--locale]
	21	[entries [entries ...]]
	22
	23	Performs a natural sort on entries given on the command-line.
	24	A natural sort sorts numerically then alphabetically, and will sort
	25	by numbers in the middle of an entry.
	26
	27	positional arguments:
	28	entries The entries to sort. Taken from stdin if nothing is
	29	given on the command line.
	30
	31	optional arguments:
	32	-h, --help show this help message and exit
	33	--version show program's version number and exit
	34	-p, --paths Interpret the input as file paths. This is not
	35	strictly necessary to sort all file paths, but in
	36	cases where there are OS-generated file paths like
	37	"Folder/" and "Folder (1)/", this option is needed to
	38	make the paths sorted in the order you expect
	39	("Folder/" before "Folder (1)/").
	40	-f LOW HIGH, --filter LOW HIGH
	41	Used for keeping only the entries that have a number
	42	falling in the given range.
	43	-F LOW HIGH, --reverse-filter LOW HIGH
	44	Used for excluding the entries that have a number
	45	falling in the given range.
	46	-e EXCLUDE, --exclude EXCLUDE
	47	Used to exclude an entry that contains a specific
	48	number.
	49	-r, --reverse Returns in reversed order.
	50	-t {digit,int,float,version,ver,real,f,i,r,d},
	51	--number-type {digit,int,float,version,ver,real,f,i,r,d},
	52	--number_type {digit,int,float,version,ver,real,f,i,r,d}
	53	Choose the type of number to search for. "float" will
	54	search for floating-point numbers. "int" will only
	55	search for integers. "digit", "version", and "ver" are
	56	synonyms for "int"."real" is a shortcut for "float"
	57	with --sign. "i" and "d" are synonyms for "int", "f"
	58	is a synonym for "float", and "r" is a synonym for
	59	"real".The default is int.
	60	--nosign Do not consider "+" or "-" as part of a number, i.e.
	61	do not take sign into consideration. This is the
	62	default.
	63	-s, --sign Consider "+" or "-" as part of a number, i.e. take
	64	sign into consideration. The default is unsigned.
	65	--noexp Do not consider an exponential as part of a number,
	66	i.e. 1e4, would be considered as 1, "e", and 4, not as
	67	10000. This only effects the --number-type=float.
	68	-l, --locale Causes natsort to use locale-aware sorting. You will
	69	get the best results if you install PyICU.
	70
	71	Description
	72	-----------
	73
	74	``natsort`` was originally written to aid in computational chemistry
	75	research so that it would be easy to analyze large sets of output files
	76	named after the parameter used:
	77
	78	.. code-block:: console
	79
	80	$ ls *.out
	81	mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
	82
	83	(Obviously, in reality there would be more files, but you get the idea.) Notice
	84	that the shell sorts in lexicographical order. This is the behavior of programs like
	85	``find`` as well as ``ls``. The problem is passing these files to an
	86	analysis program causes them not to appear in numerical order, which can lead
	87	to bad analysis. To remedy this, use ``natsort``:
	88
	89	.. code-block:: console
	90
	91	$ natsort *.out
	92	mode744.43.out
	93	mode943.54.out
	94	mode1000.35.out
	95	mode1243.34.out
	96	$ natsort -t r *.out \| xargs your_program
	97
	98	``-t r`` is short for ``--number-type real``. You can also place natsort in
	99	the middle of a pipe:
	100
	101	.. code-block:: console
	102
	103	$ find . -name "*.out" \| natsort -t r \| xargs your_program
	104
	105	To sort version numbers, use the default ``--number-type``:
	106
	107	.. code-block:: console
	108
	109	$ ls *
	110	prog-1.10.zip prog-1.9.zip prog-2.0.zip
	111	$ natsort *
	112	prog-1.9.zip
	113	prog-1.10.zip
	114	prog-2.0.zip
	115
	116	In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API,
	117	with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude``
	118	options. These three options are used as follows:
	119
	120	.. code-block:: console
	121
	122	$ ls *.out
	123	mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
	124	$ natsort -t r *.out -f 900 1100 # Select only numbers between 900-1100
	125	mode943.54.out
	126	mode1000.35.out
	127	$ natsort -t r *.out -F 900 1100 # Select only numbers NOT between 900-1100
	128	mode744.43.out
	129	mode1243.34.out
	130	$ natsort -t r *.out -e 1000.35 # Exclude 1000.35 from search
	131	mode744.43.out
	132	mode943.54.out
	133	mode1243.34.out
	134
	135	If you are sorting paths with OS-generated filenames, you may require the
	136	``--paths``/``-p`` option:
	137
	138	.. code-block:: console
	139
	140	$ find . ! -path . -type f
	141	./folder/file (1).txt
	142	./folder/file.txt
	143	./folder (1)/file.txt
	144	./folder (10)/file.txt
	145	./folder (2)/file.txt
	146	$ find . ! -path . -type f \| natsort
	147	./folder (1)/file.txt
	148	./folder (2)/file.txt
	149	./folder (10)/file.txt
	150	./folder/file (1).txt
	151	./folder/file.txt
	152	$ find . ! -path . -type f \| natsort -p
	153	./folder/file.txt
	154	./folder/file (1).txt
	155	./folder (1)/file.txt
	156	./folder (2)/file.txt
	157	./folder (10)/file.txt

-26

~~docs/source/api.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _api:
4
5		natsort API
6		===========
7
8		.. toctree::
9		:maxdepth: 2
10
11		natsort_keygen.rst
12		natsort_key.rst
13		natsorted.rst
14		versorted.rst
15		humansorted.rst
16		realsorted.rst
17		index_natsorted.rst
18		index_versorted.rst
19		index_humansorted.rst
20		index_realsorted.rst
21		order_by_index.rst
22		ns_class.rst
23		bytes.rst
24		chain.rst
25		locale_issues.rst

-20

~~docs/source/bytes.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _bytes_help:
4
5		Help With Bytes On Python 3
6		===========================
7
8		The official stance of :mod:`natsort` is to not support `bytes` for
9		sorting; there is just too much that can go wrong when trying to automate
10		conversion between `bytes` and `str`. But rather than completely give up
11		on `bytes`, :mod:`natsort` provides three functions that make it easy to
12		quickly decode `bytes` to `str` so that sorting is possible.
13
14		.. autofunction:: decoder
15
16		.. autofunction:: as_ascii
17
18		.. autofunction:: as_utf8
19

-16

~~docs/source/chain.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _function_help:
4
5		Help With Creating Function Keys
6		================================
7
8		If you need to create a complicated key argument to (for example)
9		:func:`natsorted` that is actually multiple functions called one after the other,
10		the following function can help you easily perform this action. It is
11		used internally to :mod:`natsort`, and has been exposed publically for
12		the convenience of the user.
13
14		.. autofunction:: chain_functions
15

-369

~~docs/source/changelog.rst~~ less more

0		.. _changelog:
1
2		Changelog
3		---------
4
5		09-09-2018 v. 5.4.1
6		+++++++++++++++++++
7
8		- Fix error in a newly added test.
9		- Changed code format and quality checking infrastructure.
10
11		09-06-2018 v. 5.4.0
12		+++++++++++++++++++
13
14		- Re-expose ``natsort_key`` as "public" and remove the
15		associated ``DepricationWarning``.
16		- Add better developer documentation.
17		- Refactor tests.
18		- Bump allowed ``fastnumbers`` version.
19
20		07-07-2018 v. 5.3.3
21		+++++++++++++++++++
22
23		- Update docs with a FAQ and quick how-it-works.
24		- Fix a StopIteration error in the testing code.
25		- Enable Python 3.7 support in Travis-CI.
26
27		05-17-2018 v. 5.3.2
28		+++++++++++++++++++
29
30		- Fix bug that prevented install on old versions of setuptools.
31		- Revert layout from src/natsort/ back to natsort/ to make user
32		testing simpler.
33
34		05-14-2018 v. 5.3.1
35		+++++++++++++++++++
36
37		- No bugfixes or features, just infrastructure and installation updates.
38		- Move to defining dependencies with Pipfile.
39		- Development layout is now src/natsort/ instead of natsort/.
40		- Add bumpversion infrastructure.
41		- Extras can be installed by "[]" notation.
42
43		04-20-2018 v. 5.3.0
44		+++++++++++++++++++
45
46		- Fix bug in assessing ``fastnumbers`` version at import-time.
47		- Add ability to consider unicode-decimal numbers as numbers.
48
49		02-14-2018 v. 5.2.0
50		+++++++++++++++++++
51
52		- Add ``ns.NUMAFTER`` to cause numbers to be placed after non-numbers.
53		- Add ``natcmp`` function (Python 2 only).
54
55		11-11-2017 v. 5.1.1
56		+++++++++++++++++++
57
58		- Added additional unicode number support for Python 3.7.
59		- Added information on how to install and test.
60
61		08-19-2017 v. 5.1.0
62		+++++++++++++++++++
63
64		- Fixed ``StopIteration`` warning on Python 3.6+.
65		- All Unicode input is now normalized.
66
67		04-30-2017 v. 5.0.3
68		+++++++++++++++++++
69
70		- Improved development infrastructure.
71		- Migrated documentation to ReadTheDocs.
72
73		01-02-2017 v. 5.0.2
74		+++++++++++++++++++
75
76		- Added additional unicode number support for Python 3.6.
77		- Renamed several internal functions and variables to improve clarity.
78		- Improved documentation examples.
79		- Added a "how does it work?" section to the documentation.
80
81		06-04-2016 v. 5.0.1
82		+++++++++++++++++++
83
84		- The ``ns`` enum attributes can now be imported from the top-level
85		namespace.
86		- Fixed a bug with the ``from natsort import *`` mechanism.
87		- Fixed bug with using ``natsort`` with ``python -OO``.
88
89		05-08-2016 v. 5.0.0
90		+++++++++++++++++++
91
92		- ``ns.LOCALE``/``humansorted`` now accounts for thousands separators.
93		- Refactored entire codebase to be more functional (as in use functions as
94		units). Previously, the code was rather monolithic and difficult to follow. The
95		goal is that with the code existing in smaller units, contributing will
96		be easier.
97		- Deprecated ``ns.TYPESAFE`` option as it is now always on (due to a new
98		iterator-based algorithm, the typesafe function is now cheap).
99		- Increased speed of execution (came for free with the new functional approach
100		because the new factory function paradigm eliminates most ``if`` branches
101		during execution).
102
103		- For the most cases, the code is 30-40% faster than version 4.0.4.
104		- If using ``ns.LOCALE`` or ``humansorted``, the code is 1100% faster than
105		version 4.0.4.
106
107		- Improved clarity of documentaion with regards to locale-aware sorting.
108		- Added a new ``chain_functions`` function for convenience in creating
109		a complex user-given ``key`` from several existing functions.
110
111		11-01-2015 v. 4.0.4
112		+++++++++++++++++++
113
114		- Improved coverage of unit tests.
115		- Unit tests use new and improved hypothesis library.
116		- Fixed compatibility issues with Python 3.5
117
118		06-25-2015 v. 4.0.3
119		+++++++++++++++++++
120
121		- Fixed bad install on last release (sorry guys!).
122
123		06-24-2015 v. 4.0.2
124		+++++++++++++++++++
125
126		- Added back Python 2.6 and Python 3.2 compatibility. Unit testing is now
127		performed for these versions.
128		- Consolidated under-the-hood compatibility functionality.
129
130		06-04-2015 v. 4.0.1
131		+++++++++++++++++++
132
133		- Added support for sorting NaN by internally converting to -Infinity
134		or +Infinity
135
136		05-17-2015 v. 4.0.0
137		+++++++++++++++++++
138
139		- Made default behavior of 'natsort' search for unsigned ints,
140		rather than signed floats. This is a backwards-incompatible
141		change but in 99% of use cases it should not require any
142		end-user changes.
143		- Improved handling of locale-aware sorting on systems where the
144		underlying locale library is broken.
145		- Greatly improved all unit tests by adding the hypothesis library.
146
147		04-06-2015 v. 3.5.6
148		+++++++++++++++++++
149
150		- Added 'UNGROUPLETTERS' algorithm to get the case-grouping behavior of
151		an ordinal sort when using 'LOCALE'.
152		- Added convenience functions 'decoder', 'as_ascii', and 'as_utf8' for
153		dealing with bytes types.
154
155		04-04-2015 v. 3.5.5
156		+++++++++++++++++++
157
158		- Added 'realsorted' and 'index_realsorted' functions for
159		forward-compatibility with >= 4.0.0.
160		- Made explanation of when to use "TYPESAFE" more clear in the docs.
161
162		04-02-2015 v. 3.5.4
163		+++++++++++++++++++
164
165		- Fixed bug where a 'TypeError' was raised if a string containing a leading
166		number was sorted with alpha-only strings when 'LOCALE' is used.
167
168		03-26-2015 v. 3.5.3
169		+++++++++++++++++++
170
171		- Fixed bug where '--reverse-filter' option in shell script was not
172		getting checked for correctness.
173		- Documentation updates to better describe locale bug, and illustrate
174		upcoming default behavior change.
175		- Internal improvements, including making test suite more granular.
176
177		01-13-2015 v. 3.5.2
178		+++++++++++++++++++
179
180		- Enhancement that will convert a 'pathlib.Path' object to a 'str' if
181		'ns.PATH' is enabled.
182
183		09-25-2014 v. 3.5.1
184		+++++++++++++++++++
185
186		- Fixed bug that caused list/tuples to fail when using 'ns.LOWECASEFIRST'
187		or 'ns.IGNORECASE'.
188		- Refactored modules so that only the public API was in natsort.py and
189		ns_enum.py.
190		- Refactored all import statements to be absolute, not relative.
191
192
193		09-02-2014 v. 3.5.0
194		+++++++++++++++++++
195
196		- Added the 'alg' argument to the 'natsort' functions. This argument
197		accepts an enum that is used to indicate the options the user wishes
198		to use. The 'number_type', 'signed', 'exp', 'as_path', and 'py3_safe'
199		options are being deprecated and will become (undocumented)
200		keyword-only options in natsort version 4.0.0.
201		- The user can now modify how 'natsort' handles the case of non-numeric
202		characters.
203		- The user can now instruct 'natsort' to use locale-aware sorting, which
204		allows 'natsort' to perform true "human sorting".
205
206		- The `humansorted` convenience function has been included to make this
207		easier.
208
209		- Updated shell script with locale functionality.
210
211		08-12-2014 v. 3.4.1
212		+++++++++++++++++++
213
214		- 'natsort' will now use the 'fastnumbers' module if it is installed. This
215		gives up to an extra 30% boost in speed over the previous performance
216		enhancements.
217		- Made documentation point to more 'natsort' resources, and also added a
218		new example in the examples section.
219
220		07-19-2014 v. 3.4.0
221		+++++++++++++++++++
222
223		- Fixed a bug that caused user's options to the 'natsort_key' to not be
224		passed on to recursive calls of 'natsort_key'.
225		- Added a 'natsort_keygen' function that will generate a wrapped version
226		of 'natsort_key' that is easier to call. 'natsort_key' is now set to
227		deprecate at natsort version 4.0.0.
228		- Added an 'as_path' option to 'natsorted' & co. that will try to treat
229		input strings as filepaths. This will help yield correct results for
230		OS-generated inputs like
231		``['/p/q/o.x', '/p/q (1)/o.x', '/p/q (10)/o.x', '/p/q/o (1).x']``.
232		- Massive performance enhancements for string input (1.8x-2.0x), at the expense
233		of reduction in speed for numeric input (~2.0x).
234
235		- This is a good compromise because the most common input will be strings,
236		not numbers, and sorting numbers still only takes 0.6x the time of sorting
237		strings. If you are sorting only numbers, you would use 'sorted' anyway.
238
239		- Added the 'order_by_index' function to help in using the output of
240		'index_natsorted' and 'index_versorted'.
241		- Added the 'reverse' option to 'natsorted' & co. to make it's API more
242		similar to the builtin 'sorted'.
243		- Added more unit tests.
244		- Added auxillary test code that helps in profiling and stress-testing.
245		- Reworked the documentation, moving most of it to PyPI's hosting platform.
246		- Added support for coveralls.io.
247		- Entire codebase is now PyFlakes and PEP8 compliant.
248
249		06-28-2014 v. 3.3.0
250		+++++++++++++++++++
251
252		- Added a 'versorted' method for more convenient sorting of versions.
253		- Updated command-line tool --number_type option with 'version' and 'ver'
254		to make it more clear how to sort version numbers.
255		- Moved unit-testing mechanism from being docstring-based to actual unit tests
256		in actual functions.
257
258		- This has provided the ability determine the coverage of the unit tests (99%).
259		- This also makes the pydoc documentation a bit more clear.
260
261		- Made docstrings for public functions mirror the README API.
262		- Connected natsort development to Travis-CI to help ensure quality releases.
263
264		06-20-2014 v. 3.2.1
265		+++++++++++++++++++
266
267		- Re-"Fixed" unorderable types issue on Python 3.x - this workaround
268		is for when the problem occurs in the middle of the string.
269
270		05-07-2014 v. 3.2.0
271		+++++++++++++++++++
272
273		- "Fixed" unorderable types issue on Python 3.x with a workaround that
274		attempts to replicate the Python 2.x behavior by putting all the numbers
275		(or strings that begin with numbers) first.
276		- Now explicitly excluding __pycache__ from releases by adding a prune statement
277		to MANIFEST.in.
278
279		05-05-2014 v. 3.1.2
280		+++++++++++++++++++
281
282		- Added setup.cfg to support universal wheels.
283		- Added Python 3.0 and Python 3.1 as requiring the argparse module.
284
285		03-01-2014 v. 3.1.1
286		+++++++++++++++++++
287
288		- Added ability to sort lists of lists.
289		- Cleaned up import statements.
290
291		01-20-2014 v. 3.1.0
292		+++++++++++++++++++
293
294		- Added the ``signed`` and ``exp`` options to allow finer tuning of the sorting
295		- Entire codebase now works for both Python 2 and Python 3 without needing to run
296		``2to3``.
297		- Updated all doctests.
298		- Further simplified the ``natsort`` base code by removing unneeded functions.
299		- Simplified documentation where possible.
300		- Improved the shell script code
301
302		- Made the documentation less "path"-centric to make it clear it is not just
303		for sorting file paths.
304		- Removed the filesystem-based options because these can be achieved better
305		though a pipeline.
306		- Added doctests.
307		- Added new options that correspond to ``signed`` and ``exp``.
308		- The user can now specify multiple numbers to exclude or multiple ranges
309		to filter by.
310
311		10-01-2013 v. 3.0.2
312		+++++++++++++++++++
313
314		- Made float, int, and digit searching algorithms all share the same base function.
315		- Fixed some outdated comments.
316		- Made the ``__version__`` variable available when importing the module.
317
318		8-15-2013 v. 3.0.1
319		++++++++++++++++++
320
321		- Added support for unicode strings.
322		- Removed extraneous ``string2int`` function.
323		- Fixed empty string removal function.
324
325		7-13-2013 v. 3.0.0
326		++++++++++++++++++
327
328		- Added a ``number_type`` argument to the sorting functions to specify how
329		liberal to be when deciding what a number is.
330		- Reworked the documentation.
331
332		6-25-2013 v. 2.2.0
333		++++++++++++++++++
334
335		- Added ``key`` attribute to ``natsorted`` and ``index_natsorted`` so that
336		it mimics the functionality of the built-in ``sorted``
337		- Added tests to reflect the new functionality, as well as tests demonstrating
338		how to get similar functionality using ``natsort_key``.
339
340		12-5-2012 v. 2.1.0
341		++++++++++++++++++
342
343		- Reorganized package.
344		- Now using a platform independent shell script generator (entry_points
345		from distribute).
346		- Can now execute natsort from command line with ``python -m natsort``
347		as well.
348
349		11-30-2012 v. 2.0.2
350		+++++++++++++++++++
351
352		- Added the use_2to3 option to setup.py.
353		- Added distribute_setup.py to the distribution.
354		- Added dependency to the argparse module (for python2.6).
355
356		11-21-2012 v. 2.0.1
357		+++++++++++++++++++
358
359		- Reorganized directory structure.
360		- Added tests into the natsort.py file iteself.
361
362		11-16-2012, v. 2.0.0
363		++++++++++++++++++++
364
365		- Updated sorting algorithm to support floats (including exponentials) and
366		basic version number support.
367		- Added better README documentation.
368		- Added doctests.

-275

~~docs/source/conf.py~~ less more

0		# -- coding: utf-8 --
1		#
2		# natsort documentation build configuration file, created by
3		# sphinx-quickstart on Thu Jul 17 21:01:29 2014.
4		#
5		# This file is execfile()d with the current directory set to its
6		# containing dir.
7		#
8		# Note that not all possible configuration values are present in this
9		# autogenerated file.
10		#
11		# All configuration values have a default; values that are commented out
12		# serve to show the default.
13
14		import os
15
16		# If extensions (or modules to document with autodoc) are in another directory,
17		# add these directories to sys.path here. If the directory is relative to the
18		# documentation root, use os.path.abspath to make it absolute, like shown here.
19		# sys.path.insert(0, os.path.abspath('.'))
20
21		# -- General configuration ------------------------------------------------
22
23		# If your documentation needs a minimal Sphinx version, state it here.
24		# needs_sphinx = '1.0'
25
26		# Add any Sphinx extension module names here, as strings. They can be
27		# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
28		# ones.
29		extensions = [
30		'sphinx.ext.autodoc',
31		'sphinx.ext.autosummary',
32		'sphinx.ext.intersphinx',
33		'sphinx.ext.mathjax',
34		'sphinx.ext.napoleon',
35		]
36
37		# Add any paths that contain templates here, relative to this directory.
38		templates_path = ['_templates']
39
40		# The suffix of source filenames.
41		source_suffix = '.rst'
42
43		# The encoding of source files.
44		# source_encoding = 'utf-8-sig'
45
46		# The master toctree document.
47		master_doc = 'index'
48
49		# General information about the project.
50		project = u'natsort'
51		# noinspection PyShadowingBuiltins
52		copyright = u'2014, Seth M. Morton'
53
54		# The version info for the project you're documenting, acts as replacement for
55		# \|version\| and \|release\|, also used in various other places throughout the
56		# built documents.
57		#
58		# The full version, including alpha/beta/rc tags.
59		release = '5.4.1'
60		# The short X.Y version.
61		version = '.'.join(release.split('.')[0:2])
62
63		# The language for content autogenerated by Sphinx. Refer to documentation
64		# for a list of supported languages.
65		# language = None
66
67		# There are two options for replacing \|today\|: either, you set today to some
68		# non-false value, then it is used:
69		# today = ''
70		# Else, today_fmt is used as the format for a strftime call.
71		# today_fmt = '%B %d, %Y'
72
73		# List of patterns, relative to source directory, that match files and
74		# directories to ignore when looking for source files.
75		# exclude_patterns = ['solar/*']
76
77		# The reST default role (used for this markup: `text`) to use for all
78		# documents.
79		# default_role = None
80
81		# If true, '()' will be appended to :func: etc. cross-reference text.
82		# add_function_parentheses = True
83
84		# If true, the current module name will be prepended to all description
85		# unit titles (such as .. function::).
86		# add_module_names = True
87
88		# If true, sectionauthor and moduleauthor directives will be shown in the
89		# output. They are ignored by default.
90		# show_authors = False
91
92		# The name of the Pygments (syntax highlighting) style to use.
93		pygments_style = 'sphinx'
94		highlight_language = 'python'
95
96		# A list of ignored prefixes for module index sorting.
97		# modindex_common_prefix = []
98
99		# If true, keep warnings as "system message" paragraphs in the built documents.
100		# keep_warnings = False
101
102
103		# -- Options for HTML output ----------------------------------------------
104
105		# The theme to use for HTML and HTML Help pages. See the documentation for
106		# a list of builtin themes.
107		on_rtd = os.environ.get('READTHEDOCS') == 'True'
108		if on_rtd:
109		html_theme = 'default'
110		else:
111		import sphinx_rtd_theme
112
113		html_theme = 'sphinx_rtd_theme'
114		# html_theme = 'solar'
115
116		# Theme options are theme-specific and customize the look and feel of a theme
117		# further. For a list of options available for each theme, see the
118		# documentation.
119		# html_theme_options = {}
120
121		# Add any paths that contain custom themes here, relative to this directory.
122		html_theme_path = ['.']
123
124		# The name for this set of Sphinx documents. If None, it defaults to
125		# "<project> v<release> documentation".
126		# html_title = None
127
128		# A shorter title for the navigation bar. Default is the same as html_title.
129		# html_short_title = None
130
131		# The name of an image file (relative to this directory) to place at the top
132		# of the sidebar.
133		# html_logo = None
134
135		# The name of an image file (within the static path) to use as favicon of the
136		# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
137		# pixels large.
138		# html_favicon = None
139
140		# Add any paths that contain custom static files (such as style sheets) here,
141		# relative to this directory. They are copied after the builtin static files,
142		# so a file named "default.css" will overwrite the builtin "default.css".
143		# html_static_path = ['_static']
144
145		# Add any extra paths that contain custom files (such as robots.txt or
146		# .htaccess) here, relative to this directory. These files are copied
147		# directly to the root of the documentation.
148		# html_extra_path = []
149
150		# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
151		# using the given strftime format.
152		# html_last_updated_fmt = '%b %d, %Y'
153
154		# If true, SmartyPants will be used to convert quotes and dashes to
155		# typographically correct entities.
156		# html_use_smartypants = True
157
158		# Custom sidebar templates, maps document names to template names.
159		# html_sidebars = {}
160
161		# Additional templates that should be rendered to pages, maps page names to
162		# template names.
163		# html_additional_pages = {}
164
165		# If false, no module index is generated.
166		# html_domain_indices = True
167
168		# If false, no index is generated.
169		# html_use_index = True
170
171		# If true, the index is split into individual pages for each letter.
172		# html_split_index = False
173
174		# If true, links to the reST sources are added to the pages.
175		# html_show_sourcelink = True
176
177		# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
178		# html_show_sphinx = True
179
180		# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
181		# html_show_copyright = True
182
183		# If true, an OpenSearch description file will be output, and all pages will
184		# contain a <link> tag referring to it. The value of this option must be the
185		# base URL from which the finished HTML is served.
186		# html_use_opensearch = ''
187
188		# This is the file name suffix for HTML files (e.g. ".xhtml").
189		# html_file_suffix = None
190
191		# Output file base name for HTML help builder.
192		htmlhelp_basename = 'natsortdoc'
193
194		# -- Options for LaTeX output ---------------------------------------------
195
196		latex_elements = {
197		# The paper size ('letterpaper' or 'a4paper').
198		# 'papersize': 'letterpaper',
199
200		# The font size ('10pt', '11pt' or '12pt').
201		# 'pointsize': '10pt',
202
203		# Additional stuff for the LaTeX preamble.
204		# 'preamble': '',
205		}
206
207		# Grouping the document tree into LaTeX files. List of tuples
208		# (source start file, target name, title,
209		# author, documentclass [howto, manual, or own class]).
210		latex_documents = [
211		('index', 'natsort.tex', u'natsort Documentation',
212		u'Seth M. Morton', 'manual'),
213		]
214
215		# The name of an image file (relative to this directory) to place at the top of
216		# the title page.
217		# latex_logo = None
218
219		# For "manual" documents, if this is true, then toplevel headings are parts,
220		# not chapters.
221		# latex_use_parts = False
222
223		# If true, show page references after internal links.
224		# latex_show_pagerefs = False
225
226		# If true, show URL addresses after external links.
227		# latex_show_urls = False
228
229		# Documents to append as an appendix to all manuals.
230		# latex_appendices = []
231
232		# If false, no module index is generated.
233		# latex_domain_indices = True
234
235
236		# -- Options for manual page output ---------------------------------------
237
238		# One entry per manual page. List of tuples
239		# (source start file, name, description, authors, manual section).
240		man_pages = [
241		('index', 'natsort', u'natsort Documentation',
242		[u'Seth M. Morton'], 1)
243		]
244
245		# If true, show URL addresses after external links.
246		# man_show_urls = False
247
248
249		# -- Options for Texinfo output -------------------------------------------
250
251		# Grouping the document tree into Texinfo files. List of tuples
252		# (source start file, target name, title, author,
253		# dir menu entry, description, category)
254		texinfo_documents = [
255		('index', 'natsort', u'natsort Documentation',
256		u'Seth M. Morton', 'natsort', 'One line description of project.',
257		'Miscellaneous'),
258		]
259
260		# Documents to append as an appendix to all manuals.
261		# texinfo_appendices = []
262
263		# If false, no module index is generated.
264		# texinfo_domain_indices = True
265
266		# How to display URL addresses: 'footnote', 'no', or 'inline'.
267		# texinfo_show_urls = 'footnote'
268
269		# If true, do not generate a @detailmenu in the "Top" node's menu.
270		# texinfo_no_detailmenu = False
271
272
273		# Example configuration for intersphinx: refer to the Python standard library.
274		intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}

-366

~~docs/source/examples.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _examples:
4
5		Examples and Recipes
6		====================
7
8		If you want more detailed examples than given on this page, please see
9		https://github.com/SethMMorton/natsort/tree/master/test_natsort.
10
11		.. contents::
12		:local:
13
14		Basic Usage
15		-----------
16
17		In the most basic use case, simply import :func:`~natsorted` and use
18		it as you would :func:`sorted`:
19
20		.. code-block:: python
21
22		>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
23		>>> sorted(a)
24		['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
25		>>> from natsort import natsorted, ns
26		>>> natsorted(a)
27		['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
28
29		Sort Version Numbers
30		--------------------
31
32		As of :mod:`natsort` version >= 4.0.0, :func:`~natsorted` will now properly
33		sort version numbers. The old function :func:`~versorted` exists for
34		backwards compatibility but new development should use :func:`~natsorted`.
35
36		.. _rc_sorting:
37
38		Sorting with Alpha, Beta, and Release Candidates
39		++++++++++++++++++++++++++++++++++++++++++++++++
40
41		By default, if you wish to sort versions with a non-strict versioning
42		scheme, you may not get the results you expect:
43
44		.. code-block:: python
45
46		>>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta1', '1.2alpha', '1.2.1', '1.1', '1.3']
47		>>> natsorted(a)
48		['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3']
49
50		To make the '1.2' pre-releases come before '1.2.1', you need to use the following
51		recipe:
52
53		.. code-block:: python
54
55		>>> natsorted(a, key=lambda x: x.replace('.', '~'))
56		['1.1', '1.2', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2.1', '1.3']
57
58		If you also want '1.2' after all the alpha, beta, and rc candidates, you can
59		modify the above recipe:
60
61		.. code-block:: python
62
63		>>> natsorted(a, key=lambda x: x.replace('.', '~')+'z')
64		['1.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2', '1.2.1', '1.3']
65
66		Please see `this issue <https://github.com/SethMMorton/natsort/issues/13>`_ to
67		see why this works.
68
69		.. _path_sort:
70
71		Sort OS-Generated Paths
72		-----------------------
73
74		In some cases when sorting file paths with OS-Generated names, the default
75		:mod:`~natsorted` algorithm may not be sufficient. In cases like these,
76		you may need to use the ``ns.PATH`` option:
77
78		.. code-block:: python
79
80		>>> a = ['./folder/file (1).txt',
81		... './folder/file.txt',
82		... './folder (1)/file.txt',
83		... './folder (10)/file.txt']
84		>>> natsorted(a)
85		['./folder (1)/file.txt', './folder (10)/file.txt', './folder/file (1).txt', './folder/file.txt']
86		>>> natsorted(a, alg=ns.PATH)
87		['./folder/file.txt', './folder/file (1).txt', './folder (1)/file.txt', './folder (10)/file.txt']
88
89		Locale-Aware Sorting (Human Sorting)
90		------------------------------------
91
92		.. note::
93		Please read :ref:`locale_issues` before using ``ns.LOCALE``, :func:`humansorted`,
94		or :func:`index_humansorted`.
95
96		You can instruct :mod:`natsort` to use locale-aware sorting with the
97		``ns.LOCALE`` option. In addition to making this understand non-ASCII
98		characters, it will also properly interpret non-'.' decimal separators
99		and also properly order case. It may be more convenient to just use
100		the :func:`humansorted` function:
101
102		.. code-block:: python
103
104		>>> from natsort import humansorted
105		>>> import locale
106		>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
107		'en_US.UTF-8'
108		>>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
109		>>> natsorted(a, alg=ns.LOCALE)
110		['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
111		>>> humansorted(a)
112		['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
113
114		You may find that if you do not explicitly set the locale your results may not
115		be as you expect... I have found that it depends on the system you are on.
116		If you use `PyICU <https://pypi.org/project/PyICU>`_ (see below) then
117		you should not need to do this.
118
119		.. _case_sort:
120
121		Controlling Case When Sorting
122		-----------------------------
123
124		For non-numbers, by default :mod:`natsort` used ordinal sorting (i.e.
125		it sorts by the character's value in the ASCII table). For example:
126
127		.. code-block:: python
128
129		>>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
130		>>> natsorted(a)
131		['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
132
133		There are times when you wish to ignore the case when sorting,
134		you can easily do this with the ``ns.IGNORECASE`` option:
135
136		.. code-block:: python
137
138		>>> natsorted(a, alg=ns.IGNORECASE)
139		['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
140
141		Note thats since Python's sorting is stable, the order of equivalent
142		elements after lowering the case is the same order they appear in the
143		original list.
144
145		Upper-case letters appear first in the ASCII table, but many natural
146		sorting methods place lower-case first. To do this, use
147		``ns.LOWERCASEFIRST``:
148
149		.. code-block:: python
150
151		>>> natsorted(a, alg=ns.LOWERCASEFIRST)
152		['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
153
154		It may be undesirable to have the upper-case letters grouped together
155		and the lower-case letters grouped together; most would expect all
156		"a"s to bet together regardless of case, and all "b"s, and so on. To
157		achieve this, use ``ns.GROUPLETTERS``:
158
159		.. code-block:: python
160
161		>>> natsorted(a, alg=ns.GROUPLETTERS)
162		['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
163
164		You might combine this with ``ns.LOWERCASEFIRST`` to get what most
165		would expect to be "natural" sorting:
166
167		.. code-block:: python
168
169		>>> natsorted(a, alg=ns.G \| ns.LF)
170		['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
171
172		Customizing Float Definition
173		----------------------------
174
175		You can make :func:`~natsorted` search for any float that would be
176		a valid Python float literal, such as 5, 0.4, -4.78, +4.2E-34, etc.
177		using the ``ns.FLOAT`` key. You can disable the exponential component
178		of the number with ``ns.NOEXP``.
179
180		.. code-block:: python
181
182		>>> a = ['a50', 'a51.', 'a+50.4', 'a5.034e1', 'a+50.300']
183		>>> natsorted(a, alg=ns.FLOAT)
184		['a50', 'a5.034e1', 'a51.', 'a+50.300', 'a+50.4']
185		>>> natsorted(a, alg=ns.FLOAT \| ns.SIGNED)
186		['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
187		>>> natsorted(a, alg=ns.FLOAT \| ns.SIGNED \| ns.NOEXP)
188		['a5.034e1', 'a50', 'a+50.300', 'a+50.4', 'a51.']
189
190		For convenience, the ``ns.REAL`` option is provided which is a shortcut
191		for ``ns.FLOAT \| ns.SIGNED`` and can be used to sort on real numbers.
192		This can be easily accessed with the :func:`~realsorted` convenience
193		function. Please note that the behavior of the :func:`~realsorted` function
194		was the default behavior of :func:`~natsorted` for :mod:`natsort`
195		version < 4.0.0:
196
197		.. code-block:: python
198
199		>>> natsorted(a, alg=ns.REAL)
200		['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
201		>>> from natsort import realsorted
202		>>> realsorted(a)
203		['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
204
205		.. _custom_sort:
206
207		Using a Custom Sorting Key
208		--------------------------
209
210		Like the built-in ``sorted`` function, ``natsorted`` can accept a custom
211		sort key so that:
212
213		.. code-block:: python
214
215		>>> from operator import attrgetter, itemgetter
216		>>> a = [['a', 'num4'], ['b', 'num8'], ['c', 'num2']]
217		>>> natsorted(a, key=itemgetter(1))
218		[['c', 'num2'], ['a', 'num4'], ['b', 'num8']]
219		>>> class Foo:
220		... def __init__(self, bar):
221		... self.bar = bar
222		... def __repr__(self):
223		... return "Foo('{0}')".format(self.bar)
224		>>> b = [Foo('num3'), Foo('num5'), Foo('num2')]
225		>>> natsorted(b, key=attrgetter('bar'))
226		[Foo('num2'), Foo('num3'), Foo('num5')]
227
228		Generating a Natsort Key
229		------------------------
230
231		If you need to sort a list in-place, you cannot use :func:`~natsorted`; you
232		need to pass a key to the :meth:`list.sort` method. The function
233		:func:`~natsort_keygen` is a convenient way to generate these keys for you:
234
235		.. code-block:: python
236
237		>>> from natsort import natsort_keygen
238		>>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300']
239		>>> natsort_key = natsort_keygen(alg=ns.FLOAT)
240		>>> a.sort(key=natsort_key)
241		>>> a
242		['a50', 'a50.300', 'a5.034e1', 'a50.4', 'a51.']
243
244		:func:`~natsort_keygen` has the same API as :func:`~natsorted` (minus the
245		`reverse` option).
246
247		Natural Sorting with ``cmp`` (Python 2 only)
248		--------------------------------------------
249
250		.. note::
251		This is a Python2-only feature! The :func:`natcmp` function is not
252		exposed on Python3. Because this documentation is built with
253		Python3, you will not find :func:`natcmp` in the API.
254
255		If you are using a legacy codebase that requires you to use :func:`cmp` instead
256		of a key-function, you can use :func:`~natcmp`.
257
258		.. code-block:: python
259
260		>>> import sys
261		>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
262		>>> if sys.version_info[0] == 2:
263		... from natsort import natcmp
264		... sorted(a, cmp=natcmp)
265		... else:
266		... natsorted(a) # so docstrings don't fail
267		['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
268
269		:func:`natcmp` also accepts an ``alg`` argument so you can customize your
270		sorting experience.
271
272		Sorting Multiple Lists According to a Single List
273		-------------------------------------------------
274
275		Sometimes you have multiple lists, and you want to sort one of those
276		lists and reorder the other lists according to how the first was sorted.
277		To achieve this you could use the :func:`~index_natsorted` in combination
278		with the convenience function
279		:func:`~order_by_index`:
280
281		.. code-block:: python
282
283		>>> from natsort import index_natsorted, order_by_index
284		>>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
285		>>> b = [4, 5, 6, 7, 8]
286		>>> c = ['hi', 'lo', 'ah', 'do', 'up']
287		>>> index = index_natsorted(a)
288		>>> order_by_index(a, index)
289		['a1', 'a2', 'a4', 'a9', 'a10']
290		>>> order_by_index(b, index)
291		[6, 4, 7, 5, 8]
292		>>> order_by_index(c, index)
293		['ah', 'hi', 'do', 'lo', 'up']
294
295		Returning Results in Reverse Order
296		----------------------------------
297
298		Just like the :func:`sorted` built-in function, you can supply the
299		``reverse`` option to return the results in reverse order:
300
301		.. code-block:: python
302
303		>>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
304		>>> natsorted(a, reverse=True)
305		['a10', 'a9', 'a4', 'a2', 'a1']
306
307		Sorting Bytes on Python 3
308		-------------------------
309
310		Python 3 is rather strict about comparing strings and bytes, and this
311		can make it difficult to deal with collections of both. Because of the
312		challenge of guessing which encoding should be used to decode a bytes
313		array to a string, :mod:`natsort` does not try to guess and automatically
314		convert for you; in fact, the official stance of :mod:`natsort` is to
315		not support sorting bytes. Instead, some decoding convenience functions
316		have been provided to you (see :ref:`bytes_help`) that allow you to
317		provide a codec for decoding bytes through the ``key`` argument that
318		will allow :mod:`natsort` to convert byte arrays to strings for sorting;
319		these functions know not to raise an error if the input is not a byte
320		array, so you can use the key on any arbitrary collection of data.
321
322		.. code-block:: python
323
324		>>> from natsort import as_ascii
325		>>> a = [b'a', 14.0, 'b']
326		>>> # On Python 2, natsorted(a) would would work as expected.
327		>>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
328		>>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
329		True
330
331		Additionally, regular expressions cannot be run on byte arrays, making it
332		so that :mod:`natsort` cannot parse them for numbers. As a result, if you
333		run :mod:`natsort` on a list of bytes, you will get results that are like
334		Python's default sorting behavior. Of course, you can use the decoding
335		functions to solve this:
336
337		.. code-block:: python
338
339		>>> from natsort import as_utf8
340		>>> a = [b'a56', b'a5', b'a6', b'a40']
341		>>> natsorted(a) # doctest: +SKIP
342		[b'a40', b'a5', b'a56', b'a6']
343		>>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
344		True
345
346		If you need a codec different from ASCII or UTF-8, you can use
347		:func:`decoder` to generate a custom key:
348
349		.. code-block:: python
350
351		>>> from natsort import decoder
352		>>> a = [b'a56', b'a5', b'a6', b'a40']
353		>>> natsorted(a, key=decoder('latin1')) == [b'a5', b'a6', b'a40', b'a56']
354		True
355
356		Sorting a Pandas DataFrame
357		--------------------------
358
359		As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` argument,
360		so you cannot simply pass :func:`natsort_keygen` to a Pandas DataFrame and sort.
361		This request has been made to the Pandas devs; see
362		`issue 3942 <https://github.com/pydata/pandas/issues/3942>`_ if you are interested.
363		If you need to sort a Pandas DataFrame, please check out
364		`this answer on StackOverflow <http://stackoverflow.com/a/29582718/1399279>`_
365		for ways to do this without the ``key`` argument to ``sort``.

-1113

~~docs/source/howitworks.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _howitworks:
4
5		How Does Natsort Work?
6		======================
7
8		.. contents::
9		:local:
10
11		:mod:`natsort` works by breaking strings into smaller sub-components (numbers
12		or everything else), and returning these components in a tuple. Sorting
13		tuples in Python is well-defined, and this fact is used to sort the input
14		strings properly. But how does one break a string into sub-components?
15		And what does one do to those components once they are split? Below I
16		will explain the algorithm that was chosen for the :mod:`natsort` module,
17		and some of the thinking that went into those design decisions. I will
18		also mention some of the stumbling blocks I ran into because
19		`getting sorting right is surprisingly hard`_.
20
21		If you are impatient, you can skip to :ref:`tldr1` for the algorithm
22		in the simplest case, and :ref:`tldr2`
23		to see what extra code is needed to handle special cases.
24
25		First, How Does Natural Sorting Work At a High Level?
26		-----------------------------------------------------
27
28		If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following
29
30		.. code-block:: python
31
32		>>> '2 ft 7 in' < '2 ft 11 in'
33		False
34
35		We as humans know that the above should be true, but why does Python think it
36		is false? Here is how it is performing the comparison::
37
38		'2' <=> '2' ==> equal, so keep going
39		' ' <=> ' ' ==> equal, so keep going
40		'f' <=> 'f' ==> equal, so keep going
41		't' <=> 't' ==> equal, so keep going
42		' ' <=> ' ' ==> equal, so keep going
43		'7' <=> '1' ==> different, use result of '7' < '1'
44
45		'7' evaluates as greater than '1' so the statement is false. When sorting, if
46		a value is less than another it is placed first, so in our above example
47		'2 ft 11 in' would end up before '2 ft 7 in', which is not correct. What to do?
48
49		The best way to handle this is to break the string into sub-components
50		of numbers and non-numbers, and then convert the numeric parts into
51		:func:`float` or :func:`int` types. This will force Python to
52		actually understand the context of what it is sorting and then "do the
53		right thing." Luckily, it handles sorting lists of strings right out-of-the-box,
54		so the only hard part is actually making this string-to-list transformation
55		and then Python will handle the rest.
56
57		::
58
59		'2 ft 7 in' ==> (2, ' ft ', 7, ' in')
60		'2 ft 11 in' ==> (2, ' ft ', 11, ' in')
61
62		When Python compares the two, it roughly follows the below logic::
63
64		2 <=> 2 ==> equal, so keep going
65		' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually
66		\|\|
67		-->
68		' ' <=> ' ' ==> equal, so keep going
69		'f' <=> 'f' ==> equal, so keep going
70		't' <=> 't' ==> equal, so keep going
71		' ' <=> ' ' ==> equal, so keep going
72		<== Back to parent sequence
73		7 <=> 11 ==> different, use the result of 7 < 11
74
75		Clearly, seven is less than eleven, so our comparison is as we expect, and we
76		would get the sorting order we wanted.
77
78		At its heart, :mod:`natsort` is simply a tool to break strings into tuples,
79		turning numbers in strings (i.e. ``'79'``) into ints and floats as it does this.
80
81		Natsort's Approach
82		------------------
83
84		.. contents::
85		:local:
86
87		Decomposing Strings Into Sub-Components
88		+++++++++++++++++++++++++++++++++++++++
89
90		The first major hurtle to overcome is to decompose the string into sub-components.
91		Remarkably, this turns out to be the easy part, owing mostly to Python's easy access
92		to regular expressions. Breaking an arbitrary string based on a pattern is pretty
93		straightforward.
94
95		.. code-block:: python
96
97		>>> import re
98		>>> re.split(r'(\d+)', '2 ft 11 in')
99		['', '2', ' ft ', '11', ' in']
100
101		Clear (assuming you can read regular expressions) and concise.
102
103		The reason I began developing :mod:`natsort` in the first place was because I
104		needed to handle the natural sorting of strings containing real numbers, not just
105		unsigned integers as the above example contains. By real numbers, I mean those like
106		``-45.4920E-23``. :mod:`natsort` can handle just about any number definition;
107		to that end, here are all the regular expressions used in :mod:`natsort`:
108
109		.. code-block:: python
110
111		>>> unsigned_int = r'([0-9]+)'
112		>>> signed_int = r'([-+]?[0-9]+)'
113		>>> unsigned_float = r'((?:[0-9]+\.?[0-9]*\|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
114		>>> signed_float = r'([-+]?(?:[0-9]+\.?[0-9]*\|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
115		>>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*\|\.[0-9]+))'
116		>>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*\|\.[0-9]+))'
117
118		Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float definition because you
119		wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``,
120		Let's see an example:
121
122		.. code-block:: python
123
124		>>> re.split(signed_float, 'The mass of 3 electrons is 2.732815068E-30 kg')
125		['The mass of ', '3', ' electrons is ', '2.732815068E-30', ' kg']
126
127		.. note::
128
129		It is a bit of a lie to say the above are the complete regular expressions. In the
130		actual code there is also handling for non-ASCII unicode characters (such as ⑦),
131		but I will ignore that aspect of :mod:`natsort` in this discussion.
132
133		Now, when the user wants to change the definition of a number, it is as easy as changing
134		the pattern supplied to the regular expression engine.
135
136		Choosing the right default is hard, though (well, in this case it shouldn't have been
137		but I was rather thick-headed).
138		In retrospect, it should have been obvious that since essentially all the code examples
139		I had/have seen for natural sorting were for unsigned integers, I should have made the default
140		definition of a number an unsigned integer. But, in the brash days of my youth I assumed
141		that since my use case was real numbers, everyone else would be happier sorting by real numbers;
142		so, I made the default definition of a number a signed float with exponent.
143		`This astonished`_ `a lot`_ `of people`_
144		(`and some people aren't very nice when they are astonished`_).
145		Starting with :mod:`natsort` version 4.0.0 the default number definition was
146		changed to an unsigned integer which satisfies the "least astonishment" principle, and
147		I have not heard a complaint since.
148
149		Coercing Strings Containing Numbers Into Numbers
150		++++++++++++++++++++++++++++++++++++++++++++++++
151
152		There has been some debate on Stack Overflow as to what method is best to
153		coerce a string to a number if it can be coerced, and leaving it alone otherwise
154		(see `this one for coercion`_ and `this one for checking`_ for some high traffic questions),
155		but it mostly boils down to two different solutions, shown here:
156
157		.. code-block:: python
158
159		>>> def coerce_try_except(x):
160		... try:
161		... return int(x)
162		... except ValueError:
163		... return x
164		...
165		>>> def coerce_regex(x):
166		... # Note that precompiling the regex is more performant,
167		... # but I do not show that here for clarity's sake.
168		... return int(x) if re.match(r'[-+]?\d+$', x) else x
169		...
170
171		Here are some timing results run on my machine:
172
173		::
174
175		In [0]: numbers = list(map(str, range(100))) # A list of numbers as strings
176
177		In [1]: not_numbers = ['banana' + x for x in numbers]
178
179		In [2]: %timeit [coerce_try_except(x) for x in numbers]
180		10000 loops, best of 3: 51.1 µs per loop
181
182		In [3]: %timeit [coerce_try_except(x) for x in not_numbers]
183		1000 loops, best of 3: 289 µs per loop
184
185		In [4]: %timeit [coerce_regex(x) for x in not_numbers]
186		10000 loops, best of 3: 67.6 µs per loop
187
188		In [5]: %timeit [coerce_regex(x) for x in numbers]
189		10000 loops, best of 3: 123 µs per loop
190
191		What can we learn from this? The ``try: except`` method (arguably the most "pythonic"
192		of the solutions) is best for numeric input, but performs over 5X slower for non-numeric
193		input. Conversely, the regular expression method, though slower than ``try: except`` for
194		both input types, is more efficient for non-numeric input than for input that can be
195		converted to an ``int``. Further, even though the regular expression method is slower
196		for both input types, it is always at least twice as fast as the worst case for the
197		``try: except``.
198
199		Why do I care? Shouldn't I just pick a method and not worry about it? Probably. However,
200		I am very conscious about the performance of :mod:`natsort`, and want it to be a true
201		drop-in replacement for :func:`sorted` without having to incur a performance penalty.
202		For the purposes of :mod:`natsort`, there is no clear winner between the two algorithms -
203		the data being passed to this function will likely be a mix of numeric and non-numeric
204		string content. Do I use the ``try: except`` method and hope the speed gains on
205		numbers will offset the non-number performance, or do I use regular expressions and
206		take the more stable performance?
207
208		It turns out that within the context of :mod:`natsort`, some assumptions can be
209		made that make a hybrid approach attractive. Because all strings are pre-split
210		into numeric and non-numeric content before being passed to this coercion function,
211		the assumption can be made that *if a string begins with a digit or a sign, it
212		can be coerced into a number*.
213
214		.. code-block:: python
215
216		>>> def coerce_to_int(x):
217		... if x[0] in '0123456789+-':
218		... try:
219		... return int(x)
220		... except ValueError:
221		... return x
222		... else:
223		... return x
224		...
225
226		So how does this perform compared to the standard coercion methods?
227
228		::
229
230		In [6]: %timeit [coerce_to_int(x) for x in numbers]
231		10000 loops, best of 3: 71.6 µs per loop
232
233		In [7]: %timeit [coerce_to_int(x) for x in not_numbers]
234		10000 loops, best of 3: 26.4 µs per loop
235
236		The hybrid method eliminates most of the time wasted on numbers checking that it
237		is in fact a number before passing to :func:`int`, and eliminates the time wasted
238		in the exception stack for input that is not a number.
239
240		That's as fast as we can get, right? In pure Python, probably. At least, it's
241		close. But because I am crazy and a glutton for punishment, I decided to see
242		if I could get any faster writing a C extension. It's called
243		`fastnumbers`_ and contains a C implementation of the above coercion functions
244		called :func:`fast_int`. How does it fair? Pretty well.
245
246		::
247
248		In [8]: %timeit [fast_int(x) for x in numbers]
249		10000 loops, best of 3: 30.9 µs per loop
250
251		In [9]: %timeit [fast_int(x) for x in not_numbers]
252		10000 loops, best of 3: 30 µs per loop
253
254		During development of :mod:`natsort`, I wanted to ensure that using it did not
255		get in the way of a user's program by introducing a performance penalty to their code.
256		To that end, I do not feel like my adventures down the rabbit hole of optimization
257		of coercion functions was a waste; I can confidently look users in the eye and
258		say I considered every option in ensuring :mod:`natsort` is as efficient as possible.
259		This is why if `fastnumbers`_ is installed it will be used for this step,
260		and otherwise the hybrid method will be used.
261
262		.. note::
263
264		Modifying the hybrid coercion function for floats is straightforward.
265
266		.. code-block:: python
267
268		>>> def coerce_to_float(x):
269		... if x[0] in '.0123456789+-' or x.lower().lstrip()[:3] in ('nan', 'inf'):
270		... try:
271		... return float(x)
272		... except ValueError:
273		... return x
274		... else:
275		... return x
276		...
277
278		.. _tldr1:
279
280		TL;DR 1 - The Simple "No Special Cases" Algorithm
281		+++++++++++++++++++++++++++++++++++++++++++++++++
282
283		At this point, our :mod:`natsort` algorithm is essentially the following:
284
285		.. code-block:: python
286
287		>>> import re
288		>>> def natsort_key(x, as_float=False, signed=False):
289		... if as_float:
290		... regex = signed_float if signed else unsigned_float
291		... else:
292		... regex = signed_int if signed else unsigned_int
293		... split_input = re.split(regex, x)
294		... split_input = filter(None, split_input) # removes null strings
295		... coerce = coerce_to_float if as_float else coerce_to_int
296		... return tuple(coerce(s) for s in split_input)
297		...
298
299		I have written the above for clarity and not performance.
300		This pretty much matches `most natural sort solutions for python on Stack Overflow`_
301		(except the above includes customization of the definition of a number).
302
303		Special Cases Everywhere!
304		-------------------------
305
306		.. contents::
307		:local:
308
309		.. image:: special_cases_everywhere.jpg
310
311		If what I described in :ref:`TL;DR 1 <tldr1>` were
312		all that :mod:`natsort` needed to
313		do then there probably wouldn't be much need for a third-party module, right?
314		Probably. But it turns out that in real-world data there are a lot of
315		special cases that need to be handled, and in true `80%/20%`_ fashion, the
316		majority of the code in :mod:`natsort` is devoted to handling special cases
317		like those described below.
318
319		Sorting Filesystem Paths
320		++++++++++++++++++++++++
321
322		`The first major special case I encountered was sorting filesystem paths`_
323		(if you go to the link, you will see I didn't handle it well for a year...
324		this was before I fully realized how much functionality I could really add
325		to :mod:`natsort`). Let's apply the :func:`natsort_key` from above to some
326		filesystem paths that you might see being auto-generated from your operating
327		system:
328
329		.. code-block:: python
330
331		>>> paths = ['/p/Folder (10)/file.tar.gz',
332		... '/p/Folder/file.tar.gz',
333		... '/p/Folder (1)/file (1).tar.gz',
334		... '/p/Folder (1)/file.tar.gz']
335		>>> sorted(paths, key=natsort_key)
336		['/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz', '/p/Folder/file.tar.gz']
337
338		Well that's not right! What is ``'/p/Folder/file.tar.gz'`` doing at the end?
339		It has to do with the numerical ASCII code assigned to the space and
340		``/`` characters in the `ASCII table`_. According to the `ASCII table`_, the
341		space character (number 32) comes before the ``/`` character (number 47). If
342		we remove the common prefix in all of the above strings (``'/p/Folder'``), we
343		can see why this happens:
344
345		.. code-block:: python
346
347		>>> ' (1)/file.tar.gz' < '/file.tar.gz'
348		True
349		>>> ' ' < '/'
350		True
351
352		This isn't very convenient... how do we solve it? We can split the path
353		across the path separators and then sort. A convenient way do to this is
354		with the `Path.parts`_ method from :mod:`pathlib`:
355
356		.. code-block:: python
357
358		>>> import pathlib
359		>>> sorted(paths, key=lambda x: tuple(natsort_key(s) for s in pathlib.Path(x).parts))
360		['/p/Folder/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz']
361
362		Almost! It seems like there is some funny business going on in the final
363		filename component as well. We can solve that nicely and quickly with `Path.suffixes`_
364		and `Path.stem`_.
365
366		.. code-block:: python
367
368		>>> def decompose_path_into_components(x):
369		... path_split = list(pathlib.Path(x).parts)
370		... # Remove the final filename component from the path.
371		... final_component = pathlib.Path(path_split.pop())
372		... # Split off all the extensions.
373		... suffixes = final_component.suffixes
374		... stem = final_component.name.replace(''.join(suffixes), '')
375		... # Remove the '.' prefix of each extension, and make that
376		... # final component a list of the stem and each suffix.
377		... final_component = [stem] + [x[1:] for x in suffixes]
378		... # Replace the split final filename component.
379		... path_split.extend(final_component)
380		... return path_split
381		...
382		>>> def natsort_key_with_path_support(x):
383		... return tuple(natsort_key(s) for s in decompose_path_into_components(x))
384		...
385		>>> sorted(paths, key=natsort_key_with_path_support)
386		['/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (10)/file.tar.gz']
387
388		This works because in addition to breaking the input by path separators, the final
389		filename component is separated from its extensions as well [#f1]_. Then, each of these
390		separated components is sent to the :mod:`natsort` algorithm, so the result is
391		a tuple of tuples. Once that is done, we can see how comparisons can be done in
392		the expected manner.
393
394		.. code-block:: python
395
396		>>> a = natsort_key_with_path_support('/p/Folder (1)/file (1).tar.gz')
397		>>> a
398		(('/',), ('p',), ('Folder (', 1, ')'), ('file (', 1, ')'), ('tar',), ('gz',))
399		>>>
400		>>> b = natsort_key_with_path_support('/p/Folder/file.tar.gz')
401		>>> b
402		(('/',), ('p',), ('Folder',), ('file',), ('tar',), ('gz',))
403		>>>
404		>>> a > b
405		True
406
407		Comparing Different Types on Python 3
408		+++++++++++++++++++++++++++++++++++++
409
410		`The second major special case I encountered was sorting of different types`_.
411		If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter too
412		much since it uses an arbitrary heuristic to allow traditionally un-comparable
413		types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
414		(i.e. Python) it simply won't let you perform such nonsense, raising a
415		:exc:`TypeError` instead.
416
417		You can imagine that a module that breaks strings into tuples of numbers and
418		strings is walking a dangerous line if it does not have special handling for
419		comparing numbers and strings. My imagination was not so great at first.
420		Let's take a look at all the ways this can fail with real-world data.
421
422		.. code-block:: python
423
424		>>> def natsort_key_with_poor_real_number_support(x):
425		... split_input = re.split(signed_float, x)
426		... split_input = filter(None, split_input) # removes null strings
427		... return tuple(coerce_to_float(s) for s in split_input)
428		>>>
429		>>> sorted([5, '4'], key=natsort_key_with_poor_real_number_support)
430		Traceback (most recent call last):
431		...
432		TypeError: ...
433		>>>
434		>>> sorted(['12 apples', 'apples'], key=natsort_key_with_poor_real_number_support)
435		Traceback (most recent call last):
436		...
437		TypeError: ...
438		>>>
439		>>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_poor_real_number_support)
440		Traceback (most recent call last):
441		...
442		TypeError: ...
443
444		Let's break these down.
445
446		#. The integer ``5`` is sent to ``re.split`` which expects only strings
447		or bytes, which is a no-no.
448		#. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')``
449		is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets
450		compared to a string [#f2]_ which also is a no-no.
451		#. This one scores big on the astonishment scale, especially if one accidentally
452		uses signed integers or real numbers when they mean to use unsigned integers.
453		``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')``
454		is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, so in the
455		third element a number gets compared to a string, once again the same
456		old no-no. (The same would happen with ``'version5-3'`` and ``'version5-a'``,
457		which would be come ``('version', 5, -3)`` and ``('version', 5, '-a')``).
458
459		As you might expect, the solution to the first issue is to wrap the ``re.split``
460		call in a ``try: except:`` block and handle the number specially if a
461		:exc:`TypeError` is raised. The second and third cases could be handled
462		in a "special case" manner, meaning only respond and do something different
463		if these problems are detected. But a less error-prone method is to ensure
464		that the data is correct-by-construction, and this can be done by ensuring
465		that the returned tuples always start with a string, and then alternate
466		in a string-number-string-number-string patter;n this can be achieved by
467		adding an empty string wherever the pattern is not followed [#f3]_. This ends
468		up working out pretty nicely because empty strings are always "less" than
469		any non-empty string, and we typically want numbers to come before strings.
470
471		Let's take a look at how this works out.
472
473		.. code-block:: python
474
475		>>> from natsort.utils import sep_inserter
476		>>> list(sep_inserter(iter(['apples']), ''))
477		['apples']
478		>>>
479		>>> list(sep_inserter(iter([12, ' apples']), ''))
480		['', 12, ' apples']
481		>>>
482		>>> list(sep_inserter(iter(['version', 5, -3]), ''))
483		['version', 5, '', -3]
484		>>>
485		>>> from natsort import natsort_keygen, ns
486		>>> natsort_key_with_good_real_number_support = natsort_keygen(alg=ns.REAL)
487		>>>
488		>>> sorted([5, '4'], key=natsort_key_with_good_real_number_support)
489		['4', 5]
490		>>>
491		>>> sorted(['12 apples', 'apples'], key=natsort_key_with_good_real_number_support)
492		['12 apples', 'apples']
493		>>>
494		>>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support)
495		['version5.3.0', 'version5.3rc1']
496
497		How the "good" version works will be given in `TL;DR 2 - Handling Crappy, Real-World Input`_.
498
499		Handling NaN
500		++++++++++++
501
502		`A rather unexpected special case I encountered was sorting collections containing NaN`_.
503		Let's see what happens when you try to sort a plain old list of numbers when there
504		is a NaN floating around in there.
505
506		.. code-block:: python
507
508		>>> danger = [7, float('nan'), 22.7, 19, -14, 59.123, 4]
509		>>> sorted(danger)
510		[7, nan, -14, 4, 19, 22.7, 59.123]
511
512		Clearly that isn't correct, and for once it isn't my fault!
513		`It's hard to compare floating point numbers`_. By definition, NaN is unorderable
514		to any other number, and is never equal to any other number, including itself.
515
516		.. code-block:: python
517
518		>>> nan = float('nan')
519		>>> 5 > nan
520		False
521		>>> 5 < nan
522		False
523		>>> 5 == nan
524		False
525		>>> 5 != nan
526		True
527		>>> nan == nan
528		False
529		>>> nan != nan
530		True
531
532		The implication of all this for us is that if there is an NaN in the
533		data-set we are trying to sort, the data-set will end up being sorted in
534		two separate yet individually sorted sequences - the one before the NaN,
535		and the one after. This is because the ``<`` operation that is used
536		to sort always returns :const:`False` with NaN.
537
538		Because :mod:`natsort` aims to sort sequences in a way that does not surprise
539		the user, keeping this behavior is not acceptable (I don't require my users
540		to know how NaN will behave in a sorting algorithm). The simplest way to
541		satisfy the "least astonishment" principle is to substitute NaN with
542		some other value. But what value is least astonishing? I chose to replace
543		NaN with :math:`-\infty` so that these poorly behaved elements always
544		end up at the front where the users will most likely be alerted to their presence.
545
546		.. code-block:: python
547
548		>>> def fix_nan(x):
549		... if x != x: # only true for NaN
550		... return float('-inf')
551		... else:
552		... return x
553		...
554
555		Let's check out :ref:`TL;DR 2 <tldr2>` to see how this can be
556		incorporated into the simple key function from :ref:`TL;DR 1 <tldr1>`.
557
558		.. _tldr2:
559
560		TL;DR 2 - Handling Crappy, Real-World Input
561		+++++++++++++++++++++++++++++++++++++++++++
562
563		Let's see how our elegant key function from :ref:`TL;DR 1 <tldr1>` has
564		become bastardized in order to support handling mixed real-world data
565		and user customizations.
566
567		>>> def natsort_key(x, as_float=False, signed=False, as_path=False):
568		... if as_float:
569		... regex = signed_float if signed else unsigned_float
570		... else:
571		... regex = signed_int if signed else unsigned_int
572		... try:
573		... if as_path:
574		... x = decompose_path_into_components(x) # Decomposes into list of strings
575		... # If this raises a TypeError, input is not a string.
576		... split_input = re.split(regex, x)
577		... except TypeError:
578		... try:
579		... # Does this need to be applied recursively (list-of-list)?
580		... return tuple(map(natsort_key, x))
581		... except TypeError:
582		... # Must be a number
583		... ret = ('', fix_nan(x)) # Maintain string-number-string pattern
584		... return (ret,) if as_path else ret # as_path returns tuple-of-tuples
585		... else:
586		... split_input = filter(None, split_input) # removes null strings
587		... # Note that the coerce_to_int/coerce_to_float functions
588		... # are also modified to use the fix_nan function.
589		... if as_float:
590		... coerced_input = (coerce_to_float(s) for s in split_input)
591		... else:
592		... coerced_input = (coerce_to_int(s) for s in split_input)
593		... return tuple(sep_inserter(coerced_input, ''))
594		...
595
596		And this doesn't even show handling :class:`bytes` type! Notice that we have
597		to do non-obvious things like modify the return form of numbers when ``as_path``
598		is given, just to avoid comparing strings and numbers for the case in which a user provides
599		input like ``['/home/me', 42]``.
600
601		Let's take it out for a spin!
602
603		.. code-block:: python
604
605		>>> danger = [7, float('nan'), 22.7, '19', '-14', '59.123', 4]
606		>>> sorted(danger, key=lambda x: natsort_key(x, as_float=True, signed=True))
607		[nan, '-14', 4, 7, '19', 22.7, '59.123']
608		>>>
609		>>> paths = ['/p/Folder (1)/file.tar.gz',
610		... '/p/Folder/file.tar.gz',
611		... 123456]
612		>>> sorted(paths, key=lambda x: natsort_key(x, as_path=True))
613		[123456, '/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz']
614
615		Here Be Dragons: Adding Locale Support
616		--------------------------------------
617
618		.. contents::
619		:local:
620
621		Probably the most challenging special case I had to handle was getting
622		:mod:`natsort` to handle sorting the non-numerical parts of input
623		correctly, and also allowing it to sort the numerical bits in different
624		locales. This was in no way what I originally set out to do with this
625		library, so I was `caught a bit off guard when the request was initially made`_.
626		I discovered the :mod:`locale` library, and assumed that if it's part of Python's
627		StdLib there can't be too many dragons, right?
628
629		.. admonition:: INCOMPLETE LIST OF DRAGONS
630
631		- https://github.com/SethMMorton/natsort/issues/21
632		- https://github.com/SethMMorton/natsort/issues/22
633		- https://github.com/SethMMorton/natsort/issues/23
634		- https://github.com/SethMMorton/natsort/issues/36
635		- https://github.com/SethMMorton/natsort/issues/44
636		- https://bugs.python.org/issue2481
637		- https://bugs.python.org/issue23195
638		- https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
639		- https://stackoverflow.com/questions/22203550/sort-dictionary-by-key-using-locale-collation
640		- https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
641		- https://stackoverflow.com/questions/36431810/sort-numeric-lines-with-thousand-separators
642		- https://stackoverflow.com/questions/45734562/how-can-i-get-a-reasonable-string-sorting-with-python
643
644		These can be summed up as follows:
645
646		#. :mod:`locale` is a thin wrapper over your operating system's locale
647		library, so if that is broken (like it is on BSD and OSX) then
648		:mod:`locale` is broken in Python.
649		#. Because of a bug in legacy Python (i.e. Python 2), there is no uniform way to use
650		the :mod:`locale` sorting functionality between legacy Python and Python 3.
651		#. People have differing opinions of how capitalization should affect word order.
652		#. There is no built-in way to handle locale-dependent thousands separators
653		and decimal points robustly.
654		#. Proper handling of Unicode is complicated.
655		#. Proper handling of :mod:`locale` is complicated.
656
657		Easily over half of the the code in :mod:`natsort` is in some way dealing with some
658		aspect of :mod:`locale` or basic case handling. It would have been
659		impossible to get right without a `really good`_ `testing strategy`_.
660
661		Don't expect any more TL;DR's... if you want to see how all this is fully
662		incorporated into the :mod:`natsort` algorithm then please take a look
663		`at the code`_. However, I will hint at how specific steps are taken in
664		each section.
665
666		Let's see how we can handle some of the dragons, one-by-one.
667
668		Basic Case Control Support
669		++++++++++++++++++++++++++
670
671		Without even thinking about the mess that is adding :mod:`locale` support,
672		:mod:`natsort` can introduce support for controlling how case is interpreted.
673
674		First, let's take a look at how it is sorted by default (due to
675		where characters lie on the `ASCII table`_).
676
677		.. code-block:: python
678
679		>>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
680		>>> sorted(a)
681		['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
682
683		All uppercase letters come before lowercase letters in the `ASCII table`_,
684		so all capitalized words appear first. Not everyone agrees that this
685		is the correct order. Some believe that the capitalized words should
686		be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``).
687		Some believe that both the lowercase and uppercase versions
688		should appear together (``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``).
689		Some believe that both should be true ☹. Some people don't care at all [#f4]_.
690
691		Solving the first case (I call it LOWERCASEFIRST) is actually pretty
692		easy... just call the :meth:`str.swapcase` method on the input.
693
694		.. code-block:: python
695
696		>>> sorted(a, key=lambda x: x.swapcase())
697		['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
698
699		The last (i call it IGNORECASE) should be super easy, right?
700		Simply call :meth:`str.lowercase` on the input. This will work but may
701		not always give the correct answer on non-latin character sets. It's
702		a good thing that in Python 3.3
703		:meth:`str.casefold` was introduced, which does a better job of removing
704		all case information from unicode characters in
705		non-latin alphabets.
706
707		.. code-block:: python
708
709		>>> def remove_case(x):
710		... try:
711		... return x.casefold()
712		... except AttributeError: # Legacy Python backwards compatibility
713		... return x.lowercase()
714		...
715		>>> sorted(a, key=remove_case)
716		['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
717
718		The middle case (I call it GROUPLETTERS) is less straightforward.
719		The most efficient way to handle this is to duplicate each character
720		with its lowercase version and then the original character.
721
722		.. code-block:: python
723
724		>>> import itertools
725		>>> def groupletters(x):
726		... return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
727		...
728		>>> groupletters('Apple')
729		'aAppppllee'
730		>>> groupletters('apple')
731		'aappppllee'
732		>>> sorted(a, key=groupletters)
733		['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
734
735		The effect of this is that both ``'Apple'`` and ``'apple'`` are
736		placed adjacent to each other because their transformations both begin
737		with ``'a'``, and then the second character can be used to order them
738		appropriately with respect to each other.
739
740		There's a problem with this, though. Within the context of :mod:`natsort`
741		we are trying to correctly sort numbers and those should be left alone.
742
743		.. code-block:: python
744
745		>>> a = ['Apple5', 'apple', 'Apple4E10', 'Banana']
746		>>> sorted(a, key=lambda x: natsort_key(x, as_float=True))
747		['Apple5', 'Apple4E10', 'Banana', 'apple']
748		>>> sorted(a, key=lambda x: natsort_key(groupletters(x), as_float=True))
749		['Apple4E10', 'Apple5', 'apple', 'Banana']
750		>>> groupletters('Apple4E10')
751		'aAppppllee44eE1100'
752
753		We messed up the numbers! Looks like :func:`groupletters` needs to be applied
754		after the strings are broken into their components. I'm not going to show
755		how this is done here, but basically it requires applying the function in
756		the ``else:`` block of :func:`coerce_to_int`/:func:`coerce_to_float`.
757
758		.. code-block:: python
759
760		>>> better_groupletters = natsort_keygen(alg=ns.GROUPLETTERS \| ns.REAL)
761		>>> better_groupletters('Apple4E10')
762		('aAppppllee', 40000000000.0)
763		>>> sorted(a, key=better_groupletters)
764		['Apple5', 'Apple4E10', 'apple', 'Banana']
765
766		Of course, applying both LOWERCASEFIRST and GROUPLETTERS is just
767		a matter of turning on both functions.
768
769		Basic Unicode Support
770		+++++++++++++++++++++
771
772		Unicode is hard and complicated. Here's an example.
773
774		.. code-block:: python
775
776		>>> b = [b'\x66', b'\x65', b'\xc3\xa9', b'\x65\xcc\x81', b'\x61', b'\x7a']
777		>>> a = [x.decode('utf8') for x in b]
778		>>> a # doctest: +SKIP
779		['f', 'e', 'é', 'é', 'a', 'z']
780		>>> sorted(a) # doctest: +SKIP
781		['a', 'e', 'é', 'f', 'z', 'é']
782
783
784		There are more than one way to represent the character 'é' in Unicode.
785		In fact, many characters have multiple representations. This is a challenge
786		because comparing the two representations would return ``False`` even though
787		they look the same.
788
789		.. code-block:: python
790
791		>>> a[2] == a[3]
792		False
793
794		Alas, since characters are compared based on the numerical value of their
795		representation, sorting Unicode often gives unexpected results (like seeing
796		'é' come both before and after 'z').
797
798		The original approach that :mod:`natsort` took with respect to non-ASCII
799		Unicode characters was to say "just use
800		the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers
801		and hope those libraries take care of it. As you will find in the following
802		sections, that comes with its own baggage, and turned out to not always work anyway
803		(see https://stackoverflow.com/q/45734562/1399279). A more robust approach is to
804		handle the Unicode out-of-the-box without invoking a heavy-handed library
805		like :mod:`locale` or :mod:`PyICU`. To do this, we must use normalization.
806
807		To fully understand Unicode normalization, `check out some official Unicode documentation`_.
808		Just kidding... that's too much text. The following StackOverflow answers do
809		a good job at explaining Unicode normalization in simple terms:
810		https://stackoverflow.com/a/7934397/1399279 and
811		https://stackoverflow.com/a/7931547/1399279. Put simply, normalization
812		ensures that Unicode characters with multiple representations are in
813		some canonical and consistent representation so that (for example) comparisons
814		of the characters can be performed in a sane way. The following discussion
815		assumes you at least read the StackOverflow answers.
816
817		Looking back at our 'é' example, we can see that the two versions were
818		constructed with the byte strings ``b'\xc3\xa9'`` and ``b'\x65\xcc\x81'``.
819		The former representation is actually
820		`LATIN SMALL LETTER E WITH ACUTE <http://www.fileformat.info/info/unicode/char/e9/index.htm>`_
821		and is a single character in the Unicode standard. This is known as the
822		compressed form and corresponds to the 'NFC' normalization scheme.
823		The latter representation is actually the letter 'e' followed by
824		`COMBINING ACUTE ACCENT <http://www.fileformat.info/info/unicode/char/0301/index.htm>`_
825		and so is two characters in the Unicode standard. This is known as the
826		decompressed form and corresponds to the 'NFD' normalization scheme.
827		Since the first character in the decompressed form is actually the letter 'e',
828		when compared to other ASCII characters it fits where you might expect.
829		Unfortunately, all Unicode compressed form characters come after the
830		ASCII characters and so they always will be placed after 'z' when sorting.
831
832		It seems that most Unicode data is stored and shared in the compressed form
833		which makes it challenging to sort. This can be solved by normalizing all
834		incoming Unicode data to the decompressed form ('NFD') and then sorting.
835
836		.. code-block:: python
837
838		>>> import unicodedata
839		>>> c = [unicodedata.normalize('NFD', x) for x in a]
840		>>> c # doctest: +SKIP
841		['f', 'e', 'é', 'é', 'a', 'z']
842		>>> sorted(c) # doctest: +SKIP
843		['a', 'e', 'é', 'é', 'f', 'z']
844
845		Huzzah! Sane sorting without having to resort to :mod:`locale`!
846
847		Using Locale to Compare Strings
848		+++++++++++++++++++++++++++++++
849
850		The :mod:`locale` module is actually pretty cool, and provides lowly
851		spare-time programmers like myself a way to handle the daunting task
852		of proper locale-dependent support of their libraries and utilities.
853		Having said that, it can be a bit of a bear to get right,
854		`although they do point out in the documentation that it will be painful to use`_.
855		Aside from the caveats spelled out in that link, it turns out that just
856		comparing strings with :mod:`locale` in a cross-platform and
857		cross-python-version manner is not as straightforward as one might hope.
858
859		First, how to use :mod:`locale` to compare strings? It's actually
860		pretty straightforward. Simply run the input through the :mod:`locale`
861		transformation function :func:`locale.strxfrm`.
862
863		.. code-block:: python
864
865		>>> import locale, sys
866		>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
867		'en_US.UTF-8'
868		>>> a = ['a', 'b', 'ä']
869		>>> sorted(a)
870		['a', 'b', 'ä']
871		>>> # The below fails on OSX, so don't run doctest on darwin.
872		>>> is_osx = sys.platform == 'darwin'
873		>>> sorted(a, key=locale.strxfrm) if not is_osx else ['a', 'ä', 'b']
874		['a', 'ä', 'b']
875		>>>
876		>>> a = ['apple', 'Banana', 'banana', 'Apple']
877		>>> sorted(a, key=locale.strxfrm) if not is_osx else ['apple', 'Apple', 'banana', 'Banana']
878		['apple', 'Apple', 'banana', 'Banana']
879
880		It turns out that locale-aware sorting groups numbers in the same
881		way as turning on GROUPLETTERS and LOWERCASEFIRST.
882		The trick is that you have to apply :func:`locale.strxfrm` only to non-numeric
883		characters; otherwise, numbers won't be parsed properly. Therefore, it must
884		be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
885		functions in a manner similar to :func:`groupletters`.
886
887		As you might have guessed, there is a small problem.
888		It turns out the there is a bug in the legacy Python implementation of
889		:func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
890		input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
891		but is intended for use with ``cmp``, which does not exist in current Python
892		implementations. Luckily, the :func:`functools.cmp_to_key` function
893		makes :func:`locale.strcoll` behave like :func:`locale.strxfrm` (that is, of course,
894		unless you are on Python 2.6 where :func:`functools.cmp_to_key` doesn't exist,
895		in which case you simply copy-paste the implementation from Python 2.7
896		directly into your code ☹).
897
898		Handling Broken Locale On OSX
899		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
900
901		But what if the underlying locale implementation that :mod:`locale`
902		relies upon is simply broken? It turns out that the locale library on
903		OSX (and other BSD systems) is broken (and for some reason has never been
904		fixed?), and so :mod:`locale` does not work as expected.
905
906		How do I define doesn't work as expected?
907
908		.. code-block:: python
909
910		>>> a = ['apple', 'Banana', 'banana', 'Apple']
911		>>> sorted(a)
912		['Apple', 'Banana', 'apple', 'banana']
913		>>>
914		>>> sorted(a, key=locale.strxfrm) if is_osx else sorted(a)
915		['Apple', 'Banana', 'apple', 'banana']
916
917		IT'S SORTING AS IF :func:`locale.stfxfrm` WAS NEVER USED!! (and it's worse
918		once non-ASCII characters get thrown into the mix.) I'm really not
919		sure why this is considered OK for the OSX/BSD maintainers to not fix,
920		but it's more than frustrating for poor developers who have been dragged
921		into the locale game kicking and screaming. <deep breath>.
922
923		So, how to deal with this situation? There are two ways to do so.
924
925		#. Detect if :mod:`locale` is sorting incorrectly (i.e. ``dumb``) by seeing
926		if ``'A'`` is sorted before ``'a'`` (incorrect) or not.
927
928		.. code-block:: python
929
930		>>> # This is genuinely the name of this function.
931		>>> # See natsort.compat.locale.py
932		>>> def dumb_sort():
933		... return locale.strxfrm('A') < locale.strxfrm('a')
934		...
935
936		If a ``dumb`` locale implementation is found, then automatically
937		turn on LOWERCASEFIRST and GROUPLETTERS.
938		#. Use an alternate library if installed. `ICU <http://site.icu-project.org/>`_
939		is a great and powerful library that has a pretty decent Python port
940		called (you guessed it) `PyICU <https://pypi.python.org/pypi/PyICU/>`_.
941		If a user has this library installed on their computer, :mod:`natsort`
942		chooses to use that instead of :mod:`locale`. With a little bit of
943		planning, one can write a set of wrapper functions that call
944		the correct library under the hood such that the business logic never
945		has to know what library is being used (see `natsort.compat.locale.py`_).
946
947		Let me tell you, this little complication really makes a challenge of testing
948		the code, since one must set up different environments on different operating
949		systems in order to test all possible code paths. Not to mention that
950		certain checks will fail for certain operating systems and environments
951		so one must be diligent in either writing the tests not to fail, or ignoring
952		those tests when on offending environments.
953
954		Handling Locale-Aware Numbers
955		+++++++++++++++++++++++++++++
956
957		`Thousands separator support`_ is a problem that I knew would someday be
958		requested but had decided to push off until a rainy day. One day it finally
959		rained, and I decided to tackle the problem.
960
961		So what is the problem? Consider the number ``1,234,567`` (assuming the
962		``','`` is the thousands separator). Try to run that through :func:`int`
963		and you will get a :exc:`ValueError`. To handle this properly the thousands
964		separators must be removed.
965
966		.. code-block:: python
967
968		>>> float('1,234,567'.replace(',', ''))
969		1234567.0
970
971		What if, in our current locale, the thousands separator is ``'.'`` and
972		the ``','`` is the decimal separator (like for the German locale de_DE)?
973
974		.. code-block:: python
975
976		>>> float('1.234.567'.replace('.', '').replace(',', '.'))
977		1234567.0
978		>>> float('1.234.567,89'.replace('.', '').replace(',', '.'))
979		1234567.89
980
981		This is pretty much what :func:`locale.atoi` and :func:`locale.atof` do
982		under the hood. So what's the problem? Why doesn't :mod:`natsort` just
983		use this method under its hood?
984		Well, let's take a look at what would happen if we send some possible
985		:mod:`natsort` input through our the above function:
986
987		.. code-block:: python
988
989		>>> natsort_key('1,234 apples, please.'.replace(',', ''))
990		('', 1234, ' apples please.')
991		>>> natsort_key('Sir, €1.234,50 please.'.replace('.', '').replace(',', '.'), as_float=True)
992		('Sir. €', 1234.5, ' please')
993
994		Any character matching the thousands separator was dropped, and anything
995		matching the decimal separator was changed to ``'.'``! If these characters
996		were critical to how your data was ordered, this would break :mod:`natsort`.
997
998		The first solution one might consider would be to first decompose the
999		input into sub-components (like we did for the GROUPLETTERS method
1000		above) and then only apply these transformations on the number components.
1001		This is a chicken-and-egg problem, though, because *we cannot appropriately
1002		separate out the numbers because of the thousands separators and
1003		non-'.' decimal separators* (well, at least not without making multiple
1004		passes over the data which I do not consider to be a valid option).
1005
1006		Regular expressions to the rescue! With regular expressions, we can
1007		remove the thousands separators and change the decimal separator only
1008		when they are actually within a number. Once the input has been
1009		pre-processed with this regular expression, all the infrastructure
1010		shown previously will work.
1011
1012		Beware, these regular expressions will make your eyes bleed.
1013
1014		.. code-block:: python
1015
1016		>>> decimal = ',' # Assume German locale, so decimal separator is ','
1017		>>> # Look-behind assertions cannot accept range modifiers, so instead of i.e.
1018		>>> # (?<!\.[0-9]{1,3}) I have to repeat the look-behind for 1, 2, and 3.
1019		>>> nodecimal = r'(?<!{dec}[0-9])(?<!{dec}[0-9]{{2}})(?<!{dec}[0-9]{{3}})'.format(dec=decimal)
1020		>>> strip_thousands = r'''
1021		... (?<=[0-9]{{1}}) # At least 1 number
1022		... (?<![0-9]{{4}}) # No more than 3 numbers
1023		... {nodecimal} # Cannot follow decimal
1024		... {thou} # The thousands separator
1025		... (?=[0-9]{{3}} # Three numbers must follow
1026		... ([^0-9]\|$) # But a non-number after that
1027		... )
1028		... '''.format(nodecimal=nodecimal, thou='.') # Thousands separator is '.' in German locale.
1029		...
1030		>>> re.sub(strip_thousands, '', 'Sir, €1.234,50 please.', flags=re.X)
1031		'Sir, €1234,50 please.'
1032		>>>
1033		>>> # The decimal point must be preceded by a number or after
1034		>>> # a number. This option only needs to be performed in the
1035		>>> # case when the decimal separator for the locale is not '.'.
1036		>>> switch_decimal = r'(?<=[0-9]){decimal}\|{decimal}(?=[0-9])'
1037		>>> switch_decimal = switch_decimal.format(decimal=decimal)
1038		>>> re.sub(switch_decimal, '.', 'Sir, €1234,50 please.', flags=re.X)
1039		'Sir, €1234.50 please.'
1040		>>>
1041		>>> natsort_key('Sir, €1234.50 please.', as_float=True)
1042		('Sir, €', 1234.5, ' please.')
1043
1044		Final Thoughts
1045		--------------
1046
1047		My hope is that users of :mod:`natsort` never have to think about or worry
1048		about all the bookkeeping or any of the details described above, and that using
1049		:mod:`natsort` seems to magically "just work". For those of you who
1050		took the time to read this engineering description, I hope it has enlightened
1051		you to some of the issues that can be encountered when code is released
1052		into the wild and has to accept "real-world data", or to what happens
1053		to developers who naïvely make bold assumptions that are counter to
1054		what the rest of the world assumes.
1055
1056		.. rubric:: Footnotes
1057
1058		.. [#f1]
1059		To anyone looking through the actual code, you will note that I don't
1060		actually use :mod:`pathlib` to split the paths... I wrote my own version
1061		to avoid adding an external dependency of :mod:`pathlib` on Python < 3.4.
1062		.. [#f2]
1063		*"But if you hadn't removed the leading empty string from re.split this
1064		wouldn't have happened!!"* I can hear you saying. Well, that's true. I don't
1065		have a great reason for having done that except that in an earlier
1066		non-optimal incarnation of the algorithm I needed to it, and it kind of
1067		stuck, and it made other parts of the code easier if the assumption that
1068		there were no empty strings was valid.
1069		.. [#f3]
1070		I'm not going to show how this is implemented in this document,
1071		but if you are interested you can look at the code to
1072		:func:`sep_inserter` in `util.py`_.
1073		.. [#f4]
1074		Handling each of these is straightforward, but coupled with the rapidly
1075		fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can imagine
1076		this will get out of hand quickly. If you take a look at `natsort.py`_ and
1077		`util.py`_ you can observe that to avoid this I take a more functional approach
1078		to construting the :mod:`natsort` algorithm as opposed to the procedural approach
1079		illustrated in :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`.
1080
1081		.. _ASCII table: http://www.asciitable.com/
1082		.. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/
1083		.. _This astonished: https://github.com/SethMMorton/natsort/issues/19
1084		.. _a lot: http://stackoverflow.com/questions/29548742/python-natsort-sort-strings-recursively
1085		.. _of people: http://stackoverflow.com/questions/24045348/sort-set-of-numbers-in-the-form-xx-yy-in-python
1086		.. _and some people aren't very nice when they are astonished:
1087		https://github.com/xolox/python-naturalsort/blob/ed3e6b6ffaca3bdea3b76e08acbb8bd2a5fee463/README.rst#why-another-natsort-module
1088		.. _fastnumbers: https://github.com/SethMMorton/fastnumbers
1089		.. _as part of my testing: https://github.com/SethMMorton/natsort/blob/master/test_natsort/slow_splitters.py
1090		.. _this one for coercion: http://stackoverflow.com/questions/736043/checking-if-a-string-can-be-converted-to-float-in-python
1091		.. _this one for checking: http://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float-in-python
1092		.. _most natural sort solutions for python on Stack Overflow: http://stackoverflow.com/q/4836710/1399279
1093		.. _80%/20%: https://en.wikipedia.org/wiki/Pareto_principle
1094		.. _The first major special case I encountered was sorting filesystem paths: https://github.com/SethMMorton/natsort/issues/3
1095		.. _The second major special case I encountered was sorting of different types: https://github.com/SethMMorton/natsort/issues/7
1096		.. _A rather unexpected special case I encountered was sorting collections containing NaN:
1097		https://github.com/SethMMorton/natsort/issues/27
1098		.. _Path.parts: https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts
1099		.. _Path.suffixes: https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffixes
1100		.. _Path.stem: https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.stem
1101		.. _It's hard to compare floating point numbers: http://www.drdobbs.com/cpp/its-hard-to-compare-floating-point-numbe/240149806
1102		.. _caught a bit off guard when the request was initially made: https://github.com/SethMMorton/natsort/issues/14
1103		.. _at the code: https://github.com/SethMMorton/natsort/tree/master/natsort
1104		.. _natsort.py: https://github.com/SethMMorton/natsort/blob/master/natsort/natsort.py
1105		.. _util.py: https://github.com/SethMMorton/natsort/blob/master/natsort/util.py
1106		.. _although they do point out in the documentation that it will be painful to use:
1107		https://docs.python.org/3/library/locale.html#background-details-hints-tips-and-caveats
1108		.. _natsort.compat.locale.py: https://github.com/SethMMorton/natsort/blob/master/natsort/compat/locale.py
1109		.. _Thousands separator support: https://github.com/SethMMorton/natsort/issues/36
1110		.. _really good: https://hypothesis.readthedocs.io/en/latest/
1111		.. _testing strategy: http://doc.pytest.org/en/latest/
1112		.. _check out some official Unicode documentation: http://unicode.org/reports/tr15/

-8

~~docs/source/humansorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.humansorted`
4		============================
5
6		.. autofunction:: humansorted
7

-28

~~docs/source/index.rst~~ less more

0		.. natsort documentation master file, created by
1		sphinx-quickstart on Thu Jul 17 21:01:29 2014.
2		You can adapt this file completely to your liking, but it should at least
3		contain the root `toctree` directive.
4
5		natsort: Simple yet flexible natural sorting in Python.
6		=======================================================
7
8		Contents:
9
10		.. toctree::
11		:maxdepth: 2
12		:numbered:
13
14		intro.rst
15		howitworks.rst
16		examples.rst
17		api.rst
18		shell.rst
19		changelog.rst
20
21		Indices and tables
22		==================
23
24		* :ref:`genindex`
25		* :ref:`modindex`
26		* :ref:`search`
27

-8

~~docs/source/index_humansorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.index_humansorted`
4		==================================
5
6		.. autofunction:: index_humansorted
7

-8

~~docs/source/index_natsorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.index_natsorted`
4		================================
5
6		.. autofunction:: index_natsorted
7

-8

~~docs/source/index_realsorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.index_realsorted`
4		=================================
5
6		.. autofunction:: index_realsorted
7

-8

~~docs/source/index_versorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.index_versorted`
4		================================
5
6		.. autofunction:: index_versorted
7

-397

~~docs/source/intro.rst~~ less more

0		.. default-domain:: py
1		.. module:: natsort
2
3		The :mod:`natsort` module
4		=========================
5
6		Simple yet flexible natural sorting in Python.
7
8		- Source Code: https://github.com/SethMMorton/natsort
9		- Downloads: https://pypi.org/project/natsort/
10		- Documentation: http://natsort.readthedocs.io/
11		- Optional Dependencies:
12
13		- `fastnumbers <https://pypi.org/project/fastnumbers>`_ >= 2.0.0
14		- `PyICU <https://pypi.org/project/PyICU>`_ >= 1.0.0
15
16		:mod:`natsort` is a general utility for sorting lists naturally; the definition
17		of "naturally" is not well-defined, but the most common definition is that numbers
18		contained within the string should be sorted as numbers and not as you would
19		other characters. If you need to present sorted output to a user, you probably
20		want to sort it naturally.
21
22		:mod:`natsort` was initially created for sorting scientific output filenames that
23		contained signed floating point numbers in the names. There was a lack of
24		algorithms out there that could perform a natural sort on `floats` but
25		plenty for `ints`; check out
26		`this StackOverflow question <http://stackoverflow.com/q/4836710/1399279>`_
27		and its answers and links therein,
28		`this ActiveState forum <http://code.activestate.com/recipes/285264-natural-string-sorting/>`_,
29		and of course `this great article on natural sorting <http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/>`_
30		from CodingHorror.com for examples of what I mean.
31		:mod:`natsort` was created to fill in this gap, but has since expanded to handle
32		just about any definition of a number, as well as other sorting customizations.
33
34		Quick Description
35		-----------------
36
37		When you try to sort a list of strings that contain numbers, the normal python
38		sort algorithm sorts lexicographically, so you might not get the results that you
39		expect:
40
41		.. code-block:: python
42
43		>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
44		>>> sorted(a)
45		['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
46
47		Notice that it has the order ('1', '10', '2') - this is because the list is
48		being sorted in lexicographical order, which sorts numbers like you would
49		letters (i.e. 'b', 'ba', 'c').
50
51		:mod:`natsort` provides a function :func:`~natsorted` that helps sort lists
52		"naturally" ("naturally" is rather ill-defined, but in general it means
53		sorting based on meaning and not computer code point)..
54		Using :func:`~natsorted` is simple:
55
56		.. code-block:: python
57
58		>>> from natsort import natsorted
59		>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
60		>>> natsorted(a)
61		['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
62
63		:func:`~natsorted` identifies numbers anywhere in a string and sorts them
64		naturally. Below are some other things you can do with :mod:`natsort`
65		(please see the :ref:`examples` for a quick start guide, or the :ref:`api`
66		for more details).
67
68		.. note::
69
70		:func:`~natsorted` is designed to be a drop-in replacement for the built-in
71		:func:`sorted` function. Like :func:`sorted`, :func:`~natsorted`
72		`does not sort in-place`. To sort a list and assign the output to the
73		same variable, you must explicitly assign the output to a variable:
74
75		.. code-block:: python
76
77		>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
78		>>> natsorted(a)
79		['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
80		>>> print(a) # 'a' was not sorted; "natsorted" simply returned a sorted list
81		['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
82		>>> a = natsorted(a) # Now 'a' will be sorted because the sorted list was assigned to 'a'
83		>>> print(a)
84		['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
85
86		Please see `Generating a Reusable Sorting Key and Sorting In-Place`_ for
87		an alternate way to sort in-place naturally.
88
89		Examples
90		--------
91
92		Sorting Versions
93		++++++++++++++++
94
95		This is handled properly by default (as of :mod:`natsort` version >= 4.0.0):
96
97		.. code-block:: python
98
99		>>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
100		>>> natsorted(a)
101		['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
102
103		If you need to sort release candidates, please see :ref:`rc_sorting` for
104		a useful hack.
105
106		Sorting by Real Numbers (i.e. Signed Floats)
107		++++++++++++++++++++++++++++++++++++++++++++
108
109		This is useful in scientific data analysis and was
110		the default behavior of :func:`~natsorted` for :mod:`natsort`
111		version < 4.0.0. Use the :func:`~realsorted` function:
112
113		.. code-block:: python
114
115		>>> from natsort import realsorted, ns
116		>>> # Note that when interpreting as signed floats, the below numbers are
117		>>> # +5.10, -3.00, +5.30, +2.00
118		>>> a = ['position5.10.data', 'position-3.data', 'position5.3.data', 'position2.data']
119		>>> natsorted(a)
120		['position2.data', 'position5.3.data', 'position5.10.data', 'position-3.data']
121		>>> natsorted(a, alg=ns.REAL)
122		['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
123		>>> realsorted(a) # shortcut for natsorted with alg=ns.REAL
124		['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
125
126		Locale-Aware Sorting (or "Human Sorting")
127		+++++++++++++++++++++++++++++++++++++++++
128
129		This is where the non-numeric characters are ordered based on their meaning,
130		not on their ordinal value, and a locale-dependent thousands separator and decimal
131		separator is accounted for in the number.
132		This can be achieved with the :func:`~humansorted` function:
133
134		.. code-block:: python
135
136		>>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
137		>>> natsorted(a)
138		['Apple', 'Banana', 'apple14,689', 'apple15', 'banana']
139		>>> import locale
140		>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
141		'en_US.UTF-8'
142		>>> natsorted(a, alg=ns.LOCALE)
143		['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
144		>>> from natsort import humansorted
145		>>> humansorted(a)
146		['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
147
148		You may find you need to explicitly set the locale to get this to work
149		(as shown in the example).
150		Please see :ref:`locale_issues` and the Installation section
151		below before using the :func:`~humansorted` function.
152
153		Further Customizing Natsort
154		+++++++++++++++++++++++++++
155
156		If you need to combine multiple algorithm modifiers (such as ``ns.REAL``,
157		``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the
158		bitwise OR operator (``\|``). For example,
159
160		.. code-block:: python
161
162		>>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
163		>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE)
164		['Apple', 'apple15', 'apple14,689', 'Banana', 'banana']
165		>>> # The ns enum provides long and short forms for each option.
166		>>> ns.LOCALE == ns.L
167		True
168		>>> # You can also customize the convenience functions, too.
169		>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE) == realsorted(a, alg=ns.L \| ns.IC)
170		True
171		>>> natsorted(a, alg=ns.REAL \| ns.LOCALE \| ns.IGNORECASE) == humansorted(a, alg=ns.R \| ns.IC)
172		True
173
174		All of the available customizations can be found in the documentation for
175		the :class:`~natsort.ns` enum.
176
177		You can also add your own custom transformation functions with the ``key`` argument.
178		These can be used with ``alg`` if you wish:
179
180		.. code-block:: python
181
182		>>> a = ['apple2.50', '2.3apple']
183		>>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL)
184		['2.3apple', 'apple2.50']
185
186		Sorting Mixed Types
187		+++++++++++++++++++
188
189		You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
190		when you sort:
191
192		.. code-block:: python
193
194		>>> a = ['4.5', 6, 2.0, '5', 'a']
195		>>> natsorted(a)
196		[2.0, '4.5', '5', 6, 'a']
197		>>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
198		>>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
199
200		Handling Bytes on Python 3
201		++++++++++++++++++++++++++
202
203		:mod:`natsort` does not officially support the `bytes` type on Python 3, but
204		convenience functions are provided that help you decode to `str` first:
205
206		.. code-block:: python
207
208		>>> from natsort import as_utf8
209		>>> a = [b'a', 14.0, 'b']
210		>>> # On Python 2, natsorted(a) would would work as expected.
211		>>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
212		>>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
213		True
214		>>> a = [b'a56', b'a5', b'a6', b'a40']
215		>>> # On Python 2, natsorted(a) would would work as expected.
216		>>> # On Python 3, natsorted(a) would return the same results as sorted(a)
217		>>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
218		True
219
220		Generating a Reusable Sorting Key and Sorting In-Place
221		++++++++++++++++++++++++++++++++++++++++++++++++++++++
222
223		Under the hood, :func:`~natsorted` works by generating a custom sorting
224		key using :func:`~natsort_keygen` and then passes that to the built-in
225		:func:`sorted`. You can use the :func:`~natsort_keygen` function yourself to
226		generate a custom sorting key to sort in-place using the :meth:`list.sort`
227		method.
228
229		.. code-block:: python
230
231		>>> from natsort import natsort_keygen
232		>>> natsort_key = natsort_keygen()
233		>>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
234		>>> natsorted(a) == sorted(a, key=natsort_key)
235		True
236		>>> a.sort(key=natsort_key)
237		>>> a
238		['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
239
240		All of the algorithm customizations mentioned in the `Further Customizing Natsort`_
241		section can also be applied to :func:`~natsort_keygen` through the alg keyword option.
242
243		Other Useful Things
244		+++++++++++++++++++
245
246		- recursively descend into lists of lists
247		- automatic unicode normalization of input data
248		- controlling the case-sensitivity (see :ref:`case_sort`)
249		- sorting file paths correctly (see :ref:`path_sort`)
250		- allow custom sorting keys (see :ref:`custom_sort`)
251
252		FAQ
253		---
254
255		How do I debug :func:`~natsorted`?
256		The best way to debug :func:`~natsorted` is to generate a key using :func:`~natsort_keygen`
257		with the same options being passed to :func:`~natsorted`. One can take a look at
258		exactly what is being done with their input using this key - it is highly recommended
259		to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
260		for how to debug, and also to review the
261		`How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_
262		page for why :mod:`natsort` is doing that to your data.
263
264		If you are trying to sort custom classes and running into trouble, please take a look at
265		https://github.com/SethMMorton/natsort/issues/60. In short,
266		custom classes are not likely to be sorted correctly if one relies
267		on the behavior of ``__lt__`` and the other rich comparison operators in their
268		custom class - it is better to use a ``key`` function with :mod:`natsort`, or
269		use the :mod:`natsort` key as part of your rich comparison operator definition.
270
271		How does :mod:`natsort` work?
272		If you don't want to read `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_,
273		here is a quick primer.
274
275		:mod:`natsort` provides a `key function <https://docs.python.org/3/howto/sorting.html#key-functions>`_
276		that can be passed to `list.sort() <https://docs.python.org/3/library/stdtypes.html#list.sort>`_
277		or `sorted() <https://docs.python.org/3/library/functions.html#sorted>`_ in order to
278		modify the default sorting behavior. This key is generated on-demand with the
279		key generator :func:`natsort.natsort_keygen`. :func:`natsort.natsorted` is essentially
280		a wrapper for the following code:
281
282		.. code-block:: python
283
284		>>> from natsort import natsort_keygen
285		>>> natsort_key = natsort_keygen()
286		>>> sorted(['1', '10', '2'], key=natsort_key)
287		['1', '2', '10']
288
289		Users can further customize :mod:`natsort` sorting behavior with the ``key``
290		and/or ``alg`` options (see details in the `Further Customizing Natsort`_
291		section).
292
293		The key generated by :func:`natsort.natsort_keygen` always returns a :class:`tuple`. It
294		does so in the following way (some details omitted for clarity):
295
296		1. Assume the input is a string, and attempt to split it into numbers and
297		non-numbers using regular expressions. Numbers are then converted into
298		either :class:`int` or :class:`float`.
299		2. If the above fails because the input is not a string, assume the input
300		is some other sequence (e.g. :class:`list` or :class:`tuple`), and recursively
301		apply the key to each element of the sequence.
302		3. If the above fails because the input is not iterable, assume the input
303		is an :class:`int` or :class:`float`, and just return the input in a :class:`tuple`.
304
305		Because a :class:`tuple` is always returned, a :exc:`TypeError` should not be common
306		unless one tries to do something odd like sort an :class:`int` against a :class:`list`.
307
308		:mod:`natsort` gave me results I didn't expect, and it's a terrible library!
309		Did you try to debug using the above advice? If so, and you still cannot figure out
310		the error, then please `file an issue <https://github.com/SethMMorton/natsort/issues/new>`_.
311
312		Shell script
313		------------
314
315		:mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be called
316		from the command line with ``python -m natsort``.
317
318		Requirements
319		------------
320
321		:mod:`natsort` requires Python version 2.6 or greater or Python 3.3 or greater.
322		It may run on (but is not tested against) Python 3.2.
323
324		Optional Dependencies
325		---------------------
326
327		fastnumbers
328		+++++++++++
329
330		The most efficient sorting can occur if you install the
331		`fastnumbers <https://pypi.org/project/fastnumbers>`_ package
332		(version >=2.0.0); it helps with the string to number conversions.
333		:mod:`natsort` will still run (efficiently) without the package, but if you need
334		to squeeze out that extra juice it is recommended you include this as a dependency.
335		:mod:`natsort` will not require (or check) that
336		`fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed
337		at installation.
338
339		PyICU
340		+++++
341
342		It is recommended that you install `PyICU <https://pypi.org/project/PyICU>`_
343		if you wish to sort in a locale-dependent manner, see
344		http://natsort.readthedocs.io/en/master/locale_issues.html for an explanation why.
345
346		Installation
347		------------
348
349		Use ``pip``!
350
351		.. code-block:: sh
352
353		$ pip install natsort
354
355		If you want to install the `Optional Dependencies`_, you can use the
356		`"extras" notation <https://packaging.python.org/tutorials/installing-packages/#installing-setuptools-extras>`_
357		at installation time to install those dependencies as well - use ``fast`` for
358		`fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for
359		`PyICU <https://pypi.org/project/PyICU>`_.
360
361		.. code-block:: sh
362
363		# Install both optional dependencies.
364		$ pip install natsort[fast,icu]
365		# Install just fastnumbers
366		$ pip install natsort[fast]
367
368		How to Run Tests
369		----------------
370
371		Please note that :mod:`natsort` is NOT set-up to support ``python setup.py test``.
372
373		The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_.
374		After installing ``tox``, running tests is as simple as executing the following in the
375		``natsort`` directory:
376
377		.. code-block:: sh
378
379		$ tox
380
381		``tox`` will create virtual a virtual environment for your tests and install all the
382		needed testing requirements for you. You can specify a particular python version
383		with the ``-e`` flag, e.g. ``tox -e py36``.
384
385		If you do not wish to use ``tox``, you can install the testing dependencies and run the
386		tests manually using `pytest <https://docs.pytest.org/en/latest/>`_ - ``natsort``
387		contains a ``Pipfile`` for use with `pipenv <https://github.com/pypa/pipenv>`_ that
388		makes it easy for you to install the testing dependencies:
389
390		.. code-block:: sh
391
392		$ pipenv install --skip-lock --dev
393		$ pipenv run python -m pytest
394
395		Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this is because
396		`the former puts the CWD on sys.path <https://docs.pytest.org/en/latest/usage.html#calling-pytest-through-python-m-pytest>`_.⏎

-96

~~docs/source/locale_issues.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _locale_issues:
4
5		Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE``
6		==================================================================
7
8		Being Locale-Aware Means Both Numbers and Non-Numbers
9		-----------------------------------------------------
10
11		In addition to modifying how characters are sorted, ``ns.LOCALE`` will take into
12		account locale-dependent thousands separators (and locale-dependent decimal
13		separators if ``ns.FLOAT`` is enabled). This means that if you are in a
14		locale that uses commas as the thousands separator, a number like
15		``123,456`` will be interpreted as ``123456``. If this is not what you want,
16		you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware
17		sorting for non-numbers (similarly, ``ns.LOCALENUM`` enables locale-aware
18		sorting only for numbers).
19
20		Regenerate Key With :func:`~natsort.natsort_keygen` After Changing Locale
21		-------------------------------------------------------------------------
22
23		When :func:`~natsort.natsort_keygen` is called it returns a key function that
24		hard-codes the provided settings. This means that the key returned when
25		``ns.LOCALE`` is used contins the settings specifed by the locale
26		loaded at the time the key is generated. If you change the locale,
27		you should regenerate the key to account for the new locale.
28
29		Corollary: Do Not Reuse :func:`~natsort.natsort_keygen` After Changing Locale
30		+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
31
32		If you change locale, the old function will not work as expected.
33		The `locale <https://docs.python.org/3.5/library/locale.html>`_ library works
34		with a global state. When :func:`~natsort.natsort_keygen` is called it does the
35		best job that it can to make the returned function as static as possible and
36		independent of the global state, but the
37		`strxfrm <https://docs.python.org/3.5/library/locale.html#locale.strxfrm>`_
38		function must access this global state to work; therefore, if you change
39		locale and use ``ns.LOCALE`` then you should discard the old key.
40
41		.. note:: If you use `PyICU <https://pypi.python.org/pypi/PyICU>`_ then you
42		may be able to reuse keys after changing locale.
43
44		The `locale <https://docs.python.org/3.5/library/locale.html>`_ Module From the StdLib Has Issues
45		-------------------------------------------------------------------------------------------------
46
47		:mod:`natsort` will use `PyICU <https://pypi.org/project/PyICU>`_ for
48		:func:`~natsort.humansorted` or ``ns.LOCALE`` if it is installed. If not,
49		it will fall back on the `locale <https://docs.python.org/3.5/library/locale.html>`_
50		library from the Python stdlib. If you do not have
51		`PyICU <https://pypi.org/project/PyICU>`_ installed, please keep the
52		following known problems and issues in mind.
53
54		.. note:: Remember, if you have `PyICU <https://pypi.org/project/PyICU>`_
55		installed you shouldn't need to worry about any of these.
56
57		Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCALE``
58		++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
59
60		I have found that unless you explicitly set a locale, the sorted order may not
61		be what you expect. Setting this is straightforward
62		(in the below example I use 'en_US.UTF-8', but you should use your
63		locale)::
64
65		>>> import locale
66		>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
67		'en_US.UTF-8'
68
69		.. _bug_note:
70
71		The `locale <https://docs.python.org/3.5/library/locale.html>`_ Module Is Broken on Mac OS X
72		++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
73
74		It's not Python's fault, but the OS... the locale library for BSD-based systems
75		(of which Mac OS X is one) is broken. See the following links:
76
77		- http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
78		- http://bugs.python.org/issue23195
79		- https://github.com/SethMMorton/natsort/issues/21 (contains instructons on installing)
80		- http://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
81		- https://github.com/SethMMorton/natsort/issues/34
82
83		Of course, installing `PyICU <https://pypi.org/project/PyICU>`_ fixes this,
84		but if you don't want to or cannot install this there is some hope.
85
86		1. As of ``natsort`` version 4.0.0, ``natsort`` is configured
87		to compensate for a broken ``locale`` library. When sorting non-numbers
88		it will handle case as you expect, but it will still not be able to
89		comprehend non-ASCII characters properly. Additionally, it has
90		a built-in lookup table of thousands separators that are incorrect
91		on OS X/BSD (but is possible it is not complete... please file an
92		issue if you see it is not complete)
93		2. Use "\.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "\.UTF-8"
94		locale. I have found that these have fewer issues than "UTF-8", but
95		your mileage may vary.

-8

~~docs/source/natsort_key.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.natsort_key`
4		============================
5
6		.. autofunction:: natsort_key
7

-8

~~docs/source/natsort_keygen.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.natsort_keygen`
4		===============================
5
6		.. autofunction:: natsort_keygen
7

-8

~~docs/source/natsorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.natsorted`
4		==========================
5
6		.. autofunction:: natsorted
7

-8

~~docs/source/ns_class.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:class:`~natsort.ns`
4		====================
5
6		.. autoclass:: ns
7

-8

~~docs/source/order_by_index.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.order_by_index`
4		===============================
5
6		.. autofunction:: order_by_index
7

-8

~~docs/source/realsorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.realsorted`
4		===========================
5
6		.. autofunction:: realsorted
7

-147

~~docs/source/shell.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		.. _shell:
4
5		Shell Script
6		============
7
8		The ``natsort`` shell script is automatically installed when you install
9		:mod:`natsort` with pip.
10
11		Below is the usage and some usage examples for the ``natsort`` shell script.
12
13		Usage
14		-----
15
16		::
17
18		usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE]
19		[-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp]
20		[--locale]
21		[entries [entries ...]]
22
23		Performs a natural sort on entries given on the command-line.
24		A natural sort sorts numerically then alphabetically, and will sort
25		by numbers in the middle of an entry.
26
27		positional arguments:
28		entries The entries to sort. Taken from stdin if nothing is
29		given on the command line.
30
31		optional arguments:
32		-h, --help show this help message and exit
33		--version show program's version number and exit
34		-p, --paths Interpret the input as file paths. This is not
35		strictly necessary to sort all file paths, but in
36		cases where there are OS-generated file paths like
37		"Folder/" and "Folder (1)/", this option is needed to
38		make the paths sorted in the order you expect
39		("Folder/" before "Folder (1)/").
40		-f LOW HIGH, --filter LOW HIGH
41		Used for keeping only the entries that have a number
42		falling in the given range.
43		-F LOW HIGH, --reverse-filter LOW HIGH
44		Used for excluding the entries that have a number
45		falling in the given range.
46		-e EXCLUDE, --exclude EXCLUDE
47		Used to exclude an entry that contains a specific
48		number.
49		-r, --reverse Returns in reversed order.
50		-t {digit,int,float,version,ver,real,f,i,r,d},
51		--number-type {digit,int,float,version,ver,real,f,i,r,d},
52		--number_type {digit,int,float,version,ver,real,f,i,r,d}
53		Choose the type of number to search for. "float" will
54		search for floating-point numbers. "int" will only
55		search for integers. "digit", "version", and "ver" are
56		synonyms for "int"."real" is a shortcut for "float"
57		with --sign. "i" and "d" are synonyms for "int", "f"
58		is a synonym for "float", and "r" is a synonym for
59		"real".The default is int.
60		--nosign Do not consider "+" or "-" as part of a number, i.e.
61		do not take sign into consideration. This is the
62		default.
63		-s, --sign Consider "+" or "-" as part of a number, i.e. take
64		sign into consideration. The default is unsigned.
65		--noexp Do not consider an exponential as part of a number,
66		i.e. 1e4, would be considered as 1, "e", and 4, not as
67		10000. This only effects the --number-type=float.
68		-l, --locale Causes natsort to use locale-aware sorting. You will
69		get the best results if you install PyICU.
70
71		Description
72		-----------
73
74		``natsort`` was originally written to aid in computational chemistry
75		research so that it would be easy to analyze large sets of output files
76		named after the parameter used::
77
78		$ ls *.out
79		mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
80
81		(Obviously, in reality there would be more files, but you get the idea.) Notice
82		that the shell sorts in lexicographical order. This is the behavior of programs like
83		``find`` as well as ``ls``. The problem is passing these files to an
84		analysis program causes them not to appear in numerical order, which can lead
85		to bad analysis. To remedy this, use ``natsort``::
86
87		$ natsort *.out
88		mode744.43.out
89		mode943.54.out
90		mode1000.35.out
91		mode1243.34.out
92		$ natsort -t r *.out \| xargs your_program
93
94		``-t r`` is short for ``--number-type real``. You can also place natsort in
95		the middle of a pipe::
96
97		$ find . -name "*.out" \| natsort -t r \| xargs your_program
98
99		To sort version numbers, use the default ``--number-type``::
100
101		$ ls *
102		prog-1.10.zip prog-1.9.zip prog-2.0.zip
103		$ natsort *
104		prog-1.9.zip
105		prog-1.10.zip
106		prog-2.0.zip
107
108		In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API,
109		with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude``
110		options. These three options are used as follows::
111
112		$ ls *.out
113		mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
114		$ natsort -t r *.out -f 900 1100 # Select only numbers between 900-1100
115		mode943.54.out
116		mode1000.35.out
117		$ natsort -t r *.out -F 900 1100 # Select only numbers NOT between 900-1100
118		mode744.43.out
119		mode1243.34.out
120		$ natsort -t r *.out -e 1000.35 # Exclude 1000.35 from search
121		mode744.43.out
122		mode943.54.out
123		mode1243.34.out
124
125		If you are sorting paths with OS-generated filenames, you may require the
126		``--paths``/``-p`` option::
127
128		$ find . ! -path . -type f
129		./folder/file (1).txt
130		./folder/file.txt
131		./folder (1)/file.txt
132		./folder (10)/file.txt
133		./folder (2)/file.txt
134		$ find . ! -path . -type f \| natsort
135		./folder (1)/file.txt
136		./folder (2)/file.txt
137		./folder (10)/file.txt
138		./folder/file (1).txt
139		./folder/file.txt
140		$ find . ! -path . -type f \| natsort -p
141		./folder/file.txt
142		./folder/file (1).txt
143		./folder (1)/file.txt
144		./folder (2)/file.txt
145		./folder (10)/file.txt
146

~~docs/source/special_cases_everywhere.jpg~~ less more

Binary diff not shown

-8

~~docs/source/versorted.rst~~ less more

0		.. default-domain:: py
1		.. currentmodule:: natsort
2
3		:func:`~natsort.versorted`
4		==========================
5
6		.. autofunction:: versorted
7

docs/special_cases_everywhere.jpg less more

Binary diff not shown

-7

natsort/__init__.py less more

10	10	index_humansorted,
11	11	index_natsorted,
12	12	index_realsorted,
13		index_versorted,
14	13	natsort_key,
15	14	natsort_keygen,
16	15	natsorted,
17	16	ns,
18	17	order_by_index,
19	18	realsorted,
20		versorted,
21	19	)
22	20	from natsort.utils import chain_functions
23	21
24	22	if float(sys.version[:3]) < 3:
25	23	from natsort.natsort import natcmp
26	24
27		__version__ = "5.4.1"
	25	__version__ = "6.0.0"
28	26
29	27	__all__ = [
30	28	"natsort_key",
31	29	"natsort_keygen",
32	30	"natsorted",
33		"versorted",
34	31	"humansorted",
35	32	"realsorted",
36	33	"index_natsorted",
37		"index_versorted",
38	34	"index_humansorted",
39	35	"index_realsorted",
40	36	"order_by_index",

47	43	]
48	44
49	45	# Add the ns keys to this namespace for convenience.
50		# A dict comprehension is not used for Python 2.6 compatibility.
51		globals().update(dict((k, getattr(ns, k)) for k in dir(ns) if k.isupper()))
	46	globals().update(ns._asdict())

-5

natsort/__main__.py less more

23	23	parser.add_argument(
24	24	"--version",
25	25	action="version",
26		version="%(prog)s {0}".format(natsort.__version__),
	26	version="%(prog)s {}".format(natsort.__version__),
27	27	)
28	28	parser.add_argument(
29	29	"-p",

77	77	"--number-type",
78	78	"--number_type",
79	79	dest="number_type",
80		choices=("digit", "int", "float", "version", "ver", "real", "f", "i", "r", "d"),
	80	choices=("int", "float", "real", "f", "i", "r"),
81	81	default="int",
82	82	help='Choose the type of number to search for. "float" will search '
83	83	'for floating-point numbers. "int" will only search for '
84		'integers. "digit", "version", and "ver" are synonyms for "int".'
85		'"real" is a shortcut for "float" with --sign. '
86		'"i" and "d" are synonyms for "int", "f" is a synonym for '
	84	'integers. "real" is a shortcut for "float" with --sign. '
	85	'"i" is a synonym for "int", "f" is a synonym for '
87	86	'"float", and "r" is a synonym for "real".'
88	87	"The default is %(default)s.",
89	88	)

-1

natsort/compat/locale.py less more

6	6
7	7	# Std. lib imports.
8	8	import sys
	9	from functools import cmp_to_key
9	10
10	11	# Local imports.
11		from natsort.compat.py23 import PY_VERSION, cmp_to_key, py23_unichr
	12	from natsort.compat.py23 import PY_VERSION, py23_unichr
12	13
13	14	# This string should be sorted after any other byte string because
14	15	# it contains the max unicode character repeated 20 times.

-37

natsort/compat/py23.py less more

55	55	py23_map = itertools.imap
56	56	py23_filter = itertools.ifilter
57	57
58		# cmp_to_key was not created till 2.7, so require this for 2.6
59		try:
60		from functools import cmp_to_key
61		except ImportError: # pragma: no cover
62
63		def cmp_to_key(mycmp):
64		"""Convert a cmp= function into a key= function"""
65
66		class K(object):
67		__slots__ = ["obj"]
68
69		def __init__(self, obj):
70		self.obj = obj
71
72		def __lt__(self, other):
73		return mycmp(self.obj, other.obj) < 0
74
75		def __gt__(self, other):
76		return mycmp(self.obj, other.obj) > 0
77
78		def __eq__(self, other):
79		return mycmp(self.obj, other.obj) == 0
80
81		def __le__(self, other):
82		return mycmp(self.obj, other.obj) <= 0
83
84		def __ge__(self, other):
85		return mycmp(self.obj, other.obj) >= 0
86
87		def __ne__(self, other):
88		return mycmp(self.obj, other.obj) != 0
89
90		def __hash__(self):
91		raise TypeError("hash not implemented")
92
93		return K
94
95	58
96	59	# This function is intended to decorate other functions that will modify
97	60	# either a string directly, or a function's docstring.

+16

-52

natsort/natsort.py less more

13	13	import natsort.compat.locale
14	14	from natsort import utils
15	15	from natsort.compat.py23 import py23_cmp, py23_str, u_format
16		from natsort.ns_enum import ns, ns_DUMB
	16	from natsort.ns_enum import NS_DUMB, ns
17	17
18	18
19	19	@u_format

107	107
108	108
109	109	@u_format
110		def natsort_keygen(key=None, alg=ns.DEFAULT, **_kwargs):
	110	def natsort_keygen(key=None, alg=ns.DEFAULT):
111	111	"""
112	112	Generate a key to sort strings and numbers naturally.
113	113

153	153	[{u}'num-3', {u}'num2', {u}'num5.10', {u}'num5.3']
154	154
155	155	"""
156		# Transform old arguments to the ns enum.
157	156	try:
158		alg = utils.args_to_enum(**_kwargs) \| alg
	157	ns.DEFAULT \| alg
159	158	except TypeError:
160	159	msg = "natsort_keygen: 'alg' argument must be from the enum 'ns'"
161		raise ValueError(msg + ", got {0}".format(py23_str(alg)))
162
163		# Add the _DUMB option if the locale library is broken.
	160	raise ValueError(msg + ", got {}".format(py23_str(alg)))
	161
	162	# Add the NS_DUMB option if the locale library is broken.
164	163	if alg & ns.LOCALEALPHA and natsort.compat.locale.dumb_sort():
165		alg \|= ns_DUMB
	164	alg \|= NS_DUMB
166	165
167	166	# Set some variables that will be passed to the factory functions
168	167	if alg & ns.NUMAFTER:

219	218
220	219
221	220	@u_format
222		def natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
	221	def natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT):
223	222	"""
224	223	Sorts an iterable naturally.
225	224

263	262	[{u}'num2', {u}'num3', {u}'num5']
264	263
265	264	"""
266		key = natsort_keygen(key, alg, **_kwargs)
	265	key = natsort_keygen(key, alg)
267	266	return sorted(seq, reverse=reverse, key=key)
268
269
270		@u_format
271		def versorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
272		"""
273		Identical to :func:`natsorted`.
274
275		This function exists for backwards compatibility with `natsort`
276		version < 4.0.0. Future development should use :func:`natsorted`.
277
278		See Also
279		--------
280		natsorted
281
282		"""
283		return natsorted(seq, key, reverse, alg, **_kwargs)
284	267
285	268
286	269	@u_format

391	374
392	375
393	376	@u_format
394		def index_natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
	377	def index_natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT):
395	378	"""
396	379	Determine the list of the indexes used to sort the input sequence.
397	380

456	439
457	440	# Pair the index and sequence together, then sort by element
458	441	index_seq_pair = [[x, y] for x, y in enumerate(seq)]
459		index_seq_pair.sort(reverse=reverse, key=natsort_keygen(newkey, alg, **_kwargs))
	442	index_seq_pair.sort(reverse=reverse, key=natsort_keygen(newkey, alg))
460	443	return [x for x, _ in index_seq_pair]
461
462
463		@u_format
464		def index_versorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
465		"""
466		Identical to :func:`index_natsorted`.
467
468		This function exists for backwards compatibility with
469		``index_natsort`` version < 4.0.0. Future development should use
470		:func:`index_natsorted`.
471
472		Please see the :func:`index_natsorted` documentation for use.
473
474		See Also
475		--------
476		index_natsorted
477
478		"""
479		return index_natsorted(seq, key, reverse, alg, **_kwargs)
480	444
481	445
482	446	@u_format

677	641
678	642	cached_keys = {}
679	643
680		def __new__(cls, x, y, alg=ns.DEFAULT, args, *kwargs):
	644	def __new__(cls, x, y, alg=ns.DEFAULT):
681	645	try:
682		alg = utils.args_to_enum(**kwargs) \| alg
	646	ns.DEFAULT \| alg
683	647	except TypeError:
684		msg = "natsort_keygen: 'alg' argument must be " "from the enum 'ns'"
685		raise ValueError(msg + ", got {0}".format(py23_str(alg)))
	648	msg = "natsort_keygen: 'alg' argument must be from the enum 'ns'"
	649	raise ValueError(msg + ", got {}".format(py23_str(alg)))
686	650
687	651	# Add the _DUMB option if the locale library is broken.
688	652	if alg & ns.LOCALEALPHA and natsort.compat.locale.dumb_sort():
689		alg \|= ns_DUMB
	653	alg \|= NS_DUMB
690	654
691	655	if alg not in cls.cached_keys:
692	656	cls.cached_keys[alg] = natsort_keygen(alg=alg)

+15

-29

natsort/ns_enum.py less more

5	5	from __future__ import absolute_import, division, print_function, unicode_literals
6	6
7	7	import collections
8
9		# NOTE: OrderedDict is not used below for compatibility with Python 2.6.
10	8
11	9	# The below are the base ns options. The values will be stored as powers
12	10	# of two so bitmasks can be used to extract the user's requested options.

27	25	]
28	26
29	27	# Following were previously options but are now defaults.
30		enum_do_nothing = ["DEFAULT", "TYPESAFE", "INT", "VERSION", "DIGIT", "UNSIGNED"]
	28	enum_do_nothing = ["DEFAULT", "INT", "UNSIGNED"]
31	29
32	30	# The following are bitwise-OR combinations of other fields.
33	31	enum_combos = [("REAL", ("FLOAT", "SIGNED")), ("LOCALE", ("LOCALEALPHA", "LOCALENUM"))]
34	32
35	33	# The following are aliases for other fields.
36	34	enum_aliases = [
37		("T", "TYPESAFE"),
38	35	("I", "INT"),
39		("V", "VERSION"),
40		("D", "DIGIT"),
41	36	("U", "UNSIGNED"),
42	37	("F", "FLOAT"),
43	38	("S", "SIGNED"),

59	54	]
60	55
61	56	# Construct the list of bitwise distinct enums with their fields.
62		enum_fields = [(name, 1 << i) for i, name in enumerate(enum_options)]
63		enum_fields.extend((name, 0) for name in enum_do_nothing)
	57	enum_fields = collections.OrderedDict(
	58	(name, 1 << i) for i, name in enumerate(enum_options)
	59	)
	60	enum_fields.update((name, 0) for name in enum_do_nothing)
64	61
65	62	for name, combo in enum_combos:
66		current_mapping = dict(enum_fields)
67		combined_value = current_mapping[combo[0]]
	63	combined_value = enum_fields[combo[0]]
68	64	for combo_name in combo[1:]:
69		combined_value \|= current_mapping[combo_name]
70		enum_fields.append((name, combined_value))
	65	combined_value \|= enum_fields[combo_name]
	66	enum_fields[name] = combined_value
71	67
72		current_mapping = dict(enum_fields)
73		enum_fields.extend((alias, current_mapping[name]) for alias, name in enum_aliases)
74
75		# Finally, extract out the enum field names and their values.
76		enum_field_names, enum_field_values = zip(*enum_fields)
	68	enum_fields.update(
	69	(alias, enum_fields[name]) for alias, name in enum_aliases
	70	)
77	71
78	72
79	73	# Subclass the namedtuple to improve the docstring.
80	74	# noinspection PyUnresolvedReferences
81		class _NSEnum(collections.namedtuple("_NSEnum", enum_field_names)):
	75	class _NSEnum(collections.namedtuple("_NSEnum", enum_fields.keys())):
82	76	"""
83	77	Enum to control the `natsort` algorithm.
84	78

129	123	default "NFD". This will transform characters such as '⑦' into
130	124	'7'. Please see https://stackoverflow.com/a/7934397/1399279,
131	125	https://stackoverflow.com/a/7931547/1399279,
132		and http://unicode.org/reports/tr15/ for full details into unicode
	126	and https://unicode.org/reports/tr15/ for full details into unicode
133	127	normalization.
134	128	LOCALE, L
135	129	Tell `natsort` to be locale-aware when sorting. This includes both

179	173	If an NaN shows up in the input, this instructs `natsort` to
180	174	treat these as +Infinity and place them after all the other numbers.
181	175	By default, an NaN be treated as -Infinity and be placed first.
182		TYPESAFE, T
183		Deprecated as of `natsort` version 5.0.0; this option is now
184		a no-op because it is always true.
185		VERSION, V
186		Deprecated as of `natsort` version 5.0.0; this option is now
187		a no-op because it is the default.
188		DIGIT, D
189		Same as `VERSION` above.
190	176
191	177	Notes
192	178	-----

204	190
205	191	# Here is where the instance of the ns enum that will be exported is created.
206	192	# It is a poor-man's singleton.
207		ns = _NSEnum(*enum_field_values)
	193	ns = _NSEnum(*enum_fields.values())
208	194
209	195	# The below is private for internal use only.
210		ns_DUMB = 1 << 31
	196	NS_DUMB = 1 << 31

-1

natsort/unicode_numeric_hex.py less more

1742	1742	a = py23_unichr(i)
1743	1743	except ValueError:
1744	1744	break
1745		if a in set("0123456789"):
	1745	if a in "0123456789":
1746	1746	continue
1747	1747	if unicodedata.numeric(a, None) is not None:
1748	1748	hex_chars.append(i)

-42

natsort/utils.py less more

49	49	from os.path import split as path_split
50	50	from os.path import splitext as path_splitext
51	51	from unicodedata import normalize
52		from warnings import warn
53	52
54	53	from natsort.compat.fastnumbers import fast_float, fast_int
55	54	from natsort.compat.locale import get_decimal_point, get_strxfrm, get_thousands_sep

62	61	py23_str,
63	62	u_format,
64	63	)
65		from natsort.ns_enum import ns, ns_DUMB
	64	from natsort.ns_enum import NS_DUMB, ns
66	65	from natsort.unicode_numbers import digits_no_decimals, numeric_no_decimals
67	66
68	67	if PY_VERSION >= 3:

378	377	"""
379	378	# Sometimes we store the "original" input before transformation,
380	379	# sometimes after.
381		orig_after_xfrm = not (alg & ns_DUMB and alg & ns.LOCALEALPHA)
	380	orig_after_xfrm = not (alg & NS_DUMB and alg & ns.LOCALEALPHA)
382	381	original_func = input_transform if orig_after_xfrm else _no_op
383	382	normalize_input = _normalize_input_factory(alg)
384	383

491	490	"""
492	491	# Shortcuts.
493	492	lowfirst = alg & ns.LOWERCASEFIRST
494		dumb = alg & ns_DUMB
	493	dumb = alg & NS_DUMB
495	494
496	495	# Build the chain of functions to execute in order.
497	496	function_chain = []

565	564	"""
566	565	# Shortcuts.
567	566	use_locale = alg & ns.LOCALEALPHA
568		dumb = alg & ns_DUMB
	567	dumb = alg & NS_DUMB
569	568	group_letters = (alg & ns.GROUPLETTERS) or (use_locale and dumb)
570	569	nan_val = float("+inf") if alg & ns.NANLAST else float("-inf")
571	570

613	612
614	613	"""
615	614	if alg & ns.UNGROUPLETTERS and alg & ns.LOCALEALPHA:
616		swap = alg & ns_DUMB and alg & ns.LOWERCASEFIRST
	615	swap = alg & NS_DUMB and alg & ns.LOWERCASEFIRST
617	616	transform = methodcaller("swapcase") if swap else _no_op
618	617
619	618	def func(split_val, val, _transform=transform, _sep=sep, _pre_sep=pre_sep):

786	785
787	786	# Return the split parent paths and then the split basename.
788	787	return ichain(path_parts, base_parts)
789
790
791		def args_to_enum(**kwargs):
792		"""
793		A function to convert input booleans to an enum-type argument.
794
795		For internal use only - will be deprecated in a future release.
796		"""
797		alg = 0
798		keys = ("number_type", "signed", "exp", "as_path", "py3_safe")
799		if any(x not in keys for x in kwargs):
800		x = set(kwargs) - set(keys)
801		raise TypeError("Invalid argument(s): " + ", ".join(x))
802		if "number_type" in kwargs and kwargs["number_type"] is not int:
803		msg = "The 'number_type' argument is deprecated as of 3.5.0, "
804		msg += "please use 'alg=ns.FLOAT', 'alg=ns.INT', or 'alg=ns.VERSION'"
805		warn(msg, DeprecationWarning)
806		alg \|= ns.FLOAT * bool(kwargs["number_type"] is float)
807		alg \|= ns.INT * bool(kwargs["number_type"] in (int, None))
808		alg \|= ns.SIGNED * (kwargs["number_type"] not in (float, None))
809		if "signed" in kwargs and kwargs["signed"] is not None:
810		msg = "The 'signed' argument is deprecated as of 3.5.0, "
811		msg += "please use 'alg=ns.SIGNED'."
812		warn(msg, DeprecationWarning)
813		alg \|= ns.SIGNED * bool(kwargs["signed"])
814		if "exp" in kwargs and kwargs["exp"] is not None:
815		msg = "The 'exp' argument is deprecated as of 3.5.0, "
816		msg += "please use 'alg=ns.NOEXP'."
817		warn(msg, DeprecationWarning)
818		alg \|= ns.NOEXP * (not kwargs["exp"])
819		if "as_path" in kwargs and kwargs["as_path"] is not None:
820		msg = "The 'as_path' argument is deprecated as of 3.5.0, "
821		msg += "please use 'alg=ns.PATH'."
822		warn(msg, DeprecationWarning)
823		alg \|= ns.PATH * kwargs["as_path"]
824		return alg

-4

setup.cfg less more

0	0	[bumpversion]
1		current_version = 5.4.1
	1	current_version = 6.0.0
2	2	commit = True
3	3	tag = True
4	4	tag_name = {new_version}

9	9	url = https://github.com/SethMMorton/natsort
10	10	description = Simple yet flexible natural sorting in Python.
11	11	long_description = file: README.rst
	12	long_description_content_type = text/x-rst
12	13	license = MIT
	14	license_file = LICENSE
13	15	classifiers =
14	16	Development Status :: 5 - Production/Stable
15	17	Intended Audience :: Developers

20	22	Operating System :: OS Independent
21	23	License :: OSI Approved :: MIT License
22	24	Natural Language :: English
	25	Programming Language :: Python
23	26	Programming Language :: Python :: 2
24		Programming Language :: Python :: 2.6
25	27	Programming Language :: Python :: 2.7
26	28	Programming Language :: Python :: 3
27	29	Programming Language :: Python :: 3.4

42	44
43	45	[bumpversion:file:natsort/__init__.py]
44	46
45		[bumpversion:file:docs/source/conf.py]
	47	[bumpversion:file:docs/conf.py]
46	48
47		[bumpversion:file:docs/source/changelog.rst]
	49	[bumpversion:file:CHANGELOG.rst]
48	50	search = XX-XX-XXXX v. X.X.X
49	51	replace = {now:%%m-%%d-%%Y} v. {new_version}
50	52

-3

setup.py less more

2	2	from setuptools import find_packages, setup
3	3	setup(
4	4	name='natsort',
5		version='5.4.1',
	5	version='6.0.0',
6	6	packages=find_packages(),
7		install_requires=["argparse; python_version < '2.7'"],
8	7	entry_points={'console_scripts': ['natsort = natsort.__main__:main']},
	8	python_requires=">=2.7, !=3.0., !=3.1., !=3.2., !=3.3.",
9	9	extras_require={
10		'fast': ["fastnumbers >= 2.0.0; python_version > '2.6'"],
	10	'fast': ["fastnumbers >= 2.0.0"],
11	11	'icu': ["PyICU >= 1.0.0"]
12	12	}
13	13	)

-39

~~test_natsort/conftest.py~~ less more

0		"""
1		Fixtures for pytest.
2		"""
3
4		import locale
5
6		import pytest
7
8
9		def load_locale(x):
10		"""Convenience to load a locale, trying ISO8859-1 first."""
11		try:
12		locale.setlocale(locale.LC_ALL, str("{0}.ISO8859-1".format(x)))
13		except locale.Error:
14		locale.setlocale(locale.LC_ALL, str("{0}.UTF-8".format(x)))
15
16
17		@pytest.fixture()
18		def with_locale_en_us():
19		"""Convenience to load the en_US locale - reset when complete."""
20		orig = locale.getlocale()
21		yield load_locale("en_US")
22		locale.setlocale(locale.LC_ALL, orig)
23
24
25		@pytest.fixture()
26		def with_locale_de_de():
27		"""
28		Convenience to load the de_DE locale - reset when complete - skip if missing.
29		"""
30		orig = locale.getlocale()
31		try:
32		load_locale("de_DE")
33		except locale.Error:
34		pytest.skip("requires de_DE locale to be installed")
35		else:
36		yield
37		finally:
38		locale.setlocale(locale.LC_ALL, orig)

-70

~~test_natsort/profile_natsorted.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		This file contains functions to profile natsorted with different
3		inputs and different settings.
4		"""
5		from __future__ import print_function
6
7		import cProfile
8		import locale
9		import sys
10
11		try:
12		from natsort import ns, natsort_keygen
13		from natsort.compat.py23 import py23_range
14		except ImportError:
15		sys.path.insert(0, ".")
16		from natsort import ns, natsort_keygen
17		from natsort.compat.py23 import py23_range
18
19		locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
20
21		# Samples to parse
22		number = 14695498
23		int_string = "43493"
24		float_string = "-434.93e7"
25		plain_string = "hello world"
26		fancy_string = "7abba9342fdab"
27		a_path = "/p/Folder (1)/file (1).tar.gz"
28		some_bytes = b"these are bytes"
29		a_list = ["hello", "goodbye", "74"]
30
31		basic_key = natsort_keygen()
32		real_key = natsort_keygen(alg=ns.REAL)
33		path_key = natsort_keygen(alg=ns.PATH)
34		locale_key = natsort_keygen(alg=ns.LOCALE)
35
36
37		def prof_time_to_generate():
38		print("* Generate Plain Key *")
39		for _ in py23_range(100000):
40		natsort_keygen()
41
42
43		cProfile.run("prof_time_to_generate()", sort="time")
44
45
46		def prof_parsing(a, msg, key=basic_key):
47		print(msg)
48		for _ in py23_range(100000):
49		key(a)
50
51
52		cProfile.run(
53		'prof_parsing(int_string, "* Basic Call, Int as String *")', sort="time"
54		)
55		cProfile.run(
56		'prof_parsing(float_string, "* Basic Call, Float as String *")', sort="time"
57		)
58		cProfile.run('prof_parsing(float_string, "* Real Call *", real_key)', sort="time")
59		cProfile.run('prof_parsing(number, "* Basic Call, Number *")', sort="time")
60		cProfile.run(
61		'prof_parsing(fancy_string, "* Basic Call, Mixed String *")', sort="time"
62		)
63		cProfile.run('prof_parsing(some_bytes, "* Basic Call, Byte String *")', sort="time")
64		cProfile.run('prof_parsing(a_path, "* Path Call *", path_key)', sort="time")
65		cProfile.run('prof_parsing(a_list, "* Basic Call, Recursive *")', sort="time")
66		cProfile.run(
67		'prof_parsing("434,930,000 dollars", "* Locale Call *", locale_key)',
68		sort="time",
69		)

-138

~~test_natsort/test_fake_fastnumbers.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		Test the fake fastnumbers module.
3		"""
4		from __future__ import unicode_literals
5
6		import unicodedata
7		from math import isnan
8
9		from hypothesis import given
10		from hypothesis.strategies import floats, integers, text
11		from natsort.compat.fake_fastnumbers import fast_float, fast_int
12		from natsort.compat.py23 import PY_VERSION
13
14		if PY_VERSION >= 3:
15		long = int
16
17
18		def is_float(x):
19		try:
20		float(x)
21		except ValueError:
22		try:
23		unicodedata.numeric(x)
24		except (ValueError, TypeError):
25		return False
26		else:
27		return True
28		else:
29		return True
30
31
32		def not_a_float(x):
33		return not is_float(x)
34
35
36		def is_int(x):
37		try:
38		return x.is_integer()
39		except AttributeError:
40		try:
41		long(x)
42		except ValueError:
43		try:
44		unicodedata.digit(x)
45		except (ValueError, TypeError):
46		return False
47		else:
48		return True
49		else:
50		return True
51
52
53		def not_an_int(x):
54		return not is_int(x)
55
56
57		# Each test has an "example" version for demonstrative purposes,
58		# and a test that uses the hypothesis module.
59
60
61		def test_fast_float_returns_nan_alternate_if_nan_option_is_given():
62		assert fast_float("nan", nan=7) == 7
63
64
65		def test_fast_float_converts_float_string_to_float_example():
66		assert fast_float("45.8") == 45.8
67		assert fast_float("-45") == -45.0
68		assert fast_float("45.8e-2", key=len) == 45.8e-2
69		assert isnan(fast_float("nan"))
70		assert isnan(fast_float("+nan"))
71		assert isnan(fast_float("-NaN"))
72		assert fast_float("۱۲.۱۲") == 12.12
73		assert fast_float("-۱۲.۱۲") == -12.12
74
75
76		@given(floats(allow_nan=False))
77		def test_fast_float_converts_float_string_to_float(x):
78		assert fast_float(repr(x)) == x
79
80
81		def test_fast_float_leaves_string_as_is_example():
82		assert fast_float("invalid") == "invalid"
83
84
85		@given(text().filter(not_a_float).filter(bool))
86		def test_fast_float_leaves_string_as_is(x):
87		assert fast_float(x) == x
88
89
90		def test_fast_float_with_key_applies_to_string_example():
91		assert fast_float("invalid", key=len) == len("invalid")
92
93
94		@given(text().filter(not_a_float).filter(bool))
95		def test_fast_float_with_key_applies_to_string(x):
96		assert fast_float(x, key=len) == len(x)
97
98
99		def test_fast_int_leaves_float_string_as_is_example():
100		assert fast_int("45.8") == "45.8"
101		assert fast_int("nan") == "nan"
102		assert fast_int("inf") == "inf"
103
104
105		@given(floats().filter(not_an_int))
106		def test_fast_int_leaves_float_string_as_is(x):
107		assert fast_int(repr(x)) == repr(x)
108
109
110		def test_fast_int_converts_int_string_to_int_example():
111		assert fast_int("-45") == -45
112		assert fast_int("+45") == 45
113		assert fast_int("۱۲") == 12
114		assert fast_int("-۱۲") == -12
115
116
117		@given(integers())
118		def test_fast_int_converts_int_string_to_int(x):
119		assert fast_int(repr(x)) == x
120
121
122		def test_fast_int_leaves_string_as_is_example():
123		assert fast_int("invalid") == "invalid"
124
125
126		@given(text().filter(not_an_int).filter(bool))
127		def test_fast_int_leaves_string_as_is(x):
128		assert fast_int(x) == x
129
130
131		def test_fast_int_with_key_applies_to_string_example():
132		assert fast_int("invalid", key=len) == len("invalid")
133
134
135		@given(text().filter(not_an_int).filter(bool))
136		def test_fast_int_with_key_applies_to_string(x):
137		assert fast_int(x, key=len) == len(x)

-53

~~test_natsort/test_final_data_transform_factory.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import pytest
5		from hypothesis import example, given
6		from hypothesis.strategies import floats, integers, text
7		from natsort.compat.py23 import py23_str
8		from natsort.ns_enum import ns, ns_DUMB
9		from natsort.utils import final_data_transform_factory
10
11
12		@pytest.mark.parametrize("alg", [ns.DEFAULT, ns.UNGROUPLETTERS, ns.LOCALE])
13		@given(x=text(), y=floats(allow_nan=False, allow_infinity=False) \| integers())
14		@pytest.mark.usefixtures("with_locale_en_us")
15		def test_final_data_transform_factory_default(x, y, alg):
16		final_data_transform_func = final_data_transform_factory(alg, "", "::")
17		value = (x, y)
18		original_value = "".join(map(py23_str, value))
19		result = final_data_transform_func(value, original_value)
20		assert result == value
21
22
23		@pytest.mark.parametrize(
24		"alg, func",
25		[
26		(ns.UNGROUPLETTERS \| ns.LOCALE, lambda x: x),
27		(ns.LOCALE \| ns.UNGROUPLETTERS \| ns_DUMB, lambda x: x),
28		(ns.LOCALE \| ns.UNGROUPLETTERS \| ns.LOWERCASEFIRST, lambda x: x),
29		(
30		ns.LOCALE \| ns.UNGROUPLETTERS \| ns_DUMB \| ns.LOWERCASEFIRST,
31		lambda x: x.swapcase(),
32		),
33		],
34		)
35		@given(x=text(), y=floats(allow_nan=False, allow_infinity=False) \| integers())
36		@example(x="İ", y=0)
37		@pytest.mark.usefixtures("with_locale_en_us")
38		def test_final_data_transform_factory_ungroup_and_locale(x, y, alg, func):
39		final_data_transform_func = final_data_transform_factory(alg, "", "::")
40		value = (x, y)
41		original_value = "".join(map(py23_str, value))
42		result = final_data_transform_func(value, original_value)
43		if x:
44		expected = ((func(original_value[:1]),), value)
45		else:
46		expected = (("::",), value)
47		assert result == expected
48
49
50		def test_final_data_transform_factory_ungroup_and_locale_empty_tuple():
51		final_data_transform_func = final_data_transform_factory(ns.UG \| ns.L, "", "::")
52		assert final_data_transform_func((), "") == ((), ())

-105

~~test_natsort/test_input_string_transform_factory.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import pytest
5		from hypothesis import example, given
6		from hypothesis.strategies import integers, text
7		from natsort.compat.py23 import NEWPY
8		from natsort.ns_enum import ns, ns_DUMB
9		from natsort.utils import input_string_transform_factory
10
11
12		def lower(x):
13		"""Call the appropriate lower method for the Python version."""
14		if NEWPY:
15		return x.casefold()
16		else:
17		return x.lower()
18
19
20		def thousands_separated_int(n):
21		"""Insert thousands separators in an int."""
22		new_int = ""
23		for i, y in enumerate(reversed(n), 1):
24		new_int = y + new_int
25		# For every third digit, insert a thousands separator.
26		if i % 3 == 0 and i != len(n):
27		new_int = "," + new_int
28		return new_int
29
30
31		@given(text())
32		def test_input_string_transform_factory_is_no_op_for_no_alg_options(x):
33		input_string_transform_func = input_string_transform_factory(ns.DEFAULT)
34		assert input_string_transform_func(x) is x
35
36
37		@pytest.mark.parametrize(
38		"alg, example_func",
39		[
40		(ns.IGNORECASE, lower),
41		(ns_DUMB, lambda x: x.swapcase()),
42		(ns.LOWERCASEFIRST, lambda x: x.swapcase()),
43		(ns_DUMB \| ns.LOWERCASEFIRST, lambda x: x), # No-op
44		(ns.IGNORECASE \| ns.LOWERCASEFIRST, lambda x: lower(x.swapcase())),
45		],
46		)
47		@given(x=text())
48		def test_input_string_transform_factory(x, alg, example_func):
49		input_string_transform_func = input_string_transform_factory(alg)
50		assert input_string_transform_func(x) == example_func(x)
51
52
53		@example(12543642642534980) # 12,543,642,642,534,980 => 12543642642534980
54		@given(x=integers(min_value=1000))
55		@pytest.mark.usefixtures("with_locale_en_us")
56		def test_input_string_transform_factory_cleans_thousands(x):
57		int_str = str(x).rstrip("lL")
58		thousands_int_str = thousands_separated_int(int_str)
59		assert thousands_int_str.replace(",", "") != thousands_int_str
60
61		input_string_transform_func = input_string_transform_factory(ns.LOCALE)
62		assert input_string_transform_func(thousands_int_str) == int_str
63
64		# Using LOCALEALPHA does not affect numbers.
65		input_string_transform_func_no_op = input_string_transform_factory(ns.LOCALEALPHA)
66		assert input_string_transform_func_no_op(thousands_int_str) == thousands_int_str
67
68
69		# These might be too much to test with hypothesis.
70
71
72		@pytest.mark.parametrize(
73		"x, expected",
74		[
75		("12,543,642642.5345,34980", "12543,642642.5345,34980"),
76		("12,59443,642,642.53,4534980", "12,59443,642642.53,4534980"), # No change
77		("12543,642,642.5,34534980", "12543,642642.5,34534980"),
78		],
79		)
80		@pytest.mark.usefixtures("with_locale_en_us")
81		def test_input_string_transform_factory_handles_us_locale(x, expected):
82		input_string_transform_func = input_string_transform_factory(ns.LOCALE)
83		assert input_string_transform_func(x) == expected
84
85
86		@pytest.mark.parametrize(
87		"alg, expected",
88		[
89		(ns.LOCALE, "1543,753"), # Does nothing without FLOAT
90		(ns.LOCALE \| ns.FLOAT, "1543.753"),
91		(ns.LOCALEALPHA, "1543,753"), # LOCALEALPHA won't do anything, need LOCALENUM
92		],
93		)
94		@pytest.mark.usefixtures("with_locale_de_de")
95		def test_input_string_transform_factory_handles_german_locale(alg, expected):
96		input_string_transform_func = input_string_transform_factory(alg)
97		assert input_string_transform_func("1543,753") == expected
98
99
100		@pytest.mark.usefixtures("with_locale_de_de")
101		def test_input_string_transform_factory_does_nothing_with_non_num_input():
102		input_string_transform_func = input_string_transform_factory(ns.LOCALE \| ns.FLOAT)
103		expected = "154s,t53"
104		assert input_string_transform_func("154s,t53") == expected

-223

~~test_natsort/test_main.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		Test the natsort command-line tool functions.
3		"""
4		from __future__ import print_function, unicode_literals
5
6		import re
7		import sys
8
9		import pytest
10		from hypothesis import given
11		from hypothesis.strategies import data, floats, integers, lists
12		from natsort.__main__ import (
13		check_filters,
14		keep_entry_range,
15		keep_entry_value,
16		main,
17		range_check,
18		sort_and_print_entries,
19		)
20
21
22		def test_main_passes_default_arguments_with_no_command_line_options(mocker):
23		p = mocker.patch("natsort.__main__.sort_and_print_entries")
24		main("num-2", "num-6", "num-1")
25		args = p.call_args[0][1]
26		assert not args.paths
27		assert args.filter is None
28		assert args.reverse_filter is None
29		assert args.exclude is None
30		assert not args.reverse
31		assert args.number_type == "int"
32		assert not args.signed
33		assert args.exp
34		assert not args.locale
35
36
37		def test_main_passes_arguments_with_all_command_line_options(mocker):
38		arguments = ["--paths", "--reverse", "--locale"]
39		arguments.extend(["--filter", "4", "10"])
40		arguments.extend(["--reverse-filter", "100", "110"])
41		arguments.extend(["--number-type", "float"])
42		arguments.extend(["--noexp", "--sign"])
43		arguments.extend(["--exclude", "34"])
44		arguments.extend(["--exclude", "35"])
45		arguments.extend(["num-2", "num-6", "num-1"])
46		p = mocker.patch("natsort.__main__.sort_and_print_entries")
47		main(*arguments)
48		args = p.call_args[0][1]
49		assert args.paths
50		assert args.filter == [(4.0, 10.0)]
51		assert args.reverse_filter == [(100.0, 110.0)]
52		assert args.exclude == [34, 35]
53		assert args.reverse
54		assert args.number_type == "float"
55		assert args.signed
56		assert not args.exp
57		assert args.locale
58
59
60		class Args:
61		"""A dummy class to simulate the argparse Namespace object"""
62
63		def __init__(self, filt, reverse_filter, exclude, as_path, reverse):
64		self.filter = filt
65		self.reverse_filter = reverse_filter
66		self.exclude = exclude
67		self.reverse = reverse
68		self.number_type = "float"
69		self.signed = True
70		self.exp = True
71		self.paths = as_path
72		self.locale = 0
73
74
75		mock_print = "__builtin__.print" if sys.version[0] == "2" else "builtins.print"
76
77		entries = [
78		"tmp/a57/path2",
79		"tmp/a23/path1",
80		"tmp/a1/path1",
81		"tmp/a1 (1)/path1",
82		"tmp/a130/path1",
83		"tmp/a64/path1",
84		"tmp/a64/path2",
85		]
86
87
88		@pytest.mark.parametrize(
89		"options, order",
90		[
91		# Defaults, all options false
92		# tmp/a1 (1)/path1
93		# tmp/a1/path1
94		# tmp/a23/path1
95		# tmp/a57/path2
96		# tmp/a64/path1
97		# tmp/a64/path2
98		# tmp/a130/path1
99		([None, None, False, False, False], [3, 2, 1, 0, 5, 6, 4]),
100		# Path option True
101		# tmp/a1/path1
102		# tmp/a1 (1)/path1
103		# tmp/a23/path1
104		# tmp/a57/path2
105		# tmp/a64/path1
106		# tmp/a64/path2
107		# tmp/a130/path1
108		([None, None, False, True, False], [2, 3, 1, 0, 5, 6, 4]),
109		# Filter option keeps only within range
110		# tmp/a23/path1
111		# tmp/a57/path2
112		# tmp/a64/path1
113		# tmp/a64/path2
114		([[(20, 100)], None, False, False, False], [1, 0, 5, 6]),
115		# Reverse filter, exclude in range
116		# tmp/a1/path1
117		# tmp/a1 (1)/path1
118		# tmp/a130/path1
119		([None, [(20, 100)], False, True, False], [2, 3, 4]),
120		# Exclude given values with exclude list
121		# tmp/a1/path1
122		# tmp/a1 (1)/path1
123		# tmp/a57/path2
124		# tmp/a64/path1
125		# tmp/a64/path2
126		([None, None, [23, 130], True, False], [2, 3, 0, 5, 6]),
127		# Reverse order
128		# tmp/a130/path1
129		# tmp/a64/path2
130		# tmp/a64/path1
131		# tmp/a57/path2
132		# tmp/a23/path1
133		# tmp/a1 (1)/path1
134		# tmp/a1/path1
135		([None, None, False, True, True], reversed([2, 3, 1, 0, 5, 6, 4])),
136		],
137		)
138		def test_sort_and_print_entries(options, order, mocker):
139		p = mocker.patch(mock_print)
140		sort_and_print_entries(entries, Args(*options))
141		e = [mocker.call(entries[i]) for i in order]
142		p.assert_has_calls(e)
143
144
145		# Each test has an "example" version for demonstrative purposes,
146		# and a test that uses the hypothesis module.
147
148
149		def test_range_check_returns_range_as_is_but_with_floats_example():
150		assert range_check(10, 11) == (10.0, 11.0)
151		assert range_check(6.4, 30) == (6.4, 30.0)
152
153
154		@given(x=floats(allow_nan=False, min_value=-1E8, max_value=1E8) \| integers(), d=data())
155		def test_range_check_returns_range_as_is_if_first_is_less_than_second(x, d):
156		# Pull data such that the first is less than the second.
157		if isinstance(x, float):
158		y = d.draw(floats(min_value=x + 1.0, max_value=1E9, allow_nan=False))
159		else:
160		y = d.draw(integers(min_value=x + 1))
161		assert range_check(x, y) == (x, y)
162
163
164		def test_range_check_raises_value_error_if_second_is_less_than_first_example():
165		with pytest.raises(ValueError, match="low >= high"):
166		range_check(7, 2)
167
168
169		@given(x=floats(allow_nan=False), d=data())
170		def test_range_check_raises_value_error_if_second_is_less_than_first(x, d):
171		# Pull data such that the first is greater than or equal to the second.
172		y = d.draw(floats(max_value=x, allow_nan=False))
173		with pytest.raises(ValueError, match="low >= high"):
174		range_check(x, y)
175
176
177		def test_check_filters_returns_none_if_filter_evaluates_to_false():
178		assert check_filters(()) is None
179		assert check_filters(False) is None
180		assert check_filters(None) is None
181
182
183		def test_check_filters_returns_input_as_is_if_filter_is_valid_example():
184		assert check_filters([(6, 7)]) == [(6, 7)]
185		assert check_filters([(6, 7), (2, 8)]) == [(6, 7), (2, 8)]
186
187
188		@given(x=lists(integers(), min_size=1), d=data())
189		def test_check_filters_returns_input_as_is_if_filter_is_valid(x, d):
190		# ensure y is element-wise greater than x
191		y = [d.draw(integers(min_value=val + 1)) for val in x]
192		assert check_filters(list(zip(x, y))) == [(i, j) for i, j in zip(x, y)]
193
194
195		def test_check_filters_raises_value_error_if_filter_is_invalid_example():
196		with pytest.raises(ValueError, match="Error in --filter: low >= high"):
197		check_filters([(7, 2)])
198
199
200		@given(x=lists(integers(), min_size=1), d=data())
201		def test_check_filters_raises_value_error_if_filter_is_invalid(x, d):
202		# ensure y is element-wise less than or equal to x
203		y = [d.draw(integers(max_value=val)) for val in x]
204		with pytest.raises(ValueError, match="Error in --filter: low >= high"):
205		check_filters(list(zip(x, y)))
206
207
208		@pytest.mark.parametrize(
209		"lows, highs, truth",
210		# 1. Any portion is between the bounds => True.
211		# 2. Any portion is between any bounds => True.
212		# 3. No portion is between the bounds => False.
213		[([0], [100], True), ([1, 88], [20, 90], True), ([1], [20], False)],
214		)
215		def test_keep_entry_range(lows, highs, truth):
216		assert keep_entry_range("a56b23c89", lows, highs, int, re.compile(r"\d+")) is truth
217
218
219		# 1. Values not in entry => True. 2. Values in entry => False.
220		@pytest.mark.parametrize("values, truth", [([100, 45], True), ([23], False)])
221		def test_keep_entry_value(values, truth):
222		assert keep_entry_value("a56b23c89", values, int, re.compile(r"\d+")) is truth

-83

~~test_natsort/test_natsort_cmp.py~~ less more

0		# -- coding: utf-8 --
1		# pylint: disable=unused-variable
2		"""These test the natcmp() function.
3
4		Note that these tests are only relevant for Python version < 3.
5		"""
6		from functools import partial
7
8		import pytest
9		from hypothesis import given
10		from hypothesis.strategies import floats, integers, lists
11		from natsort import ns
12		from natsort.compat.py23 import PY_VERSION, py23_cmp
13
14		if PY_VERSION < 3:
15		from natsort import natcmp
16
17
18		class Comparable(object):
19		"""Stub class for testing natcmp functionality."""
20
21		def __init__(self, value):
22		self.value = value
23
24		def __cmp__(self, other):
25		return natcmp(self.value, other.value)
26
27
28		@pytest.mark.skipif(PY_VERSION >= 3.0, reason="cmp() deprecated in Python 3")
29		class TestNatCmp:
30
31		def test_classes_can_be_compared(self):
32		one = Comparable("1")
33		two = Comparable("2")
34		another_two = Comparable("2")
35		ten = Comparable("10")
36		assert ten > two == another_two > one
37
38		def test_keys_are_being_cached(self, mocker):
39		natcmp.cached_keys = {}
40		assert len(natcmp.cached_keys) == 0
41		natcmp(0, 0)
42		assert len(natcmp.cached_keys) == 1
43		natcmp(0, 0)
44		assert len(natcmp.cached_keys) == 1
45
46		with mocker.patch("natsort.compat.locale.dumb_sort", return_value=False):
47		natcmp(0, 0, alg=ns.L)
48		assert len(natcmp.cached_keys) == 2
49		natcmp(0, 0, alg=ns.L)
50		assert len(natcmp.cached_keys) == 2
51
52		with mocker.patch("natsort.compat.locale.dumb_sort", return_value=True):
53		natcmp(0, 0, alg=ns.L)
54		assert len(natcmp.cached_keys) == 3
55		natcmp(0, 0, alg=ns.L)
56		assert len(natcmp.cached_keys) == 3
57
58		def test_illegal_algorithm_raises_error(self):
59		with pytest.raises(ValueError):
60		natcmp(0, 0, alg="Just random stuff")
61
62		def test_classes_can_utilize_max_or_min(self):
63		comparables = [Comparable(i) for i in range(10)]
64
65		assert max(comparables) == comparables[-1]
66		assert min(comparables) == comparables[0]
67
68		@given(integers(), integers())
69		def test_natcmp_works_the_same_for_integers_as_cmp(self, x, y):
70		assert py23_cmp(x, y) == natcmp(x, y)
71
72		@given(floats(allow_nan=False), floats(allow_nan=False))
73		def test_natcmp_works_the_same_for_floats_as_cmp(self, x, y):
74		assert py23_cmp(x, y) == natcmp(x, y)
75
76		@given(lists(elements=integers()))
77		def test_sort_strings_with_numbers(self, a_list):
78		strings = [str(var) for var in a_list]
79		# noinspection PyArgumentList
80		natcmp_sorted = sorted(strings, cmp=partial(natcmp, alg=ns.SIGNED))
81
82		assert sorted(a_list) == [int(var) for var in natcmp_sorted]

-49

~~test_natsort/test_natsort_key.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import pytest
5		from hypothesis import given
6		from hypothesis.strategies import binary, floats, integers, lists, text
7		from natsort.compat.py23 import PY_VERSION, py23_str
8		from natsort.utils import natsort_key
9
10		if PY_VERSION >= 3:
11		long = int
12
13
14		def str_func(x):
15		if isinstance(x, py23_str):
16		return x
17		else:
18		raise TypeError("Not a str!")
19
20
21		def fail(_):
22		raise AssertionError("This should never be reached!")
23
24
25		@given(floats(allow_nan=False) \| integers())
26		def test_natsort_key_with_numeric_input_takes_number_path(x):
27		assert natsort_key(x, None, str_func, fail, lambda y: y) is x
28
29
30		@pytest.mark.skipif(PY_VERSION < 3, reason="only valid on python3")
31		@given(binary().filter(bool))
32		def test_natsort_key_with_bytes_input_takes_bytes_path(x):
33		assert natsort_key(x, None, str_func, lambda y: y, fail) is x
34
35
36		@given(text())
37		def test_natsort_key_with_text_input_takes_string_path(x):
38		assert natsort_key(x, None, str_func, fail, fail) is x
39
40
41		@given(lists(elements=text(), min_size=1, max_size=10))
42		def test_natsort_key_with_nested_input_takes_nested_path(x):
43		assert natsort_key(x, None, str_func, fail, fail) == tuple(x)
44
45
46		@given(text())
47		def test_natsort_key_with_key_argument_applies_key_before_processing(x):
48		assert natsort_key(x, len, str_func, fail, lambda y: y) == len(x)

-168

~~test_natsort/test_natsort_keygen.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		Here are a collection of examples of how this module can be used.
3		See the README or the natsort homepage for more details.
4		"""
5		from __future__ import print_function, unicode_literals
6
7		import pytest
8		from natsort import natsort_key, natsort_keygen, natsorted, ns
9		from natsort.compat.locale import get_strxfrm, null_string_locale
10		from natsort.compat.py23 import PY_VERSION
11
12
13		@pytest.fixture
14		def arbitrary_input():
15		return ["6A-5.034e+1", "/Folder (1)/Foo", 56.7]
16
17
18		@pytest.fixture
19		def bytes_input():
20		return b"6A-5.034e+1"
21
22
23		def test_natsort_keygen_demonstration():
24		original_list = ["a50", "a51.", "a50.31", "a50.4", "a5.034e1", "a50.300"]
25		copy_of_list = original_list[:]
26		original_list.sort(key=natsort_keygen(alg=ns.F))
27		# natsorted uses the output of natsort_keygen under the hood.
28		assert original_list == natsorted(copy_of_list, alg=ns.F)
29
30
31		def test_natsort_key_public():
32		assert natsort_key("a-5.034e2") == ("a-", 5, ".", 34, "e", 2)
33
34
35		def test_natsort_keygen_with_invalid_alg_input_raises_value_error():
36		# Invalid arguments give the correct response
37		with pytest.raises(ValueError, match="'alg' argument"):
38		natsort_keygen(None, "1")
39
40
41		@pytest.mark.parametrize(
42		"alg, expected",
43		[(ns.DEFAULT, ("a-", 5, ".", 34, "e", 1)), (ns.FLOAT \| ns.SIGNED, ("a", -50.34))],
44		)
45		def test_natsort_keygen_returns_natsort_key_that_parses_input(alg, expected):
46		ns_key = natsort_keygen(alg=alg)
47		assert ns_key("a-5.034e1") == expected
48
49
50		@pytest.mark.parametrize(
51		"alg, expected",
52		[
53		(
54		ns.DEFAULT,
55		(("", 6, "A-", 5, ".", 34, "e+", 1), ("/Folder (", 1, ")/Foo"), ("", 56.7)),
56		),
57		(
58		ns.IGNORECASE,
59		(("", 6, "a-", 5, ".", 34, "e+", 1), ("/folder (", 1, ")/foo"), ("", 56.7)),
60		),
61		(ns.REAL, (("", 6.0, "A", -50.34), ("/Folder (", 1.0, ")/Foo"), ("", 56.7))),
62		(
63		ns.LOWERCASEFIRST \| ns.FLOAT \| ns.NOEXP,
64		(
65		("", 6.0, "a-", 5.034, "E+", 1.0),
66		("/fOLDER (", 1.0, ")/fOO"),
67		("", 56.7),
68		),
69		),
70		(
71		ns.PATH \| ns.GROUPLETTERS,
72		(
73		(("", 6, "aA--", 5, "..", 34, "ee++", 1),),
74		(("//",), ("fFoollddeerr ((", 1, "))"), ("fFoooo",)),
75		(("", 56.7),),
76		),
77		),
78		],
79		)
80		def test_natsort_keygen_handles_arbitrary_input(arbitrary_input, alg, expected):
81		ns_key = natsort_keygen(alg=alg)
82		assert ns_key(arbitrary_input) == expected
83
84
85		@pytest.mark.parametrize(
86		"alg, expected",
87		[
88		(ns.DEFAULT, (b"6A-5.034e+1",)),
89		(ns.IGNORECASE, (b"6a-5.034e+1",)),
90		(ns.REAL, (b"6A-5.034e+1",)),
91		(ns.LOWERCASEFIRST \| ns.FLOAT \| ns.NOEXP, (b"6A-5.034e+1",)),
92		(ns.PATH \| ns.GROUPLETTERS, ((b"6A-5.034e+1",),)),
93		],
94		)
95		@pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
96		def test_natsort_keygen_handles_bytes_input(bytes_input, alg, expected):
97		ns_key = natsort_keygen(alg=alg)
98		assert ns_key(bytes_input) == expected
99
100
101		@pytest.mark.parametrize(
102		"alg, expected, is_dumb",
103		[
104		(
105		ns.LOCALE,
106		(
107		(null_string_locale, 6, "A-", 5, ".", 34, "e+", 1),
108		("/Folder (", 1, ")/Foo"),
109		(null_string_locale, 56.7),
110		),
111		False,
112		),
113		(
114		ns.LOCALE,
115		(
116		(null_string_locale, 6, "aa--", 5, "..", 34, "eE++", 1),
117		("//ffoOlLdDeErR ((", 1, "))//ffoOoO"),
118		(null_string_locale, 56.7),
119		),
120		True,
121		),
122		(
123		ns.LOCALE \| ns.CAPITALFIRST,
124		(
125		(("",), (null_string_locale, 6, "A-", 5, ".", 34, "e+", 1)),
126		(("/",), ("/Folder (", 1, ")/Foo")),
127		(("",), (null_string_locale, 56.7)),
128		),
129		False,
130		),
131		],
132		)
133		@pytest.mark.usefixtures("with_locale_en_us")
134		def test_natsort_keygen_with_locale(mocker, arbitrary_input, alg, expected, is_dumb):
135		# First, apply the correct strxfrm function to the string values.
136		strxfrm = get_strxfrm()
137		expected = [list(sub) for sub in expected]
138		try:
139		for i in (2, 4, 6):
140		expected[0][i] = strxfrm(expected[0][i])
141		for i in (0, 2):
142		expected[1][i] = strxfrm(expected[1][i])
143		expected = tuple(tuple(sub) for sub in expected)
144		except IndexError: # ns.LOCALE \| ns.CAPITALFIRST
145		expected = [[list(subsub) for subsub in sub] for sub in expected]
146		for i in (2, 4, 6):
147		expected[0][1][i] = strxfrm(expected[0][1][i])
148		for i in (0, 2):
149		expected[1][1][i] = strxfrm(expected[1][1][i])
150		expected = tuple(tuple(tuple(subsub) for subsub in sub) for sub in expected)
151
152		with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
153		ns_key = natsort_keygen(alg=alg)
154		assert ns_key(arbitrary_input) == expected
155
156
157		@pytest.mark.parametrize(
158		"alg, is_dumb",
159		[(ns.LOCALE, False), (ns.LOCALE, True), (ns.LOCALE \| ns.CAPITALFIRST, False)],
160		)
161		@pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
162		@pytest.mark.usefixtures("with_locale_en_us")
163		def test_natsort_keygen_with_locale_bytes(mocker, bytes_input, alg, is_dumb):
164		expected = (b"6A-5.034e+1",)
165		with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
166		ns_key = natsort_keygen(alg=alg)
167		assert ns_key(bytes_input) == expected

-299

~~test_natsort/test_natsorted.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		Here are a collection of examples of how this module can be used.
3		See the README or the natsort homepage for more details.
4		"""
5		from __future__ import print_function, unicode_literals
6
7		from operator import itemgetter
8
9		import pytest
10		from natsort import as_utf8, natsorted, ns
11		from natsort.compat.py23 import PY_VERSION
12		from pytest import raises
13
14
15		@pytest.fixture
16		def float_list():
17		return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
18
19
20		@pytest.fixture
21		def fruit_list():
22		return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
23
24
25		@pytest.fixture
26		def mixed_list():
27		return ["Ä", "0", "ä", 3, "b", 1.5, "2", "Z"]
28
29
30		def test_natsorted_numbers_in_ascending_order():
31		given = ["a2", "a5", "a9", "a1", "a4", "a10", "a6"]
32		expected = ["a1", "a2", "a4", "a5", "a6", "a9", "a10"]
33		assert natsorted(given) == expected
34
35
36		def test_natsorted_can_sort_as_signed_floats_with_exponents(float_list):
37		expected = ["a-50", "a50", "a50.300", "a50.31", "a5.034e1", "a50.4", "a51."]
38		assert natsorted(float_list, alg=ns.REAL) == expected
39
40
41		@pytest.mark.parametrize(
42		# UNSIGNED is default
43		"alg",
44		[ns.NOEXP \| ns.FLOAT \| ns.UNSIGNED, ns.NOEXP \| ns.FLOAT],
45		)
46		def test_natsorted_can_sort_as_unsigned_and_ignore_exponents(float_list, alg):
47		expected = ["a5.034e1", "a50", "a50.300", "a50.31", "a50.4", "a51.", "a-50"]
48		assert natsorted(float_list, alg=alg) == expected
49
50
51		# INT, DIGIT, and VERSION are all equivalent.
52		@pytest.mark.parametrize("alg", [ns.DEFAULT, ns.INT, ns.DIGIT, ns.VERSION])
53		def test_natsorted_can_sort_as_unsigned_ints_which_is_default(float_list, alg):
54		expected = ["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"]
55		assert natsorted(float_list, alg=alg) == expected
56
57
58		def test_natsorted_can_sort_as_signed_ints(float_list):
59		expected = ["a-50", "a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51."]
60		assert natsorted(float_list, alg=ns.SIGNED) == expected
61
62
63		@pytest.mark.parametrize(
64		"alg, expected",
65		[(ns.UNSIGNED, ["a7", "a+2", "a-5"]), (ns.SIGNED, ["a-5", "a+2", "a7"])],
66		)
67		def test_natsorted_can_sort_with_or_without_accounting_for_sign(alg, expected):
68		given = ["a-5", "a7", "a+2"]
69		assert natsorted(given, alg=alg) == expected
70
71
72		@pytest.mark.parametrize("alg", [ns.DEFAULT, ns.VERSION])
73		def test_natsorted_can_sort_as_version_numbers(alg):
74		given = ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
75		expected = ["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"]
76		assert natsorted(given, alg=alg) == expected
77
78
79		@pytest.mark.parametrize(
80		"alg, expected",
81		[
82		(ns.DEFAULT, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
83		(ns.NUMAFTER, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
84		],
85		)
86		def test_natsorted_handles_mixed_types(mixed_list, alg, expected):
87		assert natsorted(mixed_list, alg=alg) == expected
88
89
90		@pytest.mark.parametrize(
91		"alg, expected, slc",
92		[
93		(ns.DEFAULT, [float("nan"), 5, "25", 1E40], slice(1, None)),
94		(ns.NANLAST, [5, "25", 1E40, float("nan")], slice(None, 3)),
95		],
96		)
97		def test_natsorted_handles_nan(alg, expected, slc):
98		given = ["25", 5, float("nan"), 1E40]
99		# The slice is because NaN != NaN
100		# noinspection PyUnresolvedReferences
101		assert natsorted(given, alg=alg)[slc] == expected[slc]
102
103
104		@pytest.mark.skipif(PY_VERSION < 3.0, reason="error is only raised on Python 3")
105		def test_natsorted_with_mixed_bytes_and_str_input_raises_type_error():
106		with raises(TypeError, match="bytes"):
107		natsorted(["ä", b"b"])
108
109		# ...unless you use as_utf (or some other decoder).
110		assert natsorted(["ä", b"b"], key=as_utf8) == ["ä", b"b"]
111
112
113		def test_natsorted_raises_type_error_for_non_iterable_input():
114		with raises(TypeError, match="'int' object is not iterable"):
115		natsorted(100)
116
117
118		def test_natsorted_recurses_into_nested_lists():
119		given = [["a1", "a5"], ["a1", "a40"], ["a10", "a1"], ["a2", "a5"]]
120		expected = [["a1", "a5"], ["a1", "a40"], ["a2", "a5"], ["a10", "a1"]]
121		assert natsorted(given) == expected
122
123
124		def test_natsorted_applies_key_to_each_list_element_before_sorting_list():
125		given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
126		expected = [("c", "num2"), ("a", "num3"), ("b", "num5")]
127		assert natsorted(given, key=itemgetter(1)) == expected
128
129
130		def test_natsorted_returns_list_in_reversed_order_with_reverse_option(float_list):
131		expected = natsorted(float_list)[::-1]
132		assert natsorted(float_list, reverse=True) == expected
133
134
135		def test_natsorted_handles_filesystem_paths():
136		given = [
137		"/p/Folder (10)/file.tar.gz",
138		"/p/Folder/file.tar.gz",
139		"/p/Folder (1)/file (1).tar.gz",
140		"/p/Folder (1)/file.tar.gz",
141		]
142		expected_correct = [
143		"/p/Folder/file.tar.gz",
144		"/p/Folder (1)/file.tar.gz",
145		"/p/Folder (1)/file (1).tar.gz",
146		"/p/Folder (10)/file.tar.gz",
147		]
148		expected_incorrect = [
149		"/p/Folder (1)/file (1).tar.gz",
150		"/p/Folder (1)/file.tar.gz",
151		"/p/Folder (10)/file.tar.gz",
152		"/p/Folder/file.tar.gz",
153		]
154		# Is incorrect by default.
155		assert natsorted(given) == expected_incorrect
156		# Need ns.PATH to make it correct.
157		assert natsorted(given, alg=ns.PATH) == expected_correct
158
159
160		def test_natsorted_handles_numbers_and_filesystem_paths_simultaneously():
161		# You can sort paths and numbers, not that you'd want to
162		given = ["/Folder (9)/file.exe", 43]
163		expected = [43, "/Folder (9)/file.exe"]
164		assert natsorted(given, alg=ns.PATH) == expected
165
166
167		@pytest.mark.parametrize(
168		"alg, expected",
169		[
170		(ns.DEFAULT, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
171		(ns.IGNORECASE, ["Apple", "apple", "Banana", "banana", "corn", "Corn"]),
172		(ns.LOWERCASEFIRST, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
173		(ns.GROUPLETTERS, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
174		(ns.G \| ns.LF, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
175		],
176		)
177		def test_natsorted_supports_case_handling(alg, expected, fruit_list):
178		assert natsorted(fruit_list, alg=alg) == expected
179
180
181		@pytest.mark.parametrize(
182		"alg, expected",
183		[
184		(ns.DEFAULT, [("A5", "a6"), ("a3", "a1")]),
185		(ns.LOWERCASEFIRST, [("a3", "a1"), ("A5", "a6")]),
186		(ns.IGNORECASE, [("a3", "a1"), ("A5", "a6")]),
187		],
188		)
189		def test_natsorted_supports_nested_case_handling(alg, expected):
190		given = [("A5", "a6"), ("a3", "a1")]
191		assert natsorted(given, alg=alg) == expected
192
193
194		@pytest.mark.parametrize(
195		"alg, expected",
196		[
197		(ns.DEFAULT, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
198		(ns.CAPITALFIRST, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
199		(ns.LOWERCASEFIRST, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
200		(ns.C \| ns.LF, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
201		],
202		)
203		@pytest.mark.usefixtures("with_locale_en_us")
204		def test_natsorted_can_sort_using_locale(fruit_list, alg, expected):
205		assert natsorted(fruit_list, alg=ns.LOCALE \| alg) == expected
206
207
208		@pytest.mark.usefixtures("with_locale_en_us")
209		def test_natsorted_can_sort_locale_specific_numbers_en():
210		given = ["c", "a5,467.86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
211		expected = ["a5,6", "a5,50", "a5367.86", "a5,467.86", "ä", "b", "c"]
212		assert natsorted(given, alg=ns.LOCALE \| ns.F) == expected
213
214
215		@pytest.mark.usefixtures("with_locale_de_de")
216		def test_natsorted_can_sort_locale_specific_numbers_de():
217		given = ["c", "a5.467,86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
218		expected = ["a5,50", "a5,6", "a5367.86", "a5.467,86", "ä", "b", "c"]
219		assert natsorted(given, alg=ns.LOCALE \| ns.F) == expected
220
221
222		@pytest.mark.parametrize(
223		"alg, expected",
224		[
225		(ns.DEFAULT, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
226		(ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
227		(ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
228		(ns.UG \| ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
229		# Adding PATH changes nothing.
230		(ns.PATH, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
231		(ns.PATH \| ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
232		(ns.PATH \| ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
233		(ns.PATH \| ns.UG \| ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
234		],
235		)
236		@pytest.mark.usefixtures("with_locale_en_us")
237		def test_natsorted_handles_mixed_types_with_locale(mixed_list, alg, expected):
238		assert natsorted(mixed_list, alg=ns.LOCALE \| alg) == expected
239
240
241		@pytest.mark.parametrize(
242		"alg, expected",
243		[
244		(ns.DEFAULT, ["73", "5039", "Banana", "apple", "corn", "~~~~~~"]),
245		(ns.NUMAFTER, ["Banana", "apple", "corn", "~~~~~~", "73", "5039"]),
246		],
247		)
248		def test_natsorted_sorts_an_odd_collection_of_strings(alg, expected):
249		given = ["apple", "Banana", "73", "5039", "corn", "~~~~~~"]
250		assert natsorted(given, alg=alg) == expected
251
252
253		def test_natsorted_sorts_mixed_ascii_and_non_ascii_numbers():
254		given = [
255		"1st street",
256		"10th street",
257		"2nd street",
258		"2 street",
259		"1 street",
260		"1street",
261		"11 street",
262		"street 2",
263		"street 1",
264		"Street 11",
265		"۲ street",
266		"۱ street",
267		"۱street",
268		"۱۲street",
269		"۱۱ street",
270		"street ۲",
271		"street ۱",
272		"street ۱",
273		"street ۱۲",
274		"street ۱۱",
275		]
276		expected = [
277		"1 street",
278		"۱ street",
279		"1st street",
280		"1street",
281		"۱street",
282		"2 street",
283		"۲ street",
284		"2nd street",
285		"10th street",
286		"11 street",
287		"۱۱ street",
288		"۱۲street",
289		"street 1",
290		"street ۱",
291		"street ۱",
292		"street 2",
293		"street ۲",
294		"Street 11",
295		"street ۱۱",
296		"street ۱۲",
297		]
298		assert natsorted(given, alg=ns.IGNORECASE) == expected

-129

~~test_natsort/test_natsorted_convenience.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		Here are a collection of examples of how this module can be used.
3		See the README or the natsort homepage for more details.
4		"""
5		from __future__ import print_function, unicode_literals
6
7		from operator import itemgetter
8
9		import pytest
10		from natsort import (
11		as_ascii,
12		as_utf8,
13		decoder,
14		humansorted,
15		index_humansorted,
16		index_natsorted,
17		index_realsorted,
18		index_versorted,
19		natsorted,
20		ns,
21		order_by_index,
22		realsorted,
23		versorted,
24		)
25		from natsort.compat.py23 import PY_VERSION
26
27
28		@pytest.fixture
29		def version_list():
30		return ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
31
32
33		@pytest.fixture
34		def float_list():
35		return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
36
37
38		@pytest.fixture
39		def fruit_list():
40		return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
41
42
43		def test_decoder_returns_function_that_can_decode_bytes_but_return_non_bytes_as_is():
44		func = decoder("latin1")
45		str_obj = "bytes"
46		int_obj = 14
47		assert func(b"bytes") == str_obj
48		assert func(int_obj) is int_obj # returns as-is, same object ID
49		if PY_VERSION >= 3:
50		assert (
51		func(str_obj) is str_obj
52		) # same object returned on Python3 b/c only bytes has decode
53		else:
54		assert func(str_obj) is not str_obj
55		assert (
56		func(str_obj) == str_obj
57		) # not same object on Python2 because str can decode
58
59
60		def test_as_ascii_converts_bytes_to_ascii():
61		assert decoder("ascii")(b"bytes") == as_ascii(b"bytes")
62
63
64		def test_as_utf8_converts_bytes_to_utf8():
65		assert decoder("utf8")(b"bytes") == as_utf8(b"bytes")
66
67
68		def test_versorted_is_identical_to_natsorted(version_list):
69		# versorted is retained for backwards compatibility
70		assert versorted(version_list) == natsorted(version_list)
71
72
73		def test_realsorted_is_identical_to_natsorted_with_real_alg(float_list):
74		assert realsorted(float_list) == natsorted(float_list, alg=ns.REAL)
75
76
77		@pytest.mark.usefixtures("with_locale_en_us")
78		def test_humansorted_is_identical_to_natsorted_with_locale_alg(fruit_list):
79		assert humansorted(fruit_list) == natsorted(fruit_list, alg=ns.LOCALE)
80
81
82		def test_index_natsorted_returns_integer_list_of_sort_order_for_input_list():
83		given = ["num3", "num5", "num2"]
84		other = ["foo", "bar", "baz"]
85		index = index_natsorted(given)
86		assert index == [2, 0, 1]
87		assert [given[i] for i in index] == ["num2", "num3", "num5"]
88		assert [other[i] for i in index] == ["baz", "foo", "bar"]
89
90
91		def test_index_natsorted_reverse():
92		given = ["num3", "num5", "num2"]
93		assert index_natsorted(given, reverse=True) == index_natsorted(given)[::-1]
94
95
96		def test_index_natsorted_applies_key_function_before_sorting():
97		given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
98		expected = [2, 0, 1]
99		assert index_natsorted(given, key=itemgetter(1)) == expected
100
101
102		def test_index_versorted_is_identical_to_index_natsorted(version_list):
103		# index_versorted is retained for backwards compatibility
104		assert index_versorted(version_list) == index_natsorted(version_list)
105
106
107		def test_index_realsorted_is_identical_to_index_natsorted_with_real_alg(float_list):
108		assert index_realsorted(float_list) == index_natsorted(float_list, alg=ns.REAL)
109
110
111		@pytest.mark.usefixtures("with_locale_en_us")
112		def test_index_humansorted_is_identical_to_index_natsorted_with_locale_alg(fruit_list):
113		assert index_humansorted(fruit_list) == index_natsorted(fruit_list, alg=ns.LOCALE)
114
115
116		def test_order_by_index_sorts_list_according_to_order_of_integer_list():
117		given = ["num3", "num5", "num2"]
118		index = [2, 0, 1]
119		expected = [given[i] for i in index]
120		assert expected == ["num2", "num3", "num5"]
121		assert order_by_index(given, index) == expected
122
123
124		def test_order_by_index_returns_generator_with_iter_true():
125		given = ["num3", "num5", "num2"]
126		index = [2, 0, 1]
127		assert order_by_index(given, index, True) != [given[i] for i in index]
128		assert list(order_by_index(given, index, True)) == [given[i] for i in index]

-25

~~test_natsort/test_parse_bytes_function.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import pytest
5		from hypothesis import given
6		from hypothesis.strategies import binary
7		from natsort.ns_enum import ns
8		from natsort.utils import parse_bytes_factory
9
10
11		@pytest.mark.parametrize(
12		"alg, example_func",
13		[
14		(ns.DEFAULT, lambda x: (x,)),
15		(ns.IGNORECASE, lambda x: (x.lower(),)),
16		# With PATH, it becomes a tested tuple.
17		(ns.PATH, lambda x: ((x,),)),
18		(ns.PATH \| ns.IGNORECASE, lambda x: ((x.lower(),),)),
19		],
20		)
21		@given(x=binary())
22		def test_parse_bytest_factory_makes_function_that_returns_tuple(x, alg, example_func):
23		parse_bytes_func = parse_bytes_factory(alg)
24		assert parse_bytes_func(x) == example_func(x)

-38

~~test_natsort/test_parse_number_function.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import pytest
5		from hypothesis import given
6		from hypothesis.strategies import floats, integers
7		from natsort.ns_enum import ns
8		from natsort.utils import parse_number_factory
9
10
11		@pytest.mark.usefixtures("with_locale_en_us")
12		@pytest.mark.parametrize(
13		"alg, example_func",
14		[
15		(ns.DEFAULT, lambda x: ("", x)),
16		(ns.PATH, lambda x: (("", x),)),
17		(ns.UNGROUPLETTERS \| ns.LOCALE, lambda x: (("xx",), ("", x))),
18		(ns.PATH \| ns.UNGROUPLETTERS \| ns.LOCALE, lambda x: ((("xx",), ("", x)),)),
19		],
20		)
21		@given(x=floats(allow_nan=False) \| integers())
22		def test_parse_number_factory_makes_function_that_returns_tuple(x, alg, example_func):
23		parse_number_func = parse_number_factory(alg, "", "xx")
24		assert parse_number_func(x) == example_func(x)
25
26
27		@pytest.mark.parametrize(
28		"alg, x, result",
29		[
30		(ns.DEFAULT, 57, ("", 57)),
31		(ns.DEFAULT, float("nan"), ("", float("-inf"))), # NaN transformed to -infinity
32		(ns.NANLAST, float("nan"), ("", float("+inf"))), # NANLAST makes it +infinity
33		],
34		)
35		def test_parse_number_factory_treats_nan_special(alg, x, result):
36		parse_number_func = parse_number_factory(alg, "", "xx")
37		assert parse_number_func(x) == result

-93

~~test_natsort/test_parse_string_function.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import unicodedata
5
6		import pytest
7		from hypothesis import given
8		from hypothesis.strategies import floats, integers, lists, text
9		from natsort.compat.fastnumbers import fast_float
10		from natsort.compat.py23 import py23_str
11		from natsort.ns_enum import ns, ns_DUMB
12		from natsort.utils import NumericalRegularExpressions as NumRegex
13		from natsort.utils import parse_string_factory
14
15
16		class CustomTuple(tuple):
17		"""Used to ensure what is given during testing is what is returned."""
18
19		original = None
20
21
22		def input_transform(x):
23		"""Make uppercase."""
24		try:
25		return x.upper()
26		except AttributeError:
27		return x
28
29
30		def final_transform(x, original):
31		"""Make the input a CustomTuple."""
32		t = CustomTuple(x)
33		t.original = original
34		return t
35
36
37		@pytest.fixture
38		def parse_string_func(request):
39		"""A parse_string_factory result with sample arguments."""
40		sep = ""
41		return parse_string_factory(
42		request.param, # algorirhm
43		sep,
44		NumRegex.int_nosign().split,
45		input_transform,
46		fast_float,
47		final_transform,
48		)
49
50
51		@pytest.mark.parametrize("parse_string_func", [ns.DEFAULT], indirect=True)
52		@given(x=floats() \| integers())
53		def test_parse_string_factory_raises_type_error_if_given_number(x, parse_string_func):
54		with pytest.raises(TypeError):
55		assert parse_string_func(x)
56
57
58		# noinspection PyCallingNonCallable
59		@pytest.mark.parametrize(
60		"parse_string_func, orig_func",
61		[
62		(ns.DEFAULT, lambda x: x.upper()),
63		(ns.LOCALE, lambda x: x.upper()),
64		(ns.LOCALE \| ns_DUMB, lambda x: x), # This changes the "original" handling.
65		],
66		indirect=["parse_string_func"],
67		)
68		@given(
69		x=lists(
70		elements=floats(allow_nan=False) \| text() \| integers(), min_size=1, max_size=10
71		)
72		)
73		@pytest.mark.usefixtures("with_locale_en_us")
74		def test_parse_string_factory_invariance(x, parse_string_func, orig_func):
75		# parse_string_factory is the high-level combination of several dedicated
76		# functions involved in splitting and manipulating a string. The details of
77		# what those functions do is not relevant to testing parse_string_factory.
78		# What is relevant is that the form of the output matches the invariant
79		# that even elements are string and odd are numerical. That each component
80		# function is doing what it should is tested elsewhere.
81		value = "".join(map(py23_str, x)) # Convert the input to a single string.
82		result = parse_string_func(value)
83		result_types = list(map(type, result))
84		expected_types = [py23_str if i % 2 == 0 else float for i in range(len(result))]
85		assert result_types == expected_types
86
87		# The result is in our CustomTuple.
88		assert isinstance(result, CustomTuple)
89
90		# Original should have gone through the "input_transform"
91		# which is uppercase in these tests.
92		assert result.original == orig_func(unicodedata.normalize("NFD", value))

-100

~~test_natsort/test_regex.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the splitting regular expressions."""
2		from __future__ import unicode_literals
3
4		import pytest
5		from natsort.utils import NumericalRegularExpressions as NumRegex
6
7
8		regex_names = {
9		NumRegex.int_nosign(): "int_nosign",
10		NumRegex.int_sign(): "int_sign",
11		NumRegex.float_nosign_noexp(): "float_nosign_noexp",
12		NumRegex.float_sign_noexp(): "float_sign_noexp",
13		NumRegex.float_nosign_exp(): "float_nosign_exp",
14		NumRegex.float_sign_exp(): "float_sign_exp",
15		}
16
17		# Regex Aliases (so lines stay a reasonable length.
18		i_u = NumRegex.int_nosign()
19		i_s = NumRegex.int_sign()
20		f_u = NumRegex.float_nosign_noexp()
21		f_s = NumRegex.float_sign_noexp()
22		f_ue = NumRegex.float_nosign_exp()
23		f_se = NumRegex.float_sign_exp()
24
25		# Assemble a test suite of regular strings and their regular expression
26		# splitting result. Organize by the input string.
27		regex_tests = {
28		"-123.45e+67": {
29		i_u: ["-", "123", ".", "45", "e+", "67", ""],
30		i_s: ["", "-123", ".", "45", "e", "+67", ""],
31		f_u: ["-", "123.45", "e+", "67", ""],
32		f_s: ["", "-123.45", "e", "+67", ""],
33		f_ue: ["-", "123.45e+67", ""],
34		f_se: ["", "-123.45e+67", ""],
35		},
36		"a-123.45e+67b": {
37		i_u: ["a-", "123", ".", "45", "e+", "67", "b"],
38		i_s: ["a", "-123", ".", "45", "e", "+67", "b"],
39		f_u: ["a-", "123.45", "e+", "67", "b"],
40		f_s: ["a", "-123.45", "e", "+67", "b"],
41		f_ue: ["a-", "123.45e+67", "b"],
42		f_se: ["a", "-123.45e+67", "b"],
43		},
44		"hello": {
45		i_u: ["hello"],
46		i_s: ["hello"],
47		f_u: ["hello"],
48		f_s: ["hello"],
49		f_ue: ["hello"],
50		f_se: ["hello"],
51		},
52		"abc12.34.56-7def": {
53		i_u: ["abc", "12", ".", "34", ".", "56", "-", "7", "def"],
54		i_s: ["abc", "12", ".", "34", ".", "56", "", "-7", "def"],
55		f_u: ["abc", "12.34", "", ".56", "-", "7", "def"],
56		f_s: ["abc", "12.34", "", ".56", "", "-7", "def"],
57		f_ue: ["abc", "12.34", "", ".56", "-", "7", "def"],
58		f_se: ["abc", "12.34", "", ".56", "", "-7", "def"],
59		},
60		"a1b2c3d4e5e6": {
61		i_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
62		i_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
63		f_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
64		f_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
65		f_ue: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
66		f_se: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
67		},
68		"eleven۱۱eleven11eleven১১": { # All of these are the decimal 11
69		i_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
70		i_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
71		f_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
72		f_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
73		f_ue: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
74		f_se: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
75		},
76		"12①②ⅠⅡ⅓": { # Two decimals, Two digits, Two numerals, fraction
77		i_u: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
78		i_s: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
79		f_u: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
80		f_s: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
81		f_ue: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
82		f_se: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
83		}
84		}
85
86
87		# From the above collections, create the parametrized tests and labels.
88		regex_params = [
89		(given, expected, regex)
90		for given, values in regex_tests.items()
91		for regex, expected in values.items()
92		]
93		labels = ["{}-{}".format(given, regex_names[regex]) for given, _, regex in regex_params]
94
95
96		@pytest.mark.parametrize("x, expected, regex", regex_params, ids=labels)
97		def test_regex_splits_correctly(x, expected, regex):
98		# noinspection PyUnresolvedReferences
99		assert regex.split(x) == expected

-78

~~test_natsort/test_string_component_transform_factory.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		from functools import partial
5
6		import pytest
7		from hypothesis import example, given
8		from hypothesis.strategies import floats, integers, text
9		from natsort.compat.fastnumbers import fast_float, fast_int
10		from natsort.compat.locale import get_strxfrm
11		from natsort.compat.py23 import py23_range, py23_str, py23_unichr
12		from natsort.ns_enum import ns, ns_DUMB
13		from natsort.utils import groupletters, string_component_transform_factory
14
15		# There are some unicode values that are known failures with the builtin locale
16		# library on BSD systems that has nothing to do with natsort (a ValueError is
17		# raised by strxfrm). Let's filter them out.
18		try:
19		bad_uni_chars = frozenset(
20		py23_unichr(x) for x in py23_range(0X10fefd, 0X10ffff + 1)
21		)
22		except ValueError:
23		# Narrow unicode build... no worries.
24		bad_uni_chars = frozenset()
25
26
27		def no_bad_uni_chars(x, _bad_chars=bad_uni_chars):
28		"""Ensure text does not contain bad unicode characters"""
29		return not any(y in _bad_chars for y in x)
30
31
32		def no_null(x):
33		"""Ensure text does not contain a null character."""
34		return "\0" not in x
35
36
37		@pytest.mark.parametrize(
38		"alg, example_func",
39		[
40		(ns.INT, fast_int),
41		(ns.DEFAULT, fast_int),
42		(ns.FLOAT, partial(fast_float, nan=float("-inf"))),
43		(ns.FLOAT \| ns.NANLAST, partial(fast_float, nan=float("+inf"))),
44		(ns.GROUPLETTERS, partial(fast_int, key=groupletters)),
45		(ns.LOCALE, partial(fast_int, key=lambda x: get_strxfrm()(x))),
46		(
47		ns.GROUPLETTERS \| ns.LOCALE,
48		partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
49		),
50		(
51		ns_DUMB \| ns.LOCALE,
52		partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
53		),
54		(
55		ns.GROUPLETTERS \| ns.LOCALE \| ns.FLOAT \| ns.NANLAST,
56		partial(
57		fast_float,
58		key=lambda x: get_strxfrm()(groupletters(x)),
59		nan=float("+inf"),
60		),
61		),
62		],
63		)
64		@example(x=float("nan"))
65		@given(
66		x=integers()
67		\| floats()
68		\| text().filter(bool).filter(no_bad_uni_chars).filter(no_null)
69		)
70		@pytest.mark.usefixtures("with_locale_en_us")
71		def test_string_component_transform_factory(x, alg, example_func):
72		string_component_transform_func = string_component_transform_factory(alg)
73		try:
74		assert string_component_transform_func(py23_str(x)) == example_func(py23_str(x))
75		except ValueError as e: # handle broken locale lib on BSD.
76		if "is not in range" not in str(e):
77		raise

-70

~~test_natsort/test_unicode_numbers.py~~ less more

0		# -- coding: utf-8 --
1		"""\
2		Test the Unicode numbers module.
3		"""
4		from __future__ import unicode_literals
5
6		import unicodedata
7
8		from natsort.compat.py23 import py23_range, py23_unichr
9		from natsort.unicode_numbers import (
10		decimal_chars,
11		decimals,
12		digit_chars,
13		digits,
14		digits_no_decimals,
15		numeric,
16		numeric_chars,
17		numeric_hex,
18		numeric_no_decimals,
19		)
20
21
22		def test_numeric_chars_contains_only_valid_unicode_numeric_characters():
23		for a in numeric_chars:
24		assert unicodedata.numeric(a, None) is not None
25
26
27		def test_digit_chars_contains_only_valid_unicode_digit_characters():
28		for a in digit_chars:
29		assert unicodedata.digit(a, None) is not None
30
31
32		def test_decimal_chars_contains_only_valid_unicode_decimal_characters():
33		for a in decimal_chars:
34		assert unicodedata.decimal(a, None) is not None
35
36
37		def test_numeric_chars_contains_all_valid_unicode_numeric_and_digit_characters():
38		set_numeric_hex = set(numeric_hex)
39		set_numeric_chars = set(numeric_chars)
40		set_digit_chars = set(digit_chars)
41		set_decimal_chars = set(decimal_chars)
42		for i in py23_range(0X110000):
43		try:
44		a = py23_unichr(i)
45		except ValueError:
46		break
47		if a in set("0123456789"):
48		continue
49		if unicodedata.numeric(a, None) is not None:
50		assert i in set_numeric_hex
51		assert a in set_numeric_chars
52		if unicodedata.digit(a, None) is not None:
53		assert i in set_numeric_hex
54		assert a in set_digit_chars
55		if unicodedata.decimal(a, None) is not None:
56		assert i in set_numeric_hex
57		assert a in set_decimal_chars
58
59		assert set_decimal_chars.isdisjoint(digits_no_decimals)
60		assert set_digit_chars.issuperset(digits_no_decimals)
61
62		assert set_decimal_chars.isdisjoint(numeric_no_decimals)
63		assert set_numeric_chars.issuperset(numeric_no_decimals)
64
65
66		def test_combined_string_contains_all_characters_in_list():
67		assert numeric == "".join(numeric_chars)
68		assert digits == "".join(digit_chars)
69		assert decimals == "".join(decimal_chars)

-197

~~test_natsort/test_utils.py~~ less more

0		# -- coding: utf-8 --
1		"""These test the utils.py functions."""
2		from __future__ import unicode_literals
3
4		import pathlib
5		import string
6		from itertools import chain
7		from operator import neg as op_neg
8
9		import pytest
10		from hypothesis import given
11		from hypothesis.strategies import integers, lists, sampled_from, text
12		from natsort import utils
13		from natsort.compat.py23 import py23_cmp, py23_int, py23_lower, py23_str
14		from natsort.ns_enum import ns
15
16
17		def test_do_decoding_decodes_bytes_string_to_unicode():
18		assert type(utils.do_decoding(b"bytes", "ascii")) is py23_str
19		assert utils.do_decoding(b"bytes", "ascii") == "bytes"
20		assert utils.do_decoding(b"bytes", "ascii") == b"bytes".decode("ascii")
21
22
23		def test_args_to_enum_raises_typeerror_for_invalid_argument():
24		with pytest.raises(TypeError):
25		utils.args_to_enum(**{"alf": 0})
26
27
28		@pytest.mark.parametrize(
29		"kwargs, expected",
30		[
31		({"number_type": float, "signed": True, "exp": True}, ns.F \| ns.S),
32		({"number_type": float, "signed": True, "exp": False}, ns.F \| ns.N \| ns.S),
33		({"number_type": float, "signed": False, "exp": True}, ns.F \| ns.U),
34		({"number_type": float, "signed": False, "exp": True}, ns.F),
35		({"number_type": float, "signed": False, "exp": False}, ns.F \| ns.U \| ns.N),
36		({"number_type": float, "as_path": True}, ns.F \| ns.P),
37		({"number_type": int, "as_path": True}, ns.I \| ns.P),
38		({"number_type": int, "signed": False}, ns.I \| ns.U),
39		({"number_type": None, "exp": True}, ns.I \| ns.U),
40		],
41		)
42		def test_args_to_enum(kwargs, expected):
43		with pytest.warns(DeprecationWarning):
44		assert utils.args_to_enum(**kwargs) == expected
45
46
47		@pytest.mark.parametrize(
48		"alg, expected",
49		[
50		(ns.I, utils.NumericalRegularExpressions.int_nosign()),
51		(ns.I \| ns.N, utils.NumericalRegularExpressions.int_nosign()),
52		(ns.I \| ns.S, utils.NumericalRegularExpressions.int_sign()),
53		(ns.I \| ns.S \| ns.N, utils.NumericalRegularExpressions.int_sign()),
54		(ns.F, utils.NumericalRegularExpressions.float_nosign_exp()),
55		(ns.F \| ns.N, utils.NumericalRegularExpressions.float_nosign_noexp()),
56		(ns.F \| ns.S, utils.NumericalRegularExpressions.float_sign_exp()),
57		(ns.F \| ns.S \| ns.N, utils.NumericalRegularExpressions.float_sign_noexp()),
58		],
59		)
60		def test_regex_chooser_returns_correct_regular_expression_object(alg, expected):
61		assert utils.regex_chooser(alg).pattern == expected.pattern
62
63
64		@pytest.mark.parametrize(
65		"alg, value_or_alias",
66		[
67		# Defaults
68		(ns.DEFAULT, 0),
69		(ns.TYPESAFE, 0),
70		(ns.INT, 0),
71		(ns.VERSION, 0),
72		(ns.DIGIT, 0),
73		(ns.UNSIGNED, 0),
74		# Aliases
75		(ns.TYPESAFE, ns.T),
76		(ns.INT, ns.I),
77		(ns.VERSION, ns.V),
78		(ns.DIGIT, ns.D),
79		(ns.UNSIGNED, ns.U),
80		(ns.FLOAT, ns.F),
81		(ns.SIGNED, ns.S),
82		(ns.NOEXP, ns.N),
83		(ns.PATH, ns.P),
84		(ns.LOCALEALPHA, ns.LA),
85		(ns.LOCALENUM, ns.LN),
86		(ns.LOCALE, ns.L),
87		(ns.IGNORECASE, ns.IC),
88		(ns.LOWERCASEFIRST, ns.LF),
89		(ns.GROUPLETTERS, ns.G),
90		(ns.UNGROUPLETTERS, ns.UG),
91		(ns.CAPITALFIRST, ns.C),
92		(ns.UNGROUPLETTERS, ns.CAPITALFIRST),
93		(ns.NANLAST, ns.NL),
94		(ns.COMPATIBILITYNORMALIZE, ns.CN),
95		(ns.NUMAFTER, ns.NA),
96		# Convenience
97		(ns.LOCALE, ns.LOCALEALPHA \| ns.LOCALENUM),
98		(ns.REAL, ns.FLOAT \| ns.SIGNED),
99		],
100		)
101		def test_ns_enum_values_and_aliases(alg, value_or_alias):
102		assert alg == value_or_alias
103
104
105		def test_chain_functions_is_a_no_op_if_no_functions_are_given():
106		x = 2345
107		assert utils.chain_functions([])(x) is x
108
109
110		def test_chain_functions_does_one_function_if_one_function_is_given():
111		x = "2345"
112		assert utils.chain_functions([len])(x) == 4
113
114
115		def test_chain_functions_combines_functions_in_given_order():
116		x = 2345
117		assert utils.chain_functions([str, len, op_neg])(x) == -len(str(x))
118
119
120		# Each test has an "example" version for demonstrative purposes,
121		# and a test that uses the hypothesis module.
122
123
124		def test_groupletters_returns_letters_with_lowercase_transform_of_letter_example():
125		assert utils.groupletters("HELLO") == "hHeElLlLoO"
126		assert utils.groupletters("hello") == "hheelllloo"
127
128
129		@given(text().filter(bool))
130		def test_groupletters_returns_letters_with_lowercase_transform_of_letter(x):
131		assert utils.groupletters(x) == "".join(
132		chain.from_iterable([py23_lower(y), y] for y in x)
133		)
134
135
136		def test_sep_inserter_does_nothing_if_no_numbers_example():
137		assert list(utils.sep_inserter(iter(["a", "b", "c"]), "")) == ["a", "b", "c"]
138		assert list(utils.sep_inserter(iter(["a"]), "")) == ["a"]
139
140
141		def test_sep_inserter_does_nothing_if_only_one_number_example():
142		assert list(utils.sep_inserter(iter(["a", 5]), "")) == ["a", 5]
143
144
145		def test_sep_inserter_inserts_separator_string_between_two_numbers_example():
146		assert list(utils.sep_inserter(iter([5, 9]), "")) == ["", 5, "", 9]
147
148
149		@given(lists(elements=text().filter(bool) \| integers(), min_size=3))
150		def test_sep_inserter_inserts_separator_between_two_numbers(x):
151		# Rather than just replicating the the results in a different
152		# algorithm, validate that the "shape" of the output is as expected.
153		result = list(utils.sep_inserter(iter(x), ""))
154		for i, pos in enumerate(result[1:-1], 1):
155		if pos == "":
156		assert isinstance(result[i - 1], py23_int)
157		assert isinstance(result[i + 1], py23_int)
158
159
160		def test_path_splitter_splits_path_string_by_separator_example():
161		z = "/this/is/a/path"
162		assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
163		z = pathlib.Path("/this/is/a/path")
164		assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
165
166
167		@given(lists(sampled_from(string.ascii_letters), min_size=2).filter(all))
168		def test_path_splitter_splits_path_string_by_separator(x):
169		z = py23_str(pathlib.Path(*x))
170		assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
171
172
173		def test_path_splitter_splits_path_string_by_separator_and_removes_extension_example():
174		z = "/this/is/a/path/file.exe"
175		y = tuple(pathlib.Path(z).parts)
176		assert tuple(utils.path_splitter(z)) == y[:-1] + (
177		pathlib.Path(z).stem,
178		pathlib.Path(z).suffix,
179		)
180
181
182		@given(lists(sampled_from(string.ascii_letters), min_size=3).filter(all))
183		def test_path_splitter_splits_path_string_by_separator_and_removes_extension(x):
184		z = py23_str(pathlib.Path(*x[:-2])) + "." + x[-1]
185		y = tuple(pathlib.Path(z).parts)
186		assert tuple(utils.path_splitter(z)) == y[:-1] + (
187		pathlib.Path(z).stem,
188		pathlib.Path(z).suffix,
189		)
190
191
192		@given(integers())
193		def test_py23_cmp(x):
194		assert py23_cmp(x, x) == 0
195		assert py23_cmp(x, x + 1) < 0
196		assert py23_cmp(x, x - 1) > 0

+39

-0

tests/conftest.py less more

	0	"""
	1	Fixtures for pytest.
	2	"""
	3
	4	import locale
	5
	6	import pytest
	7
	8
	9	def load_locale(x):
	10	"""Convenience to load a locale, trying ISO8859-1 first."""
	11	try:
	12	locale.setlocale(locale.LC_ALL, str("{}.ISO8859-1".format(x)))
	13	except locale.Error:
	14	locale.setlocale(locale.LC_ALL, str("{}.UTF-8".format(x)))
	15
	16
	17	@pytest.fixture()
	18	def with_locale_en_us():
	19	"""Convenience to load the en_US locale - reset when complete."""
	20	orig = locale.getlocale()
	21	yield load_locale("en_US")
	22	locale.setlocale(locale.LC_ALL, orig)
	23
	24
	25	@pytest.fixture()
	26	def with_locale_de_de():
	27	"""
	28	Convenience to load the de_DE locale - reset when complete - skip if missing.
	29	"""
	30	orig = locale.getlocale()
	31	try:
	32	load_locale("de_DE")
	33	except locale.Error:
	34	pytest.skip("requires de_DE locale to be installed")
	35	else:
	36	yield
	37	finally:
	38	locale.setlocale(locale.LC_ALL, orig)

+70

-0

tests/profile_natsorted.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	This file contains functions to profile natsorted with different
	3	inputs and different settings.
	4	"""
	5	from __future__ import print_function
	6
	7	import cProfile
	8	import locale
	9	import sys
	10
	11	try:
	12	from natsort import ns, natsort_keygen
	13	from natsort.compat.py23 import py23_range
	14	except ImportError:
	15	sys.path.insert(0, ".")
	16	from natsort import ns, natsort_keygen
	17	from natsort.compat.py23 import py23_range
	18
	19	locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
	20
	21	# Samples to parse
	22	number = 14695498
	23	int_string = "43493"
	24	float_string = "-434.93e7"
	25	plain_string = "hello world"
	26	fancy_string = "7abba9342fdab"
	27	a_path = "/p/Folder (1)/file (1).tar.gz"
	28	some_bytes = b"these are bytes"
	29	a_list = ["hello", "goodbye", "74"]
	30
	31	basic_key = natsort_keygen()
	32	real_key = natsort_keygen(alg=ns.REAL)
	33	path_key = natsort_keygen(alg=ns.PATH)
	34	locale_key = natsort_keygen(alg=ns.LOCALE)
	35
	36
	37	def prof_time_to_generate():
	38	print("* Generate Plain Key *")
	39	for _ in py23_range(100000):
	40	natsort_keygen()
	41
	42
	43	cProfile.run("prof_time_to_generate()", sort="time")
	44
	45
	46	def prof_parsing(a, msg, key=basic_key):
	47	print(msg)
	48	for _ in py23_range(100000):
	49	key(a)
	50
	51
	52	cProfile.run(
	53	'prof_parsing(int_string, "* Basic Call, Int as String *")', sort="time"
	54	)
	55	cProfile.run(
	56	'prof_parsing(float_string, "* Basic Call, Float as String *")', sort="time"
	57	)
	58	cProfile.run('prof_parsing(float_string, "* Real Call *", real_key)', sort="time")
	59	cProfile.run('prof_parsing(number, "* Basic Call, Number *")', sort="time")
	60	cProfile.run(
	61	'prof_parsing(fancy_string, "* Basic Call, Mixed String *")', sort="time"
	62	)
	63	cProfile.run('prof_parsing(some_bytes, "* Basic Call, Byte String *")', sort="time")
	64	cProfile.run('prof_parsing(a_path, "* Path Call *", path_key)', sort="time")
	65	cProfile.run('prof_parsing(a_list, "* Basic Call, Recursive *")', sort="time")
	66	cProfile.run(
	67	'prof_parsing("434,930,000 dollars", "* Locale Call *", locale_key)',
	68	sort="time",
	69	)

+138

-0

tests/test_fake_fastnumbers.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	Test the fake fastnumbers module.
	3	"""
	4	from __future__ import unicode_literals
	5
	6	import unicodedata
	7	from math import isnan
	8
	9	from hypothesis import given
	10	from hypothesis.strategies import floats, integers, text
	11	from natsort.compat.fake_fastnumbers import fast_float, fast_int
	12	from natsort.compat.py23 import PY_VERSION
	13
	14	if PY_VERSION >= 3:
	15	long = int
	16
	17
	18	def is_float(x):
	19	try:
	20	float(x)
	21	except ValueError:
	22	try:
	23	unicodedata.numeric(x)
	24	except (ValueError, TypeError):
	25	return False
	26	else:
	27	return True
	28	else:
	29	return True
	30
	31
	32	def not_a_float(x):
	33	return not is_float(x)
	34
	35
	36	def is_int(x):
	37	try:
	38	return x.is_integer()
	39	except AttributeError:
	40	try:
	41	long(x)
	42	except ValueError:
	43	try:
	44	unicodedata.digit(x)
	45	except (ValueError, TypeError):
	46	return False
	47	else:
	48	return True
	49	else:
	50	return True
	51
	52
	53	def not_an_int(x):
	54	return not is_int(x)
	55
	56
	57	# Each test has an "example" version for demonstrative purposes,
	58	# and a test that uses the hypothesis module.
	59
	60
	61	def test_fast_float_returns_nan_alternate_if_nan_option_is_given():
	62	assert fast_float("nan", nan=7) == 7
	63
	64
	65	def test_fast_float_converts_float_string_to_float_example():
	66	assert fast_float("45.8") == 45.8
	67	assert fast_float("-45") == -45.0
	68	assert fast_float("45.8e-2", key=len) == 45.8e-2
	69	assert isnan(fast_float("nan"))
	70	assert isnan(fast_float("+nan"))
	71	assert isnan(fast_float("-NaN"))
	72	assert fast_float("۱۲.۱۲") == 12.12
	73	assert fast_float("-۱۲.۱۲") == -12.12
	74
	75
	76	@given(floats(allow_nan=False))
	77	def test_fast_float_converts_float_string_to_float(x):
	78	assert fast_float(repr(x)) == x
	79
	80
	81	def test_fast_float_leaves_string_as_is_example():
	82	assert fast_float("invalid") == "invalid"
	83
	84
	85	@given(text().filter(not_a_float).filter(bool))
	86	def test_fast_float_leaves_string_as_is(x):
	87	assert fast_float(x) == x
	88
	89
	90	def test_fast_float_with_key_applies_to_string_example():
	91	assert fast_float("invalid", key=len) == len("invalid")
	92
	93
	94	@given(text().filter(not_a_float).filter(bool))
	95	def test_fast_float_with_key_applies_to_string(x):
	96	assert fast_float(x, key=len) == len(x)
	97
	98
	99	def test_fast_int_leaves_float_string_as_is_example():
	100	assert fast_int("45.8") == "45.8"
	101	assert fast_int("nan") == "nan"
	102	assert fast_int("inf") == "inf"
	103
	104
	105	@given(floats().filter(not_an_int))
	106	def test_fast_int_leaves_float_string_as_is(x):
	107	assert fast_int(repr(x)) == repr(x)
	108
	109
	110	def test_fast_int_converts_int_string_to_int_example():
	111	assert fast_int("-45") == -45
	112	assert fast_int("+45") == 45
	113	assert fast_int("۱۲") == 12
	114	assert fast_int("-۱۲") == -12
	115
	116
	117	@given(integers())
	118	def test_fast_int_converts_int_string_to_int(x):
	119	assert fast_int(repr(x)) == x
	120
	121
	122	def test_fast_int_leaves_string_as_is_example():
	123	assert fast_int("invalid") == "invalid"
	124
	125
	126	@given(text().filter(not_an_int).filter(bool))
	127	def test_fast_int_leaves_string_as_is(x):
	128	assert fast_int(x) == x
	129
	130
	131	def test_fast_int_with_key_applies_to_string_example():
	132	assert fast_int("invalid", key=len) == len("invalid")
	133
	134
	135	@given(text().filter(not_an_int).filter(bool))
	136	def test_fast_int_with_key_applies_to_string(x):
	137	assert fast_int(x, key=len) == len(x)

+53

-0

tests/test_final_data_transform_factory.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import pytest
	5	from hypothesis import example, given
	6	from hypothesis.strategies import floats, integers, text
	7	from natsort.compat.py23 import py23_str
	8	from natsort.ns_enum import NS_DUMB, ns
	9	from natsort.utils import final_data_transform_factory
	10
	11
	12	@pytest.mark.parametrize("alg", [ns.DEFAULT, ns.UNGROUPLETTERS, ns.LOCALE])
	13	@given(x=text(), y=floats(allow_nan=False, allow_infinity=False) \| integers())
	14	@pytest.mark.usefixtures("with_locale_en_us")
	15	def test_final_data_transform_factory_default(x, y, alg):
	16	final_data_transform_func = final_data_transform_factory(alg, "", "::")
	17	value = (x, y)
	18	original_value = "".join(map(py23_str, value))
	19	result = final_data_transform_func(value, original_value)
	20	assert result == value
	21
	22
	23	@pytest.mark.parametrize(
	24	"alg, func",
	25	[
	26	(ns.UNGROUPLETTERS \| ns.LOCALE, lambda x: x),
	27	(ns.LOCALE \| ns.UNGROUPLETTERS \| NS_DUMB, lambda x: x),
	28	(ns.LOCALE \| ns.UNGROUPLETTERS \| ns.LOWERCASEFIRST, lambda x: x),
	29	(
	30	ns.LOCALE \| ns.UNGROUPLETTERS \| NS_DUMB \| ns.LOWERCASEFIRST,
	31	lambda x: x.swapcase(),
	32	),
	33	],
	34	)
	35	@given(x=text(), y=floats(allow_nan=False, allow_infinity=False) \| integers())
	36	@example(x="İ", y=0)
	37	@pytest.mark.usefixtures("with_locale_en_us")
	38	def test_final_data_transform_factory_ungroup_and_locale(x, y, alg, func):
	39	final_data_transform_func = final_data_transform_factory(alg, "", "::")
	40	value = (x, y)
	41	original_value = "".join(map(py23_str, value))
	42	result = final_data_transform_func(value, original_value)
	43	if x:
	44	expected = ((func(original_value[:1]),), value)
	45	else:
	46	expected = (("::",), value)
	47	assert result == expected
	48
	49
	50	def test_final_data_transform_factory_ungroup_and_locale_empty_tuple():
	51	final_data_transform_func = final_data_transform_factory(ns.UG \| ns.L, "", "::")
	52	assert final_data_transform_func((), "") == ((), ())

+105

-0

tests/test_input_string_transform_factory.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import pytest
	5	from hypothesis import example, given
	6	from hypothesis.strategies import integers, text
	7	from natsort.compat.py23 import NEWPY
	8	from natsort.ns_enum import NS_DUMB, ns
	9	from natsort.utils import input_string_transform_factory
	10
	11
	12	def lower(x):
	13	"""Call the appropriate lower method for the Python version."""
	14	if NEWPY:
	15	return x.casefold()
	16	else:
	17	return x.lower()
	18
	19
	20	def thousands_separated_int(n):
	21	"""Insert thousands separators in an int."""
	22	new_int = ""
	23	for i, y in enumerate(reversed(n), 1):
	24	new_int = y + new_int
	25	# For every third digit, insert a thousands separator.
	26	if i % 3 == 0 and i != len(n):
	27	new_int = "," + new_int
	28	return new_int
	29
	30
	31	@given(text())
	32	def test_input_string_transform_factory_is_no_op_for_no_alg_options(x):
	33	input_string_transform_func = input_string_transform_factory(ns.DEFAULT)
	34	assert input_string_transform_func(x) is x
	35
	36
	37	@pytest.mark.parametrize(
	38	"alg, example_func",
	39	[
	40	(ns.IGNORECASE, lower),
	41	(NS_DUMB, lambda x: x.swapcase()),
	42	(ns.LOWERCASEFIRST, lambda x: x.swapcase()),
	43	(NS_DUMB \| ns.LOWERCASEFIRST, lambda x: x), # No-op
	44	(ns.IGNORECASE \| ns.LOWERCASEFIRST, lambda x: lower(x.swapcase())),
	45	],
	46	)
	47	@given(x=text())
	48	def test_input_string_transform_factory(x, alg, example_func):
	49	input_string_transform_func = input_string_transform_factory(alg)
	50	assert input_string_transform_func(x) == example_func(x)
	51
	52
	53	@example(12543642642534980) # 12,543,642,642,534,980 => 12543642642534980
	54	@given(x=integers(min_value=1000))
	55	@pytest.mark.usefixtures("with_locale_en_us")
	56	def test_input_string_transform_factory_cleans_thousands(x):
	57	int_str = str(x).rstrip("lL")
	58	thousands_int_str = thousands_separated_int(int_str)
	59	assert thousands_int_str.replace(",", "") != thousands_int_str
	60
	61	input_string_transform_func = input_string_transform_factory(ns.LOCALE)
	62	assert input_string_transform_func(thousands_int_str) == int_str
	63
	64	# Using LOCALEALPHA does not affect numbers.
	65	input_string_transform_func_no_op = input_string_transform_factory(ns.LOCALEALPHA)
	66	assert input_string_transform_func_no_op(thousands_int_str) == thousands_int_str
	67
	68
	69	# These might be too much to test with hypothesis.
	70
	71
	72	@pytest.mark.parametrize(
	73	"x, expected",
	74	[
	75	("12,543,642642.5345,34980", "12543,642642.5345,34980"),
	76	("12,59443,642,642.53,4534980", "12,59443,642642.53,4534980"), # No change
	77	("12543,642,642.5,34534980", "12543,642642.5,34534980"),
	78	],
	79	)
	80	@pytest.mark.usefixtures("with_locale_en_us")
	81	def test_input_string_transform_factory_handles_us_locale(x, expected):
	82	input_string_transform_func = input_string_transform_factory(ns.LOCALE)
	83	assert input_string_transform_func(x) == expected
	84
	85
	86	@pytest.mark.parametrize(
	87	"alg, expected",
	88	[
	89	(ns.LOCALE, "1543,753"), # Does nothing without FLOAT
	90	(ns.LOCALE \| ns.FLOAT, "1543.753"),
	91	(ns.LOCALEALPHA, "1543,753"), # LOCALEALPHA won't do anything, need LOCALENUM
	92	],
	93	)
	94	@pytest.mark.usefixtures("with_locale_de_de")
	95	def test_input_string_transform_factory_handles_german_locale(alg, expected):
	96	input_string_transform_func = input_string_transform_factory(alg)
	97	assert input_string_transform_func("1543,753") == expected
	98
	99
	100	@pytest.mark.usefixtures("with_locale_de_de")
	101	def test_input_string_transform_factory_does_nothing_with_non_num_input():
	102	input_string_transform_func = input_string_transform_factory(ns.LOCALE \| ns.FLOAT)
	103	expected = "154s,t53"
	104	assert input_string_transform_func("154s,t53") == expected

+223

-0

tests/test_main.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	Test the natsort command-line tool functions.
	3	"""
	4	from __future__ import print_function, unicode_literals
	5
	6	import re
	7	import sys
	8
	9	import pytest
	10	from hypothesis import given
	11	from hypothesis.strategies import data, floats, integers, lists
	12	from natsort.__main__ import (
	13	check_filters,
	14	keep_entry_range,
	15	keep_entry_value,
	16	main,
	17	range_check,
	18	sort_and_print_entries,
	19	)
	20
	21
	22	def test_main_passes_default_arguments_with_no_command_line_options(mocker):
	23	p = mocker.patch("natsort.__main__.sort_and_print_entries")
	24	main("num-2", "num-6", "num-1")
	25	args = p.call_args[0][1]
	26	assert not args.paths
	27	assert args.filter is None
	28	assert args.reverse_filter is None
	29	assert args.exclude is None
	30	assert not args.reverse
	31	assert args.number_type == "int"
	32	assert not args.signed
	33	assert args.exp
	34	assert not args.locale
	35
	36
	37	def test_main_passes_arguments_with_all_command_line_options(mocker):
	38	arguments = ["--paths", "--reverse", "--locale"]
	39	arguments.extend(["--filter", "4", "10"])
	40	arguments.extend(["--reverse-filter", "100", "110"])
	41	arguments.extend(["--number-type", "float"])
	42	arguments.extend(["--noexp", "--sign"])
	43	arguments.extend(["--exclude", "34"])
	44	arguments.extend(["--exclude", "35"])
	45	arguments.extend(["num-2", "num-6", "num-1"])
	46	p = mocker.patch("natsort.__main__.sort_and_print_entries")
	47	main(*arguments)
	48	args = p.call_args[0][1]
	49	assert args.paths
	50	assert args.filter == [(4.0, 10.0)]
	51	assert args.reverse_filter == [(100.0, 110.0)]
	52	assert args.exclude == [34, 35]
	53	assert args.reverse
	54	assert args.number_type == "float"
	55	assert args.signed
	56	assert not args.exp
	57	assert args.locale
	58
	59
	60	class Args:
	61	"""A dummy class to simulate the argparse Namespace object"""
	62
	63	def __init__(self, filt, reverse_filter, exclude, as_path, reverse):
	64	self.filter = filt
	65	self.reverse_filter = reverse_filter
	66	self.exclude = exclude
	67	self.reverse = reverse
	68	self.number_type = "float"
	69	self.signed = True
	70	self.exp = True
	71	self.paths = as_path
	72	self.locale = 0
	73
	74
	75	mock_print = "__builtin__.print" if sys.version[0] == "2" else "builtins.print"
	76
	77	entries = [
	78	"tmp/a57/path2",
	79	"tmp/a23/path1",
	80	"tmp/a1/path1",
	81	"tmp/a1 (1)/path1",
	82	"tmp/a130/path1",
	83	"tmp/a64/path1",
	84	"tmp/a64/path2",
	85	]
	86
	87
	88	@pytest.mark.parametrize(
	89	"options, order",
	90	[
	91	# Defaults, all options false
	92	# tmp/a1 (1)/path1
	93	# tmp/a1/path1
	94	# tmp/a23/path1
	95	# tmp/a57/path2
	96	# tmp/a64/path1
	97	# tmp/a64/path2
	98	# tmp/a130/path1
	99	([None, None, False, False, False], [3, 2, 1, 0, 5, 6, 4]),
	100	# Path option True
	101	# tmp/a1/path1
	102	# tmp/a1 (1)/path1
	103	# tmp/a23/path1
	104	# tmp/a57/path2
	105	# tmp/a64/path1
	106	# tmp/a64/path2
	107	# tmp/a130/path1
	108	([None, None, False, True, False], [2, 3, 1, 0, 5, 6, 4]),
	109	# Filter option keeps only within range
	110	# tmp/a23/path1
	111	# tmp/a57/path2
	112	# tmp/a64/path1
	113	# tmp/a64/path2
	114	([[(20, 100)], None, False, False, False], [1, 0, 5, 6]),
	115	# Reverse filter, exclude in range
	116	# tmp/a1/path1
	117	# tmp/a1 (1)/path1
	118	# tmp/a130/path1
	119	([None, [(20, 100)], False, True, False], [2, 3, 4]),
	120	# Exclude given values with exclude list
	121	# tmp/a1/path1
	122	# tmp/a1 (1)/path1
	123	# tmp/a57/path2
	124	# tmp/a64/path1
	125	# tmp/a64/path2
	126	([None, None, [23, 130], True, False], [2, 3, 0, 5, 6]),
	127	# Reverse order
	128	# tmp/a130/path1
	129	# tmp/a64/path2
	130	# tmp/a64/path1
	131	# tmp/a57/path2
	132	# tmp/a23/path1
	133	# tmp/a1 (1)/path1
	134	# tmp/a1/path1
	135	([None, None, False, True, True], reversed([2, 3, 1, 0, 5, 6, 4])),
	136	],
	137	)
	138	def test_sort_and_print_entries(options, order, mocker):
	139	p = mocker.patch(mock_print)
	140	sort_and_print_entries(entries, Args(*options))
	141	e = [mocker.call(entries[i]) for i in order]
	142	p.assert_has_calls(e)
	143
	144
	145	# Each test has an "example" version for demonstrative purposes,
	146	# and a test that uses the hypothesis module.
	147
	148
	149	def test_range_check_returns_range_as_is_but_with_floats_example():
	150	assert range_check(10, 11) == (10.0, 11.0)
	151	assert range_check(6.4, 30) == (6.4, 30.0)
	152
	153
	154	@given(x=floats(allow_nan=False, min_value=-1E8, max_value=1E8) \| integers(), d=data())
	155	def test_range_check_returns_range_as_is_if_first_is_less_than_second(x, d):
	156	# Pull data such that the first is less than the second.
	157	if isinstance(x, float):
	158	y = d.draw(floats(min_value=x + 1.0, max_value=1E9, allow_nan=False))
	159	else:
	160	y = d.draw(integers(min_value=x + 1))
	161	assert range_check(x, y) == (x, y)
	162
	163
	164	def test_range_check_raises_value_error_if_second_is_less_than_first_example():
	165	with pytest.raises(ValueError, match="low >= high"):
	166	range_check(7, 2)
	167
	168
	169	@given(x=floats(allow_nan=False), d=data())
	170	def test_range_check_raises_value_error_if_second_is_less_than_first(x, d):
	171	# Pull data such that the first is greater than or equal to the second.
	172	y = d.draw(floats(max_value=x, allow_nan=False))
	173	with pytest.raises(ValueError, match="low >= high"):
	174	range_check(x, y)
	175
	176
	177	def test_check_filters_returns_none_if_filter_evaluates_to_false():
	178	assert check_filters(()) is None
	179	assert check_filters(False) is None
	180	assert check_filters(None) is None
	181
	182
	183	def test_check_filters_returns_input_as_is_if_filter_is_valid_example():
	184	assert check_filters([(6, 7)]) == [(6, 7)]
	185	assert check_filters([(6, 7), (2, 8)]) == [(6, 7), (2, 8)]
	186
	187
	188	@given(x=lists(integers(), min_size=1), d=data())
	189	def test_check_filters_returns_input_as_is_if_filter_is_valid(x, d):
	190	# ensure y is element-wise greater than x
	191	y = [d.draw(integers(min_value=val + 1)) for val in x]
	192	assert check_filters(list(zip(x, y))) == [(i, j) for i, j in zip(x, y)]
	193
	194
	195	def test_check_filters_raises_value_error_if_filter_is_invalid_example():
	196	with pytest.raises(ValueError, match="Error in --filter: low >= high"):
	197	check_filters([(7, 2)])
	198
	199
	200	@given(x=lists(integers(), min_size=1), d=data())
	201	def test_check_filters_raises_value_error_if_filter_is_invalid(x, d):
	202	# ensure y is element-wise less than or equal to x
	203	y = [d.draw(integers(max_value=val)) for val in x]
	204	with pytest.raises(ValueError, match="Error in --filter: low >= high"):
	205	check_filters(list(zip(x, y)))
	206
	207
	208	@pytest.mark.parametrize(
	209	"lows, highs, truth",
	210	# 1. Any portion is between the bounds => True.
	211	# 2. Any portion is between any bounds => True.
	212	# 3. No portion is between the bounds => False.
	213	[([0], [100], True), ([1, 88], [20, 90], True), ([1], [20], False)],
	214	)
	215	def test_keep_entry_range(lows, highs, truth):
	216	assert keep_entry_range("a56b23c89", lows, highs, int, re.compile(r"\d+")) is truth
	217
	218
	219	# 1. Values not in entry => True. 2. Values in entry => False.
	220	@pytest.mark.parametrize("values, truth", [([100, 45], True), ([23], False)])
	221	def test_keep_entry_value(values, truth):
	222	assert keep_entry_value("a56b23c89", values, int, re.compile(r"\d+")) is truth

+83

-0

tests/test_natsort_cmp.py less more

	0	# -- coding: utf-8 --
	1	# pylint: disable=unused-variable
	2	"""These test the natcmp() function.
	3
	4	Note that these tests are only relevant for Python version < 3.
	5	"""
	6	from functools import partial
	7
	8	import pytest
	9	from hypothesis import given
	10	from hypothesis.strategies import floats, integers, lists
	11	from natsort import ns
	12	from natsort.compat.py23 import PY_VERSION, py23_cmp
	13
	14	if PY_VERSION < 3:
	15	from natsort import natcmp
	16
	17
	18	class Comparable(object):
	19	"""Stub class for testing natcmp functionality."""
	20
	21	def __init__(self, value):
	22	self.value = value
	23
	24	def __cmp__(self, other):
	25	return natcmp(self.value, other.value)
	26
	27
	28	@pytest.mark.skipif(PY_VERSION >= 3.0, reason="cmp() deprecated in Python 3")
	29	class TestNatCmp:
	30
	31	def test_classes_can_be_compared(self):
	32	one = Comparable("1")
	33	two = Comparable("2")
	34	another_two = Comparable("2")
	35	ten = Comparable("10")
	36	assert ten > two == another_two > one
	37
	38	def test_keys_are_being_cached(self, mocker):
	39	natcmp.cached_keys = {}
	40	assert len(natcmp.cached_keys) == 0
	41	natcmp(0, 0)
	42	assert len(natcmp.cached_keys) == 1
	43	natcmp(0, 0)
	44	assert len(natcmp.cached_keys) == 1
	45
	46	with mocker.patch("natsort.compat.locale.dumb_sort", return_value=False):
	47	natcmp(0, 0, alg=ns.L)
	48	assert len(natcmp.cached_keys) == 2
	49	natcmp(0, 0, alg=ns.L)
	50	assert len(natcmp.cached_keys) == 2
	51
	52	with mocker.patch("natsort.compat.locale.dumb_sort", return_value=True):
	53	natcmp(0, 0, alg=ns.L)
	54	assert len(natcmp.cached_keys) == 3
	55	natcmp(0, 0, alg=ns.L)
	56	assert len(natcmp.cached_keys) == 3
	57
	58	def test_illegal_algorithm_raises_error(self):
	59	with pytest.raises(ValueError):
	60	natcmp(0, 0, alg="Just random stuff")
	61
	62	def test_classes_can_utilize_max_or_min(self):
	63	comparables = [Comparable(i) for i in range(10)]
	64
	65	assert max(comparables) == comparables[-1]
	66	assert min(comparables) == comparables[0]
	67
	68	@given(integers(), integers())
	69	def test_natcmp_works_the_same_for_integers_as_cmp(self, x, y):
	70	assert py23_cmp(x, y) == natcmp(x, y)
	71
	72	@given(floats(allow_nan=False), floats(allow_nan=False))
	73	def test_natcmp_works_the_same_for_floats_as_cmp(self, x, y):
	74	assert py23_cmp(x, y) == natcmp(x, y)
	75
	76	@given(lists(elements=integers()))
	77	def test_sort_strings_with_numbers(self, a_list):
	78	strings = [str(var) for var in a_list]
	79	# noinspection PyArgumentList
	80	natcmp_sorted = sorted(strings, cmp=partial(natcmp, alg=ns.SIGNED))
	81
	82	assert sorted(a_list) == [int(var) for var in natcmp_sorted]

+49

-0

tests/test_natsort_key.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import pytest
	5	from hypothesis import given
	6	from hypothesis.strategies import binary, floats, integers, lists, text
	7	from natsort.compat.py23 import PY_VERSION, py23_str
	8	from natsort.utils import natsort_key
	9
	10	if PY_VERSION >= 3:
	11	long = int
	12
	13
	14	def str_func(x):
	15	if isinstance(x, py23_str):
	16	return x
	17	else:
	18	raise TypeError("Not a str!")
	19
	20
	21	def fail(_):
	22	raise AssertionError("This should never be reached!")
	23
	24
	25	@given(floats(allow_nan=False) \| integers())
	26	def test_natsort_key_with_numeric_input_takes_number_path(x):
	27	assert natsort_key(x, None, str_func, fail, lambda y: y) is x
	28
	29
	30	@pytest.mark.skipif(PY_VERSION < 3, reason="only valid on python3")
	31	@given(binary().filter(bool))
	32	def test_natsort_key_with_bytes_input_takes_bytes_path(x):
	33	assert natsort_key(x, None, str_func, lambda y: y, fail) is x
	34
	35
	36	@given(text())
	37	def test_natsort_key_with_text_input_takes_string_path(x):
	38	assert natsort_key(x, None, str_func, fail, fail) is x
	39
	40
	41	@given(lists(elements=text(), min_size=1, max_size=10))
	42	def test_natsort_key_with_nested_input_takes_nested_path(x):
	43	assert natsort_key(x, None, str_func, fail, fail) == tuple(x)
	44
	45
	46	@given(text())
	47	def test_natsort_key_with_key_argument_applies_key_before_processing(x):
	48	assert natsort_key(x, len, str_func, fail, lambda y: y) == len(x)

+168

-0

tests/test_natsort_keygen.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	Here are a collection of examples of how this module can be used.
	3	See the README or the natsort homepage for more details.
	4	"""
	5	from __future__ import print_function, unicode_literals
	6
	7	import pytest
	8	from natsort import natsort_key, natsort_keygen, natsorted, ns
	9	from natsort.compat.locale import get_strxfrm, null_string_locale
	10	from natsort.compat.py23 import PY_VERSION
	11
	12
	13	@pytest.fixture
	14	def arbitrary_input():
	15	return ["6A-5.034e+1", "/Folder (1)/Foo", 56.7]
	16
	17
	18	@pytest.fixture
	19	def bytes_input():
	20	return b"6A-5.034e+1"
	21
	22
	23	def test_natsort_keygen_demonstration():
	24	original_list = ["a50", "a51.", "a50.31", "a50.4", "a5.034e1", "a50.300"]
	25	copy_of_list = original_list[:]
	26	original_list.sort(key=natsort_keygen(alg=ns.F))
	27	# natsorted uses the output of natsort_keygen under the hood.
	28	assert original_list == natsorted(copy_of_list, alg=ns.F)
	29
	30
	31	def test_natsort_key_public():
	32	assert natsort_key("a-5.034e2") == ("a-", 5, ".", 34, "e", 2)
	33
	34
	35	def test_natsort_keygen_with_invalid_alg_input_raises_value_error():
	36	# Invalid arguments give the correct response
	37	with pytest.raises(ValueError, match="'alg' argument"):
	38	natsort_keygen(None, "1")
	39
	40
	41	@pytest.mark.parametrize(
	42	"alg, expected",
	43	[(ns.DEFAULT, ("a-", 5, ".", 34, "e", 1)), (ns.FLOAT \| ns.SIGNED, ("a", -50.34))],
	44	)
	45	def test_natsort_keygen_returns_natsort_key_that_parses_input(alg, expected):
	46	ns_key = natsort_keygen(alg=alg)
	47	assert ns_key("a-5.034e1") == expected
	48
	49
	50	@pytest.mark.parametrize(
	51	"alg, expected",
	52	[
	53	(
	54	ns.DEFAULT,
	55	(("", 6, "A-", 5, ".", 34, "e+", 1), ("/Folder (", 1, ")/Foo"), ("", 56.7)),
	56	),
	57	(
	58	ns.IGNORECASE,
	59	(("", 6, "a-", 5, ".", 34, "e+", 1), ("/folder (", 1, ")/foo"), ("", 56.7)),
	60	),
	61	(ns.REAL, (("", 6.0, "A", -50.34), ("/Folder (", 1.0, ")/Foo"), ("", 56.7))),
	62	(
	63	ns.LOWERCASEFIRST \| ns.FLOAT \| ns.NOEXP,
	64	(
	65	("", 6.0, "a-", 5.034, "E+", 1.0),
	66	("/fOLDER (", 1.0, ")/fOO"),
	67	("", 56.7),
	68	),
	69	),
	70	(
	71	ns.PATH \| ns.GROUPLETTERS,
	72	(
	73	(("", 6, "aA--", 5, "..", 34, "ee++", 1),),
	74	(("//",), ("fFoollddeerr ((", 1, "))"), ("fFoooo",)),
	75	(("", 56.7),),
	76	),
	77	),
	78	],
	79	)
	80	def test_natsort_keygen_handles_arbitrary_input(arbitrary_input, alg, expected):
	81	ns_key = natsort_keygen(alg=alg)
	82	assert ns_key(arbitrary_input) == expected
	83
	84
	85	@pytest.mark.parametrize(
	86	"alg, expected",
	87	[
	88	(ns.DEFAULT, (b"6A-5.034e+1",)),
	89	(ns.IGNORECASE, (b"6a-5.034e+1",)),
	90	(ns.REAL, (b"6A-5.034e+1",)),
	91	(ns.LOWERCASEFIRST \| ns.FLOAT \| ns.NOEXP, (b"6A-5.034e+1",)),
	92	(ns.PATH \| ns.GROUPLETTERS, ((b"6A-5.034e+1",),)),
	93	],
	94	)
	95	@pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
	96	def test_natsort_keygen_handles_bytes_input(bytes_input, alg, expected):
	97	ns_key = natsort_keygen(alg=alg)
	98	assert ns_key(bytes_input) == expected
	99
	100
	101	@pytest.mark.parametrize(
	102	"alg, expected, is_dumb",
	103	[
	104	(
	105	ns.LOCALE,
	106	(
	107	(null_string_locale, 6, "A-", 5, ".", 34, "e+", 1),
	108	("/Folder (", 1, ")/Foo"),
	109	(null_string_locale, 56.7),
	110	),
	111	False,
	112	),
	113	(
	114	ns.LOCALE,
	115	(
	116	(null_string_locale, 6, "aa--", 5, "..", 34, "eE++", 1),
	117	("//ffoOlLdDeErR ((", 1, "))//ffoOoO"),
	118	(null_string_locale, 56.7),
	119	),
	120	True,
	121	),
	122	(
	123	ns.LOCALE \| ns.CAPITALFIRST,
	124	(
	125	(("",), (null_string_locale, 6, "A-", 5, ".", 34, "e+", 1)),
	126	(("/",), ("/Folder (", 1, ")/Foo")),
	127	(("",), (null_string_locale, 56.7)),
	128	),
	129	False,
	130	),
	131	],
	132	)
	133	@pytest.mark.usefixtures("with_locale_en_us")
	134	def test_natsort_keygen_with_locale(mocker, arbitrary_input, alg, expected, is_dumb):
	135	# First, apply the correct strxfrm function to the string values.
	136	strxfrm = get_strxfrm()
	137	expected = [list(sub) for sub in expected]
	138	try:
	139	for i in (2, 4, 6):
	140	expected[0][i] = strxfrm(expected[0][i])
	141	for i in (0, 2):
	142	expected[1][i] = strxfrm(expected[1][i])
	143	expected = tuple(tuple(sub) for sub in expected)
	144	except IndexError: # ns.LOCALE \| ns.CAPITALFIRST
	145	expected = [[list(subsub) for subsub in sub] for sub in expected]
	146	for i in (2, 4, 6):
	147	expected[0][1][i] = strxfrm(expected[0][1][i])
	148	for i in (0, 2):
	149	expected[1][1][i] = strxfrm(expected[1][1][i])
	150	expected = tuple(tuple(tuple(subsub) for subsub in sub) for sub in expected)
	151
	152	with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
	153	ns_key = natsort_keygen(alg=alg)
	154	assert ns_key(arbitrary_input) == expected
	155
	156
	157	@pytest.mark.parametrize(
	158	"alg, is_dumb",
	159	[(ns.LOCALE, False), (ns.LOCALE, True), (ns.LOCALE \| ns.CAPITALFIRST, False)],
	160	)
	161	@pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
	162	@pytest.mark.usefixtures("with_locale_en_us")
	163	def test_natsort_keygen_with_locale_bytes(mocker, bytes_input, alg, is_dumb):
	164	expected = (b"6A-5.034e+1",)
	165	with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
	166	ns_key = natsort_keygen(alg=alg)
	167	assert ns_key(bytes_input) == expected

+298

-0

tests/test_natsorted.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	Here are a collection of examples of how this module can be used.
	3	See the README or the natsort homepage for more details.
	4	"""
	5	from __future__ import print_function, unicode_literals
	6
	7	from operator import itemgetter
	8
	9	import pytest
	10	from natsort import as_utf8, natsorted, ns
	11	from natsort.compat.py23 import PY_VERSION
	12	from pytest import raises
	13
	14
	15	@pytest.fixture
	16	def float_list():
	17	return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
	18
	19
	20	@pytest.fixture
	21	def fruit_list():
	22	return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
	23
	24
	25	@pytest.fixture
	26	def mixed_list():
	27	return ["Ä", "0", "ä", 3, "b", 1.5, "2", "Z"]
	28
	29
	30	def test_natsorted_numbers_in_ascending_order():
	31	given = ["a2", "a5", "a9", "a1", "a4", "a10", "a6"]
	32	expected = ["a1", "a2", "a4", "a5", "a6", "a9", "a10"]
	33	assert natsorted(given) == expected
	34
	35
	36	def test_natsorted_can_sort_as_signed_floats_with_exponents(float_list):
	37	expected = ["a-50", "a50", "a50.300", "a50.31", "a5.034e1", "a50.4", "a51."]
	38	assert natsorted(float_list, alg=ns.REAL) == expected
	39
	40
	41	@pytest.mark.parametrize(
	42	# UNSIGNED is default
	43	"alg",
	44	[ns.NOEXP \| ns.FLOAT \| ns.UNSIGNED, ns.NOEXP \| ns.FLOAT],
	45	)
	46	def test_natsorted_can_sort_as_unsigned_and_ignore_exponents(float_list, alg):
	47	expected = ["a5.034e1", "a50", "a50.300", "a50.31", "a50.4", "a51.", "a-50"]
	48	assert natsorted(float_list, alg=alg) == expected
	49
	50
	51	# DEFAULT and INT are all equivalent.
	52	@pytest.mark.parametrize("alg", [ns.DEFAULT, ns.INT])
	53	def test_natsorted_can_sort_as_unsigned_ints_which_is_default(float_list, alg):
	54	expected = ["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"]
	55	assert natsorted(float_list, alg=alg) == expected
	56
	57
	58	def test_natsorted_can_sort_as_signed_ints(float_list):
	59	expected = ["a-50", "a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51."]
	60	assert natsorted(float_list, alg=ns.SIGNED) == expected
	61
	62
	63	@pytest.mark.parametrize(
	64	"alg, expected",
	65	[(ns.UNSIGNED, ["a7", "a+2", "a-5"]), (ns.SIGNED, ["a-5", "a+2", "a7"])],
	66	)
	67	def test_natsorted_can_sort_with_or_without_accounting_for_sign(alg, expected):
	68	given = ["a-5", "a7", "a+2"]
	69	assert natsorted(given, alg=alg) == expected
	70
	71
	72	def test_natsorted_can_sort_as_version_numbers():
	73	given = ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
	74	expected = ["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"]
	75	assert natsorted(given) == expected
	76
	77
	78	@pytest.mark.parametrize(
	79	"alg, expected",
	80	[
	81	(ns.DEFAULT, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
	82	(ns.NUMAFTER, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
	83	],
	84	)
	85	def test_natsorted_handles_mixed_types(mixed_list, alg, expected):
	86	assert natsorted(mixed_list, alg=alg) == expected
	87
	88
	89	@pytest.mark.parametrize(
	90	"alg, expected, slc",
	91	[
	92	(ns.DEFAULT, [float("nan"), 5, "25", 1E40], slice(1, None)),
	93	(ns.NANLAST, [5, "25", 1E40, float("nan")], slice(None, 3)),
	94	],
	95	)
	96	def test_natsorted_handles_nan(alg, expected, slc):
	97	given = ["25", 5, float("nan"), 1E40]
	98	# The slice is because NaN != NaN
	99	# noinspection PyUnresolvedReferences
	100	assert natsorted(given, alg=alg)[slc] == expected[slc]
	101
	102
	103	@pytest.mark.skipif(PY_VERSION < 3.0, reason="error is only raised on Python 3")
	104	def test_natsorted_with_mixed_bytes_and_str_input_raises_type_error():
	105	with raises(TypeError, match="bytes"):
	106	natsorted(["ä", b"b"])
	107
	108	# ...unless you use as_utf (or some other decoder).
	109	assert natsorted(["ä", b"b"], key=as_utf8) == ["ä", b"b"]
	110
	111
	112	def test_natsorted_raises_type_error_for_non_iterable_input():
	113	with raises(TypeError, match="'int' object is not iterable"):
	114	natsorted(100)
	115
	116
	117	def test_natsorted_recurses_into_nested_lists():
	118	given = [["a1", "a5"], ["a1", "a40"], ["a10", "a1"], ["a2", "a5"]]
	119	expected = [["a1", "a5"], ["a1", "a40"], ["a2", "a5"], ["a10", "a1"]]
	120	assert natsorted(given) == expected
	121
	122
	123	def test_natsorted_applies_key_to_each_list_element_before_sorting_list():
	124	given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
	125	expected = [("c", "num2"), ("a", "num3"), ("b", "num5")]
	126	assert natsorted(given, key=itemgetter(1)) == expected
	127
	128
	129	def test_natsorted_returns_list_in_reversed_order_with_reverse_option(float_list):
	130	expected = natsorted(float_list)[::-1]
	131	assert natsorted(float_list, reverse=True) == expected
	132
	133
	134	def test_natsorted_handles_filesystem_paths():
	135	given = [
	136	"/p/Folder (10)/file.tar.gz",
	137	"/p/Folder/file.tar.gz",
	138	"/p/Folder (1)/file (1).tar.gz",
	139	"/p/Folder (1)/file.tar.gz",
	140	]
	141	expected_correct = [
	142	"/p/Folder/file.tar.gz",
	143	"/p/Folder (1)/file.tar.gz",
	144	"/p/Folder (1)/file (1).tar.gz",
	145	"/p/Folder (10)/file.tar.gz",
	146	]
	147	expected_incorrect = [
	148	"/p/Folder (1)/file (1).tar.gz",
	149	"/p/Folder (1)/file.tar.gz",
	150	"/p/Folder (10)/file.tar.gz",
	151	"/p/Folder/file.tar.gz",
	152	]
	153	# Is incorrect by default.
	154	assert natsorted(given) == expected_incorrect
	155	# Need ns.PATH to make it correct.
	156	assert natsorted(given, alg=ns.PATH) == expected_correct
	157
	158
	159	def test_natsorted_handles_numbers_and_filesystem_paths_simultaneously():
	160	# You can sort paths and numbers, not that you'd want to
	161	given = ["/Folder (9)/file.exe", 43]
	162	expected = [43, "/Folder (9)/file.exe"]
	163	assert natsorted(given, alg=ns.PATH) == expected
	164
	165
	166	@pytest.mark.parametrize(
	167	"alg, expected",
	168	[
	169	(ns.DEFAULT, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
	170	(ns.IGNORECASE, ["Apple", "apple", "Banana", "banana", "corn", "Corn"]),
	171	(ns.LOWERCASEFIRST, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
	172	(ns.GROUPLETTERS, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
	173	(ns.G \| ns.LF, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
	174	],
	175	)
	176	def test_natsorted_supports_case_handling(alg, expected, fruit_list):
	177	assert natsorted(fruit_list, alg=alg) == expected
	178
	179
	180	@pytest.mark.parametrize(
	181	"alg, expected",
	182	[
	183	(ns.DEFAULT, [("A5", "a6"), ("a3", "a1")]),
	184	(ns.LOWERCASEFIRST, [("a3", "a1"), ("A5", "a6")]),
	185	(ns.IGNORECASE, [("a3", "a1"), ("A5", "a6")]),
	186	],
	187	)
	188	def test_natsorted_supports_nested_case_handling(alg, expected):
	189	given = [("A5", "a6"), ("a3", "a1")]
	190	assert natsorted(given, alg=alg) == expected
	191
	192
	193	@pytest.mark.parametrize(
	194	"alg, expected",
	195	[
	196	(ns.DEFAULT, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
	197	(ns.CAPITALFIRST, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
	198	(ns.LOWERCASEFIRST, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
	199	(ns.C \| ns.LF, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
	200	],
	201	)
	202	@pytest.mark.usefixtures("with_locale_en_us")
	203	def test_natsorted_can_sort_using_locale(fruit_list, alg, expected):
	204	assert natsorted(fruit_list, alg=ns.LOCALE \| alg) == expected
	205
	206
	207	@pytest.mark.usefixtures("with_locale_en_us")
	208	def test_natsorted_can_sort_locale_specific_numbers_en():
	209	given = ["c", "a5,467.86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
	210	expected = ["a5,6", "a5,50", "a5367.86", "a5,467.86", "ä", "b", "c"]
	211	assert natsorted(given, alg=ns.LOCALE \| ns.F) == expected
	212
	213
	214	@pytest.mark.usefixtures("with_locale_de_de")
	215	def test_natsorted_can_sort_locale_specific_numbers_de():
	216	given = ["c", "a5.467,86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
	217	expected = ["a5,50", "a5,6", "a5367.86", "a5.467,86", "ä", "b", "c"]
	218	assert natsorted(given, alg=ns.LOCALE \| ns.F) == expected
	219
	220
	221	@pytest.mark.parametrize(
	222	"alg, expected",
	223	[
	224	(ns.DEFAULT, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
	225	(ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
	226	(ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
	227	(ns.UG \| ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
	228	# Adding PATH changes nothing.
	229	(ns.PATH, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
	230	(ns.PATH \| ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
	231	(ns.PATH \| ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
	232	(ns.PATH \| ns.UG \| ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
	233	],
	234	)
	235	@pytest.mark.usefixtures("with_locale_en_us")
	236	def test_natsorted_handles_mixed_types_with_locale(mixed_list, alg, expected):
	237	assert natsorted(mixed_list, alg=ns.LOCALE \| alg) == expected
	238
	239
	240	@pytest.mark.parametrize(
	241	"alg, expected",
	242	[
	243	(ns.DEFAULT, ["73", "5039", "Banana", "apple", "corn", "~~~~~~"]),
	244	(ns.NUMAFTER, ["Banana", "apple", "corn", "~~~~~~", "73", "5039"]),
	245	],
	246	)
	247	def test_natsorted_sorts_an_odd_collection_of_strings(alg, expected):
	248	given = ["apple", "Banana", "73", "5039", "corn", "~~~~~~"]
	249	assert natsorted(given, alg=alg) == expected
	250
	251
	252	def test_natsorted_sorts_mixed_ascii_and_non_ascii_numbers():
	253	given = [
	254	"1st street",
	255	"10th street",
	256	"2nd street",
	257	"2 street",
	258	"1 street",
	259	"1street",
	260	"11 street",
	261	"street 2",
	262	"street 1",
	263	"Street 11",
	264	"۲ street",
	265	"۱ street",
	266	"۱street",
	267	"۱۲street",
	268	"۱۱ street",
	269	"street ۲",
	270	"street ۱",
	271	"street ۱",
	272	"street ۱۲",
	273	"street ۱۱",
	274	]
	275	expected = [
	276	"1 street",
	277	"۱ street",
	278	"1st street",
	279	"1street",
	280	"۱street",
	281	"2 street",
	282	"۲ street",
	283	"2nd street",
	284	"10th street",
	285	"11 street",
	286	"۱۱ street",
	287	"۱۲street",
	288	"street 1",
	289	"street ۱",
	290	"street ۱",
	291	"street 2",
	292	"street ۲",
	293	"Street 11",
	294	"street ۱۱",
	295	"street ۱۲",
	296	]
	297	assert natsorted(given, alg=ns.IGNORECASE) == expected

+117

-0

tests/test_natsorted_convenience.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	Here are a collection of examples of how this module can be used.
	3	See the README or the natsort homepage for more details.
	4	"""
	5	from __future__ import print_function, unicode_literals
	6
	7	from operator import itemgetter
	8
	9	import pytest
	10	from natsort import (
	11	as_ascii,
	12	as_utf8,
	13	decoder,
	14	humansorted,
	15	index_humansorted,
	16	index_natsorted,
	17	index_realsorted,
	18	natsorted,
	19	ns,
	20	order_by_index,
	21	realsorted,
	22	)
	23	from natsort.compat.py23 import PY_VERSION
	24
	25
	26	@pytest.fixture
	27	def version_list():
	28	return ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
	29
	30
	31	@pytest.fixture
	32	def float_list():
	33	return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
	34
	35
	36	@pytest.fixture
	37	def fruit_list():
	38	return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
	39
	40
	41	def test_decoder_returns_function_that_can_decode_bytes_but_return_non_bytes_as_is():
	42	func = decoder("latin1")
	43	str_obj = "bytes"
	44	int_obj = 14
	45	assert func(b"bytes") == str_obj
	46	assert func(int_obj) is int_obj # returns as-is, same object ID
	47	if PY_VERSION >= 3:
	48	assert (
	49	func(str_obj) is str_obj
	50	) # same object returned on Python3 b/c only bytes has decode
	51	else:
	52	assert func(str_obj) is not str_obj
	53	assert (
	54	func(str_obj) == str_obj
	55	) # not same object on Python2 because str can decode
	56
	57
	58	def test_as_ascii_converts_bytes_to_ascii():
	59	assert decoder("ascii")(b"bytes") == as_ascii(b"bytes")
	60
	61
	62	def test_as_utf8_converts_bytes_to_utf8():
	63	assert decoder("utf8")(b"bytes") == as_utf8(b"bytes")
	64
	65
	66	def test_realsorted_is_identical_to_natsorted_with_real_alg(float_list):
	67	assert realsorted(float_list) == natsorted(float_list, alg=ns.REAL)
	68
	69
	70	@pytest.mark.usefixtures("with_locale_en_us")
	71	def test_humansorted_is_identical_to_natsorted_with_locale_alg(fruit_list):
	72	assert humansorted(fruit_list) == natsorted(fruit_list, alg=ns.LOCALE)
	73
	74
	75	def test_index_natsorted_returns_integer_list_of_sort_order_for_input_list():
	76	given = ["num3", "num5", "num2"]
	77	other = ["foo", "bar", "baz"]
	78	index = index_natsorted(given)
	79	assert index == [2, 0, 1]
	80	assert [given[i] for i in index] == ["num2", "num3", "num5"]
	81	assert [other[i] for i in index] == ["baz", "foo", "bar"]
	82
	83
	84	def test_index_natsorted_reverse():
	85	given = ["num3", "num5", "num2"]
	86	assert index_natsorted(given, reverse=True) == index_natsorted(given)[::-1]
	87
	88
	89	def test_index_natsorted_applies_key_function_before_sorting():
	90	given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
	91	expected = [2, 0, 1]
	92	assert index_natsorted(given, key=itemgetter(1)) == expected
	93
	94
	95	def test_index_realsorted_is_identical_to_index_natsorted_with_real_alg(float_list):
	96	assert index_realsorted(float_list) == index_natsorted(float_list, alg=ns.REAL)
	97
	98
	99	@pytest.mark.usefixtures("with_locale_en_us")
	100	def test_index_humansorted_is_identical_to_index_natsorted_with_locale_alg(fruit_list):
	101	assert index_humansorted(fruit_list) == index_natsorted(fruit_list, alg=ns.LOCALE)
	102
	103
	104	def test_order_by_index_sorts_list_according_to_order_of_integer_list():
	105	given = ["num3", "num5", "num2"]
	106	index = [2, 0, 1]
	107	expected = [given[i] for i in index]
	108	assert expected == ["num2", "num3", "num5"]
	109	assert order_by_index(given, index) == expected
	110
	111
	112	def test_order_by_index_returns_generator_with_iter_true():
	113	given = ["num3", "num5", "num2"]
	114	index = [2, 0, 1]
	115	assert order_by_index(given, index, True) != [given[i] for i in index]
	116	assert list(order_by_index(given, index, True)) == [given[i] for i in index]

+44

-0

tests/test_ns_enum.py less more

	0	from natsort import ns
	1
	2
	3	def test_ns_enum():
	4	enum_name_values = [
	5	("FLOAT", 0x0001),
	6	("SIGNED", 0x0002),
	7	("NOEXP", 0x0004),
	8	("PATH", 0x0008),
	9	("LOCALEALPHA", 0x0010),
	10	("LOCALENUM", 0x0020),
	11	("IGNORECASE", 0x0040),
	12	("LOWERCASEFIRST", 0x0080),
	13	("GROUPLETTERS", 0x0100),
	14	("UNGROUPLETTERS", 0x0200),
	15	("NANLAST", 0x0400),
	16	("COMPATIBILITYNORMALIZE", 0x0800),
	17	("NUMAFTER", 0x1000),
	18	("DEFAULT", 0x0000),
	19	("INT", 0x0000),
	20	("UNSIGNED", 0x0000),
	21	("REAL", 0x0003),
	22	("LOCALE", 0x0030),
	23	("I", 0x0000),
	24	("U", 0x0000),
	25	("F", 0x0001),
	26	("S", 0x0002),
	27	("R", 0x0003),
	28	("N", 0x0004),
	29	("P", 0x0008),
	30	("LA", 0x0010),
	31	("LN", 0x0020),
	32	("L", 0x0030),
	33	("IC", 0x0040),
	34	("LF", 0x0080),
	35	("G", 0x0100),
	36	("UG", 0x0200),
	37	("C", 0x0200),
	38	("CAPITALFIRST", 0x0200),
	39	("NL", 0x0400),
	40	("CN", 0x0800),
	41	("NA", 0x1000),
	42	]
	43	assert list(ns._asdict().items()) == enum_name_values

+25

-0

tests/test_parse_bytes_function.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import pytest
	5	from hypothesis import given
	6	from hypothesis.strategies import binary
	7	from natsort.ns_enum import ns
	8	from natsort.utils import parse_bytes_factory
	9
	10
	11	@pytest.mark.parametrize(
	12	"alg, example_func",
	13	[
	14	(ns.DEFAULT, lambda x: (x,)),
	15	(ns.IGNORECASE, lambda x: (x.lower(),)),
	16	# With PATH, it becomes a tested tuple.
	17	(ns.PATH, lambda x: ((x,),)),
	18	(ns.PATH \| ns.IGNORECASE, lambda x: ((x.lower(),),)),
	19	],
	20	)
	21	@given(x=binary())
	22	def test_parse_bytest_factory_makes_function_that_returns_tuple(x, alg, example_func):
	23	parse_bytes_func = parse_bytes_factory(alg)
	24	assert parse_bytes_func(x) == example_func(x)

+38

-0

tests/test_parse_number_function.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import pytest
	5	from hypothesis import given
	6	from hypothesis.strategies import floats, integers
	7	from natsort.ns_enum import ns
	8	from natsort.utils import parse_number_factory
	9
	10
	11	@pytest.mark.usefixtures("with_locale_en_us")
	12	@pytest.mark.parametrize(
	13	"alg, example_func",
	14	[
	15	(ns.DEFAULT, lambda x: ("", x)),
	16	(ns.PATH, lambda x: (("", x),)),
	17	(ns.UNGROUPLETTERS \| ns.LOCALE, lambda x: (("xx",), ("", x))),
	18	(ns.PATH \| ns.UNGROUPLETTERS \| ns.LOCALE, lambda x: ((("xx",), ("", x)),)),
	19	],
	20	)
	21	@given(x=floats(allow_nan=False) \| integers())
	22	def test_parse_number_factory_makes_function_that_returns_tuple(x, alg, example_func):
	23	parse_number_func = parse_number_factory(alg, "", "xx")
	24	assert parse_number_func(x) == example_func(x)
	25
	26
	27	@pytest.mark.parametrize(
	28	"alg, x, result",
	29	[
	30	(ns.DEFAULT, 57, ("", 57)),
	31	(ns.DEFAULT, float("nan"), ("", float("-inf"))), # NaN transformed to -infinity
	32	(ns.NANLAST, float("nan"), ("", float("+inf"))), # NANLAST makes it +infinity
	33	],
	34	)
	35	def test_parse_number_factory_treats_nan_special(alg, x, result):
	36	parse_number_func = parse_number_factory(alg, "", "xx")
	37	assert parse_number_func(x) == result

+93

-0

tests/test_parse_string_function.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import unicodedata
	5
	6	import pytest
	7	from hypothesis import given
	8	from hypothesis.strategies import floats, integers, lists, text
	9	from natsort.compat.fastnumbers import fast_float
	10	from natsort.compat.py23 import py23_str
	11	from natsort.ns_enum import NS_DUMB, ns
	12	from natsort.utils import NumericalRegularExpressions as NumRegex
	13	from natsort.utils import parse_string_factory
	14
	15
	16	class CustomTuple(tuple):
	17	"""Used to ensure what is given during testing is what is returned."""
	18
	19	original = None
	20
	21
	22	def input_transform(x):
	23	"""Make uppercase."""
	24	try:
	25	return x.upper()
	26	except AttributeError:
	27	return x
	28
	29
	30	def final_transform(x, original):
	31	"""Make the input a CustomTuple."""
	32	t = CustomTuple(x)
	33	t.original = original
	34	return t
	35
	36
	37	@pytest.fixture
	38	def parse_string_func(request):
	39	"""A parse_string_factory result with sample arguments."""
	40	sep = ""
	41	return parse_string_factory(
	42	request.param, # algorirhm
	43	sep,
	44	NumRegex.int_nosign().split,
	45	input_transform,
	46	fast_float,
	47	final_transform,
	48	)
	49
	50
	51	@pytest.mark.parametrize("parse_string_func", [ns.DEFAULT], indirect=True)
	52	@given(x=floats() \| integers())
	53	def test_parse_string_factory_raises_type_error_if_given_number(x, parse_string_func):
	54	with pytest.raises(TypeError):
	55	assert parse_string_func(x)
	56
	57
	58	# noinspection PyCallingNonCallable
	59	@pytest.mark.parametrize(
	60	"parse_string_func, orig_func",
	61	[
	62	(ns.DEFAULT, lambda x: x.upper()),
	63	(ns.LOCALE, lambda x: x.upper()),
	64	(ns.LOCALE \| NS_DUMB, lambda x: x), # This changes the "original" handling.
	65	],
	66	indirect=["parse_string_func"],
	67	)
	68	@given(
	69	x=lists(
	70	elements=floats(allow_nan=False) \| text() \| integers(), min_size=1, max_size=10
	71	)
	72	)
	73	@pytest.mark.usefixtures("with_locale_en_us")
	74	def test_parse_string_factory_invariance(x, parse_string_func, orig_func):
	75	# parse_string_factory is the high-level combination of several dedicated
	76	# functions involved in splitting and manipulating a string. The details of
	77	# what those functions do is not relevant to testing parse_string_factory.
	78	# What is relevant is that the form of the output matches the invariant
	79	# that even elements are string and odd are numerical. That each component
	80	# function is doing what it should is tested elsewhere.
	81	value = "".join(map(py23_str, x)) # Convert the input to a single string.
	82	result = parse_string_func(value)
	83	result_types = list(map(type, result))
	84	expected_types = [py23_str if i % 2 == 0 else float for i in range(len(result))]
	85	assert result_types == expected_types
	86
	87	# The result is in our CustomTuple.
	88	assert isinstance(result, CustomTuple)
	89
	90	# Original should have gone through the "input_transform"
	91	# which is uppercase in these tests.
	92	assert result.original == orig_func(unicodedata.normalize("NFD", value))

+100

-0

tests/test_regex.py less more

	0	# -- coding: utf-8 --
	1	"""These test the splitting regular expressions."""
	2	from __future__ import unicode_literals
	3
	4	import pytest
	5	from natsort.utils import NumericalRegularExpressions as NumRegex
	6
	7
	8	regex_names = {
	9	NumRegex.int_nosign(): "int_nosign",
	10	NumRegex.int_sign(): "int_sign",
	11	NumRegex.float_nosign_noexp(): "float_nosign_noexp",
	12	NumRegex.float_sign_noexp(): "float_sign_noexp",
	13	NumRegex.float_nosign_exp(): "float_nosign_exp",
	14	NumRegex.float_sign_exp(): "float_sign_exp",
	15	}
	16
	17	# Regex Aliases (so lines stay a reasonable length.
	18	i_u = NumRegex.int_nosign()
	19	i_s = NumRegex.int_sign()
	20	f_u = NumRegex.float_nosign_noexp()
	21	f_s = NumRegex.float_sign_noexp()
	22	f_ue = NumRegex.float_nosign_exp()
	23	f_se = NumRegex.float_sign_exp()
	24
	25	# Assemble a test suite of regular strings and their regular expression
	26	# splitting result. Organize by the input string.
	27	regex_tests = {
	28	"-123.45e+67": {
	29	i_u: ["-", "123", ".", "45", "e+", "67", ""],
	30	i_s: ["", "-123", ".", "45", "e", "+67", ""],
	31	f_u: ["-", "123.45", "e+", "67", ""],
	32	f_s: ["", "-123.45", "e", "+67", ""],
	33	f_ue: ["-", "123.45e+67", ""],
	34	f_se: ["", "-123.45e+67", ""],
	35	},
	36	"a-123.45e+67b": {
	37	i_u: ["a-", "123", ".", "45", "e+", "67", "b"],
	38	i_s: ["a", "-123", ".", "45", "e", "+67", "b"],
	39	f_u: ["a-", "123.45", "e+", "67", "b"],
	40	f_s: ["a", "-123.45", "e", "+67", "b"],
	41	f_ue: ["a-", "123.45e+67", "b"],
	42	f_se: ["a", "-123.45e+67", "b"],
	43	},
	44	"hello": {
	45	i_u: ["hello"],
	46	i_s: ["hello"],
	47	f_u: ["hello"],
	48	f_s: ["hello"],
	49	f_ue: ["hello"],
	50	f_se: ["hello"],
	51	},
	52	"abc12.34.56-7def": {
	53	i_u: ["abc", "12", ".", "34", ".", "56", "-", "7", "def"],
	54	i_s: ["abc", "12", ".", "34", ".", "56", "", "-7", "def"],
	55	f_u: ["abc", "12.34", "", ".56", "-", "7", "def"],
	56	f_s: ["abc", "12.34", "", ".56", "", "-7", "def"],
	57	f_ue: ["abc", "12.34", "", ".56", "-", "7", "def"],
	58	f_se: ["abc", "12.34", "", ".56", "", "-7", "def"],
	59	},
	60	"a1b2c3d4e5e6": {
	61	i_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
	62	i_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
	63	f_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
	64	f_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
	65	f_ue: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
	66	f_se: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
	67	},
	68	"eleven۱۱eleven11eleven১১": { # All of these are the decimal 11
	69	i_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
	70	i_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
	71	f_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
	72	f_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
	73	f_ue: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
	74	f_se: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
	75	},
	76	"12①②ⅠⅡ⅓": { # Two decimals, Two digits, Two numerals, fraction
	77	i_u: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
	78	i_s: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
	79	f_u: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
	80	f_s: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
	81	f_ue: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
	82	f_se: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
	83	}
	84	}
	85
	86
	87	# From the above collections, create the parametrized tests and labels.
	88	regex_params = [
	89	(given, expected, regex)
	90	for given, values in regex_tests.items()
	91	for regex, expected in values.items()
	92	]
	93	labels = ["{}-{}".format(given, regex_names[regex]) for given, _, regex in regex_params]
	94
	95
	96	@pytest.mark.parametrize("x, expected, regex", regex_params, ids=labels)
	97	def test_regex_splits_correctly(x, expected, regex):
	98	# noinspection PyUnresolvedReferences
	99	assert regex.split(x) == expected

+78

-0

tests/test_string_component_transform_factory.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	from functools import partial
	5
	6	import pytest
	7	from hypothesis import example, given
	8	from hypothesis.strategies import floats, integers, text
	9	from natsort.compat.fastnumbers import fast_float, fast_int
	10	from natsort.compat.locale import get_strxfrm
	11	from natsort.compat.py23 import py23_range, py23_str, py23_unichr
	12	from natsort.ns_enum import NS_DUMB, ns
	13	from natsort.utils import groupletters, string_component_transform_factory
	14
	15	# There are some unicode values that are known failures with the builtin locale
	16	# library on BSD systems that has nothing to do with natsort (a ValueError is
	17	# raised by strxfrm). Let's filter them out.
	18	try:
	19	bad_uni_chars = frozenset(
	20	py23_unichr(x) for x in py23_range(0X10fefd, 0X10ffff + 1)
	21	)
	22	except ValueError:
	23	# Narrow unicode build... no worries.
	24	bad_uni_chars = frozenset()
	25
	26
	27	def no_bad_uni_chars(x, _bad_chars=bad_uni_chars):
	28	"""Ensure text does not contain bad unicode characters"""
	29	return not any(y in _bad_chars for y in x)
	30
	31
	32	def no_null(x):
	33	"""Ensure text does not contain a null character."""
	34	return "\0" not in x
	35
	36
	37	@pytest.mark.parametrize(
	38	"alg, example_func",
	39	[
	40	(ns.INT, fast_int),
	41	(ns.DEFAULT, fast_int),
	42	(ns.FLOAT, partial(fast_float, nan=float("-inf"))),
	43	(ns.FLOAT \| ns.NANLAST, partial(fast_float, nan=float("+inf"))),
	44	(ns.GROUPLETTERS, partial(fast_int, key=groupletters)),
	45	(ns.LOCALE, partial(fast_int, key=lambda x: get_strxfrm()(x))),
	46	(
	47	ns.GROUPLETTERS \| ns.LOCALE,
	48	partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
	49	),
	50	(
	51	NS_DUMB \| ns.LOCALE,
	52	partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
	53	),
	54	(
	55	ns.GROUPLETTERS \| ns.LOCALE \| ns.FLOAT \| ns.NANLAST,
	56	partial(
	57	fast_float,
	58	key=lambda x: get_strxfrm()(groupletters(x)),
	59	nan=float("+inf"),
	60	),
	61	),
	62	],
	63	)
	64	@example(x=float("nan"))
	65	@given(
	66	x=integers()
	67	\| floats()
	68	\| text().filter(bool).filter(no_bad_uni_chars).filter(no_null)
	69	)
	70	@pytest.mark.usefixtures("with_locale_en_us")
	71	def test_string_component_transform_factory(x, alg, example_func):
	72	string_component_transform_func = string_component_transform_factory(alg)
	73	try:
	74	assert string_component_transform_func(py23_str(x)) == example_func(py23_str(x))
	75	except ValueError as e: # handle broken locale lib on BSD.
	76	if "is not in range" not in str(e):
	77	raise

+70

-0

tests/test_unicode_numbers.py less more

	0	# -- coding: utf-8 --
	1	"""\
	2	Test the Unicode numbers module.
	3	"""
	4	from __future__ import unicode_literals
	5
	6	import unicodedata
	7
	8	from natsort.compat.py23 import py23_range, py23_unichr
	9	from natsort.unicode_numbers import (
	10	decimal_chars,
	11	decimals,
	12	digit_chars,
	13	digits,
	14	digits_no_decimals,
	15	numeric,
	16	numeric_chars,
	17	numeric_hex,
	18	numeric_no_decimals,
	19	)
	20
	21
	22	def test_numeric_chars_contains_only_valid_unicode_numeric_characters():
	23	for a in numeric_chars:
	24	assert unicodedata.numeric(a, None) is not None
	25
	26
	27	def test_digit_chars_contains_only_valid_unicode_digit_characters():
	28	for a in digit_chars:
	29	assert unicodedata.digit(a, None) is not None
	30
	31
	32	def test_decimal_chars_contains_only_valid_unicode_decimal_characters():
	33	for a in decimal_chars:
	34	assert unicodedata.decimal(a, None) is not None
	35
	36
	37	def test_numeric_chars_contains_all_valid_unicode_numeric_and_digit_characters():
	38	set_numeric_hex = set(numeric_hex)
	39	set_numeric_chars = set(numeric_chars)
	40	set_digit_chars = set(digit_chars)
	41	set_decimal_chars = set(decimal_chars)
	42	for i in py23_range(0X110000):
	43	try:
	44	a = py23_unichr(i)
	45	except ValueError:
	46	break
	47	if a in "0123456789":
	48	continue
	49	if unicodedata.numeric(a, None) is not None:
	50	assert i in set_numeric_hex
	51	assert a in set_numeric_chars
	52	if unicodedata.digit(a, None) is not None:
	53	assert i in set_numeric_hex
	54	assert a in set_digit_chars
	55	if unicodedata.decimal(a, None) is not None:
	56	assert i in set_numeric_hex
	57	assert a in set_decimal_chars
	58
	59	assert set_decimal_chars.isdisjoint(digits_no_decimals)
	60	assert set_digit_chars.issuperset(digits_no_decimals)
	61
	62	assert set_decimal_chars.isdisjoint(numeric_no_decimals)
	63	assert set_numeric_chars.issuperset(numeric_no_decimals)
	64
	65
	66	def test_combined_string_contains_all_characters_in_list():
	67	assert numeric == "".join(numeric_chars)
	68	assert digits == "".join(digit_chars)
	69	assert decimals == "".join(decimal_chars)

+167

-0

tests/test_utils.py less more

	0	# -- coding: utf-8 --
	1	"""These test the utils.py functions."""
	2	from __future__ import unicode_literals
	3
	4	import pathlib
	5	import string
	6	from itertools import chain
	7	from operator import neg as op_neg
	8
	9	import pytest
	10	from hypothesis import given
	11	from hypothesis.strategies import integers, lists, sampled_from, text
	12	from natsort import utils
	13	from natsort.compat.py23 import py23_cmp, py23_int, py23_lower, py23_str
	14	from natsort.ns_enum import ns
	15
	16
	17	def test_do_decoding_decodes_bytes_string_to_unicode():
	18	assert type(utils.do_decoding(b"bytes", "ascii")) is py23_str
	19	assert utils.do_decoding(b"bytes", "ascii") == "bytes"
	20	assert utils.do_decoding(b"bytes", "ascii") == b"bytes".decode("ascii")
	21
	22
	23	@pytest.mark.parametrize(
	24	"alg, expected",
	25	[
	26	(ns.I, utils.NumericalRegularExpressions.int_nosign()),
	27	(ns.I \| ns.N, utils.NumericalRegularExpressions.int_nosign()),
	28	(ns.I \| ns.S, utils.NumericalRegularExpressions.int_sign()),
	29	(ns.I \| ns.S \| ns.N, utils.NumericalRegularExpressions.int_sign()),
	30	(ns.F, utils.NumericalRegularExpressions.float_nosign_exp()),
	31	(ns.F \| ns.N, utils.NumericalRegularExpressions.float_nosign_noexp()),
	32	(ns.F \| ns.S, utils.NumericalRegularExpressions.float_sign_exp()),
	33	(ns.F \| ns.S \| ns.N, utils.NumericalRegularExpressions.float_sign_noexp()),
	34	],
	35	)
	36	def test_regex_chooser_returns_correct_regular_expression_object(alg, expected):
	37	assert utils.regex_chooser(alg).pattern == expected.pattern
	38
	39
	40	@pytest.mark.parametrize(
	41	"alg, value_or_alias",
	42	[
	43	# Defaults
	44	(ns.DEFAULT, 0),
	45	(ns.INT, 0),
	46	(ns.UNSIGNED, 0),
	47	# Aliases
	48	(ns.INT, ns.I),
	49	(ns.UNSIGNED, ns.U),
	50	(ns.FLOAT, ns.F),
	51	(ns.SIGNED, ns.S),
	52	(ns.NOEXP, ns.N),
	53	(ns.PATH, ns.P),
	54	(ns.LOCALEALPHA, ns.LA),
	55	(ns.LOCALENUM, ns.LN),
	56	(ns.LOCALE, ns.L),
	57	(ns.IGNORECASE, ns.IC),
	58	(ns.LOWERCASEFIRST, ns.LF),
	59	(ns.GROUPLETTERS, ns.G),
	60	(ns.UNGROUPLETTERS, ns.UG),
	61	(ns.CAPITALFIRST, ns.C),
	62	(ns.UNGROUPLETTERS, ns.CAPITALFIRST),
	63	(ns.NANLAST, ns.NL),
	64	(ns.COMPATIBILITYNORMALIZE, ns.CN),
	65	(ns.NUMAFTER, ns.NA),
	66	# Convenience
	67	(ns.LOCALE, ns.LOCALEALPHA \| ns.LOCALENUM),
	68	(ns.REAL, ns.FLOAT \| ns.SIGNED),
	69	],
	70	)
	71	def test_ns_enum_values_and_aliases(alg, value_or_alias):
	72	assert alg == value_or_alias
	73
	74
	75	def test_chain_functions_is_a_no_op_if_no_functions_are_given():
	76	x = 2345
	77	assert utils.chain_functions([])(x) is x
	78
	79
	80	def test_chain_functions_does_one_function_if_one_function_is_given():
	81	x = "2345"
	82	assert utils.chain_functions([len])(x) == 4
	83
	84
	85	def test_chain_functions_combines_functions_in_given_order():
	86	x = 2345
	87	assert utils.chain_functions([str, len, op_neg])(x) == -len(str(x))
	88
	89
	90	# Each test has an "example" version for demonstrative purposes,
	91	# and a test that uses the hypothesis module.
	92
	93
	94	def test_groupletters_returns_letters_with_lowercase_transform_of_letter_example():
	95	assert utils.groupletters("HELLO") == "hHeElLlLoO"
	96	assert utils.groupletters("hello") == "hheelllloo"
	97
	98
	99	@given(text().filter(bool))
	100	def test_groupletters_returns_letters_with_lowercase_transform_of_letter(x):
	101	assert utils.groupletters(x) == "".join(
	102	chain.from_iterable([py23_lower(y), y] for y in x)
	103	)
	104
	105
	106	def test_sep_inserter_does_nothing_if_no_numbers_example():
	107	assert list(utils.sep_inserter(iter(["a", "b", "c"]), "")) == ["a", "b", "c"]
	108	assert list(utils.sep_inserter(iter(["a"]), "")) == ["a"]
	109
	110
	111	def test_sep_inserter_does_nothing_if_only_one_number_example():
	112	assert list(utils.sep_inserter(iter(["a", 5]), "")) == ["a", 5]
	113
	114
	115	def test_sep_inserter_inserts_separator_string_between_two_numbers_example():
	116	assert list(utils.sep_inserter(iter([5, 9]), "")) == ["", 5, "", 9]
	117
	118
	119	@given(lists(elements=text().filter(bool) \| integers(), min_size=3))
	120	def test_sep_inserter_inserts_separator_between_two_numbers(x):
	121	# Rather than just replicating the the results in a different
	122	# algorithm, validate that the "shape" of the output is as expected.
	123	result = list(utils.sep_inserter(iter(x), ""))
	124	for i, pos in enumerate(result[1:-1], 1):
	125	if pos == "":
	126	assert isinstance(result[i - 1], py23_int)
	127	assert isinstance(result[i + 1], py23_int)
	128
	129
	130	def test_path_splitter_splits_path_string_by_separator_example():
	131	z = "/this/is/a/path"
	132	assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
	133	z = pathlib.Path("/this/is/a/path")
	134	assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
	135
	136
	137	@given(lists(sampled_from(string.ascii_letters), min_size=2).filter(all))
	138	def test_path_splitter_splits_path_string_by_separator(x):
	139	z = py23_str(pathlib.Path(*x))
	140	assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
	141
	142
	143	def test_path_splitter_splits_path_string_by_separator_and_removes_extension_example():
	144	z = "/this/is/a/path/file.exe"
	145	y = tuple(pathlib.Path(z).parts)
	146	assert tuple(utils.path_splitter(z)) == y[:-1] + (
	147	pathlib.Path(z).stem,
	148	pathlib.Path(z).suffix,
	149	)
	150
	151
	152	@given(lists(sampled_from(string.ascii_letters), min_size=3).filter(all))
	153	def test_path_splitter_splits_path_string_by_separator_and_removes_extension(x):
	154	z = py23_str(pathlib.Path(*x[:-2])) + "." + x[-1]
	155	y = tuple(pathlib.Path(z).parts)
	156	assert tuple(utils.path_splitter(z)) == y[:-1] + (
	157	pathlib.Path(z).stem,
	158	pathlib.Path(z).suffix,
	159	)
	160
	161
	162	@given(integers())
	163	def test_py23_cmp(x):
	164	assert py23_cmp(x, x) == 0
	165	assert py23_cmp(x, x + 1) < 0
	166	assert py23_cmp(x, x - 1) > 0

+13

-7

tox.ini less more

17	17	passenv =
18	18	WITH_EXTRAS
19	19	deps =
20		pipenv
	20	-r dev-requirements.txt
21	21	extras =
22	22	{env:WITH_EXTRAS:}
23	23	commands =
24		pipenv install --dev --skip-lock
25	24	# Only run How It Works doctest on Python 3.6.
26		py36: {envpython} -m doctest -o IGNORE_EXCEPTION_DETAIL docs/source/howitworks.rst
	25	py36: {envpython} -m doctest -o IGNORE_EXCEPTION_DETAIL docs/howitworks.rst
27	26	# Other doctests are run for all pythons.
28		pytest README.rst docs/source/intro.rst docs/source/examples.rst
	27	pytest README.rst docs/intro.rst docs/examples.rst
29	28	pytest --doctest-modules {envsitepackagesdir}/natsort
30	29	# Full test suite. Allow the user to pass command-line objects.
31	30	pytest --tb=short --cov {envsitepackagesdir}/natsort --cov-report term-missing {posargs:}

37	36	flake8-import-order
38	37	flake8-bugbear
39	38	pep8-naming
40		commands = flake8
	39	check-manifest
	40	twine
	41	commands =
	42	{envpython} setup.py sdist bdist_wheel
	43	flake8
	44	check-manifest --ignore ".github,.md,.coveragerc"
	45	twine check dist/*
	46	skip_install = true
41	47
42	48	# Build documentation.
43	49	[testenv:docs]

46	52	sphinx_rtd_theme
47	53	commands =
48	54	{envpython} setup.py build_sphinx
	55	skip_install = true
49	56
50	57	# Release the code to PyPI
51	58	[testenv:release]
52	59	deps =
53	60	twine
54		check-manifest
55	61	commands =
56		check-manifest
57	62	{envpython} setup.py sdist bdist_wheel
58	63	twine upload dist/*
	64	skip_install = true