Codebase list natsort / c90e5fc
Update upstream source from tag 'upstream/6.0.0' Update to upstream version '6.0.0' with Debian dir 7a2e9f60755e7b601aad22f648a4bc5534ed5ac7 Agustin Henze 5 years ago
95 changed file(s) with 5237 addition(s) and 5195 deletion(s). Raw diff Collapse all Expand all
1313 if __name__ == .__main__.:
1414
1515 ignore_errors = True
16
0 ---
1 name: Bug report
2 about: Report unexpected behavior, a crash, or incorrect results
3
4 ---
5
6 **Describe the bug**
7 A clear and concise description of what the bug is.
8
9 **Expected behavior**
10 A clear and concise description of what you expected to happen.
11
12 **Environment (please complete the following information):**
13 - Python Version: [e.g. 3.6]
14 - OS [e.g. Windows, Fedora]
15 - If the bug involves `LOCALE` or `humansorted`:
16 - Is `PyICU` installed?
17 - Do you have a locale set? If so, to what?
18
19 **To Reproduce**
20 Include a Minimum, Complete, Verifiable Example. If there is a traceback (or error message), **please** include the *entire* traceback (or error message), even if you think it is too big.
21
22 See https://stackoverflow.com/help/mcve for an explanation.
0 ---
1 name: Feature request
2 about: Suggest or request an enhancement
3
4 ---
5
6 **Describe the feature or enhancement**
7 Be as descriptive and precise as possible.
8
9 **Provide a concrete example of how the feature or enhancement will improve `natsort`**
10 Code examples are an excellent way to show how this feature or enhancement will help. To make your case stronger, show the current workaround due to the lack of the feature. What is the return-on-investment for including the feature or enhancement?
11
12 **Would you be willing to submit a Pull Request for this feature?**
13 Extra help is *always* welcome.
0 ---
1 name: Question
2 about: Inquiry about natsort
3
4 ---
5
6 - [ ] I have read the [`natsort` documentation](https://natsort.readthedocs.io/en/master/) and the [README](https://github.com/SethMMorton/natsort#natsort), and my question is still not answered
0 dist: xenial
1 sudo: false
02 language: python
3 cache: pip
14
25 jobs:
36 include:
47 - python: "2.7"
5 dist: trusty
6 sudo: false
78 env: WITH_EXTRAS=""
89 - python: "2.7"
9 dist: trusty
10 sudo: false
1110 env: WITH_EXTRAS="fast,icu"
1211 addons:
1312 apt:
1615 - language-pack-de
1716 - language-pack-en
1817 - python: "3.4"
19 dist: trusty
20 sudo: false
2118 env: WITH_EXTRAS=""
2219 - python: "3.5"
23 dist: trusty
24 sudo: false
2520 env: WITH_EXTRAS=""
2621 - python: "3.6"
27 dist: trusty
28 sudo: false
2922 env: WITH_EXTRAS=""
3023 - python: "3.6"
31 dist: trusty
32 sudo: false
3324 env: WITH_EXTRAS="fast,icu"
3425 addons:
3526 apt:
3829 - language-pack-de
3930 - language-pack-en
4031 - python: "3.7"
41 dist: xenial
42 sudo: true
4332 env: WITH_EXTRAS=""
4433 - stage: code-quality
4534 python: "3.6"
46 dist: trusty
47 sudo: false
48 install: pip install flake8 flake8-import-order flake8-bugbear pep8-naming
49 script: flake8
35 install: pip install flake8 flake8-import-order flake8-bugbear pep8-naming twine check-manifest
36 script:
37 - flake8
38 - check-manifest --ignore ".github*,*.md,.coveragerc"
39 - python setup.py sdist
40 - twine check dist/*
5041
5142 install:
5243 - pip install -U pip
0 02-04-2019 v. 6.0.0
1 +++++++++++++++++++
2
3 - Drop support for Python 2.6 and 3.3 (thanks @jdufresne) (issue #70)
4 - Remove deprecated APIs (kwargs number_type, signed, exp, as_path, py3_safe; enums ns.TYPESAFE, ns.DIGIT, ns.VERSION; functions versorted, index_versorted) (issue #81)
5 - Remove pipenv as a dependency for building (issue #86)
6 - Simply Travis-CI configuration (thanks @jdufresne) (issue #88)
7 - Fix README rendering in PyPI (thanks @altendky) (issue #89)
8
9 11-18-2018 v. 5.5.0
10 +++++++++++++++++++
11
12 - Formally deprecated old or misleading APIs (issue #83)
13 - Documentation, packaging, and CI cleanup (thanks @jdufresne) (issues #69, #71-#80)
14 - Consolidate API documentation into a single page (issue #82)
15 - Add a CHANGELOG.rst to the top-level of the repository (issue #85)
16 - Add back support for very old versions of setuptools (issue #84)
17
18 09-09-2018 v. 5.4.1
19 +++++++++++++++++++
20
21 - Fix error in a newly added test (issues #65, #67)
22 - Changed code format and quality checking infrastructure (issue #68)
23
24 09-06-2018 v. 5.4.0
25 +++++++++++++++++++
26
27 - Re-expose ``natsort_key`` as "public" and remove the
28 associated ``DepricationWarning``
29 - Add better developer documentation
30 - Refactor tests (issue #66)
31 - Bump allowed ``fastnumbers`` version
32
33 07-07-2018 v. 5.3.3
34 +++++++++++++++++++
35
36 - Update docs with a FAQ and quick how-it-works (issue #60)
37 - Fix a StopIteration error in the testing code
38 - Enable Python 3.7 support in Travis-CI (issue #61)
39
40 05-17-2018 v. 5.3.2
41 +++++++++++++++++++
42
43 - Fix bug that prevented install on old versions of setuptools (issues #55, #56)
44 - Revert layout from src/natsort/ back to natsort/ to make user
45 testing simpler (issues #57, #58)
46
47 05-14-2018 v. 5.3.1
48 +++++++++++++++++++
49
50 - No bugfixes or features, just infrastructure and installation updates
51 - Move to defining dependencies with Pipfile
52 - Development layout is now src/natsort/ instead of natsort/
53 - Add bumpversion infrastructure
54 - Extras can be installed by "[]" notation
55
56 04-20-2018 v. 5.3.0
57 +++++++++++++++++++
58
59 - Fix bug in assessing ``fastnumbers`` version at import-time (thanks @hholzgra) (issues #51, #53)
60 - Add ability to consider unicode-decimal numbers as numbers (issues #52, #54)
61
62 02-14-2018 v. 5.2.0
63 +++++++++++++++++++
64
65 - Add ``ns.NUMAFTER`` to cause numbers to be placed after non-numbers (issues #48, #49)
66 - Add ``natcmp`` function (Python 2 only) (thanks @rinslow) (issue #47)
67
68 11-11-2017 v. 5.1.1
69 +++++++++++++++++++
70
71 - Added additional unicode number support for Python 3.7
72 - Added information on how to install and test (issue #46)
73
74 08-19-2017 v. 5.1.0
75 +++++++++++++++++++
76
77 - Fixed ``StopIteration`` warning on Python 3.6+ (thanks @lykinsbd) (issues #42, #43)
78 - All Unicode input is now normalized (issue #44, #45)
79
80 04-30-2017 v. 5.0.3
81 +++++++++++++++++++
82
83 - Improved development infrastructure
84 - Migrated documentation to ReadTheDocs
85
86 01-02-2017 v. 5.0.2
87 +++++++++++++++++++
88
89 - Added additional unicode number support for Python 3.6
90 - Renamed several internal functions and variables to improve clarity
91 - Improved documentation examples
92 - Added a "how does it work?" section to the documentation
93
94 06-04-2016 v. 5.0.1
95 +++++++++++++++++++
96
97 - The ``ns`` enum attributes can now be imported from the top-level
98 namespace
99 - Fixed a bug with the ``from natsort import *`` mechanism
100 - Fixed bug with using ``natsort`` with ``python -OO`` (issues #38, #39)
101
102 05-08-2016 v. 5.0.0
103 +++++++++++++++++++
104
105 - ``ns.LOCALE``/``humansorted`` now accounts for thousands separators (issue #36)
106 - Refactored entire codebase to be more functional (as in use functions as
107 units). Previously, the code was rather monolithic and difficult to follow. The
108 goal is that with the code existing in smaller units, contributing will
109 be easier (issue #37)
110 - Deprecated ``ns.TYPESAFE`` option as it is now always on (due to a new
111 iterator-based algorithm, the typesafe function is now cheap)
112 - Increased speed of execution (came for free with the new functional approach
113 because the new factory function paradigm eliminates most ``if`` branches
114 during execution)
115
116 - For the most cases, the code is 30-40% faster than version 4.0.4
117 - If using ``ns.LOCALE`` or ``humansorted``, the code is 1100% faster than
118 version 4.0.4
119
120 - Improved clarity of documentaion with regards to locale-aware sorting
121 - Added a new ``chain_functions`` function for convenience in creating
122 a complex user-given ``key`` from several existing functions
123
124 11-01-2015 v. 4.0.4
125 +++++++++++++++++++
126
127 - Improved coverage of unit tests
128 - Unit tests use new and improved hypothesis library
129 - Fixed compatibility issues with Python 3.5
130
131 06-25-2015 v. 4.0.3
132 +++++++++++++++++++
133
134 - Fixed bad install on last release (sorry guys!) (issue #30)
135
136 06-24-2015 v. 4.0.2
137 +++++++++++++++++++
138
139 - Added back Python 2.6 and Python 3.2 compatibility. Unit testing is now
140 performed for these versions (thanks @dpetzold) (issue #29)
141 - Consolidated under-the-hood compatibility functionality
142
143 06-04-2015 v. 4.0.1
144 +++++++++++++++++++
145
146 - Added support for sorting NaN by internally converting to -Infinity
147 or +Infinity (issue #27)
148
149 05-17-2015 v. 4.0.0
150 +++++++++++++++++++
151
152 - Made default behavior of 'natsort' search for unsigned ints,
153 rather than signed floats. This is a backwards-incompatible
154 change but in 99% of use cases it should not require any
155 end-user changes (issue #20)
156 - Improved handling of locale-aware sorting on systems where the
157 underlying locale library is broken (issue #34))
158 - Greatly improved all unit tests by adding the hypothesis library
159
160 04-06-2015 v. 3.5.6
161 +++++++++++++++++++
162
163 - Added 'UNGROUPLETTERS' algorithm to get the case-grouping behavior of
164 an ordinal sort when using 'LOCALE' (issue #23)
165 - Added convenience functions 'decoder', 'as_ascii', and 'as_utf8' for
166 dealing with bytes types
167
168 04-04-2015 v. 3.5.5
169 +++++++++++++++++++
170
171 - Added 'realsorted' and 'index_realsorted' functions for
172 forward-compatibility with >= 4.0.0
173 - Made explanation of when to use "TYPESAFE" more clear in the docs
174
175 04-02-2015 v. 3.5.4
176 +++++++++++++++++++
177
178 - Fixed bug where a 'TypeError' was raised if a string containing a leading
179 number was sorted with alpha-only strings when 'LOCALE' is used (issue #22)
180
181 03-26-2015 v. 3.5.3
182 +++++++++++++++++++
183
184 - Fixed bug where '--reverse-filter' option in shell script was not
185 getting checked for correctness
186 - Documentation updates to better describe locale bug, and illustrate
187 upcoming default behavior change
188 - Internal improvements, including making test suite more granular
189
190 01-13-2015 v. 3.5.2
191 +++++++++++++++++++
192
193 - Enhancement that will convert a 'pathlib.Path' object to a 'str' if
194 'ns.PATH' is enabled (issue #16)
195
196 09-25-2014 v. 3.5.1
197 +++++++++++++++++++
198
199 - Fixed bug that caused list/tuples to fail when using 'ns.LOWECASEFIRST'
200 or 'ns.IGNORECASE' (issue #15)
201 - Refactored modules so that only the public API was in natsort.py and
202 ns_enum.py
203 - Refactored all import statements to be absolute, not relative
204
205
206 09-02-2014 v. 3.5.0
207 +++++++++++++++++++
208
209 - Added the 'alg' argument to the 'natsort' functions. This argument
210 accepts an enum that is used to indicate the options the user wishes
211 to use. The 'number_type', 'signed', 'exp', 'as_path', and 'py3_safe'
212 options are being deprecated and will become (undocumented)
213 keyword-only options in natsort version 4.0.0
214 - The user can now modify how 'natsort' handles the case of non-numeric
215 characters (issue #14)
216 - The user can now instruct 'natsort' to use locale-aware sorting, which
217 allows 'natsort' to perform true "human sorting" (issue #14)
218
219 - The `humansorted` convenience function has been included to make this
220 easier
221
222 - Updated shell script with locale functionality
223
224 08-12-2014 v. 3.4.1
225 +++++++++++++++++++
226
227 - 'natsort' will now use the 'fastnumbers' module if it is installed. This
228 gives up to an extra 30% boost in speed over the previous performance
229 enhancements
230 - Made documentation point to more 'natsort' resources, and also added a
231 new example in the examples section
232
233 07-19-2014 v. 3.4.0
234 +++++++++++++++++++
235
236 - Fixed a bug that caused user's options to the 'natsort_key' to not be
237 passed on to recursive calls of 'natsort_key' (issue #12)
238 - Added a 'natsort_keygen' function that will generate a wrapped version
239 of 'natsort_key' that is easier to call. 'natsort_key' is now set to
240 deprecate at natsort version 4.0.0
241 - Added an 'as_path' option to 'natsorted' & co. that will try to treat
242 input strings as filepaths. This will help yield correct results for
243 OS-generated inputs like
244 ``['/p/q/o.x', '/p/q (1)/o.x', '/p/q (10)/o.x', '/p/q/o (1).x']`` (issue #3)
245 - Massive performance enhancements for string input (1.8x-2.0x), at the expense
246 of reduction in speed for numeric input (~2.0x)
247
248 - This is a good compromise because the most common input will be strings,
249 not numbers, and sorting numbers still only takes 0.6x the time of sorting
250 strings. If you are sorting only numbers, you would use 'sorted' anyway
251
252 - Added the 'order_by_index' function to help in using the output of
253 'index_natsorted' and 'index_versorted'
254 - Added the 'reverse' option to 'natsorted' & co. to make it's API more
255 similar to the builtin 'sorted'
256 - Added more unit tests
257 - Added auxillary test code that helps in profiling and stress-testing
258 - Reworked the documentation, moving most of it to PyPI's hosting platform
259 - Added support for coveralls.io
260 - Entire codebase is now PyFlakes and PEP8 compliant
261
262 06-28-2014 v. 3.3.0
263 +++++++++++++++++++
264
265 - Added a 'versorted' method for more convenient sorting of versions (issue #11)
266 - Updated command-line tool --number_type option with 'version' and 'ver'
267 to make it more clear how to sort version numbers
268 - Moved unit-testing mechanism from being docstring-based to actual unit tests
269 in actual functions (issue #10)
270
271 - This has provided the ability determine the coverage of the unit tests (99%)
272 - This also makes the pydoc documentation a bit more clear
273
274 - Made docstrings for public functions mirror the README API
275 - Connected natsort development to Travis-CI to help ensure quality releases
276
277 06-20-2014 v. 3.2.1
278 +++++++++++++++++++
279
280 - Re-"Fixed" unorderable types issue on Python 3.x - this workaround
281 is for when the problem occurs in the middle of the string (issue #7 again)
282
283 05-07-2014 v. 3.2.0
284 +++++++++++++++++++
285
286 - "Fixed" unorderable types issue on Python 3.x with a workaround that
287 attempts to replicate the Python 2.x behavior by putting all the numbers
288 (or strings that begin with numbers) first (issue #7)
289 - Now explicitly excluding __pycache__ from releases by adding a prune statement
290 to MANIFEST.in
291
292 05-05-2014 v. 3.1.2
293 +++++++++++++++++++
294
295 - Added setup.cfg to support universal wheels (issue #6)
296 - Added Python 3.0 and Python 3.1 as requiring the argparse module
297
298 03-01-2014 v. 3.1.1
299 +++++++++++++++++++
300
301 - Added ability to sort lists of lists (issue #5)
302 - Cleaned up import statements
303
304 01-20-2014 v. 3.1.0
305 +++++++++++++++++++
306
307 - Added the ``signed`` and ``exp`` options to allow finer tuning of the sorting
308 - Entire codebase now works for both Python 2 and Python 3 without needing to run
309 ``2to3``
310 - Updated all doctests
311 - Further simplified the ``natsort`` base code by removing unneeded functions.
312 - Simplified documentation where possible
313 - Improved the shell script code
314
315 - Made the documentation less "path"-centric to make it clear it is not just
316 for sorting file paths
317 - Removed the filesystem-based options because these can be achieved better
318 though a pipeline
319 - Added doctests
320 - Added new options that correspond to ``signed`` and ``exp``
321 - The user can now specify multiple numbers to exclude or multiple ranges
322 to filter by
323
324 10-01-2013 v. 3.0.2
325 +++++++++++++++++++
326
327 - Made float, int, and digit searching algorithms all share the same base function
328 - Fixed some outdated comments
329 - Made the ``__version__`` variable available when importing the module
330
331 8-15-2013 v. 3.0.1
332 ++++++++++++++++++
333
334 - Added support for unicode strings (issue #2)
335 - Removed extraneous ``string2int`` function
336 - Fixed empty string removal function
337
338 7-13-2013 v. 3.0.0
339 ++++++++++++++++++
340
341 - Added a ``number_type`` argument to the sorting functions to specify how
342 liberal to be when deciding what a number is
343 - Reworked the documentation
344
345 6-25-2013 v. 2.2.0
346 ++++++++++++++++++
347
348 - Added ``key`` attribute to ``natsorted`` and ``index_natsorted`` so that
349 it mimics the functionality of the built-in ``sorted`` (issue #1)
350 - Added tests to reflect the new functionality, as well as tests demonstrating
351 how to get similar functionality using ``natsort_key``
352
353 12-5-2012 v. 2.1.0
354 ++++++++++++++++++
355
356 - Reorganized package
357 - Now using a platform independent shell script generator (entry_points
358 from distribute)
359 - Can now execute natsort from command line with ``python -m natsort``
360 as well
361
362 11-30-2012 v. 2.0.2
363 +++++++++++++++++++
364
365 - Added the use_2to3 option to setup.py
366 - Added distribute_setup.py to the distribution
367 - Added dependency to the argparse module (for python2.6)
368
369 11-21-2012 v. 2.0.1
370 +++++++++++++++++++
371
372 - Reorganized directory structure
373 - Added tests into the natsort.py file iteself
374
375 11-16-2012, v. 2.0.0
376 ++++++++++++++++++++
377
378 - Updated sorting algorithm to support floats (including exponentials) and
379 basic version number support
380 - Added better README documentation
381 - Added doctests
3939
4040 ## Attribution
4141
42 This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
42 This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct.html][version]
4343
44 [homepage]: http://contributor-covenant.org
45 [version]: http://contributor-covenant.org/version/1/4/
44 [homepage]: https://www.contributor-covenant.org/
45 [version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
11
22 If you have an idea for how to improve `natsort`, please contribute! It can
33 be as simple as a bug fix or documentation update, or as complicated as a more
4 robust algorithm.
4 robust algorithm. Contributions that change the public API of
5 `natsort` will have to ensure that the library does not become
6 less usable after the contribution and is backwards-compatible (unless there is
7 a good reason not to be).
58
69 I do not have strong opinions on how one should contribute, so
710 I have copy/pasted some text verbatim from the
+0
-5
ISSUE_TEMPLATE.md less more
0 ## Minimum, Complete, Verifiable Example
1
2 See https://stackoverflow.com/help/mcve for explanation.
3
4 ## Error message, Traceback, Desired behavior, Suggestion, Request, or Question
0 include README.rst
10 include LICENSE
2 include *.md
3 include *.sh
4 include Pipfile
5 include setup.py
6 include setup.cfg
1 include CHANGELOG.rst
2 include clean.sh
3 include dev-requirements.txt
74 include tox.ini
8 include .travis.yml
9 include .coveragerc
10 include .gitignore
11 include .bumpversion.cfg
125 graft docs
136 graft natsort
14 graft test_natsort
7 graft tests
158 global-exclude *.py[cod] __pycache__ *.so
+0
-10
Pipfile less more
0 [dev-packages]
1 coverage = "*"
2 pytest = ">=3.5"
3 pytest-cov = "*"
4 pytest-mock = ">=1.1"
5 hypothesis = ">=3.8.0"
6 pytest-faulthandler = {version = "*", platform_python_implementation = "== 'CPython'"}
7
8 # These packages are standard on newer python versions.
9 pathlib = {version = "*", python_version = "< '3.4'"}
2222
2323 - Source Code: https://github.com/SethMMorton/natsort
2424 - Downloads: https://pypi.org/project/natsort/
25 - Documentation: http://natsort.readthedocs.io/
26
27 - `Examples and Recipes <http://natsort.readthedocs.io/en/master/examples.html>`_
28 - `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_
29 - `API <http://natsort.readthedocs.io/en/master/api.html>`_
25 - Documentation: https://natsort.readthedocs.io/
26
27 - `Examples and Recipes <https://natsort.readthedocs.io/en/master/examples.html>`_
28 - `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_
29 - `API <https://natsort.readthedocs.io/en/master/api.html>`_
3030
3131 - `FAQ`_
3232 - `Optional Dependencies`_
3333
3434 - `fastnumbers <https://pypi.org/project/fastnumbers>`_ >= 2.0.0
3535 - `PyICU <https://pypi.org/project/PyICU>`_ >= 1.0.0
36
37 **NOTE**: Please see the `Deprecation Schedule`_ section for changes in
38 ``natsort`` version 6.0.0 and in the upcoming version 7.0.0.
3639
3740 Quick Description
3841 -----------------
4144 sort algorithm sorts lexicographically, so you might not get the results that you
4245 expect:
4346
44 .. code-block:: python
47 .. code-block:: pycon
4548
4649 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
4750 >>> sorted(a)
5659 sorting based on meaning and not computer code point).
5760 Using ``natsorted`` is simple:
5861
59 .. code-block:: python
62 .. code-block:: pycon
6063
6164 >>> from natsort import natsorted
6265 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
6568
6669 ``natsorted`` identifies numbers anywhere in a string and sorts them
6770 naturally. Below are some other things you can do with ``natsort``
68 (also see the `examples <http://natsort.readthedocs.io/en/master/examples.html>`_
71 (also see the `examples <https://natsort.readthedocs.io/en/master/examples.html>`_
6972 for a quick start guide, or the
70 `api <http://natsort.readthedocs.io/en/master/api.html>`_ for complete details).
73 `api <https://natsort.readthedocs.io/en/master/api.html>`_ for complete details).
7174
7275 **Note**: ``natsorted`` is designed to be a drop-in replacement for the built-in
7376 ``sorted`` function. Like ``sorted``, ``natsorted`` `does not sort in-place`.
7477 To sort a list and assign the output to the same variable, you must
7578 explicitly assign the output to a variable:
7679
77 .. code-block:: python
80 .. code-block:: pycon
7881
7982 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
8083 >>> natsorted(a)
9497 Sorting Versions
9598 ++++++++++++++++
9699
97 This is handled properly by default (as of ``natsort`` version >= 4.0.0):
98
99 .. code-block:: python
100 ``natsort`` does not actually *comprehend* version numbers.
101 It just so happens that the most common versioning schemes are designed to
102 work with standard natural sorting techniques; these schemes include
103 ``MAJOR.MINOR``, ``MAJOR.MINOR.PATCH``, ``YEAR.MONTH.DAY``. If your data
104 conforms to a scheme like this, then it will work out-of-the-box with
105 ``natsorted`` (as of ``natsort`` version >= 4.0.0):
106
107 .. code-block:: pycon
100108
101109 >>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
102110 >>> natsorted(a)
103111 ['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
104112
105 If you need to sort release candidates, please see
106 `this useful hack <http://natsort.readthedocs.io/en/master/examples.html#rc-sorting>`_.
113 If you need to versions that use a more complicated scheme, please see
114 `these examples <https://natsort.readthedocs.io/en/master/examples.html#rc-sorting>`_.
107115
108116 Sorting by Real Numbers (i.e. Signed Floats)
109117 ++++++++++++++++++++++++++++++++++++++++++++
110118
111 This is useful in scientific data analysis and was
119 This is useful in scientific data analysis (and was
112120 the default behavior of ``natsorted`` for ``natsort``
113 version < 4.0.0. Use the ``realsorted`` function:
114
115 .. code-block:: python
121 version < 4.0.0). Use the ``realsorted`` function:
122
123 .. code-block:: pycon
116124
117125 >>> from natsort import realsorted, ns
118126 >>> # Note that when interpreting as signed floats, the below numbers are
133141 separator is accounted for in the number.
134142 This can be achieved with the ``humansorted`` function:
135143
136 .. code-block:: python
144 .. code-block:: pycon
137145
138146 >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
139147 >>> natsorted(a)
149157
150158 You may find you need to explicitly set the locale to get this to work
151159 (as shown in the example).
152 Please see `locale issues <http://natsort.readthedocs.io/en/master/locale_issues.html>`_ and the
160 Please see `locale issues <https://natsort.readthedocs.io/en/master/locale_issues.html>`_ and the
153161 `Optional Dependencies`_ section below before using the ``humansorted`` function.
154162
155163 Further Customizing Natsort
159167 ``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the
160168 bitwise OR operator (``|``). For example,
161169
162 .. code-block:: python
170 .. code-block:: pycon
163171
164172 >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
165173 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE)
174182 True
175183
176184 All of the available customizations can be found in the documentation for
177 `the ns enum <http://natsort.readthedocs.io/en/master/ns_class.html>`_.
185 `the ns enum <https://natsort.readthedocs.io/en/master/api.html#natsort.ns>`_.
178186
179187 You can also add your own custom transformation functions with the ``key`` argument.
180188 These can be used with ``alg`` if you wish.
181189
182 .. code-block:: python
190 .. code-block:: pycon
183191
184192 >>> a = ['apple2.50', '2.3apple']
185193 >>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL)
191199 You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
192200 when you sort:
193201
194 .. code-block:: python
202 .. code-block:: pycon
195203
196204 >>> a = ['4.5', 6, 2.0, '5', 'a']
197205 >>> natsorted(a)
205213 ``natsort`` does not officially support the `bytes` type on Python 3, but
206214 convenience functions are provided that help you decode to `str` first:
207215
208 .. code-block:: python
216 .. code-block:: pycon
209217
210218 >>> from natsort import as_utf8
211219 >>> a = [b'a', 14.0, 'b']
228236 generate a custom sorting key to sort in-place using the ``list.sort``
229237 method.
230238
231 .. code-block:: python
239 .. code-block:: pycon
232240
233241 >>> from natsort import natsort_keygen
234242 >>> natsort_key = natsort_keygen()
247255
248256 - recursively descend into lists of lists
249257 - automatic unicode normalization of input data
250 - `controlling the case-sensitivity <http://natsort.readthedocs.io/en/master/examples.html#case-sort>`_
251 - `sorting file paths correctly <http://natsort.readthedocs.io/en/master/examples.html#path-sort>`_
252 - `allow custom sorting keys <http://natsort.readthedocs.io/en/master/examples.html#custom-sort>`_
258 - `controlling the case-sensitivity <https://natsort.readthedocs.io/en/master/examples.html#case-sort>`_
259 - `sorting file paths correctly <https://natsort.readthedocs.io/en/master/examples.html#path-sort>`_
260 - `allow custom sorting keys <https://natsort.readthedocs.io/en/master/examples.html#custom-sort>`_
253261
254262 FAQ
255263 ---
260268 exactly what is being done with their input using this key - it is highly recommended
261269 to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
262270 for *how* to debug, and also to review the
263 `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_
271 `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_
264272 page for *why* ``natsort`` is doing that to your data.
265273
266274 If you are trying to sort custom classes and running into trouble, please take a look at
271279 use the ``natsort`` key as part of your rich comparison operator definition.
272280
273281 How *does* ``natsort`` work?
274 If you don't want to read `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_,
282 If you don't want to read `How Does Natsort Work? <https://natsort.readthedocs.io/en/master/howitworks.html>`_,
275283 here is a quick primer.
276284
277285 ``natsort`` provides a `key function <https://docs.python.org/3/howto/sorting.html#key-functions>`_
281289 key generator ``natsort.natsort_keygen()``. ``natsort.natsorted()`` is essentially
282290 a wrapper for the following code:
283291
284 .. code-block:: python
292 .. code-block:: pycon
285293
286294 >>> from natsort import natsort_keygen
287295 >>> natsort_key = natsort_keygen()
315323 ------------
316324
317325 ``natsort`` comes with a shell script called ``natsort``, or can also be called
318 from the command line with ``python -m natsort``.
326 from the command line with ``python -m natsort``.
319327
320328 Requirements
321329 ------------
322330
323 ``natsort`` requires Python version 2.6 or greater or Python 3.3 or greater.
324 It may run on (but is not tested against) Python 3.2.
331 ``natsort`` requires Python version 2.7 or Python 3.4 or greater.
325332
326333 Optional Dependencies
327334 ---------------------
343350
344351 It is recommended that you install `PyICU <https://pypi.org/project/PyICU>`_
345352 if you wish to sort in a locale-dependent manner, see
346 http://natsort.readthedocs.io/en/master/locale_issues.html for an explanation why.
353 https://natsort.readthedocs.io/en/master/locale_issues.html for an explanation why.
347354
348355 Installation
349356 ------------
350357
351358 Use ``pip``!
352359
353 .. code-block:: sh
360 .. code-block:: console
354361
355362 $ pip install natsort
356363
360367 `fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for
361368 `PyICU <https://pypi.org/project/PyICU>`_.
362369
363 .. code-block:: sh
370 .. code-block:: console
364371
365372 # Install both optional dependencies.
366373 $ pip install natsort[fast,icu]
376383 After installing ``tox``, running tests is as simple as executing the following in the
377384 ``natsort`` directory:
378385
379 .. code-block:: sh
386 .. code-block:: console
380387
381388 $ tox
382389
383390 ``tox`` will create virtual a virtual environment for your tests and install all the
384391 needed testing requirements for you. You can specify a particular python version
385 with the ``-e`` flag, e.g. ``tox -e py36``.
386
387 If you do not wish to use ``tox``, you can install the testing dependencies and run the
388 tests manually using `pytest <https://docs.pytest.org/en/latest/>`_ - ``natsort``
389 contains a ``Pipfile`` for use with `pipenv <https://github.com/pypa/pipenv>`_ that
390 makes it easy for you to install the testing dependencies:
391
392 .. code-block:: sh
393
394 $ pipenv install --skip-lock --dev
395 $ pipenv run python -m pytest
392 with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``.
393 You can see all available testing environments with ``tox --listenvs``.
394
395 If you do not wish to use ``tox``, you can install the testing dependencies with the
396 ``dev-requirements.txt`` file and then run the tests manually using
397 `pytest <https://docs.pytest.org/en/latest/>`_.
398
399 .. code-block:: console
400
401 $ pip install -r dev-requirements.txt
402 $ python -m pytest
396403
397404 Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this is because
398405 `the former puts the CWD on sys.path <https://docs.pytest.org/en/latest/usage.html#calling-pytest-through-python-m-pytest>`_.
399406
407 How to Build Documentation
408 --------------------------
409
410 If you want to build the documentation for ``natsort``, it is recommended to use ``tox``:
411
412 .. code-block:: console
413
414 $ tox -e docs
415
416 This will place the documentation in ``build/sphinx/html``. If you do not
417 which to use ``tox``, you can do the following:
418
419 .. code-block:: console
420
421 $ pip install sphinx sphinx_rtd_theme
422 $ python setup.py build_sphinx
423
424 Deprecation Schedule
425 --------------------
426
427 Dropping Python 2.7 Support
428 +++++++++++++++++++++++++++
429
430 ``natsort`` version 7.0.0 will drop support for Python 2.7.
431
432 The version 6.X branch will remain as a "long term support" branch where bug fixes
433 are applied so that users who cannot update from Python 2.7 will not be forced to
434 use a buggy ``natsort`` version. Once version 7.0.0 is released, new features
435 will not be added to version 6.X, only bug fixes.
436
437 Deprecated APIs
438 +++++++++++++++
439
440 In ``natsort`` version 6.0.0, the following APIs and functions were removed
441
442 - ``number_type`` keyword argument (deprecated since 3.4.0)
443 - ``signed`` keyword argument (deprecated since 3.4.0)
444 - ``exp`` keyword argument (deprecated since 3.4.0)
445 - ``as_path`` keyword argument (deprecated since 3.4.0)
446 - ``py3_safe`` keyword argument (deprecated since 3.4.0)
447 - ``ns.TYPESAFE`` (deprecated since version 5.0.0)
448 - ``ns.DIGIT`` (deprecated since version 5.0.0)
449 - ``ns.VERSION`` (deprecated since version 5.0.0)
450 - ``versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
451 - ``index_versorted()`` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
452
453 In general, if you want to determine if you are using deprecated APIs you can run your
454 code with the following flag
455
456 .. code-block:: console
457
458 $ python -Wdefault::DeprecationWarning my-code.py
459
460 By default ``DeprecationWarnings`` are not shown, but this will cause them to be shown.
461 Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to
462 "default::DeprecationWarning" and then run your code.
463
464 Dropped Pipenv for Development
465 ++++++++++++++++++++++++++++++
466
467 ``natsort`` version 6.0.0 no longer uses `Pipenv <https://pipenv.readthedocs.io/en/latest/>`_
468 to install development dependencies.
469
470 Dropped Python 2.6 and 3.3 Support
471 ++++++++++++++++++++++++++++++++++
472
473 ``natsort`` version 6.0.0 dropped support for Python 2.6 and Python 3.3.
474
400475 Author
401476 ------
402477
405480 History
406481 -------
407482
408 Please visit the `changelog <http://natsort.readthedocs.io/en/master/changelog.html>`_.
483 Please visit the changelog
484 `on GitHub <https://github.com/SethMMorton/natsort/blob/master/CHANGELOG.rst>`_ or
485 `in the documentation <https://natsort.readthedocs.io/en/master/changelog.html>`_.
0 coverage
1 pytest >= 3.5
2 pytest-cov
3 pytest-mock >= 1.1
4 hypothesis >= 3.8.0
5 pytest-faulthandler; platform_python_implementation == 'CPython'
6 semver
7 # These packages are standard on newer python versions.
8 pathlib; python_version < '3.4'
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _api:
4
5 natsort API
6 ===========
7
8 .. contents::
9 :local:
10
11 Standard API
12 ------------
13
14 :func:`~natsort.natsorted`
15 ++++++++++++++++++++++++++
16
17 .. autofunction:: natsorted
18
19 The :class:`~natsort.ns` enum
20 +++++++++++++++++++++++++++++
21
22 .. autodata:: ns
23 :annotation:
24
25 :func:`~natsort.natsort_key`
26 ++++++++++++++++++++++++++++
27
28 .. autofunction:: natsort_key
29
30 :func:`~natsort.natsort_keygen`
31 +++++++++++++++++++++++++++++++
32
33 .. autofunction:: natsort_keygen
34
35 Convenience Functions
36 ---------------------
37
38 :func:`~natsort.realsorted`
39 +++++++++++++++++++++++++++
40
41 .. autofunction:: realsorted
42
43 :func:`~natsort.humansorted`
44 ++++++++++++++++++++++++++++
45
46 .. autofunction:: humansorted
47
48 :func:`~natsort.index_natsorted`
49 ++++++++++++++++++++++++++++++++
50
51 .. autofunction:: index_natsorted
52
53 :func:`~natsort.index_realsorted`
54 +++++++++++++++++++++++++++++++++
55
56 .. autofunction:: index_realsorted
57
58 :func:`~natsort.index_humansorted`
59 ++++++++++++++++++++++++++++++++++
60
61 .. autofunction:: index_humansorted
62
63 :func:`~natsort.order_by_index`
64 +++++++++++++++++++++++++++++++
65
66 .. autofunction:: order_by_index
67
68 .. _bytes_help:
69
70 Help With Bytes On Python 3
71 +++++++++++++++++++++++++++
72
73 The official stance of :mod:`natsort` is to not support `bytes` for
74 sorting; there is just too much that can go wrong when trying to automate
75 conversion between `bytes` and `str`. But rather than completely give up
76 on `bytes`, :mod:`natsort` provides three functions that make it easy to
77 quickly decode `bytes` to `str` so that sorting is possible.
78
79 .. autofunction:: decoder
80
81 .. autofunction:: as_ascii
82
83 .. autofunction:: as_utf8
84
85 .. _function_help:
86
87 Help With Creating Function Keys
88 ++++++++++++++++++++++++++++++++
89
90 If you need to create a complicated *key* argument to (for example)
91 :func:`natsorted` that is actually multiple functions called one after the other,
92 the following function can help you easily perform this action. It is
93 used internally to :mod:`natsort`, and has been exposed publically for
94 the convenience of the user.
95
96 .. autofunction:: chain_functions
0 .. _changelog:
1
2 Changelog
3 ---------
4
5 .. include:: ../CHANGELOG.rst
0 # -*- coding: utf-8 -*-
1 #
2 # natsort documentation build configuration file, created by
3 # sphinx-quickstart on Thu Jul 17 21:01:29 2014.
4 #
5 # This file is execfile()d with the current directory set to its
6 # containing dir.
7 #
8 # Note that not all possible configuration values are present in this
9 # autogenerated file.
10 #
11 # All configuration values have a default; values that are commented out
12 # serve to show the default.
13
14 import os
15
16 # If extensions (or modules to document with autodoc) are in another directory,
17 # add these directories to sys.path here. If the directory is relative to the
18 # documentation root, use os.path.abspath to make it absolute, like shown here.
19 # sys.path.insert(0, os.path.abspath('.'))
20
21 # -- General configuration ------------------------------------------------
22
23 # If your documentation needs a minimal Sphinx version, state it here.
24 # needs_sphinx = '1.0'
25
26 # Add any Sphinx extension module names here, as strings. They can be
27 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
28 # ones.
29 extensions = [
30 'sphinx.ext.autodoc',
31 'sphinx.ext.autosummary',
32 'sphinx.ext.intersphinx',
33 'sphinx.ext.mathjax',
34 'sphinx.ext.napoleon',
35 ]
36
37 # Add any paths that contain templates here, relative to this directory.
38 templates_path = ['_templates']
39
40 # The suffix of source filenames.
41 source_suffix = '.rst'
42
43 # The encoding of source files.
44 # source_encoding = 'utf-8-sig'
45
46 # The master toctree document.
47 master_doc = 'index'
48
49 # General information about the project.
50 project = u'natsort'
51 # noinspection PyShadowingBuiltins
52 copyright = u'2014, Seth M. Morton'
53
54 # The version info for the project you're documenting, acts as replacement for
55 # |version| and |release|, also used in various other places throughout the
56 # built documents.
57 #
58 # The full version, including alpha/beta/rc tags.
59 release = '6.0.0'
60 # The short X.Y version.
61 version = '.'.join(release.split('.')[0:2])
62
63 # The language for content autogenerated by Sphinx. Refer to documentation
64 # for a list of supported languages.
65 # language = None
66
67 # There are two options for replacing |today|: either, you set today to some
68 # non-false value, then it is used:
69 # today = ''
70 # Else, today_fmt is used as the format for a strftime call.
71 # today_fmt = '%B %d, %Y'
72
73 # List of patterns, relative to source directory, that match files and
74 # directories to ignore when looking for source files.
75 # exclude_patterns = ['solar/*']
76
77 # The reST default role (used for this markup: `text`) to use for all
78 # documents.
79 # default_role = None
80
81 # If true, '()' will be appended to :func: etc. cross-reference text.
82 # add_function_parentheses = True
83
84 # If true, the current module name will be prepended to all description
85 # unit titles (such as .. function::).
86 # add_module_names = True
87
88 # If true, sectionauthor and moduleauthor directives will be shown in the
89 # output. They are ignored by default.
90 # show_authors = False
91
92 # The name of the Pygments (syntax highlighting) style to use.
93 pygments_style = 'sphinx'
94 highlight_language = 'python'
95
96 # A list of ignored prefixes for module index sorting.
97 # modindex_common_prefix = []
98
99 # If true, keep warnings as "system message" paragraphs in the built documents.
100 # keep_warnings = False
101
102
103 # -- Options for HTML output ----------------------------------------------
104
105 # The theme to use for HTML and HTML Help pages. See the documentation for
106 # a list of builtin themes.
107 on_rtd = os.environ.get('READTHEDOCS') == 'True'
108 if on_rtd:
109 html_theme = 'default'
110 else:
111 import sphinx_rtd_theme
112
113 html_theme = 'sphinx_rtd_theme'
114 # html_theme = 'solar'
115
116 # Theme options are theme-specific and customize the look and feel of a theme
117 # further. For a list of options available for each theme, see the
118 # documentation.
119 # html_theme_options = {}
120
121 # Add any paths that contain custom themes here, relative to this directory.
122 html_theme_path = ['.']
123
124 # The name for this set of Sphinx documents. If None, it defaults to
125 # "<project> v<release> documentation".
126 # html_title = None
127
128 # A shorter title for the navigation bar. Default is the same as html_title.
129 # html_short_title = None
130
131 # The name of an image file (relative to this directory) to place at the top
132 # of the sidebar.
133 # html_logo = None
134
135 # The name of an image file (within the static path) to use as favicon of the
136 # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
137 # pixels large.
138 # html_favicon = None
139
140 # Add any paths that contain custom static files (such as style sheets) here,
141 # relative to this directory. They are copied after the builtin static files,
142 # so a file named "default.css" will overwrite the builtin "default.css".
143 # html_static_path = ['_static']
144
145 # Add any extra paths that contain custom files (such as robots.txt or
146 # .htaccess) here, relative to this directory. These files are copied
147 # directly to the root of the documentation.
148 # html_extra_path = []
149
150 # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
151 # using the given strftime format.
152 # html_last_updated_fmt = '%b %d, %Y'
153
154 # If true, SmartyPants will be used to convert quotes and dashes to
155 # typographically correct entities.
156 # html_use_smartypants = True
157
158 # Custom sidebar templates, maps document names to template names.
159 # html_sidebars = {}
160
161 # Additional templates that should be rendered to pages, maps page names to
162 # template names.
163 # html_additional_pages = {}
164
165 # If false, no module index is generated.
166 # html_domain_indices = True
167
168 # If false, no index is generated.
169 # html_use_index = True
170
171 # If true, the index is split into individual pages for each letter.
172 # html_split_index = False
173
174 # If true, links to the reST sources are added to the pages.
175 # html_show_sourcelink = True
176
177 # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
178 # html_show_sphinx = True
179
180 # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
181 # html_show_copyright = True
182
183 # If true, an OpenSearch description file will be output, and all pages will
184 # contain a <link> tag referring to it. The value of this option must be the
185 # base URL from which the finished HTML is served.
186 # html_use_opensearch = ''
187
188 # This is the file name suffix for HTML files (e.g. ".xhtml").
189 # html_file_suffix = None
190
191 # Output file base name for HTML help builder.
192 htmlhelp_basename = 'natsortdoc'
193
194 # -- Options for LaTeX output ---------------------------------------------
195
196 latex_elements = {
197 # The paper size ('letterpaper' or 'a4paper').
198 # 'papersize': 'letterpaper',
199
200 # The font size ('10pt', '11pt' or '12pt').
201 # 'pointsize': '10pt',
202
203 # Additional stuff for the LaTeX preamble.
204 # 'preamble': '',
205 }
206
207 # Grouping the document tree into LaTeX files. List of tuples
208 # (source start file, target name, title,
209 # author, documentclass [howto, manual, or own class]).
210 latex_documents = [
211 ('index', 'natsort.tex', u'natsort Documentation',
212 u'Seth M. Morton', 'manual'),
213 ]
214
215 # The name of an image file (relative to this directory) to place at the top of
216 # the title page.
217 # latex_logo = None
218
219 # For "manual" documents, if this is true, then toplevel headings are parts,
220 # not chapters.
221 # latex_use_parts = False
222
223 # If true, show page references after internal links.
224 # latex_show_pagerefs = False
225
226 # If true, show URL addresses after external links.
227 # latex_show_urls = False
228
229 # Documents to append as an appendix to all manuals.
230 # latex_appendices = []
231
232 # If false, no module index is generated.
233 # latex_domain_indices = True
234
235
236 # -- Options for manual page output ---------------------------------------
237
238 # One entry per manual page. List of tuples
239 # (source start file, name, description, authors, manual section).
240 man_pages = [
241 ('index', 'natsort', u'natsort Documentation',
242 [u'Seth M. Morton'], 1)
243 ]
244
245 # If true, show URL addresses after external links.
246 # man_show_urls = False
247
248
249 # -- Options for Texinfo output -------------------------------------------
250
251 # Grouping the document tree into Texinfo files. List of tuples
252 # (source start file, target name, title, author,
253 # dir menu entry, description, category)
254 texinfo_documents = [
255 ('index', 'natsort', u'natsort Documentation',
256 u'Seth M. Morton', 'natsort', 'One line description of project.',
257 'Miscellaneous'),
258 ]
259
260 # Documents to append as an appendix to all manuals.
261 # texinfo_appendices = []
262
263 # If false, no module index is generated.
264 # texinfo_domain_indices = True
265
266 # How to display URL addresses: 'footnote', 'no', or 'inline'.
267 # texinfo_show_urls = 'footnote'
268
269 # If true, do not generate a @detailmenu in the "Top" node's menu.
270 # texinfo_no_detailmenu = False
271
272
273 # Example configuration for intersphinx: refer to the Python standard library.
274 intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _examples:
4
5 Examples and Recipes
6 ====================
7
8 If you want more detailed examples than given on this page, please see
9 https://github.com/SethMMorton/natsort/tree/master/tests.
10
11 .. contents::
12 :local:
13
14 Basic Usage
15 -----------
16
17 In the most basic use case, simply import :func:`~natsorted` and use
18 it as you would :func:`sorted`:
19
20 .. code-block:: pycon
21
22 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
23 >>> sorted(a)
24 ['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
25 >>> from natsort import natsorted, ns
26 >>> natsorted(a)
27 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
28
29 Sort Version Numbers
30 --------------------
31
32 As of :mod:`natsort` version >= 4.0.0, :func:`~natsorted` will work for
33 well-behaved version numbers, like ``MAJOR.MINOR.PATCH``.
34
35 .. _rc_sorting:
36
37 Sorting More Expressive Versioning Schemes
38 ++++++++++++++++++++++++++++++++++++++++++
39
40 By default, if you wish to sort versions that are not as simple as
41 ``MAJOR.MINOR.PATCH`` (or similar), you may not get the results you expect:
42
43 .. code-block:: pycon
44
45 >>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta1', '1.2alpha', '1.2.1', '1.1', '1.3']
46 >>> natsorted(a)
47 ['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3']
48
49 To make the '1.2' pre-releases come before '1.2.1', you need to use the following
50 recipe:
51
52 .. code-block:: pycon
53
54 >>> natsorted(a, key=lambda x: x.replace('.', '~'))
55 ['1.1', '1.2', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2.1', '1.3']
56
57 If you also want '1.2' after all the alpha, beta, and rc candidates, you can
58 modify the above recipe:
59
60 .. code-block:: pycon
61
62 >>> natsorted(a, key=lambda x: x.replace('.', '~')+'z')
63 ['1.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2', '1.2.1', '1.3']
64
65 Please see `this issue <https://github.com/SethMMorton/natsort/issues/13>`_ to
66 see why this works.
67
68 Sorting Rigorously Defined Versioning Schemes (e.g. SemVer or PEP 440)
69 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
70
71 If you know you are using a versioning scheme that follows a well-defined format
72 for which there is third-party module support, you should use those modules
73 to assist in sorting. Some examples might be
74 `PEP 440 <https://packaging.pypa.io/en/latest/version>`_ or
75 `SemVer <https://python-semver.readthedocs.io/en/latest/api.html>`_.
76
77 If we are being honest, using these methods to parse a version means you don't
78 need to use :mod:`natsort` - you should probably just use :func:`sorted` directly.
79 Here's an example with SemVer:
80
81 .. code-block:: pycon
82
83 >>> from semver import parse_version_info
84 >>> a = ['3.4.5-pre.1', '3.4.5', '3.4.5-pre.2+build.4']
85 >>> sorted(a, key=parse_version_info)
86 ['3.4.5-pre.1', '3.4.5-pre.2+build.4', '3.4.5']
87
88 .. _path_sort:
89
90 Sort OS-Generated Paths
91 -----------------------
92
93 In some cases when sorting file paths with OS-Generated names, the default
94 :mod:`~natsorted` algorithm may not be sufficient. In cases like these,
95 you may need to use the ``ns.PATH`` option:
96
97 .. code-block:: pycon
98
99 >>> a = ['./folder/file (1).txt',
100 ... './folder/file.txt',
101 ... './folder (1)/file.txt',
102 ... './folder (10)/file.txt']
103 >>> natsorted(a)
104 ['./folder (1)/file.txt', './folder (10)/file.txt', './folder/file (1).txt', './folder/file.txt']
105 >>> natsorted(a, alg=ns.PATH)
106 ['./folder/file.txt', './folder/file (1).txt', './folder (1)/file.txt', './folder (10)/file.txt']
107
108 Locale-Aware Sorting (Human Sorting)
109 ------------------------------------
110
111 .. note::
112 Please read :ref:`locale_issues` before using ``ns.LOCALE``, :func:`humansorted`,
113 or :func:`index_humansorted`.
114
115 You can instruct :mod:`natsort` to use locale-aware sorting with the
116 ``ns.LOCALE`` option. In addition to making this understand non-ASCII
117 characters, it will also properly interpret non-'.' decimal separators
118 and also properly order case. It may be more convenient to just use
119 the :func:`humansorted` function:
120
121 .. code-block:: pycon
122
123 >>> from natsort import humansorted
124 >>> import locale
125 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
126 'en_US.UTF-8'
127 >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
128 >>> natsorted(a, alg=ns.LOCALE)
129 ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
130 >>> humansorted(a)
131 ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
132
133 You may find that if you do not explicitly set the locale your results may not
134 be as you expect... I have found that it depends on the system you are on.
135 If you use `PyICU <https://pypi.org/project/PyICU>`_ (see below) then
136 you should not need to do this.
137
138 .. _case_sort:
139
140 Controlling Case When Sorting
141 -----------------------------
142
143 For non-numbers, by default :mod:`natsort` used ordinal sorting (i.e.
144 it sorts by the character's value in the ASCII table). For example:
145
146 .. code-block:: pycon
147
148 >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
149 >>> natsorted(a)
150 ['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
151
152 There are times when you wish to ignore the case when sorting,
153 you can easily do this with the ``ns.IGNORECASE`` option:
154
155 .. code-block:: pycon
156
157 >>> natsorted(a, alg=ns.IGNORECASE)
158 ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
159
160 Note thats since Python's sorting is stable, the order of equivalent
161 elements after lowering the case is the same order they appear in the
162 original list.
163
164 Upper-case letters appear first in the ASCII table, but many natural
165 sorting methods place lower-case first. To do this, use
166 ``ns.LOWERCASEFIRST``:
167
168 .. code-block:: pycon
169
170 >>> natsorted(a, alg=ns.LOWERCASEFIRST)
171 ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
172
173 It may be undesirable to have the upper-case letters grouped together
174 and the lower-case letters grouped together; most would expect all
175 "a"s to bet together regardless of case, and all "b"s, and so on. To
176 achieve this, use ``ns.GROUPLETTERS``:
177
178 .. code-block:: pycon
179
180 >>> natsorted(a, alg=ns.GROUPLETTERS)
181 ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
182
183 You might combine this with ``ns.LOWERCASEFIRST`` to get what most
184 would expect to be "natural" sorting:
185
186 .. code-block:: pycon
187
188 >>> natsorted(a, alg=ns.G | ns.LF)
189 ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
190
191 Customizing Float Definition
192 ----------------------------
193
194 You can make :func:`~natsorted` search for any float that would be
195 a valid Python float literal, such as 5, 0.4, -4.78, +4.2E-34, etc.
196 using the ``ns.FLOAT`` key. You can disable the exponential component
197 of the number with ``ns.NOEXP``.
198
199 .. code-block:: pycon
200
201 >>> a = ['a50', 'a51.', 'a+50.4', 'a5.034e1', 'a+50.300']
202 >>> natsorted(a, alg=ns.FLOAT)
203 ['a50', 'a5.034e1', 'a51.', 'a+50.300', 'a+50.4']
204 >>> natsorted(a, alg=ns.FLOAT | ns.SIGNED)
205 ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
206 >>> natsorted(a, alg=ns.FLOAT | ns.SIGNED | ns.NOEXP)
207 ['a5.034e1', 'a50', 'a+50.300', 'a+50.4', 'a51.']
208
209 For convenience, the ``ns.REAL`` option is provided which is a shortcut
210 for ``ns.FLOAT | ns.SIGNED`` and can be used to sort on real numbers.
211 This can be easily accessed with the :func:`~realsorted` convenience
212 function. Please note that the behavior of the :func:`~realsorted` function
213 was the default behavior of :func:`~natsorted` for :mod:`natsort`
214 version < 4.0.0:
215
216 .. code-block:: pycon
217
218 >>> natsorted(a, alg=ns.REAL)
219 ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
220 >>> from natsort import realsorted
221 >>> realsorted(a)
222 ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
223
224 .. _custom_sort:
225
226 Using a Custom Sorting Key
227 --------------------------
228
229 Like the built-in ``sorted`` function, ``natsorted`` can accept a custom
230 sort key so that:
231
232 .. code-block:: pycon
233
234 >>> from operator import attrgetter, itemgetter
235 >>> a = [['a', 'num4'], ['b', 'num8'], ['c', 'num2']]
236 >>> natsorted(a, key=itemgetter(1))
237 [['c', 'num2'], ['a', 'num4'], ['b', 'num8']]
238 >>> class Foo:
239 ... def __init__(self, bar):
240 ... self.bar = bar
241 ... def __repr__(self):
242 ... return "Foo('{}')".format(self.bar)
243 >>> b = [Foo('num3'), Foo('num5'), Foo('num2')]
244 >>> natsorted(b, key=attrgetter('bar'))
245 [Foo('num2'), Foo('num3'), Foo('num5')]
246
247 Generating a Natsort Key
248 ------------------------
249
250 If you need to sort a list in-place, you cannot use :func:`~natsorted`; you
251 need to pass a key to the :meth:`list.sort` method. The function
252 :func:`~natsort_keygen` is a convenient way to generate these keys for you:
253
254 .. code-block:: pycon
255
256 >>> from natsort import natsort_keygen
257 >>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300']
258 >>> natsort_key = natsort_keygen(alg=ns.FLOAT)
259 >>> a.sort(key=natsort_key)
260 >>> a
261 ['a50', 'a50.300', 'a5.034e1', 'a50.4', 'a51.']
262
263 :func:`~natsort_keygen` has the same API as :func:`~natsorted` (minus the
264 `reverse` option).
265
266 Natural Sorting with ``cmp`` (Python 2 only)
267 --------------------------------------------
268
269 .. note::
270 This is a Python2-only feature! The :func:`natcmp` function is not
271 exposed on Python3. Because this documentation is built with
272 Python3, you will not find :func:`natcmp` in the API.
273
274 If you are using a legacy codebase that requires you to use :func:`cmp` instead
275 of a key-function, you can use :func:`~natcmp`.
276
277 .. code-block:: pycon
278
279 >>> import sys
280 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
281 >>> if sys.version_info[0] == 2:
282 ... from natsort import natcmp
283 ... sorted(a, cmp=natcmp)
284 ... else:
285 ... natsorted(a) # so docstrings don't fail
286 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
287
288 :func:`natcmp` also accepts an ``alg`` argument so you can customize your
289 sorting experience.
290
291 Sorting Multiple Lists According to a Single List
292 -------------------------------------------------
293
294 Sometimes you have multiple lists, and you want to sort one of those
295 lists and reorder the other lists according to how the first was sorted.
296 To achieve this you could use the :func:`~index_natsorted` in combination
297 with the convenience function
298 :func:`~order_by_index`:
299
300 .. code-block:: pycon
301
302 >>> from natsort import index_natsorted, order_by_index
303 >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
304 >>> b = [4, 5, 6, 7, 8]
305 >>> c = ['hi', 'lo', 'ah', 'do', 'up']
306 >>> index = index_natsorted(a)
307 >>> order_by_index(a, index)
308 ['a1', 'a2', 'a4', 'a9', 'a10']
309 >>> order_by_index(b, index)
310 [6, 4, 7, 5, 8]
311 >>> order_by_index(c, index)
312 ['ah', 'hi', 'do', 'lo', 'up']
313
314 Returning Results in Reverse Order
315 ----------------------------------
316
317 Just like the :func:`sorted` built-in function, you can supply the
318 ``reverse`` option to return the results in reverse order:
319
320 .. code-block:: pycon
321
322 >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
323 >>> natsorted(a, reverse=True)
324 ['a10', 'a9', 'a4', 'a2', 'a1']
325
326 Sorting Bytes on Python 3
327 -------------------------
328
329 Python 3 is rather strict about comparing strings and bytes, and this
330 can make it difficult to deal with collections of both. Because of the
331 challenge of guessing which encoding should be used to decode a bytes
332 array to a string, :mod:`natsort` does *not* try to guess and automatically
333 convert for you; in fact, the official stance of :mod:`natsort` is to
334 not support sorting bytes. Instead, some decoding convenience functions
335 have been provided to you (see :ref:`bytes_help`) that allow you to
336 provide a codec for decoding bytes through the ``key`` argument that
337 will allow :mod:`natsort` to convert byte arrays to strings for sorting;
338 these functions know not to raise an error if the input is not a byte
339 array, so you can use the key on any arbitrary collection of data.
340
341 .. code-block:: pycon
342
343 >>> from natsort import as_ascii
344 >>> a = [b'a', 14.0, 'b']
345 >>> # On Python 2, natsorted(a) would would work as expected.
346 >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
347 >>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
348 True
349
350 Additionally, regular expressions cannot be run on byte arrays, making it
351 so that :mod:`natsort` cannot parse them for numbers. As a result, if you
352 run :mod:`natsort` on a list of bytes, you will get results that are like
353 Python's default sorting behavior. Of course, you can use the decoding
354 functions to solve this:
355
356 .. code-block:: pycon
357
358 >>> from natsort import as_utf8
359 >>> a = [b'a56', b'a5', b'a6', b'a40']
360 >>> natsorted(a) # doctest: +SKIP
361 [b'a40', b'a5', b'a56', b'a6']
362 >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
363 True
364
365 If you need a codec different from ASCII or UTF-8, you can use
366 :func:`decoder` to generate a custom key:
367
368 .. code-block:: pycon
369
370 >>> from natsort import decoder
371 >>> a = [b'a56', b'a5', b'a6', b'a40']
372 >>> natsorted(a, key=decoder('latin1')) == [b'a5', b'a6', b'a40', b'a56']
373 True
374
375 Sorting a Pandas DataFrame
376 --------------------------
377
378 As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` argument,
379 so you cannot simply pass :func:`natsort_keygen` to a Pandas DataFrame and sort.
380 This request has been made to the Pandas devs; see
381 `issue 3942 <https://github.com/pydata/pandas/issues/3942>`_ if you are interested.
382 If you need to sort a Pandas DataFrame, please check out
383 `this answer on StackOverflow <https://stackoverflow.com/a/29582718/1399279>`_
384 for ways to do this without the ``key`` argument to ``sort``.
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _howitworks:
4
5 How Does Natsort Work?
6 ======================
7
8 .. contents::
9 :local:
10
11 :mod:`natsort` works by breaking strings into smaller sub-components (numbers
12 or everything else), and returning these components in a tuple. Sorting
13 tuples in Python is well-defined, and this fact is used to sort the input
14 strings properly. But how does one break a string into sub-components?
15 And what does one do to those components once they are split? Below I
16 will explain the algorithm that was chosen for the :mod:`natsort` module,
17 and some of the thinking that went into those design decisions. I will
18 also mention some of the stumbling blocks I ran into because
19 `getting sorting right is surprisingly hard`_.
20
21 If you are impatient, you can skip to :ref:`tldr1` for the algorithm
22 in the simplest case, and :ref:`tldr2`
23 to see what extra code is needed to handle special cases.
24
25 First, How Does Natural Sorting Work At a High Level?
26 -----------------------------------------------------
27
28 If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following
29
30 .. code-block:: pycon
31
32 >>> '2 ft 7 in' < '2 ft 11 in'
33 False
34
35 We as humans know that the above should be true, but why does Python think it
36 is false? Here is how it is performing the comparison:
37
38 .. code-block:: none
39
40 '2' <=> '2' ==> equal, so keep going
41 ' ' <=> ' ' ==> equal, so keep going
42 'f' <=> 'f' ==> equal, so keep going
43 't' <=> 't' ==> equal, so keep going
44 ' ' <=> ' ' ==> equal, so keep going
45 '7' <=> '1' ==> different, use result of '7' < '1'
46
47 '7' evaluates as greater than '1' so the statement is false. When sorting, if
48 a value is less than another it is placed first, so in our above example
49 '2 ft 11 in' would end up before '2 ft 7 in', which is not correct. What to do?
50
51 The best way to handle this is to break the string into sub-components
52 of numbers and non-numbers, and then convert the numeric parts into
53 :func:`float` or :func:`int` types. This will force Python to
54 actually understand the context of what it is sorting and then "do the
55 right thing." Luckily, it handles sorting lists of strings right out-of-the-box,
56 so the only hard part is actually making this string-to-list transformation
57 and then Python will handle the rest.
58
59 .. code-block:: none
60
61 '2 ft 7 in' ==> (2, ' ft ', 7, ' in')
62 '2 ft 11 in' ==> (2, ' ft ', 11, ' in')
63
64 When Python compares the two, it roughly follows the below logic:
65
66 .. code-block:: none
67
68 2 <=> 2 ==> equal, so keep going
69 ' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually
70 ||
71 -->
72 ' ' <=> ' ' ==> equal, so keep going
73 'f' <=> 'f' ==> equal, so keep going
74 't' <=> 't' ==> equal, so keep going
75 ' ' <=> ' ' ==> equal, so keep going
76 <== Back to parent sequence
77 7 <=> 11 ==> different, use the result of 7 < 11
78
79 Clearly, seven is less than eleven, so our comparison is as we expect, and we
80 would get the sorting order we wanted.
81
82 At its heart, :mod:`natsort` is simply a tool to break strings into tuples,
83 turning numbers in strings (i.e. ``'79'``) into *ints* and *floats* as it does this.
84
85 Natsort's Approach
86 ------------------
87
88 .. contents::
89 :local:
90
91 Decomposing Strings Into Sub-Components
92 +++++++++++++++++++++++++++++++++++++++
93
94 The first major hurtle to overcome is to decompose the string into sub-components.
95 Remarkably, this turns out to be the easy part, owing mostly to Python's easy access
96 to regular expressions. Breaking an arbitrary string based on a pattern is pretty
97 straightforward.
98
99 .. code-block:: pycon
100
101 >>> import re
102 >>> re.split(r'(\d+)', '2 ft 11 in')
103 ['', '2', ' ft ', '11', ' in']
104
105 Clear (assuming you can read regular expressions) and concise.
106
107 The reason I began developing :mod:`natsort` in the first place was because I
108 needed to handle the natural sorting of strings containing *real numbers*, not just
109 unsigned integers as the above example contains. By real numbers, I mean those like
110 ``-45.4920E-23``. :mod:`natsort` can handle just about any number definition;
111 to that end, here are all the regular expressions used in :mod:`natsort`:
112
113 .. code-block:: pycon
114
115 >>> unsigned_int = r'([0-9]+)'
116 >>> signed_int = r'([-+]?[0-9]+)'
117 >>> unsigned_float = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
118 >>> signed_float = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
119 >>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+))'
120 >>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+))'
121
122 Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float definition because you
123 wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``,
124 Let's see an example:
125
126 .. code-block:: pycon
127
128 >>> re.split(signed_float, 'The mass of 3 electrons is 2.732815068E-30 kg')
129 ['The mass of ', '3', ' electrons is ', '2.732815068E-30', ' kg']
130
131 .. note::
132
133 It is a bit of a lie to say the above are the complete regular expressions. In the
134 actual code there is also handling for non-ASCII unicode characters (such as ⑦),
135 but I will ignore that aspect of :mod:`natsort` in this discussion.
136
137 Now, when the user wants to change the definition of a number, it is as easy as changing
138 the pattern supplied to the regular expression engine.
139
140 Choosing the right default is hard, though (well, in this case it shouldn't have been
141 but I was rather thick-headed).
142 In retrospect, it should have been obvious that since essentially all the code examples
143 I had/have seen for natural sorting were for *unsigned integers*, I should have made the default
144 definition of a number an *unsigned integer*. But, in the brash days of my youth I assumed
145 that since my use case was real numbers, everyone else would be happier sorting by real numbers;
146 so, I made the default definition of a number a *signed float with exponent*.
147 `This astonished`_ `a lot`_ `of people`_
148 (`and some people aren't very nice when they are astonished`_).
149 Starting with :mod:`natsort` version 4.0.0 the default number definition was
150 changed to an *unsigned integer* which satisfies the "least astonishment" principle, and
151 I have not heard a complaint since.
152
153 Coercing Strings Containing Numbers Into Numbers
154 ++++++++++++++++++++++++++++++++++++++++++++++++
155
156 There has been some debate on Stack Overflow as to what method is best to
157 coerce a string to a number if it can be coerced, and leaving it alone otherwise
158 (see `this one for coercion`_ and `this one for checking`_ for some high traffic questions),
159 but it mostly boils down to two different solutions, shown here:
160
161 .. code-block:: pycon
162
163 >>> def coerce_try_except(x):
164 ... try:
165 ... return int(x)
166 ... except ValueError:
167 ... return x
168 ...
169 >>> def coerce_regex(x):
170 ... # Note that precompiling the regex is more performant,
171 ... # but I do not show that here for clarity's sake.
172 ... return int(x) if re.match(r'[-+]?\d+$', x) else x
173 ...
174
175 Here are some timing results run on my machine:
176
177 .. code-block:: pycon
178
179 In [0]: numbers = list(map(str, range(100))) # A list of numbers as strings
180
181 In [1]: not_numbers = ['banana' + x for x in numbers]
182
183 In [2]: %timeit [coerce_try_except(x) for x in numbers]
184 10000 loops, best of 3: 51.1 µs per loop
185
186 In [3]: %timeit [coerce_try_except(x) for x in not_numbers]
187 1000 loops, best of 3: 289 µs per loop
188
189 In [4]: %timeit [coerce_regex(x) for x in not_numbers]
190 10000 loops, best of 3: 67.6 µs per loop
191
192 In [5]: %timeit [coerce_regex(x) for x in numbers]
193 10000 loops, best of 3: 123 µs per loop
194
195 What can we learn from this? The ``try: except`` method (arguably the most "pythonic"
196 of the solutions) is best for numeric input, but performs over 5X slower for non-numeric
197 input. Conversely, the regular expression method, though slower than ``try: except`` for
198 both input types, is more efficient for non-numeric input than for input that can be
199 converted to an ``int``. Further, even though the regular expression method is slower
200 for both input types, it is always at least twice as fast as the worst case for the
201 ``try: except``.
202
203 Why do I care? Shouldn't I just pick a method and not worry about it? Probably. However,
204 I am very conscious about the performance of :mod:`natsort`, and want it to be a true
205 drop-in replacement for :func:`sorted` without having to incur a performance penalty.
206 For the purposes of :mod:`natsort`, there is no clear winner between the two algorithms -
207 the data being passed to this function will likely be a mix of numeric and non-numeric
208 string content. Do I use the ``try: except`` method and hope the speed gains on
209 numbers will offset the non-number performance, or do I use regular expressions and
210 take the more stable performance?
211
212 It turns out that within the context of :mod:`natsort`, some assumptions can be
213 made that make a hybrid approach attractive. Because all strings are pre-split
214 into numeric and non-numeric content *before* being passed to this coercion function,
215 the assumption can be made that *if a string begins with a digit or a sign, it
216 can be coerced into a number*.
217
218 .. code-block:: pycon
219
220 >>> def coerce_to_int(x):
221 ... if x[0] in '0123456789+-':
222 ... try:
223 ... return int(x)
224 ... except ValueError:
225 ... return x
226 ... else:
227 ... return x
228 ...
229
230 So how does this perform compared to the standard coercion methods?
231
232 .. code-block:: pycon
233
234 In [6]: %timeit [coerce_to_int(x) for x in numbers]
235 10000 loops, best of 3: 71.6 µs per loop
236
237 In [7]: %timeit [coerce_to_int(x) for x in not_numbers]
238 10000 loops, best of 3: 26.4 µs per loop
239
240 The hybrid method eliminates most of the time wasted on numbers checking that it
241 is in fact a number before passing to :func:`int`, and eliminates the time wasted
242 in the exception stack for input that is not a number.
243
244 That's as fast as we can get, right? In pure Python, probably. At least, it's
245 close. But because I am crazy and a glutton for punishment, I decided to see
246 if I could get any faster writing a C extension. It's called
247 `fastnumbers`_ and contains a C implementation of the above coercion functions
248 called :func:`fast_int`. How does it fair? Pretty well.
249
250 .. code-block:: pycon
251
252 In [8]: %timeit [fast_int(x) for x in numbers]
253 10000 loops, best of 3: 30.9 µs per loop
254
255 In [9]: %timeit [fast_int(x) for x in not_numbers]
256 10000 loops, best of 3: 30 µs per loop
257
258 During development of :mod:`natsort`, I wanted to ensure that using it did not
259 get in the way of a user's program by introducing a performance penalty to their code.
260 To that end, I do not feel like my adventures down the rabbit hole of optimization
261 of coercion functions was a waste; I can confidently look users in the eye and
262 say I considered every option in ensuring :mod:`natsort` is as efficient as possible.
263 This is why if `fastnumbers`_ is installed it will be used for this step,
264 and otherwise the hybrid method will be used.
265
266 .. note::
267
268 Modifying the hybrid coercion function for floats is straightforward.
269
270 .. code-block:: pycon
271
272 >>> def coerce_to_float(x):
273 ... if x[0] in '.0123456789+-' or x.lower().lstrip()[:3] in ('nan', 'inf'):
274 ... try:
275 ... return float(x)
276 ... except ValueError:
277 ... return x
278 ... else:
279 ... return x
280 ...
281
282 .. _tldr1:
283
284 TL;DR 1 - The Simple "No Special Cases" Algorithm
285 +++++++++++++++++++++++++++++++++++++++++++++++++
286
287 At this point, our :mod:`natsort` algorithm is essentially the following:
288
289 .. code-block:: pycon
290
291 >>> import re
292 >>> def natsort_key(x, as_float=False, signed=False):
293 ... if as_float:
294 ... regex = signed_float if signed else unsigned_float
295 ... else:
296 ... regex = signed_int if signed else unsigned_int
297 ... split_input = re.split(regex, x)
298 ... split_input = filter(None, split_input) # removes null strings
299 ... coerce = coerce_to_float if as_float else coerce_to_int
300 ... return tuple(coerce(s) for s in split_input)
301 ...
302
303 I have written the above for clarity and not performance.
304 This pretty much matches `most natural sort solutions for python on Stack Overflow`_
305 (except the above includes customization of the definition of a number).
306
307 Special Cases Everywhere!
308 -------------------------
309
310 .. contents::
311 :local:
312
313 .. image:: special_cases_everywhere.jpg
314
315 If what I described in :ref:`TL;DR 1 <tldr1>` were
316 all that :mod:`natsort` needed to
317 do then there probably wouldn't be much need for a third-party module, right?
318 Probably. But it turns out that in real-world data there are a lot of
319 special cases that need to be handled, and in true `80%/20%`_ fashion, the
320 majority of the code in :mod:`natsort` is devoted to handling special cases
321 like those described below.
322
323 Sorting Filesystem Paths
324 ++++++++++++++++++++++++
325
326 `The first major special case I encountered was sorting filesystem paths`_
327 (if you go to the link, you will see I didn't handle it well for a year...
328 this was before I fully realized how much functionality I could really add
329 to :mod:`natsort`). Let's apply the :func:`natsort_key` from above to some
330 filesystem paths that you might see being auto-generated from your operating
331 system:
332
333 .. code-block:: pycon
334
335 >>> paths = ['/p/Folder (10)/file.tar.gz',
336 ... '/p/Folder/file.tar.gz',
337 ... '/p/Folder (1)/file (1).tar.gz',
338 ... '/p/Folder (1)/file.tar.gz']
339 >>> sorted(paths, key=natsort_key)
340 ['/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz', '/p/Folder/file.tar.gz']
341
342 Well that's not right! What is ``'/p/Folder/file.tar.gz'`` doing at the end?
343 It has to do with the numerical ASCII code assigned to the space and
344 ``/`` characters in the `ASCII table`_. According to the `ASCII table`_, the
345 space character (number 32) comes before the ``/`` character (number 47). If
346 we remove the common prefix in all of the above strings (``'/p/Folder'``), we
347 can see why this happens:
348
349 .. code-block:: pycon
350
351 >>> ' (1)/file.tar.gz' < '/file.tar.gz'
352 True
353 >>> ' ' < '/'
354 True
355
356 This isn't very convenient... how do we solve it? We can split the path
357 across the path separators and then sort. A convenient way do to this is
358 with the :data:`Path.parts <pathlib.PurePath.parts>` property from
359 :mod:`pathlib`:
360
361 .. code-block:: pycon
362
363 >>> import pathlib
364 >>> sorted(paths, key=lambda x: tuple(natsort_key(s) for s in pathlib.Path(x).parts))
365 ['/p/Folder/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz']
366
367 Almost! It seems like there is some funny business going on in the final
368 filename component as well. We can solve that nicely and quickly with
369 :data:`Path.suffixes <pathlib.PurePath.suffixes>` and :data:`Path.stem
370 <pathlib.PurePath.stem>`.
371
372 .. code-block:: pycon
373
374 >>> def decompose_path_into_components(x):
375 ... path_split = list(pathlib.Path(x).parts)
376 ... # Remove the final filename component from the path.
377 ... final_component = pathlib.Path(path_split.pop())
378 ... # Split off all the extensions.
379 ... suffixes = final_component.suffixes
380 ... stem = final_component.name.replace(''.join(suffixes), '')
381 ... # Remove the '.' prefix of each extension, and make that
382 ... # final component a list of the stem and each suffix.
383 ... final_component = [stem] + [x[1:] for x in suffixes]
384 ... # Replace the split final filename component.
385 ... path_split.extend(final_component)
386 ... return path_split
387 ...
388 >>> def natsort_key_with_path_support(x):
389 ... return tuple(natsort_key(s) for s in decompose_path_into_components(x))
390 ...
391 >>> sorted(paths, key=natsort_key_with_path_support)
392 ['/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (10)/file.tar.gz']
393
394 This works because in addition to breaking the input by path separators, the final
395 filename component is separated from its extensions as well [#f1]_. *Then*, each of these
396 separated components is sent to the :mod:`natsort` algorithm, so the result is
397 a tuple of tuples. Once that is done, we can see how comparisons can be done in
398 the expected manner.
399
400 .. code-block:: pycon
401
402 >>> a = natsort_key_with_path_support('/p/Folder (1)/file (1).tar.gz')
403 >>> a
404 (('/',), ('p',), ('Folder (', 1, ')'), ('file (', 1, ')'), ('tar',), ('gz',))
405 >>>
406 >>> b = natsort_key_with_path_support('/p/Folder/file.tar.gz')
407 >>> b
408 (('/',), ('p',), ('Folder',), ('file',), ('tar',), ('gz',))
409 >>>
410 >>> a > b
411 True
412
413 Comparing Different Types on Python 3
414 +++++++++++++++++++++++++++++++++++++
415
416 `The second major special case I encountered was sorting of different types`_.
417 If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter *too*
418 much since it uses an arbitrary heuristic to allow traditionally un-comparable
419 types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
420 (i.e. Python) it simply won't let you perform such nonsense, raising a
421 :exc:`TypeError` instead.
422
423 You can imagine that a module that breaks strings into tuples of numbers and
424 strings is walking a dangerous line if it does not have special handling for
425 comparing numbers and strings. My imagination was not so great at first.
426 Let's take a look at all the ways this can fail with real-world data.
427
428 .. code-block:: pycon
429
430 >>> def natsort_key_with_poor_real_number_support(x):
431 ... split_input = re.split(signed_float, x)
432 ... split_input = filter(None, split_input) # removes null strings
433 ... return tuple(coerce_to_float(s) for s in split_input)
434 >>>
435 >>> sorted([5, '4'], key=natsort_key_with_poor_real_number_support)
436 Traceback (most recent call last):
437 ...
438 TypeError: ...
439 >>>
440 >>> sorted(['12 apples', 'apples'], key=natsort_key_with_poor_real_number_support)
441 Traceback (most recent call last):
442 ...
443 TypeError: ...
444 >>>
445 >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_poor_real_number_support)
446 Traceback (most recent call last):
447 ...
448 TypeError: ...
449
450 Let's break these down.
451
452 #. The integer ``5`` is sent to ``re.split`` which expects only strings
453 or bytes, which is a no-no.
454 #. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')``
455 is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets
456 compared to a string [#f2]_ which also is a no-no.
457 #. This one scores big on the astonishment scale, especially if one accidentally
458 uses signed integers or real numbers when they mean to use unsigned integers.
459 ``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')``
460 is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, so in the
461 third element a number gets compared to a string, once again the same
462 old no-no. (The same would happen with ``'version5-3'`` and ``'version5-a'``,
463 which would be come ``('version', 5, -3)`` and ``('version', 5, '-a')``).
464
465 As you might expect, the solution to the first issue is to wrap the ``re.split``
466 call in a ``try: except:`` block and handle the number specially if a
467 :exc:`TypeError` is raised. The second and third cases *could* be handled
468 in a "special case" manner, meaning only respond and do something different
469 if these problems are detected. But a less error-prone method is to ensure
470 that the data is correct-by-construction, and this can be done by ensuring
471 that the returned tuples *always* start with a string, and then alternate
472 in a string-number-string-number-string patter;n this can be achieved by
473 adding an empty string wherever the pattern is not followed [#f3]_. This ends
474 up working out pretty nicely because empty strings are always "less" than
475 any non-empty string, and we typically want numbers to come before strings.
476
477 Let's take a look at how this works out.
478
479 .. code-block:: pycon
480
481 >>> from natsort.utils import sep_inserter
482 >>> list(sep_inserter(iter(['apples']), ''))
483 ['apples']
484 >>>
485 >>> list(sep_inserter(iter([12, ' apples']), ''))
486 ['', 12, ' apples']
487 >>>
488 >>> list(sep_inserter(iter(['version', 5, -3]), ''))
489 ['version', 5, '', -3]
490 >>>
491 >>> from natsort import natsort_keygen, ns
492 >>> natsort_key_with_good_real_number_support = natsort_keygen(alg=ns.REAL)
493 >>>
494 >>> sorted([5, '4'], key=natsort_key_with_good_real_number_support)
495 ['4', 5]
496 >>>
497 >>> sorted(['12 apples', 'apples'], key=natsort_key_with_good_real_number_support)
498 ['12 apples', 'apples']
499 >>>
500 >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support)
501 ['version5.3.0', 'version5.3rc1']
502
503 How the "good" version works will be given in `TL;DR 2 - Handling Crappy, Real-World Input`_.
504
505 Handling NaN
506 ++++++++++++
507
508 `A rather unexpected special case I encountered was sorting collections containing NaN`_.
509 Let's see what happens when you try to sort a plain old list of numbers when there
510 is a **NaN** floating around in there.
511
512 .. code-block:: pycon
513
514 >>> danger = [7, float('nan'), 22.7, 19, -14, 59.123, 4]
515 >>> sorted(danger)
516 [7, nan, -14, 4, 19, 22.7, 59.123]
517
518 Clearly that isn't correct, and for once it isn't my fault!
519 `It's hard to compare floating point numbers`_. By definition, **NaN** is unorderable
520 to any other number, and is never equal to any other number, including itself.
521
522 .. code-block:: pycon
523
524 >>> nan = float('nan')
525 >>> 5 > nan
526 False
527 >>> 5 < nan
528 False
529 >>> 5 == nan
530 False
531 >>> 5 != nan
532 True
533 >>> nan == nan
534 False
535 >>> nan != nan
536 True
537
538 The implication of all this for us is that if there is an **NaN** in the
539 data-set we are trying to sort, the data-set will end up being sorted in
540 two separate yet individually sorted sequences - the one *before* the **NaN**,
541 and the one *after*. This is because the ``<`` operation that is used
542 to sort always returns :const:`False` with **NaN**.
543
544 Because :mod:`natsort` aims to sort sequences in a way that does not surprise
545 the user, keeping this behavior is not acceptable (I don't require my users
546 to know how **NaN** will behave in a sorting algorithm). The simplest way to
547 satisfy the "least astonishment" principle is to substitute **NaN** with
548 some other value. But what value is *least* astonishing? I chose to replace
549 **NaN** with :math:`-\infty` so that these poorly behaved elements always
550 end up at the front where the users will most likely be alerted to their presence.
551
552 .. code-block:: pycon
553
554 >>> def fix_nan(x):
555 ... if x != x: # only true for NaN
556 ... return float('-inf')
557 ... else:
558 ... return x
559 ...
560
561 Let's check out :ref:`TL;DR 2 <tldr2>` to see how this can be
562 incorporated into the simple key function from :ref:`TL;DR 1 <tldr1>`.
563
564 .. _tldr2:
565
566 TL;DR 2 - Handling Crappy, Real-World Input
567 +++++++++++++++++++++++++++++++++++++++++++
568
569 Let's see how our elegant key function from :ref:`TL;DR 1 <tldr1>` has
570 become bastardized in order to support handling mixed real-world data
571 and user customizations.
572
573 >>> def natsort_key(x, as_float=False, signed=False, as_path=False):
574 ... if as_float:
575 ... regex = signed_float if signed else unsigned_float
576 ... else:
577 ... regex = signed_int if signed else unsigned_int
578 ... try:
579 ... if as_path:
580 ... x = decompose_path_into_components(x) # Decomposes into list of strings
581 ... # If this raises a TypeError, input is not a string.
582 ... split_input = re.split(regex, x)
583 ... except TypeError:
584 ... try:
585 ... # Does this need to be applied recursively (list-of-list)?
586 ... return tuple(map(natsort_key, x))
587 ... except TypeError:
588 ... # Must be a number
589 ... ret = ('', fix_nan(x)) # Maintain string-number-string pattern
590 ... return (ret,) if as_path else ret # as_path returns tuple-of-tuples
591 ... else:
592 ... split_input = filter(None, split_input) # removes null strings
593 ... # Note that the coerce_to_int/coerce_to_float functions
594 ... # are also modified to use the fix_nan function.
595 ... if as_float:
596 ... coerced_input = (coerce_to_float(s) for s in split_input)
597 ... else:
598 ... coerced_input = (coerce_to_int(s) for s in split_input)
599 ... return tuple(sep_inserter(coerced_input, ''))
600 ...
601
602 And this doesn't even show handling :class:`bytes` type! Notice that we have
603 to do non-obvious things like modify the return form of numbers when ``as_path``
604 is given, just to avoid comparing strings and numbers for the case in which a user provides
605 input like ``['/home/me', 42]``.
606
607 Let's take it out for a spin!
608
609 .. code-block:: pycon
610
611 >>> danger = [7, float('nan'), 22.7, '19', '-14', '59.123', 4]
612 >>> sorted(danger, key=lambda x: natsort_key(x, as_float=True, signed=True))
613 [nan, '-14', 4, 7, '19', 22.7, '59.123']
614 >>>
615 >>> paths = ['/p/Folder (1)/file.tar.gz',
616 ... '/p/Folder/file.tar.gz',
617 ... 123456]
618 >>> sorted(paths, key=lambda x: natsort_key(x, as_path=True))
619 [123456, '/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz']
620
621 Here Be Dragons: Adding Locale Support
622 --------------------------------------
623
624 .. contents::
625 :local:
626
627 Probably the most challenging special case I had to handle was getting
628 :mod:`natsort` to handle sorting the non-numerical parts of input
629 correctly, and also allowing it to sort the numerical bits in different
630 locales. This was in no way what I originally set out to do with this
631 library, so I was `caught a bit off guard when the request was initially made`_.
632 I discovered the :mod:`locale` library, and assumed that if it's part of Python's
633 StdLib there can't be too many dragons, right?
634
635 .. admonition:: INCOMPLETE LIST OF DRAGONS
636
637 - https://github.com/SethMMorton/natsort/issues/21
638 - https://github.com/SethMMorton/natsort/issues/22
639 - https://github.com/SethMMorton/natsort/issues/23
640 - https://github.com/SethMMorton/natsort/issues/36
641 - https://github.com/SethMMorton/natsort/issues/44
642 - https://bugs.python.org/issue2481
643 - https://bugs.python.org/issue23195
644 - https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
645 - https://stackoverflow.com/questions/22203550/sort-dictionary-by-key-using-locale-collation
646 - https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
647 - https://stackoverflow.com/questions/36431810/sort-numeric-lines-with-thousand-separators
648 - https://stackoverflow.com/questions/45734562/how-can-i-get-a-reasonable-string-sorting-with-python
649
650 These can be summed up as follows:
651
652 #. :mod:`locale` is a thin wrapper over your operating system's *locale*
653 library, so if *that* is broken (like it is on BSD and OSX) then
654 :mod:`locale` is broken in Python.
655 #. Because of a bug in legacy Python (i.e. Python 2), there is no uniform way to use
656 the :mod:`locale` sorting functionality between legacy Python and Python 3.
657 #. People have differing opinions of how capitalization should affect word order.
658 #. There is no built-in way to handle locale-dependent thousands separators
659 and decimal points *robustly*.
660 #. Proper handling of Unicode is complicated.
661 #. Proper handling of :mod:`locale` is complicated.
662
663 Easily over half of the the code in :mod:`natsort` is in some way dealing with some
664 aspect of :mod:`locale` or basic case handling. It would have been
665 impossible to get right without a `really good`_ `testing strategy`_.
666
667 Don't expect any more TL;DR's... if you want to see how all this is fully
668 incorporated into the :mod:`natsort` algorithm then please take a look
669 `at the code`_. However, I will hint at how specific steps are taken in
670 each section.
671
672 Let's see how we can handle some of the dragons, one-by-one.
673
674 Basic Case Control Support
675 ++++++++++++++++++++++++++
676
677 Without even thinking about the mess that is adding :mod:`locale` support,
678 :mod:`natsort` can introduce support for controlling how case is interpreted.
679
680 First, let's take a look at how it is sorted by default (due to
681 where characters lie on the `ASCII table`_).
682
683 .. code-block:: pycon
684
685 >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
686 >>> sorted(a)
687 ['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
688
689 All uppercase letters come before lowercase letters in the `ASCII table`_,
690 so all capitalized words appear first. Not everyone agrees that this
691 is the correct order. Some believe that the capitalized words should
692 be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``).
693 Some believe that both the lowercase and uppercase versions
694 should appear together (``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``).
695 Some believe that both should be true ☹. Some people don't care at all [#f4]_.
696
697 Solving the first case (I call it *LOWERCASEFIRST*) is actually pretty
698 easy... just call the :meth:`str.swapcase` method on the input.
699
700 .. code-block:: pycon
701
702 >>> sorted(a, key=lambda x: x.swapcase())
703 ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
704
705 The last (i call it *IGNORECASE*) should be super easy, right?
706 Simply call :meth:`str.lowercase` on the input. This will work but may
707 not always give the correct answer on non-latin character sets. It's
708 a good thing that in Python 3.3
709 :meth:`str.casefold` was introduced, which does a better job of removing
710 all case information from unicode characters in
711 non-latin alphabets.
712
713 .. code-block:: pycon
714
715 >>> def remove_case(x):
716 ... try:
717 ... return x.casefold()
718 ... except AttributeError: # Legacy Python backwards compatibility
719 ... return x.lowercase()
720 ...
721 >>> sorted(a, key=remove_case)
722 ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
723
724 The middle case (I call it *GROUPLETTERS*) is less straightforward.
725 The most efficient way to handle this is to duplicate each character
726 with its lowercase version and then the original character.
727
728 .. code-block:: pycon
729
730 >>> import itertools
731 >>> def groupletters(x):
732 ... return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
733 ...
734 >>> groupletters('Apple')
735 'aAppppllee'
736 >>> groupletters('apple')
737 'aappppllee'
738 >>> sorted(a, key=groupletters)
739 ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
740
741 The effect of this is that both ``'Apple'`` and ``'apple'`` are
742 placed adjacent to each other because their transformations both begin
743 with ``'a'``, and then the second character can be used to order them
744 appropriately with respect to each other.
745
746 There's a problem with this, though. Within the context of :mod:`natsort`
747 we are trying to correctly sort numbers and those should be left alone.
748
749 .. code-block:: pycon
750
751 >>> a = ['Apple5', 'apple', 'Apple4E10', 'Banana']
752 >>> sorted(a, key=lambda x: natsort_key(x, as_float=True))
753 ['Apple5', 'Apple4E10', 'Banana', 'apple']
754 >>> sorted(a, key=lambda x: natsort_key(groupletters(x), as_float=True))
755 ['Apple4E10', 'Apple5', 'apple', 'Banana']
756 >>> groupletters('Apple4E10')
757 'aAppppllee44eE1100'
758
759 We messed up the numbers! Looks like :func:`groupletters` needs to be applied
760 *after* the strings are broken into their components. I'm not going to show
761 how this is done here, but basically it requires applying the function in
762 the ``else:`` block of :func:`coerce_to_int`/:func:`coerce_to_float`.
763
764 .. code-block:: pycon
765
766 >>> better_groupletters = natsort_keygen(alg=ns.GROUPLETTERS | ns.REAL)
767 >>> better_groupletters('Apple4E10')
768 ('aAppppllee', 40000000000.0)
769 >>> sorted(a, key=better_groupletters)
770 ['Apple5', 'Apple4E10', 'apple', 'Banana']
771
772 Of course, applying both *LOWERCASEFIRST* and *GROUPLETTERS* is just
773 a matter of turning on both functions.
774
775 Basic Unicode Support
776 +++++++++++++++++++++
777
778 Unicode is hard and complicated. Here's an example.
779
780 .. code-block:: pycon
781
782 >>> b = [b'\x66', b'\x65', b'\xc3\xa9', b'\x65\xcc\x81', b'\x61', b'\x7a']
783 >>> a = [x.decode('utf8') for x in b]
784 >>> a # doctest: +SKIP
785 ['f', 'e', 'é', 'é', 'a', 'z']
786 >>> sorted(a) # doctest: +SKIP
787 ['a', 'e', 'é', 'f', 'z', 'é']
788
789
790 There are more than one way to represent the character 'é' in Unicode.
791 In fact, many characters have multiple representations. This is a challenge
792 because comparing the two representations would return ``False`` even though
793 they *look* the same.
794
795 .. code-block:: pycon
796
797 >>> a[2] == a[3]
798 False
799
800 Alas, since characters are compared based on the numerical value of their
801 representation, sorting Unicode often gives unexpected results (like seeing
802 'é' come both *before* and *after* 'z').
803
804 The original approach that :mod:`natsort` took with respect to non-ASCII
805 Unicode characters was to say "just use
806 the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers
807 and hope those libraries take care of it. As you will find in the following
808 sections, that comes with its own baggage, and turned out to not always work anyway
809 (see https://stackoverflow.com/q/45734562/1399279). A more robust approach is to
810 handle the Unicode out-of-the-box without invoking a heavy-handed library
811 like :mod:`locale` or :mod:`PyICU`. To do this, we must use *normalization*.
812
813 To fully understand Unicode normalization, `check out some official Unicode documentation`_.
814 Just kidding... that's too much text. The following StackOverflow answers do
815 a good job at explaining Unicode normalization in simple terms:
816 https://stackoverflow.com/a/7934397/1399279 and
817 https://stackoverflow.com/a/7931547/1399279. Put simply, normalization
818 ensures that Unicode characters with multiple representations are in
819 some canonical and consistent representation so that (for example) comparisons
820 of the characters can be performed in a sane way. The following discussion
821 assumes you at least read the StackOverflow answers.
822
823 Looking back at our 'é' example, we can see that the two versions were
824 constructed with the byte strings ``b'\xc3\xa9'`` and ``b'\x65\xcc\x81'``.
825 The former representation is actually
826 `LATIN SMALL LETTER E WITH ACUTE <https://www.fileformat.info/info/unicode/char/e9/index.htm>`_
827 and is a single character in the Unicode standard. This is known as the
828 *compressed form* and corresponds to the 'NFC' normalization scheme.
829 The latter representation is actually the letter 'e' followed by
830 `COMBINING ACUTE ACCENT <https://www.fileformat.info/info/unicode/char/0301/index.htm>`_
831 and so is two characters in the Unicode standard. This is known as the
832 *decompressed form* and corresponds to the 'NFD' normalization scheme.
833 Since the first character in the decompressed form is actually the letter 'e',
834 when compared to other ASCII characters it fits where you might expect.
835 Unfortunately, all Unicode compressed form characters come after the
836 ASCII characters and so they always will be placed after 'z' when sorting.
837
838 It seems that most Unicode data is stored and shared in the compressed form
839 which makes it challenging to sort. This can be solved by normalizing all
840 incoming Unicode data to the decompressed form ('NFD') and *then* sorting.
841
842 .. code-block:: pycon
843
844 >>> import unicodedata
845 >>> c = [unicodedata.normalize('NFD', x) for x in a]
846 >>> c # doctest: +SKIP
847 ['f', 'e', 'é', 'é', 'a', 'z']
848 >>> sorted(c) # doctest: +SKIP
849 ['a', 'e', 'é', 'é', 'f', 'z']
850
851 Huzzah! Sane sorting without having to resort to :mod:`locale`!
852
853 Using Locale to Compare Strings
854 +++++++++++++++++++++++++++++++
855
856 The :mod:`locale` module is actually pretty cool, and provides lowly
857 spare-time programmers like myself a way to handle the daunting task
858 of proper locale-dependent support of their libraries and utilities.
859 Having said that, it can be a bit of a bear to get right,
860 `although they do point out in the documentation that it will be painful to use`_.
861 Aside from the caveats spelled out in that link, it turns out that just
862 comparing strings with :mod:`locale` in a cross-platform and
863 cross-python-version manner is not as straightforward as one might hope.
864
865 First, how to use :mod:`locale` to compare strings? It's actually
866 pretty straightforward. Simply run the input through the :mod:`locale`
867 transformation function :func:`locale.strxfrm`.
868
869 .. code-block:: pycon
870
871 >>> import locale, sys
872 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
873 'en_US.UTF-8'
874 >>> a = ['a', 'b', 'ä']
875 >>> sorted(a)
876 ['a', 'b', 'ä']
877 >>> # The below fails on OSX, so don't run doctest on darwin.
878 >>> is_osx = sys.platform == 'darwin'
879 >>> sorted(a, key=locale.strxfrm) if not is_osx else ['a', 'ä', 'b']
880 ['a', 'ä', 'b']
881 >>>
882 >>> a = ['apple', 'Banana', 'banana', 'Apple']
883 >>> sorted(a, key=locale.strxfrm) if not is_osx else ['apple', 'Apple', 'banana', 'Banana']
884 ['apple', 'Apple', 'banana', 'Banana']
885
886 It turns out that locale-aware sorting groups numbers in the same
887 way as turning on *GROUPLETTERS* and *LOWERCASEFIRST*.
888 The trick is that you have to apply :func:`locale.strxfrm` only to non-numeric
889 characters; otherwise, numbers won't be parsed properly. Therefore, it must
890 be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
891 functions in a manner similar to :func:`groupletters`.
892
893 As you might have guessed, there is a small problem.
894 It turns out the there is a bug in the legacy Python implementation of
895 :func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
896 input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
897 but is intended for use with ``cmp``, which does not exist in current Python
898 implementations. Luckily, the :func:`functools.cmp_to_key` function
899 makes :func:`locale.strcoll` behave like :func:`locale.strxfrm`.
900
901 Handling Broken Locale On OSX
902 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
903
904 But what if the underlying *locale* implementation that :mod:`locale`
905 relies upon is simply broken? It turns out that the *locale* library on
906 OSX (and other BSD systems) is broken (and for some reason has never been
907 fixed?), and so :mod:`locale` does not work as expected.
908
909 How do I define doesn't work as expected?
910
911 .. code-block:: pycon
912
913 >>> a = ['apple', 'Banana', 'banana', 'Apple']
914 >>> sorted(a)
915 ['Apple', 'Banana', 'apple', 'banana']
916 >>>
917 >>> sorted(a, key=locale.strxfrm) if is_osx else sorted(a)
918 ['Apple', 'Banana', 'apple', 'banana']
919
920 IT'S SORTING AS IF :func:`locale.stfxfrm` WAS NEVER USED!! (and it's worse
921 once non-ASCII characters get thrown into the mix.) I'm really not
922 sure why this is considered OK for the OSX/BSD maintainers to not fix,
923 but it's more than frustrating for poor developers who have been dragged
924 into the *locale* game kicking and screaming. *<deep breath>*.
925
926 So, how to deal with this situation? There are two ways to do so.
927
928 #. Detect if :mod:`locale` is sorting incorrectly (i.e. ``dumb``) by seeing
929 if ``'A'`` is sorted before ``'a'`` (incorrect) or not.
930
931 .. code-block:: pycon
932
933 >>> # This is genuinely the name of this function.
934 >>> # See natsort.compat.locale.py
935 >>> def dumb_sort():
936 ... return locale.strxfrm('A') < locale.strxfrm('a')
937 ...
938
939 If a ``dumb`` *locale* implementation is found, then automatically
940 turn on *LOWERCASEFIRST* and *GROUPLETTERS*.
941 #. Use an alternate library if installed. `ICU <http://site.icu-project.org/>`_
942 is a great and powerful library that has a pretty decent Python port
943 called (you guessed it) `PyICU <https://pypi.org/project/PyICU/>`_.
944 If a user has this library installed on their computer, :mod:`natsort`
945 chooses to use that instead of :mod:`locale`. With a little bit of
946 planning, one can write a set of wrapper functions that call
947 the correct library under the hood such that the business logic never
948 has to know what library is being used (see `natsort.compat.locale.py`_).
949
950 Let me tell you, this little complication really makes a challenge of testing
951 the code, since one must set up different environments on different operating
952 systems in order to test all possible code paths. Not to mention that
953 certain checks *will* fail for certain operating systems and environments
954 so one must be diligent in either writing the tests not to fail, or ignoring
955 those tests when on offending environments.
956
957 Handling Locale-Aware Numbers
958 +++++++++++++++++++++++++++++
959
960 `Thousands separator support`_ is a problem that I knew would someday be
961 requested but had decided to push off until a rainy day. One day it finally
962 rained, and I decided to tackle the problem.
963
964 So what is the problem? Consider the number ``1,234,567`` (assuming the
965 ``','`` is the thousands separator). Try to run that through :func:`int`
966 and you will get a :exc:`ValueError`. To handle this properly the thousands
967 separators must be removed.
968
969 .. code-block:: pycon
970
971 >>> float('1,234,567'.replace(',', ''))
972 1234567.0
973
974 What if, in our current locale, the thousands separator is ``'.'`` and
975 the ``','`` is the decimal separator (like for the German locale *de_DE*)?
976
977 .. code-block:: pycon
978
979 >>> float('1.234.567'.replace('.', '').replace(',', '.'))
980 1234567.0
981 >>> float('1.234.567,89'.replace('.', '').replace(',', '.'))
982 1234567.89
983
984 This is pretty much what :func:`locale.atoi` and :func:`locale.atof` do
985 under the hood. So what's the problem? Why doesn't :mod:`natsort` just
986 use this method under its hood?
987 Well, let's take a look at what would happen if we send some possible
988 :mod:`natsort` input through our the above function:
989
990 .. code-block:: pycon
991
992 >>> natsort_key('1,234 apples, please.'.replace(',', ''))
993 ('', 1234, ' apples please.')
994 >>> natsort_key('Sir, €1.234,50 please.'.replace('.', '').replace(',', '.'), as_float=True)
995 ('Sir. €', 1234.5, ' please')
996
997 Any character matching the thousands separator was dropped, and anything
998 matching the decimal separator was changed to ``'.'``! If these characters
999 were critical to how your data was ordered, this would break :mod:`natsort`.
1000
1001 The first solution one might consider would be to first decompose the
1002 input into sub-components (like we did for the *GROUPLETTERS* method
1003 above) and then only apply these transformations on the number components.
1004 This is a chicken-and-egg problem, though, because *we cannot appropriately
1005 separate out the numbers because of the thousands separators and
1006 non-'.' decimal separators* (well, at least not without making multiple
1007 passes over the data which I do not consider to be a valid option).
1008
1009 Regular expressions to the rescue! With regular expressions, we can
1010 remove the thousands separators and change the decimal separator only
1011 when they are actually within a number. Once the input has been
1012 pre-processed with this regular expression, all the infrastructure
1013 shown previously will work.
1014
1015 Beware, these regular expressions will make your eyes bleed.
1016
1017 .. code-block:: pycon
1018
1019 >>> decimal = ',' # Assume German locale, so decimal separator is ','
1020 >>> # Look-behind assertions cannot accept range modifiers, so instead of i.e.
1021 >>> # (?<!\.[0-9]{1,3}) I have to repeat the look-behind for 1, 2, and 3.
1022 >>> nodecimal = r'(?<!{dec}[0-9])(?<!{dec}[0-9]{{2}})(?<!{dec}[0-9]{{3}})'.format(dec=decimal)
1023 >>> strip_thousands = r'''
1024 ... (?<=[0-9]{{1}}) # At least 1 number
1025 ... (?<![0-9]{{4}}) # No more than 3 numbers
1026 ... {nodecimal} # Cannot follow decimal
1027 ... {thou} # The thousands separator
1028 ... (?=[0-9]{{3}} # Three numbers must follow
1029 ... ([^0-9]|$) # But a non-number after that
1030 ... )
1031 ... '''.format(nodecimal=nodecimal, thou='.') # Thousands separator is '.' in German locale.
1032 ...
1033 >>> re.sub(strip_thousands, '', 'Sir, €1.234,50 please.', flags=re.X)
1034 'Sir, €1234,50 please.'
1035 >>>
1036 >>> # The decimal point must be preceded by a number or after
1037 >>> # a number. This option only needs to be performed in the
1038 >>> # case when the decimal separator for the locale is not '.'.
1039 >>> switch_decimal = r'(?<=[0-9]){decimal}|{decimal}(?=[0-9])'
1040 >>> switch_decimal = switch_decimal.format(decimal=decimal)
1041 >>> re.sub(switch_decimal, '.', 'Sir, €1234,50 please.', flags=re.X)
1042 'Sir, €1234.50 please.'
1043 >>>
1044 >>> natsort_key('Sir, €1234.50 please.', as_float=True)
1045 ('Sir, €', 1234.5, ' please.')
1046
1047 Final Thoughts
1048 --------------
1049
1050 My hope is that users of :mod:`natsort` never have to think about or worry
1051 about all the bookkeeping or any of the details described above, and that using
1052 :mod:`natsort` seems to magically "just work". For those of you who
1053 took the time to read this engineering description, I hope it has enlightened
1054 you to some of the issues that can be encountered when code is released
1055 into the wild and has to accept "real-world data", or to what happens
1056 to developers who naïvely make bold assumptions that are counter to
1057 what the rest of the world assumes.
1058
1059 .. rubric:: Footnotes
1060
1061 .. [#f1]
1062 To anyone looking through the actual code, you will note that I don't
1063 actually use :mod:`pathlib` to split the paths... I wrote my own version
1064 to avoid adding an external dependency of :mod:`pathlib` on Python < 3.4.
1065 .. [#f2]
1066 *"But if you hadn't removed the leading empty string from re.split this
1067 wouldn't have happened!!"* I can hear you saying. Well, that's true. I don't
1068 have a *great* reason for having done that except that in an earlier
1069 non-optimal incarnation of the algorithm I needed to it, and it kind of
1070 stuck, and it made other parts of the code easier if the assumption that
1071 there were no empty strings was valid.
1072 .. [#f3]
1073 I'm not going to show how this is implemented in this document,
1074 but if you are interested you can look at the code to
1075 :func:`sep_inserter` in `util.py`_.
1076 .. [#f4]
1077 Handling each of these is straightforward, but coupled with the rapidly
1078 fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can imagine
1079 this will get out of hand quickly. If you take a look at `natsort.py`_ and
1080 `util.py`_ you can observe that to avoid this I take a more functional approach
1081 to construting the :mod:`natsort` algorithm as opposed to the procedural approach
1082 illustrated in :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`.
1083
1084 .. _ASCII table: https://www.asciitable.com/
1085 .. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/
1086 .. _This astonished: https://github.com/SethMMorton/natsort/issues/19
1087 .. _a lot: https://stackoverflow.com/questions/29548742/python-natsort-sort-strings-recursively
1088 .. _of people: https://stackoverflow.com/questions/24045348/sort-set-of-numbers-in-the-form-xx-yy-in-python
1089 .. _and some people aren't very nice when they are astonished:
1090 https://github.com/xolox/python-naturalsort/blob/ed3e6b6ffaca3bdea3b76e08acbb8bd2a5fee463/README.rst#why-another-natsort-module
1091 .. _fastnumbers: https://github.com/SethMMorton/fastnumbers
1092 .. _as part of my testing: https://github.com/SethMMorton/natsort/blob/master/test_natsort/slow_splitters.py
1093 .. _this one for coercion: https://stackoverflow.com/questions/736043/checking-if-a-string-can-be-converted-to-float-in-python
1094 .. _this one for checking: https://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float
1095 .. _most natural sort solutions for python on Stack Overflow: https://stackoverflow.com/q/4836710/1399279
1096 .. _80%/20%: https://en.wikipedia.org/wiki/Pareto_principle
1097 .. _The first major special case I encountered was sorting filesystem paths: https://github.com/SethMMorton/natsort/issues/3
1098 .. _The second major special case I encountered was sorting of different types: https://github.com/SethMMorton/natsort/issues/7
1099 .. _A rather unexpected special case I encountered was sorting collections containing NaN:
1100 https://github.com/SethMMorton/natsort/issues/27
1101 .. _It's hard to compare floating point numbers: http://www.drdobbs.com/cpp/its-hard-to-compare-floating-point-numbe/240149806
1102 .. _caught a bit off guard when the request was initially made: https://github.com/SethMMorton/natsort/issues/14
1103 .. _at the code: https://github.com/SethMMorton/natsort/tree/master/natsort
1104 .. _natsort.py: https://github.com/SethMMorton/natsort/blob/master/natsort/natsort.py
1105 .. _util.py: https://github.com/SethMMorton/natsort/blob/master/natsort/util.py
1106 .. _although they do point out in the documentation that it will be painful to use:
1107 https://docs.python.org/3/library/locale.html#background-details-hints-tips-and-caveats
1108 .. _natsort.compat.locale.py: https://github.com/SethMMorton/natsort/blob/master/natsort/compat/locale.py
1109 .. _Thousands separator support: https://github.com/SethMMorton/natsort/issues/36
1110 .. _really good: https://hypothesis.readthedocs.io/en/latest/
1111 .. _testing strategy: https://docs.pytest.org/en/latest/
1112 .. _check out some official Unicode documentation: https://unicode.org/reports/tr15/
0 .. natsort documentation master file, created by
1 sphinx-quickstart on Thu Jul 17 21:01:29 2014.
2 You can adapt this file completely to your liking, but it should at least
3 contain the root `toctree` directive.
4
5 natsort: Simple yet flexible natural sorting in Python.
6 =======================================================
7
8 Contents:
9
10 .. toctree::
11 :maxdepth: 2
12 :numbered:
13
14 intro.rst
15 howitworks.rst
16 examples.rst
17 api.rst
18 locale_issues.rst
19 shell.rst
20 changelog.rst
21
22 Indices and tables
23 ==================
24
25 * :ref:`genindex`
26 * :ref:`modindex`
27 * :ref:`search`
0 .. default-domain:: py
1 .. module:: natsort
2
3 The :mod:`natsort` module
4 =========================
5
6 Simple yet flexible natural sorting in Python.
7
8 - Source Code: https://github.com/SethMMorton/natsort
9 - Downloads: https://pypi.org/project/natsort/
10 - Documentation: https://natsort.readthedocs.io/
11 - Optional Dependencies:
12
13 - `fastnumbers <https://pypi.org/project/fastnumbers>`_ >= 2.0.0
14 - `PyICU <https://pypi.org/project/PyICU>`_ >= 1.0.0
15
16 **NOTE**: Please see the `Deprecation Schedule`_ section for changes in
17 :mod:`natsort` version 6.0.0 and in the upcoming version 7.0.0.
18
19 :mod:`natsort` is a general utility for sorting lists *naturally*; the definition
20 of "naturally" is not well-defined, but the most common definition is that numbers
21 contained within the string should be sorted as numbers and not as you would
22 other characters. If you need to present sorted output to a user, you probably
23 want to sort it naturally.
24
25 :mod:`natsort` was initially created for sorting scientific output filenames that
26 contained signed floating point numbers in the names. There was a lack of
27 algorithms out there that could perform a natural sort on `floats` but
28 plenty for `ints`; check out
29 `this StackOverflow question <https://stackoverflow.com/q/4836710/1399279>`_
30 and its answers and links therein,
31 `this ActiveState forum <https://code.activestate.com/recipes/285264-natural-string-sorting/>`_,
32 and of course `this great article on natural sorting <https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/>`_
33 from CodingHorror.com for examples of what I mean.
34 :mod:`natsort` was created to fill in this gap, but has since expanded to handle
35 just about any definition of a number, as well as other sorting customizations.
36
37 Quick Description
38 -----------------
39
40 When you try to sort a list of strings that contain numbers, the normal python
41 sort algorithm sorts lexicographically, so you might not get the results that you
42 expect:
43
44 .. code-block:: pycon
45
46 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
47 >>> sorted(a)
48 ['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
49
50 Notice that it has the order ('1', '10', '2') - this is because the list is
51 being sorted in lexicographical order, which sorts numbers like you would
52 letters (i.e. 'b', 'ba', 'c').
53
54 :mod:`natsort` provides a function :func:`~natsorted` that helps sort lists
55 "naturally" ("naturally" is rather ill-defined, but in general it means
56 sorting based on meaning and not computer code point)..
57 Using :func:`~natsorted` is simple:
58
59 .. code-block:: pycon
60
61 >>> from natsort import natsorted
62 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
63 >>> natsorted(a)
64 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
65
66 :func:`~natsorted` identifies numbers anywhere in a string and sorts them
67 naturally. Below are some other things you can do with :mod:`natsort`
68 (please see the :ref:`examples` for a quick start guide, or the :ref:`api`
69 for more details).
70
71 .. note::
72
73 :func:`~natsorted` is designed to be a drop-in replacement for the built-in
74 :func:`sorted` function. Like :func:`sorted`, :func:`~natsorted`
75 `does not sort in-place`. To sort a list and assign the output to the
76 same variable, you must explicitly assign the output to a variable:
77
78 .. code-block:: pycon
79
80 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
81 >>> natsorted(a)
82 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
83 >>> print(a) # 'a' was not sorted; "natsorted" simply returned a sorted list
84 ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
85 >>> a = natsorted(a) # Now 'a' will be sorted because the sorted list was assigned to 'a'
86 >>> print(a)
87 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
88
89 Please see `Generating a Reusable Sorting Key and Sorting In-Place`_ for
90 an alternate way to sort in-place naturally.
91
92 Examples
93 --------
94
95 Sorting Versions
96 ++++++++++++++++
97
98 :mod:`natsort` does not (and never has) actually *comprehend* version numbers.
99 It just so happens that the most common versioning schemes are designed to
100 work with standard natural sorting techniques; these schemes include
101 ``MAJOR.MINOR``, ``MAJOR.MINOR.PATCH``, ``YEAR.MONTH.DAY``. If your data
102 conforms to a scheme like this, then it will work out-of-the-box with
103 ``natsorted`` (as of ``natsort`` version >= 4.0.0):
104
105 .. code-block:: pycon
106
107 >>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
108 >>> natsorted(a)
109 ['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
110
111 If you need to versions that use a more complicated scheme, please see
112 :ref:`rc_sorting` for examples.
113
114 Sorting by Real Numbers (i.e. Signed Floats)
115 ++++++++++++++++++++++++++++++++++++++++++++
116
117 This is useful in scientific data analysis and was
118 the default behavior of :func:`~natsorted` for :mod:`natsort`
119 version < 4.0.0. Use the :func:`~realsorted` function:
120
121 .. code-block:: pycon
122
123 >>> from natsort import realsorted, ns
124 >>> # Note that when interpreting as signed floats, the below numbers are
125 >>> # +5.10, -3.00, +5.30, +2.00
126 >>> a = ['position5.10.data', 'position-3.data', 'position5.3.data', 'position2.data']
127 >>> natsorted(a)
128 ['position2.data', 'position5.3.data', 'position5.10.data', 'position-3.data']
129 >>> natsorted(a, alg=ns.REAL)
130 ['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
131 >>> realsorted(a) # shortcut for natsorted with alg=ns.REAL
132 ['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
133
134 Locale-Aware Sorting (or "Human Sorting")
135 +++++++++++++++++++++++++++++++++++++++++
136
137 This is where the non-numeric characters are ordered based on their meaning,
138 not on their ordinal value, and a locale-dependent thousands separator and decimal
139 separator is accounted for in the number.
140 This can be achieved with the :func:`~humansorted` function:
141
142 .. code-block:: pycon
143
144 >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
145 >>> natsorted(a)
146 ['Apple', 'Banana', 'apple14,689', 'apple15', 'banana']
147 >>> import locale
148 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
149 'en_US.UTF-8'
150 >>> natsorted(a, alg=ns.LOCALE)
151 ['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
152 >>> from natsort import humansorted
153 >>> humansorted(a)
154 ['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
155
156 You may find you need to explicitly set the locale to get this to work
157 (as shown in the example).
158 Please see :ref:`locale_issues` and the Installation section
159 below before using the :func:`~humansorted` function.
160
161 Further Customizing Natsort
162 +++++++++++++++++++++++++++
163
164 If you need to combine multiple algorithm modifiers (such as ``ns.REAL``,
165 ``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the
166 bitwise OR operator (``|``). For example,
167
168 .. code-block:: pycon
169
170 >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
171 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE)
172 ['Apple', 'apple15', 'apple14,689', 'Banana', 'banana']
173 >>> # The ns enum provides long and short forms for each option.
174 >>> ns.LOCALE == ns.L
175 True
176 >>> # You can also customize the convenience functions, too.
177 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE) == realsorted(a, alg=ns.L | ns.IC)
178 True
179 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE) == humansorted(a, alg=ns.R | ns.IC)
180 True
181
182 All of the available customizations can be found in the documentation for
183 the :class:`~natsort.ns` enum.
184
185 You can also add your own custom transformation functions with the ``key`` argument.
186 These can be used with ``alg`` if you wish:
187
188 .. code-block:: pycon
189
190 >>> a = ['apple2.50', '2.3apple']
191 >>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL)
192 ['2.3apple', 'apple2.50']
193
194 Sorting Mixed Types
195 +++++++++++++++++++
196
197 You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
198 when you sort:
199
200 .. code-block:: pycon
201
202 >>> a = ['4.5', 6, 2.0, '5', 'a']
203 >>> natsorted(a)
204 [2.0, '4.5', '5', 6, 'a']
205 >>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
206 >>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
207
208 Handling Bytes on Python 3
209 ++++++++++++++++++++++++++
210
211 :mod:`natsort` does not officially support the `bytes` type on Python 3, but
212 convenience functions are provided that help you decode to `str` first:
213
214 .. code-block:: pycon
215
216 >>> from natsort import as_utf8
217 >>> a = [b'a', 14.0, 'b']
218 >>> # On Python 2, natsorted(a) would would work as expected.
219 >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
220 >>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
221 True
222 >>> a = [b'a56', b'a5', b'a6', b'a40']
223 >>> # On Python 2, natsorted(a) would would work as expected.
224 >>> # On Python 3, natsorted(a) would return the same results as sorted(a)
225 >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
226 True
227
228 Generating a Reusable Sorting Key and Sorting In-Place
229 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
230
231 Under the hood, :func:`~natsorted` works by generating a custom sorting
232 key using :func:`~natsort_keygen` and then passes that to the built-in
233 :func:`sorted`. You can use the :func:`~natsort_keygen` function yourself to
234 generate a custom sorting key to sort in-place using the :meth:`list.sort`
235 method.
236
237 .. code-block:: pycon
238
239 >>> from natsort import natsort_keygen
240 >>> natsort_key = natsort_keygen()
241 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
242 >>> natsorted(a) == sorted(a, key=natsort_key)
243 True
244 >>> a.sort(key=natsort_key)
245 >>> a
246 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
247
248 All of the algorithm customizations mentioned in the `Further Customizing Natsort`_
249 section can also be applied to :func:`~natsort_keygen` through the *alg* keyword option.
250
251 Other Useful Things
252 +++++++++++++++++++
253
254 - recursively descend into lists of lists
255 - automatic unicode normalization of input data
256 - controlling the case-sensitivity (see :ref:`case_sort`)
257 - sorting file paths correctly (see :ref:`path_sort`)
258 - allow custom sorting keys (see :ref:`custom_sort`)
259
260 FAQ
261 ---
262
263 How do I debug :func:`~natsorted`?
264 The best way to debug :func:`~natsorted` is to generate a key using :func:`~natsort_keygen`
265 with the same options being passed to :func:`~natsorted`. One can take a look at
266 exactly what is being done with their input using this key - it is highly recommended
267 to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
268 for *how* to debug, and also to review the :ref:`howitworks` page for *why*
269 :mod:`natsort` is doing that to your data.
270
271 If you are trying to sort custom classes and running into trouble, please take a look at
272 https://github.com/SethMMorton/natsort/issues/60. In short,
273 custom classes are not likely to be sorted correctly if one relies
274 on the behavior of ``__lt__`` and the other rich comparison operators in their
275 custom class - it is better to use a ``key`` function with :mod:`natsort`, or
276 use the :mod:`natsort` key as part of your rich comparison operator definition.
277
278 How *does* :mod:`natsort` work?
279 If you don't want to read :ref:`howitworks`, here is a quick primer.
280
281 :mod:`natsort` provides a :term:`key function` that can be passed to
282 :meth:`list.sort` or :func:`sorted` in order to modify the default sorting
283 behavior. This key is generated on-demand with the key generator
284 :func:`natsort.natsort_keygen`. :func:`natsort.natsorted` is essentially a
285 wrapper for the following code:
286
287 .. code-block:: pycon
288
289 >>> from natsort import natsort_keygen
290 >>> natsort_key = natsort_keygen()
291 >>> sorted(['1', '10', '2'], key=natsort_key)
292 ['1', '2', '10']
293
294 Users can further customize :mod:`natsort` sorting behavior with the ``key``
295 and/or ``alg`` options (see details in the `Further Customizing Natsort`_
296 section).
297
298 The key generated by :func:`natsort.natsort_keygen` *always* returns a :class:`tuple`. It
299 does so in the following way (*some details omitted for clarity*):
300
301 1. Assume the input is a string, and attempt to split it into numbers and
302 non-numbers using regular expressions. Numbers are then converted into
303 either :class:`int` or :class:`float`.
304 2. If the above fails because the input is not a string, assume the input
305 is some other sequence (e.g. :class:`list` or :class:`tuple`), and recursively
306 apply the key to each element of the sequence.
307 3. If the above fails because the input is not iterable, assume the input
308 is an :class:`int` or :class:`float`, and just return the input in a :class:`tuple`.
309
310 Because a :class:`tuple` is always returned, a :exc:`TypeError` should not be common
311 unless one tries to do something odd like sort an :class:`int` against a :class:`list`.
312
313 :mod:`natsort` gave me results I didn't expect, and it's a terrible library!
314 Did you try to debug using the above advice? If so, and you still cannot figure out
315 the error, then please `file an issue <https://github.com/SethMMorton/natsort/issues/new>`_.
316
317 Shell script
318 ------------
319
320 :mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be called
321 from the command line with ``python -m natsort``.
322
323 Requirements
324 ------------
325
326 :mod:`natsort` requires Python version 2.7 or Python 3.4 or greater.
327
328 Optional Dependencies
329 ---------------------
330
331 fastnumbers
332 +++++++++++
333
334 The most efficient sorting can occur if you install the
335 `fastnumbers <https://pypi.org/project/fastnumbers>`_ package
336 (version >=2.0.0); it helps with the string to number conversions.
337 :mod:`natsort` will still run (efficiently) without the package, but if you need
338 to squeeze out that extra juice it is recommended you include this as a dependency.
339 :mod:`natsort` will not require (or check) that
340 `fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed
341 at installation.
342
343 PyICU
344 +++++
345
346 It is recommended that you install `PyICU <https://pypi.org/project/PyICU>`_
347 if you wish to sort in a locale-dependent manner, see :ref:`locale_issues` for
348 an explanation why.
349
350 Installation
351 ------------
352
353 Use ``pip``!
354
355 .. code-block:: sh
356
357 $ pip install natsort
358
359 If you want to install the `Optional Dependencies`_, you can use the
360 `"extras" notation <https://packaging.python.org/tutorials/installing-packages/#installing-setuptools-extras>`_
361 at installation time to install those dependencies as well - use ``fast`` for
362 `fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for
363 `PyICU <https://pypi.org/project/PyICU>`_.
364
365 .. code-block:: sh
366
367 # Install both optional dependencies.
368 $ pip install natsort[fast,icu]
369 # Install just fastnumbers
370 $ pip install natsort[fast]
371
372 How to Run Tests
373 ----------------
374
375 Please note that :mod:`natsort` is NOT set-up to support ``python setup.py test``.
376
377 The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_.
378 After installing ``tox``, running tests is as simple as executing the following in the
379 ``natsort`` directory:
380
381 .. code-block:: sh
382
383 $ tox
384
385 ``tox`` will create virtual a virtual environment for your tests and install all the
386 needed testing requirements for you. You can specify a particular python version
387 with the ``-e`` flag, e.g. ``tox -e py36``. Static analysis is done with ``tox -e flake8``.
388 You can see all available testing environments with ``tox --listenvs``.
389
390 If you do not wish to use ``tox``, you can install the testing dependencies with the
391 ``dev-requirements.txt`` file and then run the tests manually using
392 `pytest <https://docs.pytest.org/en/latest/>`_.
393
394 .. code-block:: console
395
396 $ pip install -r dev-requirements.txt
397 $ python -m pytest
398
399 Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this is because
400 `the former puts the CWD on sys.path <https://docs.pytest.org/en/latest/usage.html#calling-pytest-through-python-m-pytest>`_.
401
402 How to Build Documentation
403 --------------------------
404
405 If you want to build the documentation for :mod:`natsort`, it is recommended to use ``tox``:
406
407 .. code-block:: console
408
409 $ tox -e docs
410
411 This will place the documentation in ``build/sphinx/html``. If you do not
412 which to use ``tox``, you can do the following:
413
414 .. code-block:: console
415
416 $ pip install sphinx sphinx_rtd_theme
417 $ python setup.py build_sphinx
418
419 Deprecation Schedule
420 --------------------
421
422 Dropping Python 2.7 Support
423 +++++++++++++++++++++++++++
424
425 :mod:`natsort` version 7.0.0 will drop support for Python 2.7.
426
427 The version 6.X branch will remain as a "long term support" branch where bug fixes
428 are applied so that users who cannot update from Python 2.7 will not be forced to
429 use a buggy :mod:`natsort` version. Once version 7.0.0 is released, new features
430 will not be added to version 6.X, only bug fixes.
431
432 Deprecated APIs
433 +++++++++++++++
434
435 In :mod:`natsort` version 6.0.0, the following APIs and functions were removed
436
437 - ``number_type`` keyword argument (deprecated since 3.4.0)
438 - ``signed`` keyword argument (deprecated since 3.4.0)
439 - ``exp`` keyword argument (deprecated since 3.4.0)
440 - ``as_path`` keyword argument (deprecated since 3.4.0)
441 - ``py3_safe`` keyword argument (deprecated since 3.4.0)
442 - ``ns.TYPESAFE`` (deprecated since version 5.0.0)
443 - ``ns.DIGIT`` (deprecated since version 5.0.0)
444 - ``ns.VERSION`` (deprecated since version 5.0.0)
445 - :func:`~natsort.versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
446 - :func:`~natsort.index_versorted` (discouraged since version 4.0.0, officially deprecated since version 5.5.0)
447
448 In general, if you want to determine if you are using deprecated APIs you can run your
449 code with the following flag
450
451 .. code-block:: console
452
453 $ python -Wdefault::DeprecationWarning my-code.py
454
455 By default :exc:`DeprecationWarnings` are not shown, but this will cause them to be shown.
456 Alternatively, you can just set the environment variable ``PYTHONWARNINGS`` to
457 "default::DeprecationWarning" and then run your code.
458
459 Dropped Pipenv for Development
460 ++++++++++++++++++++++++++++++
461
462 :mod:`natsort` version 6.0.0 no longer uses `Pipenv <https://pipenv.readthedocs.io/en/latest/>`_
463 to install development dependencies.
464
465 Dropped Python 2.6 and 3.3 Support
466 ++++++++++++++++++++++++++++++++++
467
468 :mod:`natsort` version 6.0.0 dropped support for Python 2.6 and Python 3.3.
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _locale_issues:
4
5 Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE``
6 ==================================================================
7
8 Being Locale-Aware Means Both Numbers and Non-Numbers
9 -----------------------------------------------------
10
11 In addition to modifying how characters are sorted, ``ns.LOCALE`` will take into
12 account locale-dependent thousands separators (and locale-dependent decimal
13 separators if ``ns.FLOAT`` is enabled). This means that if you are in a
14 locale that uses commas as the thousands separator, a number like
15 ``123,456`` will be interpreted as ``123456``. If this is not what you want,
16 you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware
17 sorting for non-numbers (similarly, ``ns.LOCALENUM`` enables locale-aware
18 sorting only for numbers).
19
20 Regenerate Key With :func:`~natsort.natsort_keygen` After Changing Locale
21 -------------------------------------------------------------------------
22
23 When :func:`~natsort.natsort_keygen` is called it returns a key function that
24 hard-codes the provided settings. This means that the key returned when
25 ``ns.LOCALE`` is used contins the settings specifed by the locale
26 *loaded at the time the key is generated*. If you change the locale,
27 you should regenerate the key to account for the new locale.
28
29 Corollary: Do Not Reuse :func:`~natsort.natsort_keygen` After Changing Locale
30 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
31
32 If you change locale, the old function will not work as expected.
33 The :mod:`locale` library works with a global state. When
34 :func:`~natsort.natsort_keygen` is called it does the best job that it can to
35 make the returned function as static as possible and independent of the global
36 state, but the :func:`locale.strxfrm` function must access this global state to
37 work; therefore, if you change locale and use ``ns.LOCALE`` then you should
38 discard the old key.
39
40 .. note:: If you use `PyICU`_ then you may be able to reuse keys after changing
41 locale.
42
43 The :mod:`locale` Module From the StdLib Has Issues
44 ---------------------------------------------------
45
46 :mod:`natsort` will use `PyICU`_ for :func:`~natsort.humansorted` or
47 ``ns.LOCALE`` if it is installed. If not, it will fall back on the
48 :mod:`locale` library from the Python stdlib. If you do not have `PyICU`_
49 installed, please keep the following known problems and issues in mind.
50
51 .. note:: Remember, if you have `PyICU`_ installed you shouldn't need to worry
52 about any of these.
53
54 Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCALE``
55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
56
57 I have found that unless you explicitly set a locale, the sorted order may not
58 be what you expect. Setting this is straightforward
59 (in the below example I use 'en_US.UTF-8', but you should use your
60 locale):
61
62 .. code-block:: pycon
63
64 >>> import locale
65 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
66 'en_US.UTF-8'
67
68 .. _bug_note:
69
70 The :mod:`locale` Module Is Broken on Mac OS X
71 ++++++++++++++++++++++++++++++++++++++++++++++
72
73 It's not Python's fault, but the OS... the locale library for BSD-based systems
74 (of which Mac OS X is one) is broken. See the following links:
75
76 - https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
77 - https://bugs.python.org/issue23195
78 - https://github.com/SethMMorton/natsort/issues/21 (contains instructons on installing)
79 - https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
80 - https://github.com/SethMMorton/natsort/issues/34
81
82 Of course, installing `PyICU`_ fixes this, but if you don't want to or cannot
83 install this there is some hope.
84
85 1. As of ``natsort`` version 4.0.0, ``natsort`` is configured
86 to compensate for a broken ``locale`` library. When sorting non-numbers
87 it will handle case as you expect, but it will still not be able to
88 comprehend non-ASCII characters properly. Additionally, it has
89 a built-in lookup table of thousands separators that are incorrect
90 on OS X/BSD (but is possible it is not complete... please file an
91 issue if you see it is not complete)
92 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "\*.UTF-8"
93 locale. I have found that these have fewer issues than "UTF-8", but
94 your mileage may vary.
95
96 .. _PyICU: https://pypi.org/project/PyICU
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _shell:
4
5 Shell Script
6 ============
7
8 The ``natsort`` shell script is automatically installed when you install
9 :mod:`natsort` with pip.
10
11 Below is the usage and some usage examples for the ``natsort`` shell script.
12
13 Usage
14 -----
15
16 .. code-block:: none
17
18 usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE]
19 [-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp]
20 [--locale]
21 [entries [entries ...]]
22
23 Performs a natural sort on entries given on the command-line.
24 A natural sort sorts numerically then alphabetically, and will sort
25 by numbers in the middle of an entry.
26
27 positional arguments:
28 entries The entries to sort. Taken from stdin if nothing is
29 given on the command line.
30
31 optional arguments:
32 -h, --help show this help message and exit
33 --version show program's version number and exit
34 -p, --paths Interpret the input as file paths. This is not
35 strictly necessary to sort all file paths, but in
36 cases where there are OS-generated file paths like
37 "Folder/" and "Folder (1)/", this option is needed to
38 make the paths sorted in the order you expect
39 ("Folder/" before "Folder (1)/").
40 -f LOW HIGH, --filter LOW HIGH
41 Used for keeping only the entries that have a number
42 falling in the given range.
43 -F LOW HIGH, --reverse-filter LOW HIGH
44 Used for excluding the entries that have a number
45 falling in the given range.
46 -e EXCLUDE, --exclude EXCLUDE
47 Used to exclude an entry that contains a specific
48 number.
49 -r, --reverse Returns in reversed order.
50 -t {digit,int,float,version,ver,real,f,i,r,d},
51 --number-type {digit,int,float,version,ver,real,f,i,r,d},
52 --number_type {digit,int,float,version,ver,real,f,i,r,d}
53 Choose the type of number to search for. "float" will
54 search for floating-point numbers. "int" will only
55 search for integers. "digit", "version", and "ver" are
56 synonyms for "int"."real" is a shortcut for "float"
57 with --sign. "i" and "d" are synonyms for "int", "f"
58 is a synonym for "float", and "r" is a synonym for
59 "real".The default is int.
60 --nosign Do not consider "+" or "-" as part of a number, i.e.
61 do not take sign into consideration. This is the
62 default.
63 -s, --sign Consider "+" or "-" as part of a number, i.e. take
64 sign into consideration. The default is unsigned.
65 --noexp Do not consider an exponential as part of a number,
66 i.e. 1e4, would be considered as 1, "e", and 4, not as
67 10000. This only effects the --number-type=float.
68 -l, --locale Causes natsort to use locale-aware sorting. You will
69 get the best results if you install PyICU.
70
71 Description
72 -----------
73
74 ``natsort`` was originally written to aid in computational chemistry
75 research so that it would be easy to analyze large sets of output files
76 named after the parameter used:
77
78 .. code-block:: console
79
80 $ ls *.out
81 mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
82
83 (Obviously, in reality there would be more files, but you get the idea.) Notice
84 that the shell sorts in lexicographical order. This is the behavior of programs like
85 ``find`` as well as ``ls``. The problem is passing these files to an
86 analysis program causes them not to appear in numerical order, which can lead
87 to bad analysis. To remedy this, use ``natsort``:
88
89 .. code-block:: console
90
91 $ natsort *.out
92 mode744.43.out
93 mode943.54.out
94 mode1000.35.out
95 mode1243.34.out
96 $ natsort -t r *.out | xargs your_program
97
98 ``-t r`` is short for ``--number-type real``. You can also place natsort in
99 the middle of a pipe:
100
101 .. code-block:: console
102
103 $ find . -name "*.out" | natsort -t r | xargs your_program
104
105 To sort version numbers, use the default ``--number-type``:
106
107 .. code-block:: console
108
109 $ ls *
110 prog-1.10.zip prog-1.9.zip prog-2.0.zip
111 $ natsort *
112 prog-1.9.zip
113 prog-1.10.zip
114 prog-2.0.zip
115
116 In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API,
117 with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude``
118 options. These three options are used as follows:
119
120 .. code-block:: console
121
122 $ ls *.out
123 mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
124 $ natsort -t r *.out -f 900 1100 # Select only numbers between 900-1100
125 mode943.54.out
126 mode1000.35.out
127 $ natsort -t r *.out -F 900 1100 # Select only numbers NOT between 900-1100
128 mode744.43.out
129 mode1243.34.out
130 $ natsort -t r *.out -e 1000.35 # Exclude 1000.35 from search
131 mode744.43.out
132 mode943.54.out
133 mode1243.34.out
134
135 If you are sorting paths with OS-generated filenames, you may require the
136 ``--paths``/``-p`` option:
137
138 .. code-block:: console
139
140 $ find . ! -path . -type f
141 ./folder/file (1).txt
142 ./folder/file.txt
143 ./folder (1)/file.txt
144 ./folder (10)/file.txt
145 ./folder (2)/file.txt
146 $ find . ! -path . -type f | natsort
147 ./folder (1)/file.txt
148 ./folder (2)/file.txt
149 ./folder (10)/file.txt
150 ./folder/file (1).txt
151 ./folder/file.txt
152 $ find . ! -path . -type f | natsort -p
153 ./folder/file.txt
154 ./folder/file (1).txt
155 ./folder (1)/file.txt
156 ./folder (2)/file.txt
157 ./folder (10)/file.txt
+0
-26
docs/source/api.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _api:
4
5 natsort API
6 ===========
7
8 .. toctree::
9 :maxdepth: 2
10
11 natsort_keygen.rst
12 natsort_key.rst
13 natsorted.rst
14 versorted.rst
15 humansorted.rst
16 realsorted.rst
17 index_natsorted.rst
18 index_versorted.rst
19 index_humansorted.rst
20 index_realsorted.rst
21 order_by_index.rst
22 ns_class.rst
23 bytes.rst
24 chain.rst
25 locale_issues.rst
+0
-20
docs/source/bytes.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _bytes_help:
4
5 Help With Bytes On Python 3
6 ===========================
7
8 The official stance of :mod:`natsort` is to not support `bytes` for
9 sorting; there is just too much that can go wrong when trying to automate
10 conversion between `bytes` and `str`. But rather than completely give up
11 on `bytes`, :mod:`natsort` provides three functions that make it easy to
12 quickly decode `bytes` to `str` so that sorting is possible.
13
14 .. autofunction:: decoder
15
16 .. autofunction:: as_ascii
17
18 .. autofunction:: as_utf8
19
+0
-16
docs/source/chain.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _function_help:
4
5 Help With Creating Function Keys
6 ================================
7
8 If you need to create a complicated *key* argument to (for example)
9 :func:`natsorted` that is actually multiple functions called one after the other,
10 the following function can help you easily perform this action. It is
11 used internally to :mod:`natsort`, and has been exposed publically for
12 the convenience of the user.
13
14 .. autofunction:: chain_functions
15
+0
-369
docs/source/changelog.rst less more
0 .. _changelog:
1
2 Changelog
3 ---------
4
5 09-09-2018 v. 5.4.1
6 +++++++++++++++++++
7
8 - Fix error in a newly added test.
9 - Changed code format and quality checking infrastructure.
10
11 09-06-2018 v. 5.4.0
12 +++++++++++++++++++
13
14 - Re-expose ``natsort_key`` as "public" and remove the
15 associated ``DepricationWarning``.
16 - Add better developer documentation.
17 - Refactor tests.
18 - Bump allowed ``fastnumbers`` version.
19
20 07-07-2018 v. 5.3.3
21 +++++++++++++++++++
22
23 - Update docs with a FAQ and quick how-it-works.
24 - Fix a StopIteration error in the testing code.
25 - Enable Python 3.7 support in Travis-CI.
26
27 05-17-2018 v. 5.3.2
28 +++++++++++++++++++
29
30 - Fix bug that prevented install on old versions of setuptools.
31 - Revert layout from src/natsort/ back to natsort/ to make user
32 testing simpler.
33
34 05-14-2018 v. 5.3.1
35 +++++++++++++++++++
36
37 - No bugfixes or features, just infrastructure and installation updates.
38 - Move to defining dependencies with Pipfile.
39 - Development layout is now src/natsort/ instead of natsort/.
40 - Add bumpversion infrastructure.
41 - Extras can be installed by "[]" notation.
42
43 04-20-2018 v. 5.3.0
44 +++++++++++++++++++
45
46 - Fix bug in assessing ``fastnumbers`` version at import-time.
47 - Add ability to consider unicode-decimal numbers as numbers.
48
49 02-14-2018 v. 5.2.0
50 +++++++++++++++++++
51
52 - Add ``ns.NUMAFTER`` to cause numbers to be placed after non-numbers.
53 - Add ``natcmp`` function (Python 2 only).
54
55 11-11-2017 v. 5.1.1
56 +++++++++++++++++++
57
58 - Added additional unicode number support for Python 3.7.
59 - Added information on how to install and test.
60
61 08-19-2017 v. 5.1.0
62 +++++++++++++++++++
63
64 - Fixed ``StopIteration`` warning on Python 3.6+.
65 - All Unicode input is now normalized.
66
67 04-30-2017 v. 5.0.3
68 +++++++++++++++++++
69
70 - Improved development infrastructure.
71 - Migrated documentation to ReadTheDocs.
72
73 01-02-2017 v. 5.0.2
74 +++++++++++++++++++
75
76 - Added additional unicode number support for Python 3.6.
77 - Renamed several internal functions and variables to improve clarity.
78 - Improved documentation examples.
79 - Added a "how does it work?" section to the documentation.
80
81 06-04-2016 v. 5.0.1
82 +++++++++++++++++++
83
84 - The ``ns`` enum attributes can now be imported from the top-level
85 namespace.
86 - Fixed a bug with the ``from natsort import *`` mechanism.
87 - Fixed bug with using ``natsort`` with ``python -OO``.
88
89 05-08-2016 v. 5.0.0
90 +++++++++++++++++++
91
92 - ``ns.LOCALE``/``humansorted`` now accounts for thousands separators.
93 - Refactored entire codebase to be more functional (as in use functions as
94 units). Previously, the code was rather monolithic and difficult to follow. The
95 goal is that with the code existing in smaller units, contributing will
96 be easier.
97 - Deprecated ``ns.TYPESAFE`` option as it is now always on (due to a new
98 iterator-based algorithm, the typesafe function is now cheap).
99 - Increased speed of execution (came for free with the new functional approach
100 because the new factory function paradigm eliminates most ``if`` branches
101 during execution).
102
103 - For the most cases, the code is 30-40% faster than version 4.0.4.
104 - If using ``ns.LOCALE`` or ``humansorted``, the code is 1100% faster than
105 version 4.0.4.
106
107 - Improved clarity of documentaion with regards to locale-aware sorting.
108 - Added a new ``chain_functions`` function for convenience in creating
109 a complex user-given ``key`` from several existing functions.
110
111 11-01-2015 v. 4.0.4
112 +++++++++++++++++++
113
114 - Improved coverage of unit tests.
115 - Unit tests use new and improved hypothesis library.
116 - Fixed compatibility issues with Python 3.5
117
118 06-25-2015 v. 4.0.3
119 +++++++++++++++++++
120
121 - Fixed bad install on last release (sorry guys!).
122
123 06-24-2015 v. 4.0.2
124 +++++++++++++++++++
125
126 - Added back Python 2.6 and Python 3.2 compatibility. Unit testing is now
127 performed for these versions.
128 - Consolidated under-the-hood compatibility functionality.
129
130 06-04-2015 v. 4.0.1
131 +++++++++++++++++++
132
133 - Added support for sorting NaN by internally converting to -Infinity
134 or +Infinity
135
136 05-17-2015 v. 4.0.0
137 +++++++++++++++++++
138
139 - Made default behavior of 'natsort' search for unsigned ints,
140 rather than signed floats. This is a backwards-incompatible
141 change but in 99% of use cases it should not require any
142 end-user changes.
143 - Improved handling of locale-aware sorting on systems where the
144 underlying locale library is broken.
145 - Greatly improved all unit tests by adding the hypothesis library.
146
147 04-06-2015 v. 3.5.6
148 +++++++++++++++++++
149
150 - Added 'UNGROUPLETTERS' algorithm to get the case-grouping behavior of
151 an ordinal sort when using 'LOCALE'.
152 - Added convenience functions 'decoder', 'as_ascii', and 'as_utf8' for
153 dealing with bytes types.
154
155 04-04-2015 v. 3.5.5
156 +++++++++++++++++++
157
158 - Added 'realsorted' and 'index_realsorted' functions for
159 forward-compatibility with >= 4.0.0.
160 - Made explanation of when to use "TYPESAFE" more clear in the docs.
161
162 04-02-2015 v. 3.5.4
163 +++++++++++++++++++
164
165 - Fixed bug where a 'TypeError' was raised if a string containing a leading
166 number was sorted with alpha-only strings when 'LOCALE' is used.
167
168 03-26-2015 v. 3.5.3
169 +++++++++++++++++++
170
171 - Fixed bug where '--reverse-filter' option in shell script was not
172 getting checked for correctness.
173 - Documentation updates to better describe locale bug, and illustrate
174 upcoming default behavior change.
175 - Internal improvements, including making test suite more granular.
176
177 01-13-2015 v. 3.5.2
178 +++++++++++++++++++
179
180 - Enhancement that will convert a 'pathlib.Path' object to a 'str' if
181 'ns.PATH' is enabled.
182
183 09-25-2014 v. 3.5.1
184 +++++++++++++++++++
185
186 - Fixed bug that caused list/tuples to fail when using 'ns.LOWECASEFIRST'
187 or 'ns.IGNORECASE'.
188 - Refactored modules so that only the public API was in natsort.py and
189 ns_enum.py.
190 - Refactored all import statements to be absolute, not relative.
191
192
193 09-02-2014 v. 3.5.0
194 +++++++++++++++++++
195
196 - Added the 'alg' argument to the 'natsort' functions. This argument
197 accepts an enum that is used to indicate the options the user wishes
198 to use. The 'number_type', 'signed', 'exp', 'as_path', and 'py3_safe'
199 options are being deprecated and will become (undocumented)
200 keyword-only options in natsort version 4.0.0.
201 - The user can now modify how 'natsort' handles the case of non-numeric
202 characters.
203 - The user can now instruct 'natsort' to use locale-aware sorting, which
204 allows 'natsort' to perform true "human sorting".
205
206 - The `humansorted` convenience function has been included to make this
207 easier.
208
209 - Updated shell script with locale functionality.
210
211 08-12-2014 v. 3.4.1
212 +++++++++++++++++++
213
214 - 'natsort' will now use the 'fastnumbers' module if it is installed. This
215 gives up to an extra 30% boost in speed over the previous performance
216 enhancements.
217 - Made documentation point to more 'natsort' resources, and also added a
218 new example in the examples section.
219
220 07-19-2014 v. 3.4.0
221 +++++++++++++++++++
222
223 - Fixed a bug that caused user's options to the 'natsort_key' to not be
224 passed on to recursive calls of 'natsort_key'.
225 - Added a 'natsort_keygen' function that will generate a wrapped version
226 of 'natsort_key' that is easier to call. 'natsort_key' is now set to
227 deprecate at natsort version 4.0.0.
228 - Added an 'as_path' option to 'natsorted' & co. that will try to treat
229 input strings as filepaths. This will help yield correct results for
230 OS-generated inputs like
231 ``['/p/q/o.x', '/p/q (1)/o.x', '/p/q (10)/o.x', '/p/q/o (1).x']``.
232 - Massive performance enhancements for string input (1.8x-2.0x), at the expense
233 of reduction in speed for numeric input (~2.0x).
234
235 - This is a good compromise because the most common input will be strings,
236 not numbers, and sorting numbers still only takes 0.6x the time of sorting
237 strings. If you are sorting only numbers, you would use 'sorted' anyway.
238
239 - Added the 'order_by_index' function to help in using the output of
240 'index_natsorted' and 'index_versorted'.
241 - Added the 'reverse' option to 'natsorted' & co. to make it's API more
242 similar to the builtin 'sorted'.
243 - Added more unit tests.
244 - Added auxillary test code that helps in profiling and stress-testing.
245 - Reworked the documentation, moving most of it to PyPI's hosting platform.
246 - Added support for coveralls.io.
247 - Entire codebase is now PyFlakes and PEP8 compliant.
248
249 06-28-2014 v. 3.3.0
250 +++++++++++++++++++
251
252 - Added a 'versorted' method for more convenient sorting of versions.
253 - Updated command-line tool --number_type option with 'version' and 'ver'
254 to make it more clear how to sort version numbers.
255 - Moved unit-testing mechanism from being docstring-based to actual unit tests
256 in actual functions.
257
258 - This has provided the ability determine the coverage of the unit tests (99%).
259 - This also makes the pydoc documentation a bit more clear.
260
261 - Made docstrings for public functions mirror the README API.
262 - Connected natsort development to Travis-CI to help ensure quality releases.
263
264 06-20-2014 v. 3.2.1
265 +++++++++++++++++++
266
267 - Re-"Fixed" unorderable types issue on Python 3.x - this workaround
268 is for when the problem occurs in the middle of the string.
269
270 05-07-2014 v. 3.2.0
271 +++++++++++++++++++
272
273 - "Fixed" unorderable types issue on Python 3.x with a workaround that
274 attempts to replicate the Python 2.x behavior by putting all the numbers
275 (or strings that begin with numbers) first.
276 - Now explicitly excluding __pycache__ from releases by adding a prune statement
277 to MANIFEST.in.
278
279 05-05-2014 v. 3.1.2
280 +++++++++++++++++++
281
282 - Added setup.cfg to support universal wheels.
283 - Added Python 3.0 and Python 3.1 as requiring the argparse module.
284
285 03-01-2014 v. 3.1.1
286 +++++++++++++++++++
287
288 - Added ability to sort lists of lists.
289 - Cleaned up import statements.
290
291 01-20-2014 v. 3.1.0
292 +++++++++++++++++++
293
294 - Added the ``signed`` and ``exp`` options to allow finer tuning of the sorting
295 - Entire codebase now works for both Python 2 and Python 3 without needing to run
296 ``2to3``.
297 - Updated all doctests.
298 - Further simplified the ``natsort`` base code by removing unneeded functions.
299 - Simplified documentation where possible.
300 - Improved the shell script code
301
302 - Made the documentation less "path"-centric to make it clear it is not just
303 for sorting file paths.
304 - Removed the filesystem-based options because these can be achieved better
305 though a pipeline.
306 - Added doctests.
307 - Added new options that correspond to ``signed`` and ``exp``.
308 - The user can now specify multiple numbers to exclude or multiple ranges
309 to filter by.
310
311 10-01-2013 v. 3.0.2
312 +++++++++++++++++++
313
314 - Made float, int, and digit searching algorithms all share the same base function.
315 - Fixed some outdated comments.
316 - Made the ``__version__`` variable available when importing the module.
317
318 8-15-2013 v. 3.0.1
319 ++++++++++++++++++
320
321 - Added support for unicode strings.
322 - Removed extraneous ``string2int`` function.
323 - Fixed empty string removal function.
324
325 7-13-2013 v. 3.0.0
326 ++++++++++++++++++
327
328 - Added a ``number_type`` argument to the sorting functions to specify how
329 liberal to be when deciding what a number is.
330 - Reworked the documentation.
331
332 6-25-2013 v. 2.2.0
333 ++++++++++++++++++
334
335 - Added ``key`` attribute to ``natsorted`` and ``index_natsorted`` so that
336 it mimics the functionality of the built-in ``sorted``
337 - Added tests to reflect the new functionality, as well as tests demonstrating
338 how to get similar functionality using ``natsort_key``.
339
340 12-5-2012 v. 2.1.0
341 ++++++++++++++++++
342
343 - Reorganized package.
344 - Now using a platform independent shell script generator (entry_points
345 from distribute).
346 - Can now execute natsort from command line with ``python -m natsort``
347 as well.
348
349 11-30-2012 v. 2.0.2
350 +++++++++++++++++++
351
352 - Added the use_2to3 option to setup.py.
353 - Added distribute_setup.py to the distribution.
354 - Added dependency to the argparse module (for python2.6).
355
356 11-21-2012 v. 2.0.1
357 +++++++++++++++++++
358
359 - Reorganized directory structure.
360 - Added tests into the natsort.py file iteself.
361
362 11-16-2012, v. 2.0.0
363 ++++++++++++++++++++
364
365 - Updated sorting algorithm to support floats (including exponentials) and
366 basic version number support.
367 - Added better README documentation.
368 - Added doctests.
+0
-275
docs/source/conf.py less more
0 # -*- coding: utf-8 -*-
1 #
2 # natsort documentation build configuration file, created by
3 # sphinx-quickstart on Thu Jul 17 21:01:29 2014.
4 #
5 # This file is execfile()d with the current directory set to its
6 # containing dir.
7 #
8 # Note that not all possible configuration values are present in this
9 # autogenerated file.
10 #
11 # All configuration values have a default; values that are commented out
12 # serve to show the default.
13
14 import os
15
16 # If extensions (or modules to document with autodoc) are in another directory,
17 # add these directories to sys.path here. If the directory is relative to the
18 # documentation root, use os.path.abspath to make it absolute, like shown here.
19 # sys.path.insert(0, os.path.abspath('.'))
20
21 # -- General configuration ------------------------------------------------
22
23 # If your documentation needs a minimal Sphinx version, state it here.
24 # needs_sphinx = '1.0'
25
26 # Add any Sphinx extension module names here, as strings. They can be
27 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
28 # ones.
29 extensions = [
30 'sphinx.ext.autodoc',
31 'sphinx.ext.autosummary',
32 'sphinx.ext.intersphinx',
33 'sphinx.ext.mathjax',
34 'sphinx.ext.napoleon',
35 ]
36
37 # Add any paths that contain templates here, relative to this directory.
38 templates_path = ['_templates']
39
40 # The suffix of source filenames.
41 source_suffix = '.rst'
42
43 # The encoding of source files.
44 # source_encoding = 'utf-8-sig'
45
46 # The master toctree document.
47 master_doc = 'index'
48
49 # General information about the project.
50 project = u'natsort'
51 # noinspection PyShadowingBuiltins
52 copyright = u'2014, Seth M. Morton'
53
54 # The version info for the project you're documenting, acts as replacement for
55 # |version| and |release|, also used in various other places throughout the
56 # built documents.
57 #
58 # The full version, including alpha/beta/rc tags.
59 release = '5.4.1'
60 # The short X.Y version.
61 version = '.'.join(release.split('.')[0:2])
62
63 # The language for content autogenerated by Sphinx. Refer to documentation
64 # for a list of supported languages.
65 # language = None
66
67 # There are two options for replacing |today|: either, you set today to some
68 # non-false value, then it is used:
69 # today = ''
70 # Else, today_fmt is used as the format for a strftime call.
71 # today_fmt = '%B %d, %Y'
72
73 # List of patterns, relative to source directory, that match files and
74 # directories to ignore when looking for source files.
75 # exclude_patterns = ['solar/*']
76
77 # The reST default role (used for this markup: `text`) to use for all
78 # documents.
79 # default_role = None
80
81 # If true, '()' will be appended to :func: etc. cross-reference text.
82 # add_function_parentheses = True
83
84 # If true, the current module name will be prepended to all description
85 # unit titles (such as .. function::).
86 # add_module_names = True
87
88 # If true, sectionauthor and moduleauthor directives will be shown in the
89 # output. They are ignored by default.
90 # show_authors = False
91
92 # The name of the Pygments (syntax highlighting) style to use.
93 pygments_style = 'sphinx'
94 highlight_language = 'python'
95
96 # A list of ignored prefixes for module index sorting.
97 # modindex_common_prefix = []
98
99 # If true, keep warnings as "system message" paragraphs in the built documents.
100 # keep_warnings = False
101
102
103 # -- Options for HTML output ----------------------------------------------
104
105 # The theme to use for HTML and HTML Help pages. See the documentation for
106 # a list of builtin themes.
107 on_rtd = os.environ.get('READTHEDOCS') == 'True'
108 if on_rtd:
109 html_theme = 'default'
110 else:
111 import sphinx_rtd_theme
112
113 html_theme = 'sphinx_rtd_theme'
114 # html_theme = 'solar'
115
116 # Theme options are theme-specific and customize the look and feel of a theme
117 # further. For a list of options available for each theme, see the
118 # documentation.
119 # html_theme_options = {}
120
121 # Add any paths that contain custom themes here, relative to this directory.
122 html_theme_path = ['.']
123
124 # The name for this set of Sphinx documents. If None, it defaults to
125 # "<project> v<release> documentation".
126 # html_title = None
127
128 # A shorter title for the navigation bar. Default is the same as html_title.
129 # html_short_title = None
130
131 # The name of an image file (relative to this directory) to place at the top
132 # of the sidebar.
133 # html_logo = None
134
135 # The name of an image file (within the static path) to use as favicon of the
136 # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
137 # pixels large.
138 # html_favicon = None
139
140 # Add any paths that contain custom static files (such as style sheets) here,
141 # relative to this directory. They are copied after the builtin static files,
142 # so a file named "default.css" will overwrite the builtin "default.css".
143 # html_static_path = ['_static']
144
145 # Add any extra paths that contain custom files (such as robots.txt or
146 # .htaccess) here, relative to this directory. These files are copied
147 # directly to the root of the documentation.
148 # html_extra_path = []
149
150 # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
151 # using the given strftime format.
152 # html_last_updated_fmt = '%b %d, %Y'
153
154 # If true, SmartyPants will be used to convert quotes and dashes to
155 # typographically correct entities.
156 # html_use_smartypants = True
157
158 # Custom sidebar templates, maps document names to template names.
159 # html_sidebars = {}
160
161 # Additional templates that should be rendered to pages, maps page names to
162 # template names.
163 # html_additional_pages = {}
164
165 # If false, no module index is generated.
166 # html_domain_indices = True
167
168 # If false, no index is generated.
169 # html_use_index = True
170
171 # If true, the index is split into individual pages for each letter.
172 # html_split_index = False
173
174 # If true, links to the reST sources are added to the pages.
175 # html_show_sourcelink = True
176
177 # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
178 # html_show_sphinx = True
179
180 # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
181 # html_show_copyright = True
182
183 # If true, an OpenSearch description file will be output, and all pages will
184 # contain a <link> tag referring to it. The value of this option must be the
185 # base URL from which the finished HTML is served.
186 # html_use_opensearch = ''
187
188 # This is the file name suffix for HTML files (e.g. ".xhtml").
189 # html_file_suffix = None
190
191 # Output file base name for HTML help builder.
192 htmlhelp_basename = 'natsortdoc'
193
194 # -- Options for LaTeX output ---------------------------------------------
195
196 latex_elements = {
197 # The paper size ('letterpaper' or 'a4paper').
198 # 'papersize': 'letterpaper',
199
200 # The font size ('10pt', '11pt' or '12pt').
201 # 'pointsize': '10pt',
202
203 # Additional stuff for the LaTeX preamble.
204 # 'preamble': '',
205 }
206
207 # Grouping the document tree into LaTeX files. List of tuples
208 # (source start file, target name, title,
209 # author, documentclass [howto, manual, or own class]).
210 latex_documents = [
211 ('index', 'natsort.tex', u'natsort Documentation',
212 u'Seth M. Morton', 'manual'),
213 ]
214
215 # The name of an image file (relative to this directory) to place at the top of
216 # the title page.
217 # latex_logo = None
218
219 # For "manual" documents, if this is true, then toplevel headings are parts,
220 # not chapters.
221 # latex_use_parts = False
222
223 # If true, show page references after internal links.
224 # latex_show_pagerefs = False
225
226 # If true, show URL addresses after external links.
227 # latex_show_urls = False
228
229 # Documents to append as an appendix to all manuals.
230 # latex_appendices = []
231
232 # If false, no module index is generated.
233 # latex_domain_indices = True
234
235
236 # -- Options for manual page output ---------------------------------------
237
238 # One entry per manual page. List of tuples
239 # (source start file, name, description, authors, manual section).
240 man_pages = [
241 ('index', 'natsort', u'natsort Documentation',
242 [u'Seth M. Morton'], 1)
243 ]
244
245 # If true, show URL addresses after external links.
246 # man_show_urls = False
247
248
249 # -- Options for Texinfo output -------------------------------------------
250
251 # Grouping the document tree into Texinfo files. List of tuples
252 # (source start file, target name, title, author,
253 # dir menu entry, description, category)
254 texinfo_documents = [
255 ('index', 'natsort', u'natsort Documentation',
256 u'Seth M. Morton', 'natsort', 'One line description of project.',
257 'Miscellaneous'),
258 ]
259
260 # Documents to append as an appendix to all manuals.
261 # texinfo_appendices = []
262
263 # If false, no module index is generated.
264 # texinfo_domain_indices = True
265
266 # How to display URL addresses: 'footnote', 'no', or 'inline'.
267 # texinfo_show_urls = 'footnote'
268
269 # If true, do not generate a @detailmenu in the "Top" node's menu.
270 # texinfo_no_detailmenu = False
271
272
273 # Example configuration for intersphinx: refer to the Python standard library.
274 intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+0
-366
docs/source/examples.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _examples:
4
5 Examples and Recipes
6 ====================
7
8 If you want more detailed examples than given on this page, please see
9 https://github.com/SethMMorton/natsort/tree/master/test_natsort.
10
11 .. contents::
12 :local:
13
14 Basic Usage
15 -----------
16
17 In the most basic use case, simply import :func:`~natsorted` and use
18 it as you would :func:`sorted`:
19
20 .. code-block:: python
21
22 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
23 >>> sorted(a)
24 ['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
25 >>> from natsort import natsorted, ns
26 >>> natsorted(a)
27 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
28
29 Sort Version Numbers
30 --------------------
31
32 As of :mod:`natsort` version >= 4.0.0, :func:`~natsorted` will now properly
33 sort version numbers. The old function :func:`~versorted` exists for
34 backwards compatibility but new development should use :func:`~natsorted`.
35
36 .. _rc_sorting:
37
38 Sorting with Alpha, Beta, and Release Candidates
39 ++++++++++++++++++++++++++++++++++++++++++++++++
40
41 By default, if you wish to sort versions with a non-strict versioning
42 scheme, you may not get the results you expect:
43
44 .. code-block:: python
45
46 >>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta1', '1.2alpha', '1.2.1', '1.1', '1.3']
47 >>> natsorted(a)
48 ['1.1', '1.2', '1.2.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.3']
49
50 To make the '1.2' pre-releases come before '1.2.1', you need to use the following
51 recipe:
52
53 .. code-block:: python
54
55 >>> natsorted(a, key=lambda x: x.replace('.', '~'))
56 ['1.1', '1.2', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2.1', '1.3']
57
58 If you also want '1.2' after all the alpha, beta, and rc candidates, you can
59 modify the above recipe:
60
61 .. code-block:: python
62
63 >>> natsorted(a, key=lambda x: x.replace('.', '~')+'z')
64 ['1.1', '1.2alpha', '1.2beta1', '1.2beta2', '1.2rc1', '1.2', '1.2.1', '1.3']
65
66 Please see `this issue <https://github.com/SethMMorton/natsort/issues/13>`_ to
67 see why this works.
68
69 .. _path_sort:
70
71 Sort OS-Generated Paths
72 -----------------------
73
74 In some cases when sorting file paths with OS-Generated names, the default
75 :mod:`~natsorted` algorithm may not be sufficient. In cases like these,
76 you may need to use the ``ns.PATH`` option:
77
78 .. code-block:: python
79
80 >>> a = ['./folder/file (1).txt',
81 ... './folder/file.txt',
82 ... './folder (1)/file.txt',
83 ... './folder (10)/file.txt']
84 >>> natsorted(a)
85 ['./folder (1)/file.txt', './folder (10)/file.txt', './folder/file (1).txt', './folder/file.txt']
86 >>> natsorted(a, alg=ns.PATH)
87 ['./folder/file.txt', './folder/file (1).txt', './folder (1)/file.txt', './folder (10)/file.txt']
88
89 Locale-Aware Sorting (Human Sorting)
90 ------------------------------------
91
92 .. note::
93 Please read :ref:`locale_issues` before using ``ns.LOCALE``, :func:`humansorted`,
94 or :func:`index_humansorted`.
95
96 You can instruct :mod:`natsort` to use locale-aware sorting with the
97 ``ns.LOCALE`` option. In addition to making this understand non-ASCII
98 characters, it will also properly interpret non-'.' decimal separators
99 and also properly order case. It may be more convenient to just use
100 the :func:`humansorted` function:
101
102 .. code-block:: python
103
104 >>> from natsort import humansorted
105 >>> import locale
106 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
107 'en_US.UTF-8'
108 >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
109 >>> natsorted(a, alg=ns.LOCALE)
110 ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
111 >>> humansorted(a)
112 ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
113
114 You may find that if you do not explicitly set the locale your results may not
115 be as you expect... I have found that it depends on the system you are on.
116 If you use `PyICU <https://pypi.org/project/PyICU>`_ (see below) then
117 you should not need to do this.
118
119 .. _case_sort:
120
121 Controlling Case When Sorting
122 -----------------------------
123
124 For non-numbers, by default :mod:`natsort` used ordinal sorting (i.e.
125 it sorts by the character's value in the ASCII table). For example:
126
127 .. code-block:: python
128
129 >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
130 >>> natsorted(a)
131 ['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
132
133 There are times when you wish to ignore the case when sorting,
134 you can easily do this with the ``ns.IGNORECASE`` option:
135
136 .. code-block:: python
137
138 >>> natsorted(a, alg=ns.IGNORECASE)
139 ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
140
141 Note thats since Python's sorting is stable, the order of equivalent
142 elements after lowering the case is the same order they appear in the
143 original list.
144
145 Upper-case letters appear first in the ASCII table, but many natural
146 sorting methods place lower-case first. To do this, use
147 ``ns.LOWERCASEFIRST``:
148
149 .. code-block:: python
150
151 >>> natsorted(a, alg=ns.LOWERCASEFIRST)
152 ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
153
154 It may be undesirable to have the upper-case letters grouped together
155 and the lower-case letters grouped together; most would expect all
156 "a"s to bet together regardless of case, and all "b"s, and so on. To
157 achieve this, use ``ns.GROUPLETTERS``:
158
159 .. code-block:: python
160
161 >>> natsorted(a, alg=ns.GROUPLETTERS)
162 ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
163
164 You might combine this with ``ns.LOWERCASEFIRST`` to get what most
165 would expect to be "natural" sorting:
166
167 .. code-block:: python
168
169 >>> natsorted(a, alg=ns.G | ns.LF)
170 ['apple', 'Apple', 'banana', 'Banana', 'corn', 'Corn']
171
172 Customizing Float Definition
173 ----------------------------
174
175 You can make :func:`~natsorted` search for any float that would be
176 a valid Python float literal, such as 5, 0.4, -4.78, +4.2E-34, etc.
177 using the ``ns.FLOAT`` key. You can disable the exponential component
178 of the number with ``ns.NOEXP``.
179
180 .. code-block:: python
181
182 >>> a = ['a50', 'a51.', 'a+50.4', 'a5.034e1', 'a+50.300']
183 >>> natsorted(a, alg=ns.FLOAT)
184 ['a50', 'a5.034e1', 'a51.', 'a+50.300', 'a+50.4']
185 >>> natsorted(a, alg=ns.FLOAT | ns.SIGNED)
186 ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
187 >>> natsorted(a, alg=ns.FLOAT | ns.SIGNED | ns.NOEXP)
188 ['a5.034e1', 'a50', 'a+50.300', 'a+50.4', 'a51.']
189
190 For convenience, the ``ns.REAL`` option is provided which is a shortcut
191 for ``ns.FLOAT | ns.SIGNED`` and can be used to sort on real numbers.
192 This can be easily accessed with the :func:`~realsorted` convenience
193 function. Please note that the behavior of the :func:`~realsorted` function
194 was the default behavior of :func:`~natsorted` for :mod:`natsort`
195 version < 4.0.0:
196
197 .. code-block:: python
198
199 >>> natsorted(a, alg=ns.REAL)
200 ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
201 >>> from natsort import realsorted
202 >>> realsorted(a)
203 ['a50', 'a+50.300', 'a5.034e1', 'a+50.4', 'a51.']
204
205 .. _custom_sort:
206
207 Using a Custom Sorting Key
208 --------------------------
209
210 Like the built-in ``sorted`` function, ``natsorted`` can accept a custom
211 sort key so that:
212
213 .. code-block:: python
214
215 >>> from operator import attrgetter, itemgetter
216 >>> a = [['a', 'num4'], ['b', 'num8'], ['c', 'num2']]
217 >>> natsorted(a, key=itemgetter(1))
218 [['c', 'num2'], ['a', 'num4'], ['b', 'num8']]
219 >>> class Foo:
220 ... def __init__(self, bar):
221 ... self.bar = bar
222 ... def __repr__(self):
223 ... return "Foo('{0}')".format(self.bar)
224 >>> b = [Foo('num3'), Foo('num5'), Foo('num2')]
225 >>> natsorted(b, key=attrgetter('bar'))
226 [Foo('num2'), Foo('num3'), Foo('num5')]
227
228 Generating a Natsort Key
229 ------------------------
230
231 If you need to sort a list in-place, you cannot use :func:`~natsorted`; you
232 need to pass a key to the :meth:`list.sort` method. The function
233 :func:`~natsort_keygen` is a convenient way to generate these keys for you:
234
235 .. code-block:: python
236
237 >>> from natsort import natsort_keygen
238 >>> a = ['a50', 'a51.', 'a50.4', 'a5.034e1', 'a50.300']
239 >>> natsort_key = natsort_keygen(alg=ns.FLOAT)
240 >>> a.sort(key=natsort_key)
241 >>> a
242 ['a50', 'a50.300', 'a5.034e1', 'a50.4', 'a51.']
243
244 :func:`~natsort_keygen` has the same API as :func:`~natsorted` (minus the
245 `reverse` option).
246
247 Natural Sorting with ``cmp`` (Python 2 only)
248 --------------------------------------------
249
250 .. note::
251 This is a Python2-only feature! The :func:`natcmp` function is not
252 exposed on Python3. Because this documentation is built with
253 Python3, you will not find :func:`natcmp` in the API.
254
255 If you are using a legacy codebase that requires you to use :func:`cmp` instead
256 of a key-function, you can use :func:`~natcmp`.
257
258 .. code-block:: python
259
260 >>> import sys
261 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
262 >>> if sys.version_info[0] == 2:
263 ... from natsort import natcmp
264 ... sorted(a, cmp=natcmp)
265 ... else:
266 ... natsorted(a) # so docstrings don't fail
267 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
268
269 :func:`natcmp` also accepts an ``alg`` argument so you can customize your
270 sorting experience.
271
272 Sorting Multiple Lists According to a Single List
273 -------------------------------------------------
274
275 Sometimes you have multiple lists, and you want to sort one of those
276 lists and reorder the other lists according to how the first was sorted.
277 To achieve this you could use the :func:`~index_natsorted` in combination
278 with the convenience function
279 :func:`~order_by_index`:
280
281 .. code-block:: python
282
283 >>> from natsort import index_natsorted, order_by_index
284 >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
285 >>> b = [4, 5, 6, 7, 8]
286 >>> c = ['hi', 'lo', 'ah', 'do', 'up']
287 >>> index = index_natsorted(a)
288 >>> order_by_index(a, index)
289 ['a1', 'a2', 'a4', 'a9', 'a10']
290 >>> order_by_index(b, index)
291 [6, 4, 7, 5, 8]
292 >>> order_by_index(c, index)
293 ['ah', 'hi', 'do', 'lo', 'up']
294
295 Returning Results in Reverse Order
296 ----------------------------------
297
298 Just like the :func:`sorted` built-in function, you can supply the
299 ``reverse`` option to return the results in reverse order:
300
301 .. code-block:: python
302
303 >>> a = ['a2', 'a9', 'a1', 'a4', 'a10']
304 >>> natsorted(a, reverse=True)
305 ['a10', 'a9', 'a4', 'a2', 'a1']
306
307 Sorting Bytes on Python 3
308 -------------------------
309
310 Python 3 is rather strict about comparing strings and bytes, and this
311 can make it difficult to deal with collections of both. Because of the
312 challenge of guessing which encoding should be used to decode a bytes
313 array to a string, :mod:`natsort` does *not* try to guess and automatically
314 convert for you; in fact, the official stance of :mod:`natsort` is to
315 not support sorting bytes. Instead, some decoding convenience functions
316 have been provided to you (see :ref:`bytes_help`) that allow you to
317 provide a codec for decoding bytes through the ``key`` argument that
318 will allow :mod:`natsort` to convert byte arrays to strings for sorting;
319 these functions know not to raise an error if the input is not a byte
320 array, so you can use the key on any arbitrary collection of data.
321
322 .. code-block:: python
323
324 >>> from natsort import as_ascii
325 >>> a = [b'a', 14.0, 'b']
326 >>> # On Python 2, natsorted(a) would would work as expected.
327 >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
328 >>> natsorted(a, key=as_ascii) == [14.0, b'a', 'b']
329 True
330
331 Additionally, regular expressions cannot be run on byte arrays, making it
332 so that :mod:`natsort` cannot parse them for numbers. As a result, if you
333 run :mod:`natsort` on a list of bytes, you will get results that are like
334 Python's default sorting behavior. Of course, you can use the decoding
335 functions to solve this:
336
337 .. code-block:: python
338
339 >>> from natsort import as_utf8
340 >>> a = [b'a56', b'a5', b'a6', b'a40']
341 >>> natsorted(a) # doctest: +SKIP
342 [b'a40', b'a5', b'a56', b'a6']
343 >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
344 True
345
346 If you need a codec different from ASCII or UTF-8, you can use
347 :func:`decoder` to generate a custom key:
348
349 .. code-block:: python
350
351 >>> from natsort import decoder
352 >>> a = [b'a56', b'a5', b'a6', b'a40']
353 >>> natsorted(a, key=decoder('latin1')) == [b'a5', b'a6', b'a40', b'a56']
354 True
355
356 Sorting a Pandas DataFrame
357 --------------------------
358
359 As of Pandas version 0.16.0, the sorting methods do not accept a ``key`` argument,
360 so you cannot simply pass :func:`natsort_keygen` to a Pandas DataFrame and sort.
361 This request has been made to the Pandas devs; see
362 `issue 3942 <https://github.com/pydata/pandas/issues/3942>`_ if you are interested.
363 If you need to sort a Pandas DataFrame, please check out
364 `this answer on StackOverflow <http://stackoverflow.com/a/29582718/1399279>`_
365 for ways to do this without the ``key`` argument to ``sort``.
+0
-1113
docs/source/howitworks.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _howitworks:
4
5 How Does Natsort Work?
6 ======================
7
8 .. contents::
9 :local:
10
11 :mod:`natsort` works by breaking strings into smaller sub-components (numbers
12 or everything else), and returning these components in a tuple. Sorting
13 tuples in Python is well-defined, and this fact is used to sort the input
14 strings properly. But how does one break a string into sub-components?
15 And what does one do to those components once they are split? Below I
16 will explain the algorithm that was chosen for the :mod:`natsort` module,
17 and some of the thinking that went into those design decisions. I will
18 also mention some of the stumbling blocks I ran into because
19 `getting sorting right is surprisingly hard`_.
20
21 If you are impatient, you can skip to :ref:`tldr1` for the algorithm
22 in the simplest case, and :ref:`tldr2`
23 to see what extra code is needed to handle special cases.
24
25 First, How Does Natural Sorting Work At a High Level?
26 -----------------------------------------------------
27
28 If I want to compare '2 ft 7 in' to '2 ft 11 in', I might do the following
29
30 .. code-block:: python
31
32 >>> '2 ft 7 in' < '2 ft 11 in'
33 False
34
35 We as humans know that the above should be true, but why does Python think it
36 is false? Here is how it is performing the comparison::
37
38 '2' <=> '2' ==> equal, so keep going
39 ' ' <=> ' ' ==> equal, so keep going
40 'f' <=> 'f' ==> equal, so keep going
41 't' <=> 't' ==> equal, so keep going
42 ' ' <=> ' ' ==> equal, so keep going
43 '7' <=> '1' ==> different, use result of '7' < '1'
44
45 '7' evaluates as greater than '1' so the statement is false. When sorting, if
46 a value is less than another it is placed first, so in our above example
47 '2 ft 11 in' would end up before '2 ft 7 in', which is not correct. What to do?
48
49 The best way to handle this is to break the string into sub-components
50 of numbers and non-numbers, and then convert the numeric parts into
51 :func:`float` or :func:`int` types. This will force Python to
52 actually understand the context of what it is sorting and then "do the
53 right thing." Luckily, it handles sorting lists of strings right out-of-the-box,
54 so the only hard part is actually making this string-to-list transformation
55 and then Python will handle the rest.
56
57 ::
58
59 '2 ft 7 in' ==> (2, ' ft ', 7, ' in')
60 '2 ft 11 in' ==> (2, ' ft ', 11, ' in')
61
62 When Python compares the two, it roughly follows the below logic::
63
64 2 <=> 2 ==> equal, so keep going
65 ' ft ' <=> ' ft ' ==> a string is a special type of sequence - evaluate each character individually
66 ||
67 -->
68 ' ' <=> ' ' ==> equal, so keep going
69 'f' <=> 'f' ==> equal, so keep going
70 't' <=> 't' ==> equal, so keep going
71 ' ' <=> ' ' ==> equal, so keep going
72 <== Back to parent sequence
73 7 <=> 11 ==> different, use the result of 7 < 11
74
75 Clearly, seven is less than eleven, so our comparison is as we expect, and we
76 would get the sorting order we wanted.
77
78 At its heart, :mod:`natsort` is simply a tool to break strings into tuples,
79 turning numbers in strings (i.e. ``'79'``) into *ints* and *floats* as it does this.
80
81 Natsort's Approach
82 ------------------
83
84 .. contents::
85 :local:
86
87 Decomposing Strings Into Sub-Components
88 +++++++++++++++++++++++++++++++++++++++
89
90 The first major hurtle to overcome is to decompose the string into sub-components.
91 Remarkably, this turns out to be the easy part, owing mostly to Python's easy access
92 to regular expressions. Breaking an arbitrary string based on a pattern is pretty
93 straightforward.
94
95 .. code-block:: python
96
97 >>> import re
98 >>> re.split(r'(\d+)', '2 ft 11 in')
99 ['', '2', ' ft ', '11', ' in']
100
101 Clear (assuming you can read regular expressions) and concise.
102
103 The reason I began developing :mod:`natsort` in the first place was because I
104 needed to handle the natural sorting of strings containing *real numbers*, not just
105 unsigned integers as the above example contains. By real numbers, I mean those like
106 ``-45.4920E-23``. :mod:`natsort` can handle just about any number definition;
107 to that end, here are all the regular expressions used in :mod:`natsort`:
108
109 .. code-block:: python
110
111 >>> unsigned_int = r'([0-9]+)'
112 >>> signed_int = r'([-+]?[0-9]+)'
113 >>> unsigned_float = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
114 >>> signed_float = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+)(?:[eE][-+]?[0-9]+)?)'
115 >>> unsigned_float_no_exponent = r'((?:[0-9]+\.?[0-9]*|\.[0-9]+))'
116 >>> signed_float_no_exponent = r'([-+]?(?:[0-9]+\.?[0-9]*|\.[0-9]+))'
117
118 Note that ``"inf"`` and ``"nan"`` are deliberately omitted from the float definition because you
119 wouldn't want (for example) ``"banana"`` to be converted into ``['ba', 'nan', 'a']``,
120 Let's see an example:
121
122 .. code-block:: python
123
124 >>> re.split(signed_float, 'The mass of 3 electrons is 2.732815068E-30 kg')
125 ['The mass of ', '3', ' electrons is ', '2.732815068E-30', ' kg']
126
127 .. note::
128
129 It is a bit of a lie to say the above are the complete regular expressions. In the
130 actual code there is also handling for non-ASCII unicode characters (such as ⑦),
131 but I will ignore that aspect of :mod:`natsort` in this discussion.
132
133 Now, when the user wants to change the definition of a number, it is as easy as changing
134 the pattern supplied to the regular expression engine.
135
136 Choosing the right default is hard, though (well, in this case it shouldn't have been
137 but I was rather thick-headed).
138 In retrospect, it should have been obvious that since essentially all the code examples
139 I had/have seen for natural sorting were for *unsigned integers*, I should have made the default
140 definition of a number an *unsigned integer*. But, in the brash days of my youth I assumed
141 that since my use case was real numbers, everyone else would be happier sorting by real numbers;
142 so, I made the default definition of a number a *signed float with exponent*.
143 `This astonished`_ `a lot`_ `of people`_
144 (`and some people aren't very nice when they are astonished`_).
145 Starting with :mod:`natsort` version 4.0.0 the default number definition was
146 changed to an *unsigned integer* which satisfies the "least astonishment" principle, and
147 I have not heard a complaint since.
148
149 Coercing Strings Containing Numbers Into Numbers
150 ++++++++++++++++++++++++++++++++++++++++++++++++
151
152 There has been some debate on Stack Overflow as to what method is best to
153 coerce a string to a number if it can be coerced, and leaving it alone otherwise
154 (see `this one for coercion`_ and `this one for checking`_ for some high traffic questions),
155 but it mostly boils down to two different solutions, shown here:
156
157 .. code-block:: python
158
159 >>> def coerce_try_except(x):
160 ... try:
161 ... return int(x)
162 ... except ValueError:
163 ... return x
164 ...
165 >>> def coerce_regex(x):
166 ... # Note that precompiling the regex is more performant,
167 ... # but I do not show that here for clarity's sake.
168 ... return int(x) if re.match(r'[-+]?\d+$', x) else x
169 ...
170
171 Here are some timing results run on my machine:
172
173 ::
174
175 In [0]: numbers = list(map(str, range(100))) # A list of numbers as strings
176
177 In [1]: not_numbers = ['banana' + x for x in numbers]
178
179 In [2]: %timeit [coerce_try_except(x) for x in numbers]
180 10000 loops, best of 3: 51.1 µs per loop
181
182 In [3]: %timeit [coerce_try_except(x) for x in not_numbers]
183 1000 loops, best of 3: 289 µs per loop
184
185 In [4]: %timeit [coerce_regex(x) for x in not_numbers]
186 10000 loops, best of 3: 67.6 µs per loop
187
188 In [5]: %timeit [coerce_regex(x) for x in numbers]
189 10000 loops, best of 3: 123 µs per loop
190
191 What can we learn from this? The ``try: except`` method (arguably the most "pythonic"
192 of the solutions) is best for numeric input, but performs over 5X slower for non-numeric
193 input. Conversely, the regular expression method, though slower than ``try: except`` for
194 both input types, is more efficient for non-numeric input than for input that can be
195 converted to an ``int``. Further, even though the regular expression method is slower
196 for both input types, it is always at least twice as fast as the worst case for the
197 ``try: except``.
198
199 Why do I care? Shouldn't I just pick a method and not worry about it? Probably. However,
200 I am very conscious about the performance of :mod:`natsort`, and want it to be a true
201 drop-in replacement for :func:`sorted` without having to incur a performance penalty.
202 For the purposes of :mod:`natsort`, there is no clear winner between the two algorithms -
203 the data being passed to this function will likely be a mix of numeric and non-numeric
204 string content. Do I use the ``try: except`` method and hope the speed gains on
205 numbers will offset the non-number performance, or do I use regular expressions and
206 take the more stable performance?
207
208 It turns out that within the context of :mod:`natsort`, some assumptions can be
209 made that make a hybrid approach attractive. Because all strings are pre-split
210 into numeric and non-numeric content *before* being passed to this coercion function,
211 the assumption can be made that *if a string begins with a digit or a sign, it
212 can be coerced into a number*.
213
214 .. code-block:: python
215
216 >>> def coerce_to_int(x):
217 ... if x[0] in '0123456789+-':
218 ... try:
219 ... return int(x)
220 ... except ValueError:
221 ... return x
222 ... else:
223 ... return x
224 ...
225
226 So how does this perform compared to the standard coercion methods?
227
228 ::
229
230 In [6]: %timeit [coerce_to_int(x) for x in numbers]
231 10000 loops, best of 3: 71.6 µs per loop
232
233 In [7]: %timeit [coerce_to_int(x) for x in not_numbers]
234 10000 loops, best of 3: 26.4 µs per loop
235
236 The hybrid method eliminates most of the time wasted on numbers checking that it
237 is in fact a number before passing to :func:`int`, and eliminates the time wasted
238 in the exception stack for input that is not a number.
239
240 That's as fast as we can get, right? In pure Python, probably. At least, it's
241 close. But because I am crazy and a glutton for punishment, I decided to see
242 if I could get any faster writing a C extension. It's called
243 `fastnumbers`_ and contains a C implementation of the above coercion functions
244 called :func:`fast_int`. How does it fair? Pretty well.
245
246 ::
247
248 In [8]: %timeit [fast_int(x) for x in numbers]
249 10000 loops, best of 3: 30.9 µs per loop
250
251 In [9]: %timeit [fast_int(x) for x in not_numbers]
252 10000 loops, best of 3: 30 µs per loop
253
254 During development of :mod:`natsort`, I wanted to ensure that using it did not
255 get in the way of a user's program by introducing a performance penalty to their code.
256 To that end, I do not feel like my adventures down the rabbit hole of optimization
257 of coercion functions was a waste; I can confidently look users in the eye and
258 say I considered every option in ensuring :mod:`natsort` is as efficient as possible.
259 This is why if `fastnumbers`_ is installed it will be used for this step,
260 and otherwise the hybrid method will be used.
261
262 .. note::
263
264 Modifying the hybrid coercion function for floats is straightforward.
265
266 .. code-block:: python
267
268 >>> def coerce_to_float(x):
269 ... if x[0] in '.0123456789+-' or x.lower().lstrip()[:3] in ('nan', 'inf'):
270 ... try:
271 ... return float(x)
272 ... except ValueError:
273 ... return x
274 ... else:
275 ... return x
276 ...
277
278 .. _tldr1:
279
280 TL;DR 1 - The Simple "No Special Cases" Algorithm
281 +++++++++++++++++++++++++++++++++++++++++++++++++
282
283 At this point, our :mod:`natsort` algorithm is essentially the following:
284
285 .. code-block:: python
286
287 >>> import re
288 >>> def natsort_key(x, as_float=False, signed=False):
289 ... if as_float:
290 ... regex = signed_float if signed else unsigned_float
291 ... else:
292 ... regex = signed_int if signed else unsigned_int
293 ... split_input = re.split(regex, x)
294 ... split_input = filter(None, split_input) # removes null strings
295 ... coerce = coerce_to_float if as_float else coerce_to_int
296 ... return tuple(coerce(s) for s in split_input)
297 ...
298
299 I have written the above for clarity and not performance.
300 This pretty much matches `most natural sort solutions for python on Stack Overflow`_
301 (except the above includes customization of the definition of a number).
302
303 Special Cases Everywhere!
304 -------------------------
305
306 .. contents::
307 :local:
308
309 .. image:: special_cases_everywhere.jpg
310
311 If what I described in :ref:`TL;DR 1 <tldr1>` were
312 all that :mod:`natsort` needed to
313 do then there probably wouldn't be much need for a third-party module, right?
314 Probably. But it turns out that in real-world data there are a lot of
315 special cases that need to be handled, and in true `80%/20%`_ fashion, the
316 majority of the code in :mod:`natsort` is devoted to handling special cases
317 like those described below.
318
319 Sorting Filesystem Paths
320 ++++++++++++++++++++++++
321
322 `The first major special case I encountered was sorting filesystem paths`_
323 (if you go to the link, you will see I didn't handle it well for a year...
324 this was before I fully realized how much functionality I could really add
325 to :mod:`natsort`). Let's apply the :func:`natsort_key` from above to some
326 filesystem paths that you might see being auto-generated from your operating
327 system:
328
329 .. code-block:: python
330
331 >>> paths = ['/p/Folder (10)/file.tar.gz',
332 ... '/p/Folder/file.tar.gz',
333 ... '/p/Folder (1)/file (1).tar.gz',
334 ... '/p/Folder (1)/file.tar.gz']
335 >>> sorted(paths, key=natsort_key)
336 ['/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz', '/p/Folder/file.tar.gz']
337
338 Well that's not right! What is ``'/p/Folder/file.tar.gz'`` doing at the end?
339 It has to do with the numerical ASCII code assigned to the space and
340 ``/`` characters in the `ASCII table`_. According to the `ASCII table`_, the
341 space character (number 32) comes before the ``/`` character (number 47). If
342 we remove the common prefix in all of the above strings (``'/p/Folder'``), we
343 can see why this happens:
344
345 .. code-block:: python
346
347 >>> ' (1)/file.tar.gz' < '/file.tar.gz'
348 True
349 >>> ' ' < '/'
350 True
351
352 This isn't very convenient... how do we solve it? We can split the path
353 across the path separators and then sort. A convenient way do to this is
354 with the `Path.parts`_ method from :mod:`pathlib`:
355
356 .. code-block:: python
357
358 >>> import pathlib
359 >>> sorted(paths, key=lambda x: tuple(natsort_key(s) for s in pathlib.Path(x).parts))
360 ['/p/Folder/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (10)/file.tar.gz']
361
362 Almost! It seems like there is some funny business going on in the final
363 filename component as well. We can solve that nicely and quickly with `Path.suffixes`_
364 and `Path.stem`_.
365
366 .. code-block:: python
367
368 >>> def decompose_path_into_components(x):
369 ... path_split = list(pathlib.Path(x).parts)
370 ... # Remove the final filename component from the path.
371 ... final_component = pathlib.Path(path_split.pop())
372 ... # Split off all the extensions.
373 ... suffixes = final_component.suffixes
374 ... stem = final_component.name.replace(''.join(suffixes), '')
375 ... # Remove the '.' prefix of each extension, and make that
376 ... # final component a list of the stem and each suffix.
377 ... final_component = [stem] + [x[1:] for x in suffixes]
378 ... # Replace the split final filename component.
379 ... path_split.extend(final_component)
380 ... return path_split
381 ...
382 >>> def natsort_key_with_path_support(x):
383 ... return tuple(natsort_key(s) for s in decompose_path_into_components(x))
384 ...
385 >>> sorted(paths, key=natsort_key_with_path_support)
386 ['/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz', '/p/Folder (1)/file (1).tar.gz', '/p/Folder (10)/file.tar.gz']
387
388 This works because in addition to breaking the input by path separators, the final
389 filename component is separated from its extensions as well [#f1]_. *Then*, each of these
390 separated components is sent to the :mod:`natsort` algorithm, so the result is
391 a tuple of tuples. Once that is done, we can see how comparisons can be done in
392 the expected manner.
393
394 .. code-block:: python
395
396 >>> a = natsort_key_with_path_support('/p/Folder (1)/file (1).tar.gz')
397 >>> a
398 (('/',), ('p',), ('Folder (', 1, ')'), ('file (', 1, ')'), ('tar',), ('gz',))
399 >>>
400 >>> b = natsort_key_with_path_support('/p/Folder/file.tar.gz')
401 >>> b
402 (('/',), ('p',), ('Folder',), ('file',), ('tar',), ('gz',))
403 >>>
404 >>> a > b
405 True
406
407 Comparing Different Types on Python 3
408 +++++++++++++++++++++++++++++++++++++
409
410 `The second major special case I encountered was sorting of different types`_.
411 If you are on Python 2 (i.e. legacy Python), this mostly doesn't matter *too*
412 much since it uses an arbitrary heuristic to allow traditionally un-comparable
413 types to be compared (such as comparing ``'a'`` to ``1``). However, on Python 3
414 (i.e. Python) it simply won't let you perform such nonsense, raising a
415 :exc:`TypeError` instead.
416
417 You can imagine that a module that breaks strings into tuples of numbers and
418 strings is walking a dangerous line if it does not have special handling for
419 comparing numbers and strings. My imagination was not so great at first.
420 Let's take a look at all the ways this can fail with real-world data.
421
422 .. code-block:: python
423
424 >>> def natsort_key_with_poor_real_number_support(x):
425 ... split_input = re.split(signed_float, x)
426 ... split_input = filter(None, split_input) # removes null strings
427 ... return tuple(coerce_to_float(s) for s in split_input)
428 >>>
429 >>> sorted([5, '4'], key=natsort_key_with_poor_real_number_support)
430 Traceback (most recent call last):
431 ...
432 TypeError: ...
433 >>>
434 >>> sorted(['12 apples', 'apples'], key=natsort_key_with_poor_real_number_support)
435 Traceback (most recent call last):
436 ...
437 TypeError: ...
438 >>>
439 >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_poor_real_number_support)
440 Traceback (most recent call last):
441 ...
442 TypeError: ...
443
444 Let's break these down.
445
446 #. The integer ``5`` is sent to ``re.split`` which expects only strings
447 or bytes, which is a no-no.
448 #. ``natsort_key_with_poor_real_number_support('12 apples') < natsort_key_with_poor_real_number_support('apples')``
449 is the same as ``(12.0, ' apples') < ('apples',)``, and thus a number gets
450 compared to a string [#f2]_ which also is a no-no.
451 #. This one scores big on the astonishment scale, especially if one accidentally
452 uses signed integers or real numbers when they mean to use unsigned integers.
453 ``natsort_key_with_poor_real_number_support('version5.3.0') < natsort_key_with_poor_real_number_support('version5.3rc1')``
454 is the same as ``('version', 5.3, 0.0) < ('version', 5.3, 'rc', 1.0)``, so in the
455 third element a number gets compared to a string, once again the same
456 old no-no. (The same would happen with ``'version5-3'`` and ``'version5-a'``,
457 which would be come ``('version', 5, -3)`` and ``('version', 5, '-a')``).
458
459 As you might expect, the solution to the first issue is to wrap the ``re.split``
460 call in a ``try: except:`` block and handle the number specially if a
461 :exc:`TypeError` is raised. The second and third cases *could* be handled
462 in a "special case" manner, meaning only respond and do something different
463 if these problems are detected. But a less error-prone method is to ensure
464 that the data is correct-by-construction, and this can be done by ensuring
465 that the returned tuples *always* start with a string, and then alternate
466 in a string-number-string-number-string patter;n this can be achieved by
467 adding an empty string wherever the pattern is not followed [#f3]_. This ends
468 up working out pretty nicely because empty strings are always "less" than
469 any non-empty string, and we typically want numbers to come before strings.
470
471 Let's take a look at how this works out.
472
473 .. code-block:: python
474
475 >>> from natsort.utils import sep_inserter
476 >>> list(sep_inserter(iter(['apples']), ''))
477 ['apples']
478 >>>
479 >>> list(sep_inserter(iter([12, ' apples']), ''))
480 ['', 12, ' apples']
481 >>>
482 >>> list(sep_inserter(iter(['version', 5, -3]), ''))
483 ['version', 5, '', -3]
484 >>>
485 >>> from natsort import natsort_keygen, ns
486 >>> natsort_key_with_good_real_number_support = natsort_keygen(alg=ns.REAL)
487 >>>
488 >>> sorted([5, '4'], key=natsort_key_with_good_real_number_support)
489 ['4', 5]
490 >>>
491 >>> sorted(['12 apples', 'apples'], key=natsort_key_with_good_real_number_support)
492 ['12 apples', 'apples']
493 >>>
494 >>> sorted(['version5.3.0', 'version5.3rc1'], key=natsort_key_with_good_real_number_support)
495 ['version5.3.0', 'version5.3rc1']
496
497 How the "good" version works will be given in `TL;DR 2 - Handling Crappy, Real-World Input`_.
498
499 Handling NaN
500 ++++++++++++
501
502 `A rather unexpected special case I encountered was sorting collections containing NaN`_.
503 Let's see what happens when you try to sort a plain old list of numbers when there
504 is a **NaN** floating around in there.
505
506 .. code-block:: python
507
508 >>> danger = [7, float('nan'), 22.7, 19, -14, 59.123, 4]
509 >>> sorted(danger)
510 [7, nan, -14, 4, 19, 22.7, 59.123]
511
512 Clearly that isn't correct, and for once it isn't my fault!
513 `It's hard to compare floating point numbers`_. By definition, **NaN** is unorderable
514 to any other number, and is never equal to any other number, including itself.
515
516 .. code-block:: python
517
518 >>> nan = float('nan')
519 >>> 5 > nan
520 False
521 >>> 5 < nan
522 False
523 >>> 5 == nan
524 False
525 >>> 5 != nan
526 True
527 >>> nan == nan
528 False
529 >>> nan != nan
530 True
531
532 The implication of all this for us is that if there is an **NaN** in the
533 data-set we are trying to sort, the data-set will end up being sorted in
534 two separate yet individually sorted sequences - the one *before* the **NaN**,
535 and the one *after*. This is because the ``<`` operation that is used
536 to sort always returns :const:`False` with **NaN**.
537
538 Because :mod:`natsort` aims to sort sequences in a way that does not surprise
539 the user, keeping this behavior is not acceptable (I don't require my users
540 to know how **NaN** will behave in a sorting algorithm). The simplest way to
541 satisfy the "least astonishment" principle is to substitute **NaN** with
542 some other value. But what value is *least* astonishing? I chose to replace
543 **NaN** with :math:`-\infty` so that these poorly behaved elements always
544 end up at the front where the users will most likely be alerted to their presence.
545
546 .. code-block:: python
547
548 >>> def fix_nan(x):
549 ... if x != x: # only true for NaN
550 ... return float('-inf')
551 ... else:
552 ... return x
553 ...
554
555 Let's check out :ref:`TL;DR 2 <tldr2>` to see how this can be
556 incorporated into the simple key function from :ref:`TL;DR 1 <tldr1>`.
557
558 .. _tldr2:
559
560 TL;DR 2 - Handling Crappy, Real-World Input
561 +++++++++++++++++++++++++++++++++++++++++++
562
563 Let's see how our elegant key function from :ref:`TL;DR 1 <tldr1>` has
564 become bastardized in order to support handling mixed real-world data
565 and user customizations.
566
567 >>> def natsort_key(x, as_float=False, signed=False, as_path=False):
568 ... if as_float:
569 ... regex = signed_float if signed else unsigned_float
570 ... else:
571 ... regex = signed_int if signed else unsigned_int
572 ... try:
573 ... if as_path:
574 ... x = decompose_path_into_components(x) # Decomposes into list of strings
575 ... # If this raises a TypeError, input is not a string.
576 ... split_input = re.split(regex, x)
577 ... except TypeError:
578 ... try:
579 ... # Does this need to be applied recursively (list-of-list)?
580 ... return tuple(map(natsort_key, x))
581 ... except TypeError:
582 ... # Must be a number
583 ... ret = ('', fix_nan(x)) # Maintain string-number-string pattern
584 ... return (ret,) if as_path else ret # as_path returns tuple-of-tuples
585 ... else:
586 ... split_input = filter(None, split_input) # removes null strings
587 ... # Note that the coerce_to_int/coerce_to_float functions
588 ... # are also modified to use the fix_nan function.
589 ... if as_float:
590 ... coerced_input = (coerce_to_float(s) for s in split_input)
591 ... else:
592 ... coerced_input = (coerce_to_int(s) for s in split_input)
593 ... return tuple(sep_inserter(coerced_input, ''))
594 ...
595
596 And this doesn't even show handling :class:`bytes` type! Notice that we have
597 to do non-obvious things like modify the return form of numbers when ``as_path``
598 is given, just to avoid comparing strings and numbers for the case in which a user provides
599 input like ``['/home/me', 42]``.
600
601 Let's take it out for a spin!
602
603 .. code-block:: python
604
605 >>> danger = [7, float('nan'), 22.7, '19', '-14', '59.123', 4]
606 >>> sorted(danger, key=lambda x: natsort_key(x, as_float=True, signed=True))
607 [nan, '-14', 4, 7, '19', 22.7, '59.123']
608 >>>
609 >>> paths = ['/p/Folder (1)/file.tar.gz',
610 ... '/p/Folder/file.tar.gz',
611 ... 123456]
612 >>> sorted(paths, key=lambda x: natsort_key(x, as_path=True))
613 [123456, '/p/Folder/file.tar.gz', '/p/Folder (1)/file.tar.gz']
614
615 Here Be Dragons: Adding Locale Support
616 --------------------------------------
617
618 .. contents::
619 :local:
620
621 Probably the most challenging special case I had to handle was getting
622 :mod:`natsort` to handle sorting the non-numerical parts of input
623 correctly, and also allowing it to sort the numerical bits in different
624 locales. This was in no way what I originally set out to do with this
625 library, so I was `caught a bit off guard when the request was initially made`_.
626 I discovered the :mod:`locale` library, and assumed that if it's part of Python's
627 StdLib there can't be too many dragons, right?
628
629 .. admonition:: INCOMPLETE LIST OF DRAGONS
630
631 - https://github.com/SethMMorton/natsort/issues/21
632 - https://github.com/SethMMorton/natsort/issues/22
633 - https://github.com/SethMMorton/natsort/issues/23
634 - https://github.com/SethMMorton/natsort/issues/36
635 - https://github.com/SethMMorton/natsort/issues/44
636 - https://bugs.python.org/issue2481
637 - https://bugs.python.org/issue23195
638 - https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
639 - https://stackoverflow.com/questions/22203550/sort-dictionary-by-key-using-locale-collation
640 - https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
641 - https://stackoverflow.com/questions/36431810/sort-numeric-lines-with-thousand-separators
642 - https://stackoverflow.com/questions/45734562/how-can-i-get-a-reasonable-string-sorting-with-python
643
644 These can be summed up as follows:
645
646 #. :mod:`locale` is a thin wrapper over your operating system's *locale*
647 library, so if *that* is broken (like it is on BSD and OSX) then
648 :mod:`locale` is broken in Python.
649 #. Because of a bug in legacy Python (i.e. Python 2), there is no uniform way to use
650 the :mod:`locale` sorting functionality between legacy Python and Python 3.
651 #. People have differing opinions of how capitalization should affect word order.
652 #. There is no built-in way to handle locale-dependent thousands separators
653 and decimal points *robustly*.
654 #. Proper handling of Unicode is complicated.
655 #. Proper handling of :mod:`locale` is complicated.
656
657 Easily over half of the the code in :mod:`natsort` is in some way dealing with some
658 aspect of :mod:`locale` or basic case handling. It would have been
659 impossible to get right without a `really good`_ `testing strategy`_.
660
661 Don't expect any more TL;DR's... if you want to see how all this is fully
662 incorporated into the :mod:`natsort` algorithm then please take a look
663 `at the code`_. However, I will hint at how specific steps are taken in
664 each section.
665
666 Let's see how we can handle some of the dragons, one-by-one.
667
668 Basic Case Control Support
669 ++++++++++++++++++++++++++
670
671 Without even thinking about the mess that is adding :mod:`locale` support,
672 :mod:`natsort` can introduce support for controlling how case is interpreted.
673
674 First, let's take a look at how it is sorted by default (due to
675 where characters lie on the `ASCII table`_).
676
677 .. code-block:: python
678
679 >>> a = ['Apple', 'corn', 'Corn', 'Banana', 'apple', 'banana']
680 >>> sorted(a)
681 ['Apple', 'Banana', 'Corn', 'apple', 'banana', 'corn']
682
683 All uppercase letters come before lowercase letters in the `ASCII table`_,
684 so all capitalized words appear first. Not everyone agrees that this
685 is the correct order. Some believe that the capitalized words should
686 be last (``['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']``).
687 Some believe that both the lowercase and uppercase versions
688 should appear together (``['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']``).
689 Some believe that both should be true ☹. Some people don't care at all [#f4]_.
690
691 Solving the first case (I call it *LOWERCASEFIRST*) is actually pretty
692 easy... just call the :meth:`str.swapcase` method on the input.
693
694 .. code-block:: python
695
696 >>> sorted(a, key=lambda x: x.swapcase())
697 ['apple', 'banana', 'corn', 'Apple', 'Banana', 'Corn']
698
699 The last (i call it *IGNORECASE*) should be super easy, right?
700 Simply call :meth:`str.lowercase` on the input. This will work but may
701 not always give the correct answer on non-latin character sets. It's
702 a good thing that in Python 3.3
703 :meth:`str.casefold` was introduced, which does a better job of removing
704 all case information from unicode characters in
705 non-latin alphabets.
706
707 .. code-block:: python
708
709 >>> def remove_case(x):
710 ... try:
711 ... return x.casefold()
712 ... except AttributeError: # Legacy Python backwards compatibility
713 ... return x.lowercase()
714 ...
715 >>> sorted(a, key=remove_case)
716 ['Apple', 'apple', 'Banana', 'banana', 'corn', 'Corn']
717
718 The middle case (I call it *GROUPLETTERS*) is less straightforward.
719 The most efficient way to handle this is to duplicate each character
720 with its lowercase version and then the original character.
721
722 .. code-block:: python
723
724 >>> import itertools
725 >>> def groupletters(x):
726 ... return ''.join(itertools.chain.from_iterable((remove_case(y), y) for y in x))
727 ...
728 >>> groupletters('Apple')
729 'aAppppllee'
730 >>> groupletters('apple')
731 'aappppllee'
732 >>> sorted(a, key=groupletters)
733 ['Apple', 'apple', 'Banana', 'banana', 'Corn', 'corn']
734
735 The effect of this is that both ``'Apple'`` and ``'apple'`` are
736 placed adjacent to each other because their transformations both begin
737 with ``'a'``, and then the second character can be used to order them
738 appropriately with respect to each other.
739
740 There's a problem with this, though. Within the context of :mod:`natsort`
741 we are trying to correctly sort numbers and those should be left alone.
742
743 .. code-block:: python
744
745 >>> a = ['Apple5', 'apple', 'Apple4E10', 'Banana']
746 >>> sorted(a, key=lambda x: natsort_key(x, as_float=True))
747 ['Apple5', 'Apple4E10', 'Banana', 'apple']
748 >>> sorted(a, key=lambda x: natsort_key(groupletters(x), as_float=True))
749 ['Apple4E10', 'Apple5', 'apple', 'Banana']
750 >>> groupletters('Apple4E10')
751 'aAppppllee44eE1100'
752
753 We messed up the numbers! Looks like :func:`groupletters` needs to be applied
754 *after* the strings are broken into their components. I'm not going to show
755 how this is done here, but basically it requires applying the function in
756 the ``else:`` block of :func:`coerce_to_int`/:func:`coerce_to_float`.
757
758 .. code-block:: python
759
760 >>> better_groupletters = natsort_keygen(alg=ns.GROUPLETTERS | ns.REAL)
761 >>> better_groupletters('Apple4E10')
762 ('aAppppllee', 40000000000.0)
763 >>> sorted(a, key=better_groupletters)
764 ['Apple5', 'Apple4E10', 'apple', 'Banana']
765
766 Of course, applying both *LOWERCASEFIRST* and *GROUPLETTERS* is just
767 a matter of turning on both functions.
768
769 Basic Unicode Support
770 +++++++++++++++++++++
771
772 Unicode is hard and complicated. Here's an example.
773
774 .. code-block:: python
775
776 >>> b = [b'\x66', b'\x65', b'\xc3\xa9', b'\x65\xcc\x81', b'\x61', b'\x7a']
777 >>> a = [x.decode('utf8') for x in b]
778 >>> a # doctest: +SKIP
779 ['f', 'e', 'é', 'é', 'a', 'z']
780 >>> sorted(a) # doctest: +SKIP
781 ['a', 'e', 'é', 'f', 'z', 'é']
782
783
784 There are more than one way to represent the character 'é' in Unicode.
785 In fact, many characters have multiple representations. This is a challenge
786 because comparing the two representations would return ``False`` even though
787 they *look* the same.
788
789 .. code-block:: python
790
791 >>> a[2] == a[3]
792 False
793
794 Alas, since characters are compared based on the numerical value of their
795 representation, sorting Unicode often gives unexpected results (like seeing
796 'é' come both *before* and *after* 'z').
797
798 The original approach that :mod:`natsort` took with respect to non-ASCII
799 Unicode characters was to say "just use
800 the :mod:`locale` or :mod:`PyICU` library" and then cross it's fingers
801 and hope those libraries take care of it. As you will find in the following
802 sections, that comes with its own baggage, and turned out to not always work anyway
803 (see https://stackoverflow.com/q/45734562/1399279). A more robust approach is to
804 handle the Unicode out-of-the-box without invoking a heavy-handed library
805 like :mod:`locale` or :mod:`PyICU`. To do this, we must use *normalization*.
806
807 To fully understand Unicode normalization, `check out some official Unicode documentation`_.
808 Just kidding... that's too much text. The following StackOverflow answers do
809 a good job at explaining Unicode normalization in simple terms:
810 https://stackoverflow.com/a/7934397/1399279 and
811 https://stackoverflow.com/a/7931547/1399279. Put simply, normalization
812 ensures that Unicode characters with multiple representations are in
813 some canonical and consistent representation so that (for example) comparisons
814 of the characters can be performed in a sane way. The following discussion
815 assumes you at least read the StackOverflow answers.
816
817 Looking back at our 'é' example, we can see that the two versions were
818 constructed with the byte strings ``b'\xc3\xa9'`` and ``b'\x65\xcc\x81'``.
819 The former representation is actually
820 `LATIN SMALL LETTER E WITH ACUTE <http://www.fileformat.info/info/unicode/char/e9/index.htm>`_
821 and is a single character in the Unicode standard. This is known as the
822 *compressed form* and corresponds to the 'NFC' normalization scheme.
823 The latter representation is actually the letter 'e' followed by
824 `COMBINING ACUTE ACCENT <http://www.fileformat.info/info/unicode/char/0301/index.htm>`_
825 and so is two characters in the Unicode standard. This is known as the
826 *decompressed form* and corresponds to the 'NFD' normalization scheme.
827 Since the first character in the decompressed form is actually the letter 'e',
828 when compared to other ASCII characters it fits where you might expect.
829 Unfortunately, all Unicode compressed form characters come after the
830 ASCII characters and so they always will be placed after 'z' when sorting.
831
832 It seems that most Unicode data is stored and shared in the compressed form
833 which makes it challenging to sort. This can be solved by normalizing all
834 incoming Unicode data to the decompressed form ('NFD') and *then* sorting.
835
836 .. code-block:: python
837
838 >>> import unicodedata
839 >>> c = [unicodedata.normalize('NFD', x) for x in a]
840 >>> c # doctest: +SKIP
841 ['f', 'e', 'é', 'é', 'a', 'z']
842 >>> sorted(c) # doctest: +SKIP
843 ['a', 'e', 'é', 'é', 'f', 'z']
844
845 Huzzah! Sane sorting without having to resort to :mod:`locale`!
846
847 Using Locale to Compare Strings
848 +++++++++++++++++++++++++++++++
849
850 The :mod:`locale` module is actually pretty cool, and provides lowly
851 spare-time programmers like myself a way to handle the daunting task
852 of proper locale-dependent support of their libraries and utilities.
853 Having said that, it can be a bit of a bear to get right,
854 `although they do point out in the documentation that it will be painful to use`_.
855 Aside from the caveats spelled out in that link, it turns out that just
856 comparing strings with :mod:`locale` in a cross-platform and
857 cross-python-version manner is not as straightforward as one might hope.
858
859 First, how to use :mod:`locale` to compare strings? It's actually
860 pretty straightforward. Simply run the input through the :mod:`locale`
861 transformation function :func:`locale.strxfrm`.
862
863 .. code-block:: python
864
865 >>> import locale, sys
866 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
867 'en_US.UTF-8'
868 >>> a = ['a', 'b', 'ä']
869 >>> sorted(a)
870 ['a', 'b', 'ä']
871 >>> # The below fails on OSX, so don't run doctest on darwin.
872 >>> is_osx = sys.platform == 'darwin'
873 >>> sorted(a, key=locale.strxfrm) if not is_osx else ['a', 'ä', 'b']
874 ['a', 'ä', 'b']
875 >>>
876 >>> a = ['apple', 'Banana', 'banana', 'Apple']
877 >>> sorted(a, key=locale.strxfrm) if not is_osx else ['apple', 'Apple', 'banana', 'Banana']
878 ['apple', 'Apple', 'banana', 'Banana']
879
880 It turns out that locale-aware sorting groups numbers in the same
881 way as turning on *GROUPLETTERS* and *LOWERCASEFIRST*.
882 The trick is that you have to apply :func:`locale.strxfrm` only to non-numeric
883 characters; otherwise, numbers won't be parsed properly. Therefore, it must
884 be applied as part of the :func:`coerce_to_int`/:func:`coerce_to_float`
885 functions in a manner similar to :func:`groupletters`.
886
887 As you might have guessed, there is a small problem.
888 It turns out the there is a bug in the legacy Python implementation of
889 :func:`locale.strxfrm` that causes it to outright fail for :func:`unicode`
890 input (https://bugs.python.org/issue2481). :func:`locale.strcoll` works,
891 but is intended for use with ``cmp``, which does not exist in current Python
892 implementations. Luckily, the :func:`functools.cmp_to_key` function
893 makes :func:`locale.strcoll` behave like :func:`locale.strxfrm` (that is, of course,
894 unless you are on Python 2.6 where :func:`functools.cmp_to_key` doesn't exist,
895 in which case you simply copy-paste the implementation from Python 2.7
896 directly into your code ☹).
897
898 Handling Broken Locale On OSX
899 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
900
901 But what if the underlying *locale* implementation that :mod:`locale`
902 relies upon is simply broken? It turns out that the *locale* library on
903 OSX (and other BSD systems) is broken (and for some reason has never been
904 fixed?), and so :mod:`locale` does not work as expected.
905
906 How do I define doesn't work as expected?
907
908 .. code-block:: python
909
910 >>> a = ['apple', 'Banana', 'banana', 'Apple']
911 >>> sorted(a)
912 ['Apple', 'Banana', 'apple', 'banana']
913 >>>
914 >>> sorted(a, key=locale.strxfrm) if is_osx else sorted(a)
915 ['Apple', 'Banana', 'apple', 'banana']
916
917 IT'S SORTING AS IF :func:`locale.stfxfrm` WAS NEVER USED!! (and it's worse
918 once non-ASCII characters get thrown into the mix.) I'm really not
919 sure why this is considered OK for the OSX/BSD maintainers to not fix,
920 but it's more than frustrating for poor developers who have been dragged
921 into the *locale* game kicking and screaming. *<deep breath>*.
922
923 So, how to deal with this situation? There are two ways to do so.
924
925 #. Detect if :mod:`locale` is sorting incorrectly (i.e. ``dumb``) by seeing
926 if ``'A'`` is sorted before ``'a'`` (incorrect) or not.
927
928 .. code-block:: python
929
930 >>> # This is genuinely the name of this function.
931 >>> # See natsort.compat.locale.py
932 >>> def dumb_sort():
933 ... return locale.strxfrm('A') < locale.strxfrm('a')
934 ...
935
936 If a ``dumb`` *locale* implementation is found, then automatically
937 turn on *LOWERCASEFIRST* and *GROUPLETTERS*.
938 #. Use an alternate library if installed. `ICU <http://site.icu-project.org/>`_
939 is a great and powerful library that has a pretty decent Python port
940 called (you guessed it) `PyICU <https://pypi.python.org/pypi/PyICU/>`_.
941 If a user has this library installed on their computer, :mod:`natsort`
942 chooses to use that instead of :mod:`locale`. With a little bit of
943 planning, one can write a set of wrapper functions that call
944 the correct library under the hood such that the business logic never
945 has to know what library is being used (see `natsort.compat.locale.py`_).
946
947 Let me tell you, this little complication really makes a challenge of testing
948 the code, since one must set up different environments on different operating
949 systems in order to test all possible code paths. Not to mention that
950 certain checks *will* fail for certain operating systems and environments
951 so one must be diligent in either writing the tests not to fail, or ignoring
952 those tests when on offending environments.
953
954 Handling Locale-Aware Numbers
955 +++++++++++++++++++++++++++++
956
957 `Thousands separator support`_ is a problem that I knew would someday be
958 requested but had decided to push off until a rainy day. One day it finally
959 rained, and I decided to tackle the problem.
960
961 So what is the problem? Consider the number ``1,234,567`` (assuming the
962 ``','`` is the thousands separator). Try to run that through :func:`int`
963 and you will get a :exc:`ValueError`. To handle this properly the thousands
964 separators must be removed.
965
966 .. code-block:: python
967
968 >>> float('1,234,567'.replace(',', ''))
969 1234567.0
970
971 What if, in our current locale, the thousands separator is ``'.'`` and
972 the ``','`` is the decimal separator (like for the German locale *de_DE*)?
973
974 .. code-block:: python
975
976 >>> float('1.234.567'.replace('.', '').replace(',', '.'))
977 1234567.0
978 >>> float('1.234.567,89'.replace('.', '').replace(',', '.'))
979 1234567.89
980
981 This is pretty much what :func:`locale.atoi` and :func:`locale.atof` do
982 under the hood. So what's the problem? Why doesn't :mod:`natsort` just
983 use this method under its hood?
984 Well, let's take a look at what would happen if we send some possible
985 :mod:`natsort` input through our the above function:
986
987 .. code-block:: python
988
989 >>> natsort_key('1,234 apples, please.'.replace(',', ''))
990 ('', 1234, ' apples please.')
991 >>> natsort_key('Sir, €1.234,50 please.'.replace('.', '').replace(',', '.'), as_float=True)
992 ('Sir. €', 1234.5, ' please')
993
994 Any character matching the thousands separator was dropped, and anything
995 matching the decimal separator was changed to ``'.'``! If these characters
996 were critical to how your data was ordered, this would break :mod:`natsort`.
997
998 The first solution one might consider would be to first decompose the
999 input into sub-components (like we did for the *GROUPLETTERS* method
1000 above) and then only apply these transformations on the number components.
1001 This is a chicken-and-egg problem, though, because *we cannot appropriately
1002 separate out the numbers because of the thousands separators and
1003 non-'.' decimal separators* (well, at least not without making multiple
1004 passes over the data which I do not consider to be a valid option).
1005
1006 Regular expressions to the rescue! With regular expressions, we can
1007 remove the thousands separators and change the decimal separator only
1008 when they are actually within a number. Once the input has been
1009 pre-processed with this regular expression, all the infrastructure
1010 shown previously will work.
1011
1012 Beware, these regular expressions will make your eyes bleed.
1013
1014 .. code-block:: python
1015
1016 >>> decimal = ',' # Assume German locale, so decimal separator is ','
1017 >>> # Look-behind assertions cannot accept range modifiers, so instead of i.e.
1018 >>> # (?<!\.[0-9]{1,3}) I have to repeat the look-behind for 1, 2, and 3.
1019 >>> nodecimal = r'(?<!{dec}[0-9])(?<!{dec}[0-9]{{2}})(?<!{dec}[0-9]{{3}})'.format(dec=decimal)
1020 >>> strip_thousands = r'''
1021 ... (?<=[0-9]{{1}}) # At least 1 number
1022 ... (?<![0-9]{{4}}) # No more than 3 numbers
1023 ... {nodecimal} # Cannot follow decimal
1024 ... {thou} # The thousands separator
1025 ... (?=[0-9]{{3}} # Three numbers must follow
1026 ... ([^0-9]|$) # But a non-number after that
1027 ... )
1028 ... '''.format(nodecimal=nodecimal, thou='.') # Thousands separator is '.' in German locale.
1029 ...
1030 >>> re.sub(strip_thousands, '', 'Sir, €1.234,50 please.', flags=re.X)
1031 'Sir, €1234,50 please.'
1032 >>>
1033 >>> # The decimal point must be preceded by a number or after
1034 >>> # a number. This option only needs to be performed in the
1035 >>> # case when the decimal separator for the locale is not '.'.
1036 >>> switch_decimal = r'(?<=[0-9]){decimal}|{decimal}(?=[0-9])'
1037 >>> switch_decimal = switch_decimal.format(decimal=decimal)
1038 >>> re.sub(switch_decimal, '.', 'Sir, €1234,50 please.', flags=re.X)
1039 'Sir, €1234.50 please.'
1040 >>>
1041 >>> natsort_key('Sir, €1234.50 please.', as_float=True)
1042 ('Sir, €', 1234.5, ' please.')
1043
1044 Final Thoughts
1045 --------------
1046
1047 My hope is that users of :mod:`natsort` never have to think about or worry
1048 about all the bookkeeping or any of the details described above, and that using
1049 :mod:`natsort` seems to magically "just work". For those of you who
1050 took the time to read this engineering description, I hope it has enlightened
1051 you to some of the issues that can be encountered when code is released
1052 into the wild and has to accept "real-world data", or to what happens
1053 to developers who naïvely make bold assumptions that are counter to
1054 what the rest of the world assumes.
1055
1056 .. rubric:: Footnotes
1057
1058 .. [#f1]
1059 To anyone looking through the actual code, you will note that I don't
1060 actually use :mod:`pathlib` to split the paths... I wrote my own version
1061 to avoid adding an external dependency of :mod:`pathlib` on Python < 3.4.
1062 .. [#f2]
1063 *"But if you hadn't removed the leading empty string from re.split this
1064 wouldn't have happened!!"* I can hear you saying. Well, that's true. I don't
1065 have a *great* reason for having done that except that in an earlier
1066 non-optimal incarnation of the algorithm I needed to it, and it kind of
1067 stuck, and it made other parts of the code easier if the assumption that
1068 there were no empty strings was valid.
1069 .. [#f3]
1070 I'm not going to show how this is implemented in this document,
1071 but if you are interested you can look at the code to
1072 :func:`sep_inserter` in `util.py`_.
1073 .. [#f4]
1074 Handling each of these is straightforward, but coupled with the rapidly
1075 fracturing execution paths presented in :ref:`TL;DR 2 <tldr2>` one can imagine
1076 this will get out of hand quickly. If you take a look at `natsort.py`_ and
1077 `util.py`_ you can observe that to avoid this I take a more functional approach
1078 to construting the :mod:`natsort` algorithm as opposed to the procedural approach
1079 illustrated in :ref:`TL;DR 1 <tldr1>` and :ref:`TL;DR 2 <tldr2>`.
1080
1081 .. _ASCII table: http://www.asciitable.com/
1082 .. _getting sorting right is surprisingly hard: http://www.compciv.org/guides/python/fundamentals/sorting-collections-with-sorted/
1083 .. _This astonished: https://github.com/SethMMorton/natsort/issues/19
1084 .. _a lot: http://stackoverflow.com/questions/29548742/python-natsort-sort-strings-recursively
1085 .. _of people: http://stackoverflow.com/questions/24045348/sort-set-of-numbers-in-the-form-xx-yy-in-python
1086 .. _and some people aren't very nice when they are astonished:
1087 https://github.com/xolox/python-naturalsort/blob/ed3e6b6ffaca3bdea3b76e08acbb8bd2a5fee463/README.rst#why-another-natsort-module
1088 .. _fastnumbers: https://github.com/SethMMorton/fastnumbers
1089 .. _as part of my testing: https://github.com/SethMMorton/natsort/blob/master/test_natsort/slow_splitters.py
1090 .. _this one for coercion: http://stackoverflow.com/questions/736043/checking-if-a-string-can-be-converted-to-float-in-python
1091 .. _this one for checking: http://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-is-a-number-float-in-python
1092 .. _most natural sort solutions for python on Stack Overflow: http://stackoverflow.com/q/4836710/1399279
1093 .. _80%/20%: https://en.wikipedia.org/wiki/Pareto_principle
1094 .. _The first major special case I encountered was sorting filesystem paths: https://github.com/SethMMorton/natsort/issues/3
1095 .. _The second major special case I encountered was sorting of different types: https://github.com/SethMMorton/natsort/issues/7
1096 .. _A rather unexpected special case I encountered was sorting collections containing NaN:
1097 https://github.com/SethMMorton/natsort/issues/27
1098 .. _Path.parts: https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts
1099 .. _Path.suffixes: https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.suffixes
1100 .. _Path.stem: https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.stem
1101 .. _It's hard to compare floating point numbers: http://www.drdobbs.com/cpp/its-hard-to-compare-floating-point-numbe/240149806
1102 .. _caught a bit off guard when the request was initially made: https://github.com/SethMMorton/natsort/issues/14
1103 .. _at the code: https://github.com/SethMMorton/natsort/tree/master/natsort
1104 .. _natsort.py: https://github.com/SethMMorton/natsort/blob/master/natsort/natsort.py
1105 .. _util.py: https://github.com/SethMMorton/natsort/blob/master/natsort/util.py
1106 .. _although they do point out in the documentation that it will be painful to use:
1107 https://docs.python.org/3/library/locale.html#background-details-hints-tips-and-caveats
1108 .. _natsort.compat.locale.py: https://github.com/SethMMorton/natsort/blob/master/natsort/compat/locale.py
1109 .. _Thousands separator support: https://github.com/SethMMorton/natsort/issues/36
1110 .. _really good: https://hypothesis.readthedocs.io/en/latest/
1111 .. _testing strategy: http://doc.pytest.org/en/latest/
1112 .. _check out some official Unicode documentation: http://unicode.org/reports/tr15/
+0
-8
docs/source/humansorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.humansorted`
4 ============================
5
6 .. autofunction:: humansorted
7
+0
-28
docs/source/index.rst less more
0 .. natsort documentation master file, created by
1 sphinx-quickstart on Thu Jul 17 21:01:29 2014.
2 You can adapt this file completely to your liking, but it should at least
3 contain the root `toctree` directive.
4
5 natsort: Simple yet flexible natural sorting in Python.
6 =======================================================
7
8 Contents:
9
10 .. toctree::
11 :maxdepth: 2
12 :numbered:
13
14 intro.rst
15 howitworks.rst
16 examples.rst
17 api.rst
18 shell.rst
19 changelog.rst
20
21 Indices and tables
22 ==================
23
24 * :ref:`genindex`
25 * :ref:`modindex`
26 * :ref:`search`
27
+0
-8
docs/source/index_humansorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.index_humansorted`
4 ==================================
5
6 .. autofunction:: index_humansorted
7
+0
-8
docs/source/index_natsorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.index_natsorted`
4 ================================
5
6 .. autofunction:: index_natsorted
7
+0
-8
docs/source/index_realsorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.index_realsorted`
4 =================================
5
6 .. autofunction:: index_realsorted
7
+0
-8
docs/source/index_versorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.index_versorted`
4 ================================
5
6 .. autofunction:: index_versorted
7
+0
-397
docs/source/intro.rst less more
0 .. default-domain:: py
1 .. module:: natsort
2
3 The :mod:`natsort` module
4 =========================
5
6 Simple yet flexible natural sorting in Python.
7
8 - Source Code: https://github.com/SethMMorton/natsort
9 - Downloads: https://pypi.org/project/natsort/
10 - Documentation: http://natsort.readthedocs.io/
11 - Optional Dependencies:
12
13 - `fastnumbers <https://pypi.org/project/fastnumbers>`_ >= 2.0.0
14 - `PyICU <https://pypi.org/project/PyICU>`_ >= 1.0.0
15
16 :mod:`natsort` is a general utility for sorting lists *naturally*; the definition
17 of "naturally" is not well-defined, but the most common definition is that numbers
18 contained within the string should be sorted as numbers and not as you would
19 other characters. If you need to present sorted output to a user, you probably
20 want to sort it naturally.
21
22 :mod:`natsort` was initially created for sorting scientific output filenames that
23 contained signed floating point numbers in the names. There was a lack of
24 algorithms out there that could perform a natural sort on `floats` but
25 plenty for `ints`; check out
26 `this StackOverflow question <http://stackoverflow.com/q/4836710/1399279>`_
27 and its answers and links therein,
28 `this ActiveState forum <http://code.activestate.com/recipes/285264-natural-string-sorting/>`_,
29 and of course `this great article on natural sorting <http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/>`_
30 from CodingHorror.com for examples of what I mean.
31 :mod:`natsort` was created to fill in this gap, but has since expanded to handle
32 just about any definition of a number, as well as other sorting customizations.
33
34 Quick Description
35 -----------------
36
37 When you try to sort a list of strings that contain numbers, the normal python
38 sort algorithm sorts lexicographically, so you might not get the results that you
39 expect:
40
41 .. code-block:: python
42
43 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
44 >>> sorted(a)
45 ['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
46
47 Notice that it has the order ('1', '10', '2') - this is because the list is
48 being sorted in lexicographical order, which sorts numbers like you would
49 letters (i.e. 'b', 'ba', 'c').
50
51 :mod:`natsort` provides a function :func:`~natsorted` that helps sort lists
52 "naturally" ("naturally" is rather ill-defined, but in general it means
53 sorting based on meaning and not computer code point)..
54 Using :func:`~natsorted` is simple:
55
56 .. code-block:: python
57
58 >>> from natsort import natsorted
59 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
60 >>> natsorted(a)
61 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
62
63 :func:`~natsorted` identifies numbers anywhere in a string and sorts them
64 naturally. Below are some other things you can do with :mod:`natsort`
65 (please see the :ref:`examples` for a quick start guide, or the :ref:`api`
66 for more details).
67
68 .. note::
69
70 :func:`~natsorted` is designed to be a drop-in replacement for the built-in
71 :func:`sorted` function. Like :func:`sorted`, :func:`~natsorted`
72 `does not sort in-place`. To sort a list and assign the output to the
73 same variable, you must explicitly assign the output to a variable:
74
75 .. code-block:: python
76
77 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
78 >>> natsorted(a)
79 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
80 >>> print(a) # 'a' was not sorted; "natsorted" simply returned a sorted list
81 ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
82 >>> a = natsorted(a) # Now 'a' will be sorted because the sorted list was assigned to 'a'
83 >>> print(a)
84 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
85
86 Please see `Generating a Reusable Sorting Key and Sorting In-Place`_ for
87 an alternate way to sort in-place naturally.
88
89 Examples
90 --------
91
92 Sorting Versions
93 ++++++++++++++++
94
95 This is handled properly by default (as of :mod:`natsort` version >= 4.0.0):
96
97 .. code-block:: python
98
99 >>> a = ['version-1.9', 'version-2.0', 'version-1.11', 'version-1.10']
100 >>> natsorted(a)
101 ['version-1.9', 'version-1.10', 'version-1.11', 'version-2.0']
102
103 If you need to sort release candidates, please see :ref:`rc_sorting` for
104 a useful hack.
105
106 Sorting by Real Numbers (i.e. Signed Floats)
107 ++++++++++++++++++++++++++++++++++++++++++++
108
109 This is useful in scientific data analysis and was
110 the default behavior of :func:`~natsorted` for :mod:`natsort`
111 version < 4.0.0. Use the :func:`~realsorted` function:
112
113 .. code-block:: python
114
115 >>> from natsort import realsorted, ns
116 >>> # Note that when interpreting as signed floats, the below numbers are
117 >>> # +5.10, -3.00, +5.30, +2.00
118 >>> a = ['position5.10.data', 'position-3.data', 'position5.3.data', 'position2.data']
119 >>> natsorted(a)
120 ['position2.data', 'position5.3.data', 'position5.10.data', 'position-3.data']
121 >>> natsorted(a, alg=ns.REAL)
122 ['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
123 >>> realsorted(a) # shortcut for natsorted with alg=ns.REAL
124 ['position-3.data', 'position2.data', 'position5.10.data', 'position5.3.data']
125
126 Locale-Aware Sorting (or "Human Sorting")
127 +++++++++++++++++++++++++++++++++++++++++
128
129 This is where the non-numeric characters are ordered based on their meaning,
130 not on their ordinal value, and a locale-dependent thousands separator and decimal
131 separator is accounted for in the number.
132 This can be achieved with the :func:`~humansorted` function:
133
134 .. code-block:: python
135
136 >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
137 >>> natsorted(a)
138 ['Apple', 'Banana', 'apple14,689', 'apple15', 'banana']
139 >>> import locale
140 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
141 'en_US.UTF-8'
142 >>> natsorted(a, alg=ns.LOCALE)
143 ['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
144 >>> from natsort import humansorted
145 >>> humansorted(a)
146 ['apple15', 'apple14,689', 'Apple', 'banana', 'Banana']
147
148 You may find you need to explicitly set the locale to get this to work
149 (as shown in the example).
150 Please see :ref:`locale_issues` and the Installation section
151 below before using the :func:`~humansorted` function.
152
153 Further Customizing Natsort
154 +++++++++++++++++++++++++++
155
156 If you need to combine multiple algorithm modifiers (such as ``ns.REAL``,
157 ``ns.LOCALE``, and ``ns.IGNORECASE``), you can combine the options using the
158 bitwise OR operator (``|``). For example,
159
160 .. code-block:: python
161
162 >>> a = ['Apple', 'apple15', 'Banana', 'apple14,689', 'banana']
163 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE)
164 ['Apple', 'apple15', 'apple14,689', 'Banana', 'banana']
165 >>> # The ns enum provides long and short forms for each option.
166 >>> ns.LOCALE == ns.L
167 True
168 >>> # You can also customize the convenience functions, too.
169 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE) == realsorted(a, alg=ns.L | ns.IC)
170 True
171 >>> natsorted(a, alg=ns.REAL | ns.LOCALE | ns.IGNORECASE) == humansorted(a, alg=ns.R | ns.IC)
172 True
173
174 All of the available customizations can be found in the documentation for
175 the :class:`~natsort.ns` enum.
176
177 You can also add your own custom transformation functions with the ``key`` argument.
178 These can be used with ``alg`` if you wish:
179
180 .. code-block:: python
181
182 >>> a = ['apple2.50', '2.3apple']
183 >>> natsorted(a, key=lambda x: x.replace('apple', ''), alg=ns.REAL)
184 ['2.3apple', 'apple2.50']
185
186 Sorting Mixed Types
187 +++++++++++++++++++
188
189 You can mix and match ``int``, ``float``, and ``str`` (or ``unicode``) types
190 when you sort:
191
192 .. code-block:: python
193
194 >>> a = ['4.5', 6, 2.0, '5', 'a']
195 >>> natsorted(a)
196 [2.0, '4.5', '5', 6, 'a']
197 >>> # On Python 2, sorted(a) would return [2.0, 6, '4.5', '5', 'a']
198 >>> # On Python 3, sorted(a) would raise an "unorderable types" TypeError
199
200 Handling Bytes on Python 3
201 ++++++++++++++++++++++++++
202
203 :mod:`natsort` does not officially support the `bytes` type on Python 3, but
204 convenience functions are provided that help you decode to `str` first:
205
206 .. code-block:: python
207
208 >>> from natsort import as_utf8
209 >>> a = [b'a', 14.0, 'b']
210 >>> # On Python 2, natsorted(a) would would work as expected.
211 >>> # On Python 3, natsorted(a) would raise a TypeError (bytes() < str())
212 >>> natsorted(a, key=as_utf8) == [14.0, b'a', 'b']
213 True
214 >>> a = [b'a56', b'a5', b'a6', b'a40']
215 >>> # On Python 2, natsorted(a) would would work as expected.
216 >>> # On Python 3, natsorted(a) would return the same results as sorted(a)
217 >>> natsorted(a, key=as_utf8) == [b'a5', b'a6', b'a40', b'a56']
218 True
219
220 Generating a Reusable Sorting Key and Sorting In-Place
221 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
222
223 Under the hood, :func:`~natsorted` works by generating a custom sorting
224 key using :func:`~natsort_keygen` and then passes that to the built-in
225 :func:`sorted`. You can use the :func:`~natsort_keygen` function yourself to
226 generate a custom sorting key to sort in-place using the :meth:`list.sort`
227 method.
228
229 .. code-block:: python
230
231 >>> from natsort import natsort_keygen
232 >>> natsort_key = natsort_keygen()
233 >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
234 >>> natsorted(a) == sorted(a, key=natsort_key)
235 True
236 >>> a.sort(key=natsort_key)
237 >>> a
238 ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
239
240 All of the algorithm customizations mentioned in the `Further Customizing Natsort`_
241 section can also be applied to :func:`~natsort_keygen` through the *alg* keyword option.
242
243 Other Useful Things
244 +++++++++++++++++++
245
246 - recursively descend into lists of lists
247 - automatic unicode normalization of input data
248 - controlling the case-sensitivity (see :ref:`case_sort`)
249 - sorting file paths correctly (see :ref:`path_sort`)
250 - allow custom sorting keys (see :ref:`custom_sort`)
251
252 FAQ
253 ---
254
255 How do I debug :func:`~natsorted`?
256 The best way to debug :func:`~natsorted` is to generate a key using :func:`~natsort_keygen`
257 with the same options being passed to :func:`~natsorted`. One can take a look at
258 exactly what is being done with their input using this key - it is highly recommended
259 to `look at this issue describing how to debug <https://github.com/SethMMorton/natsort/issues/13#issuecomment-50422375>`_
260 for *how* to debug, and also to review the
261 `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_
262 page for *why* :mod:`natsort` is doing that to your data.
263
264 If you are trying to sort custom classes and running into trouble, please take a look at
265 https://github.com/SethMMorton/natsort/issues/60. In short,
266 custom classes are not likely to be sorted correctly if one relies
267 on the behavior of ``__lt__`` and the other rich comparison operators in their
268 custom class - it is better to use a ``key`` function with :mod:`natsort`, or
269 use the :mod:`natsort` key as part of your rich comparison operator definition.
270
271 How *does* :mod:`natsort` work?
272 If you don't want to read `How Does Natsort Work? <http://natsort.readthedocs.io/en/master/howitworks.html>`_,
273 here is a quick primer.
274
275 :mod:`natsort` provides a `key function <https://docs.python.org/3/howto/sorting.html#key-functions>`_
276 that can be passed to `list.sort() <https://docs.python.org/3/library/stdtypes.html#list.sort>`_
277 or `sorted() <https://docs.python.org/3/library/functions.html#sorted>`_ in order to
278 modify the default sorting behavior. This key is generated on-demand with the
279 key generator :func:`natsort.natsort_keygen`. :func:`natsort.natsorted` is essentially
280 a wrapper for the following code:
281
282 .. code-block:: python
283
284 >>> from natsort import natsort_keygen
285 >>> natsort_key = natsort_keygen()
286 >>> sorted(['1', '10', '2'], key=natsort_key)
287 ['1', '2', '10']
288
289 Users can further customize :mod:`natsort` sorting behavior with the ``key``
290 and/or ``alg`` options (see details in the `Further Customizing Natsort`_
291 section).
292
293 The key generated by :func:`natsort.natsort_keygen` *always* returns a :class:`tuple`. It
294 does so in the following way (*some details omitted for clarity*):
295
296 1. Assume the input is a string, and attempt to split it into numbers and
297 non-numbers using regular expressions. Numbers are then converted into
298 either :class:`int` or :class:`float`.
299 2. If the above fails because the input is not a string, assume the input
300 is some other sequence (e.g. :class:`list` or :class:`tuple`), and recursively
301 apply the key to each element of the sequence.
302 3. If the above fails because the input is not iterable, assume the input
303 is an :class:`int` or :class:`float`, and just return the input in a :class:`tuple`.
304
305 Because a :class:`tuple` is always returned, a :exc:`TypeError` should not be common
306 unless one tries to do something odd like sort an :class:`int` against a :class:`list`.
307
308 :mod:`natsort` gave me results I didn't expect, and it's a terrible library!
309 Did you try to debug using the above advice? If so, and you still cannot figure out
310 the error, then please `file an issue <https://github.com/SethMMorton/natsort/issues/new>`_.
311
312 Shell script
313 ------------
314
315 :mod:`natsort` comes with a shell script called :mod:`natsort`, or can also be called
316 from the command line with ``python -m natsort``.
317
318 Requirements
319 ------------
320
321 :mod:`natsort` requires Python version 2.6 or greater or Python 3.3 or greater.
322 It may run on (but is not tested against) Python 3.2.
323
324 Optional Dependencies
325 ---------------------
326
327 fastnumbers
328 +++++++++++
329
330 The most efficient sorting can occur if you install the
331 `fastnumbers <https://pypi.org/project/fastnumbers>`_ package
332 (version >=2.0.0); it helps with the string to number conversions.
333 :mod:`natsort` will still run (efficiently) without the package, but if you need
334 to squeeze out that extra juice it is recommended you include this as a dependency.
335 :mod:`natsort` will not require (or check) that
336 `fastnumbers <https://pypi.org/project/fastnumbers>`_ is installed
337 at installation.
338
339 PyICU
340 +++++
341
342 It is recommended that you install `PyICU <https://pypi.org/project/PyICU>`_
343 if you wish to sort in a locale-dependent manner, see
344 http://natsort.readthedocs.io/en/master/locale_issues.html for an explanation why.
345
346 Installation
347 ------------
348
349 Use ``pip``!
350
351 .. code-block:: sh
352
353 $ pip install natsort
354
355 If you want to install the `Optional Dependencies`_, you can use the
356 `"extras" notation <https://packaging.python.org/tutorials/installing-packages/#installing-setuptools-extras>`_
357 at installation time to install those dependencies as well - use ``fast`` for
358 `fastnumbers <https://pypi.org/project/fastnumbers>`_ and ``icu`` for
359 `PyICU <https://pypi.org/project/PyICU>`_.
360
361 .. code-block:: sh
362
363 # Install both optional dependencies.
364 $ pip install natsort[fast,icu]
365 # Install just fastnumbers
366 $ pip install natsort[fast]
367
368 How to Run Tests
369 ----------------
370
371 Please note that :mod:`natsort` is NOT set-up to support ``python setup.py test``.
372
373 The recommended way to run tests is with `tox <https://tox.readthedocs.io/en/latest/>`_.
374 After installing ``tox``, running tests is as simple as executing the following in the
375 ``natsort`` directory:
376
377 .. code-block:: sh
378
379 $ tox
380
381 ``tox`` will create virtual a virtual environment for your tests and install all the
382 needed testing requirements for you. You can specify a particular python version
383 with the ``-e`` flag, e.g. ``tox -e py36``.
384
385 If you do not wish to use ``tox``, you can install the testing dependencies and run the
386 tests manually using `pytest <https://docs.pytest.org/en/latest/>`_ - ``natsort``
387 contains a ``Pipfile`` for use with `pipenv <https://github.com/pypa/pipenv>`_ that
388 makes it easy for you to install the testing dependencies:
389
390 .. code-block:: sh
391
392 $ pipenv install --skip-lock --dev
393 $ pipenv run python -m pytest
394
395 Note that above I invoked ``python -m pytest`` instead of just ``pytest`` - this is because
396 `the former puts the CWD on sys.path <https://docs.pytest.org/en/latest/usage.html#calling-pytest-through-python-m-pytest>`_.
+0
-96
docs/source/locale_issues.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _locale_issues:
4
5 Possible Issues with :func:`~natsort.humansorted` or ``ns.LOCALE``
6 ==================================================================
7
8 Being Locale-Aware Means Both Numbers and Non-Numbers
9 -----------------------------------------------------
10
11 In addition to modifying how characters are sorted, ``ns.LOCALE`` will take into
12 account locale-dependent thousands separators (and locale-dependent decimal
13 separators if ``ns.FLOAT`` is enabled). This means that if you are in a
14 locale that uses commas as the thousands separator, a number like
15 ``123,456`` will be interpreted as ``123456``. If this is not what you want,
16 you may consider using ``ns.LOCALEALPHA`` which will only enable locale-aware
17 sorting for non-numbers (similarly, ``ns.LOCALENUM`` enables locale-aware
18 sorting only for numbers).
19
20 Regenerate Key With :func:`~natsort.natsort_keygen` After Changing Locale
21 -------------------------------------------------------------------------
22
23 When :func:`~natsort.natsort_keygen` is called it returns a key function that
24 hard-codes the provided settings. This means that the key returned when
25 ``ns.LOCALE`` is used contins the settings specifed by the locale
26 *loaded at the time the key is generated*. If you change the locale,
27 you should regenerate the key to account for the new locale.
28
29 Corollary: Do Not Reuse :func:`~natsort.natsort_keygen` After Changing Locale
30 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
31
32 If you change locale, the old function will not work as expected.
33 The `locale <https://docs.python.org/3.5/library/locale.html>`_ library works
34 with a global state. When :func:`~natsort.natsort_keygen` is called it does the
35 best job that it can to make the returned function as static as possible and
36 independent of the global state, but the
37 `strxfrm <https://docs.python.org/3.5/library/locale.html#locale.strxfrm>`_
38 function must access this global state to work; therefore, if you change
39 locale and use ``ns.LOCALE`` then you should discard the old key.
40
41 .. note:: If you use `PyICU <https://pypi.python.org/pypi/PyICU>`_ then you
42 may be able to reuse keys after changing locale.
43
44 The `locale <https://docs.python.org/3.5/library/locale.html>`_ Module From the StdLib Has Issues
45 -------------------------------------------------------------------------------------------------
46
47 :mod:`natsort` will use `PyICU <https://pypi.org/project/PyICU>`_ for
48 :func:`~natsort.humansorted` or ``ns.LOCALE`` if it is installed. If not,
49 it will fall back on the `locale <https://docs.python.org/3.5/library/locale.html>`_
50 library from the Python stdlib. If you do not have
51 `PyICU <https://pypi.org/project/PyICU>`_ installed, please keep the
52 following known problems and issues in mind.
53
54 .. note:: Remember, if you have `PyICU <https://pypi.org/project/PyICU>`_
55 installed you shouldn't need to worry about any of these.
56
57 Explicitly Set the Locale Before Using :func:`~natsort.humansorted` or ``ns.LOCALE``
58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
59
60 I have found that unless you explicitly set a locale, the sorted order may not
61 be what you expect. Setting this is straightforward
62 (in the below example I use 'en_US.UTF-8', but you should use your
63 locale)::
64
65 >>> import locale
66 >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
67 'en_US.UTF-8'
68
69 .. _bug_note:
70
71 The `locale <https://docs.python.org/3.5/library/locale.html>`_ Module Is Broken on Mac OS X
72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
73
74 It's not Python's fault, but the OS... the locale library for BSD-based systems
75 (of which Mac OS X is one) is broken. See the following links:
76
77 - http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
78 - http://bugs.python.org/issue23195
79 - https://github.com/SethMMorton/natsort/issues/21 (contains instructons on installing)
80 - http://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
81 - https://github.com/SethMMorton/natsort/issues/34
82
83 Of course, installing `PyICU <https://pypi.org/project/PyICU>`_ fixes this,
84 but if you don't want to or cannot install this there is some hope.
85
86 1. As of ``natsort`` version 4.0.0, ``natsort`` is configured
87 to compensate for a broken ``locale`` library. When sorting non-numbers
88 it will handle case as you expect, but it will still not be able to
89 comprehend non-ASCII characters properly. Additionally, it has
90 a built-in lookup table of thousands separators that are incorrect
91 on OS X/BSD (but is possible it is not complete... please file an
92 issue if you see it is not complete)
93 2. Use "\*.ISO8859-1" locale (i.e. 'en_US.ISO8859-1') rather than "\*.UTF-8"
94 locale. I have found that these have fewer issues than "UTF-8", but
95 your mileage may vary.
+0
-8
docs/source/natsort_key.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.natsort_key`
4 ============================
5
6 .. autofunction:: natsort_key
7
+0
-8
docs/source/natsort_keygen.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.natsort_keygen`
4 ===============================
5
6 .. autofunction:: natsort_keygen
7
+0
-8
docs/source/natsorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.natsorted`
4 ==========================
5
6 .. autofunction:: natsorted
7
+0
-8
docs/source/ns_class.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :class:`~natsort.ns`
4 ====================
5
6 .. autoclass:: ns
7
+0
-8
docs/source/order_by_index.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.order_by_index`
4 ===============================
5
6 .. autofunction:: order_by_index
7
+0
-8
docs/source/realsorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.realsorted`
4 ===========================
5
6 .. autofunction:: realsorted
7
+0
-147
docs/source/shell.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 .. _shell:
4
5 Shell Script
6 ============
7
8 The ``natsort`` shell script is automatically installed when you install
9 :mod:`natsort` with pip.
10
11 Below is the usage and some usage examples for the ``natsort`` shell script.
12
13 Usage
14 -----
15
16 ::
17
18 usage: natsort [-h] [--version] [-p] [-f LOW HIGH] [-F LOW HIGH] [-e EXCLUDE]
19 [-r] [-t {digit,int,float,version,ver}] [--nosign] [--noexp]
20 [--locale]
21 [entries [entries ...]]
22
23 Performs a natural sort on entries given on the command-line.
24 A natural sort sorts numerically then alphabetically, and will sort
25 by numbers in the middle of an entry.
26
27 positional arguments:
28 entries The entries to sort. Taken from stdin if nothing is
29 given on the command line.
30
31 optional arguments:
32 -h, --help show this help message and exit
33 --version show program's version number and exit
34 -p, --paths Interpret the input as file paths. This is not
35 strictly necessary to sort all file paths, but in
36 cases where there are OS-generated file paths like
37 "Folder/" and "Folder (1)/", this option is needed to
38 make the paths sorted in the order you expect
39 ("Folder/" before "Folder (1)/").
40 -f LOW HIGH, --filter LOW HIGH
41 Used for keeping only the entries that have a number
42 falling in the given range.
43 -F LOW HIGH, --reverse-filter LOW HIGH
44 Used for excluding the entries that have a number
45 falling in the given range.
46 -e EXCLUDE, --exclude EXCLUDE
47 Used to exclude an entry that contains a specific
48 number.
49 -r, --reverse Returns in reversed order.
50 -t {digit,int,float,version,ver,real,f,i,r,d},
51 --number-type {digit,int,float,version,ver,real,f,i,r,d},
52 --number_type {digit,int,float,version,ver,real,f,i,r,d}
53 Choose the type of number to search for. "float" will
54 search for floating-point numbers. "int" will only
55 search for integers. "digit", "version", and "ver" are
56 synonyms for "int"."real" is a shortcut for "float"
57 with --sign. "i" and "d" are synonyms for "int", "f"
58 is a synonym for "float", and "r" is a synonym for
59 "real".The default is int.
60 --nosign Do not consider "+" or "-" as part of a number, i.e.
61 do not take sign into consideration. This is the
62 default.
63 -s, --sign Consider "+" or "-" as part of a number, i.e. take
64 sign into consideration. The default is unsigned.
65 --noexp Do not consider an exponential as part of a number,
66 i.e. 1e4, would be considered as 1, "e", and 4, not as
67 10000. This only effects the --number-type=float.
68 -l, --locale Causes natsort to use locale-aware sorting. You will
69 get the best results if you install PyICU.
70
71 Description
72 -----------
73
74 ``natsort`` was originally written to aid in computational chemistry
75 research so that it would be easy to analyze large sets of output files
76 named after the parameter used::
77
78 $ ls *.out
79 mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
80
81 (Obviously, in reality there would be more files, but you get the idea.) Notice
82 that the shell sorts in lexicographical order. This is the behavior of programs like
83 ``find`` as well as ``ls``. The problem is passing these files to an
84 analysis program causes them not to appear in numerical order, which can lead
85 to bad analysis. To remedy this, use ``natsort``::
86
87 $ natsort *.out
88 mode744.43.out
89 mode943.54.out
90 mode1000.35.out
91 mode1243.34.out
92 $ natsort -t r *.out | xargs your_program
93
94 ``-t r`` is short for ``--number-type real``. You can also place natsort in
95 the middle of a pipe::
96
97 $ find . -name "*.out" | natsort -t r | xargs your_program
98
99 To sort version numbers, use the default ``--number-type``::
100
101 $ ls *
102 prog-1.10.zip prog-1.9.zip prog-2.0.zip
103 $ natsort *
104 prog-1.9.zip
105 prog-1.10.zip
106 prog-2.0.zip
107
108 In general, all ``natsort`` shell script options mirror the :func:`~natsorted` API,
109 with notable exception of the ``--filter``, ``--reverse-filter``, and ``--exclude``
110 options. These three options are used as follows::
111
112 $ ls *.out
113 mode1000.35.out mode1243.34.out mode744.43.out mode943.54.out
114 $ natsort -t r *.out -f 900 1100 # Select only numbers between 900-1100
115 mode943.54.out
116 mode1000.35.out
117 $ natsort -t r *.out -F 900 1100 # Select only numbers NOT between 900-1100
118 mode744.43.out
119 mode1243.34.out
120 $ natsort -t r *.out -e 1000.35 # Exclude 1000.35 from search
121 mode744.43.out
122 mode943.54.out
123 mode1243.34.out
124
125 If you are sorting paths with OS-generated filenames, you may require the
126 ``--paths``/``-p`` option::
127
128 $ find . ! -path . -type f
129 ./folder/file (1).txt
130 ./folder/file.txt
131 ./folder (1)/file.txt
132 ./folder (10)/file.txt
133 ./folder (2)/file.txt
134 $ find . ! -path . -type f | natsort
135 ./folder (1)/file.txt
136 ./folder (2)/file.txt
137 ./folder (10)/file.txt
138 ./folder/file (1).txt
139 ./folder/file.txt
140 $ find . ! -path . -type f | natsort -p
141 ./folder/file.txt
142 ./folder/file (1).txt
143 ./folder (1)/file.txt
144 ./folder (2)/file.txt
145 ./folder (10)/file.txt
146
docs/source/special_cases_everywhere.jpg less more
Binary diff not shown
+0
-8
docs/source/versorted.rst less more
0 .. default-domain:: py
1 .. currentmodule:: natsort
2
3 :func:`~natsort.versorted`
4 ==========================
5
6 .. autofunction:: versorted
7
1010 index_humansorted,
1111 index_natsorted,
1212 index_realsorted,
13 index_versorted,
1413 natsort_key,
1514 natsort_keygen,
1615 natsorted,
1716 ns,
1817 order_by_index,
1918 realsorted,
20 versorted,
2119 )
2220 from natsort.utils import chain_functions
2321
2422 if float(sys.version[:3]) < 3:
2523 from natsort.natsort import natcmp
2624
27 __version__ = "5.4.1"
25 __version__ = "6.0.0"
2826
2927 __all__ = [
3028 "natsort_key",
3129 "natsort_keygen",
3230 "natsorted",
33 "versorted",
3431 "humansorted",
3532 "realsorted",
3633 "index_natsorted",
37 "index_versorted",
3834 "index_humansorted",
3935 "index_realsorted",
4036 "order_by_index",
4743 ]
4844
4945 # Add the ns keys to this namespace for convenience.
50 # A dict comprehension is not used for Python 2.6 compatibility.
51 globals().update(dict((k, getattr(ns, k)) for k in dir(ns) if k.isupper()))
46 globals().update(ns._asdict())
2323 parser.add_argument(
2424 "--version",
2525 action="version",
26 version="%(prog)s {0}".format(natsort.__version__),
26 version="%(prog)s {}".format(natsort.__version__),
2727 )
2828 parser.add_argument(
2929 "-p",
7777 "--number-type",
7878 "--number_type",
7979 dest="number_type",
80 choices=("digit", "int", "float", "version", "ver", "real", "f", "i", "r", "d"),
80 choices=("int", "float", "real", "f", "i", "r"),
8181 default="int",
8282 help='Choose the type of number to search for. "float" will search '
8383 'for floating-point numbers. "int" will only search for '
84 'integers. "digit", "version", and "ver" are synonyms for "int".'
85 '"real" is a shortcut for "float" with --sign. '
86 '"i" and "d" are synonyms for "int", "f" is a synonym for '
84 'integers. "real" is a shortcut for "float" with --sign. '
85 '"i" is a synonym for "int", "f" is a synonym for '
8786 '"float", and "r" is a synonym for "real".'
8887 "The default is %(default)s.",
8988 )
66
77 # Std. lib imports.
88 import sys
9 from functools import cmp_to_key
910
1011 # Local imports.
11 from natsort.compat.py23 import PY_VERSION, cmp_to_key, py23_unichr
12 from natsort.compat.py23 import PY_VERSION, py23_unichr
1213
1314 # This string should be sorted after any other byte string because
1415 # it contains the max unicode character repeated 20 times.
5555 py23_map = itertools.imap
5656 py23_filter = itertools.ifilter
5757
58 # cmp_to_key was not created till 2.7, so require this for 2.6
59 try:
60 from functools import cmp_to_key
61 except ImportError: # pragma: no cover
62
63 def cmp_to_key(mycmp):
64 """Convert a cmp= function into a key= function"""
65
66 class K(object):
67 __slots__ = ["obj"]
68
69 def __init__(self, obj):
70 self.obj = obj
71
72 def __lt__(self, other):
73 return mycmp(self.obj, other.obj) < 0
74
75 def __gt__(self, other):
76 return mycmp(self.obj, other.obj) > 0
77
78 def __eq__(self, other):
79 return mycmp(self.obj, other.obj) == 0
80
81 def __le__(self, other):
82 return mycmp(self.obj, other.obj) <= 0
83
84 def __ge__(self, other):
85 return mycmp(self.obj, other.obj) >= 0
86
87 def __ne__(self, other):
88 return mycmp(self.obj, other.obj) != 0
89
90 def __hash__(self):
91 raise TypeError("hash not implemented")
92
93 return K
94
9558
9659 # This function is intended to decorate other functions that will modify
9760 # either a string directly, or a function's docstring.
1313 import natsort.compat.locale
1414 from natsort import utils
1515 from natsort.compat.py23 import py23_cmp, py23_str, u_format
16 from natsort.ns_enum import ns, ns_DUMB
16 from natsort.ns_enum import NS_DUMB, ns
1717
1818
1919 @u_format
107107
108108
109109 @u_format
110 def natsort_keygen(key=None, alg=ns.DEFAULT, **_kwargs):
110 def natsort_keygen(key=None, alg=ns.DEFAULT):
111111 """
112112 Generate a key to sort strings and numbers naturally.
113113
153153 [{u}'num-3', {u}'num2', {u}'num5.10', {u}'num5.3']
154154
155155 """
156 # Transform old arguments to the ns enum.
157156 try:
158 alg = utils.args_to_enum(**_kwargs) | alg
157 ns.DEFAULT | alg
159158 except TypeError:
160159 msg = "natsort_keygen: 'alg' argument must be from the enum 'ns'"
161 raise ValueError(msg + ", got {0}".format(py23_str(alg)))
162
163 # Add the _DUMB option if the locale library is broken.
160 raise ValueError(msg + ", got {}".format(py23_str(alg)))
161
162 # Add the NS_DUMB option if the locale library is broken.
164163 if alg & ns.LOCALEALPHA and natsort.compat.locale.dumb_sort():
165 alg |= ns_DUMB
164 alg |= NS_DUMB
166165
167166 # Set some variables that will be passed to the factory functions
168167 if alg & ns.NUMAFTER:
219218
220219
221220 @u_format
222 def natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
221 def natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT):
223222 """
224223 Sorts an iterable naturally.
225224
263262 [{u}'num2', {u}'num3', {u}'num5']
264263
265264 """
266 key = natsort_keygen(key, alg, **_kwargs)
265 key = natsort_keygen(key, alg)
267266 return sorted(seq, reverse=reverse, key=key)
268
269
270 @u_format
271 def versorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
272 """
273 Identical to :func:`natsorted`.
274
275 This function exists for backwards compatibility with `natsort`
276 version < 4.0.0. Future development should use :func:`natsorted`.
277
278 See Also
279 --------
280 natsorted
281
282 """
283 return natsorted(seq, key, reverse, alg, **_kwargs)
284267
285268
286269 @u_format
391374
392375
393376 @u_format
394 def index_natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
377 def index_natsorted(seq, key=None, reverse=False, alg=ns.DEFAULT):
395378 """
396379 Determine the list of the indexes used to sort the input sequence.
397380
456439
457440 # Pair the index and sequence together, then sort by element
458441 index_seq_pair = [[x, y] for x, y in enumerate(seq)]
459 index_seq_pair.sort(reverse=reverse, key=natsort_keygen(newkey, alg, **_kwargs))
442 index_seq_pair.sort(reverse=reverse, key=natsort_keygen(newkey, alg))
460443 return [x for x, _ in index_seq_pair]
461
462
463 @u_format
464 def index_versorted(seq, key=None, reverse=False, alg=ns.DEFAULT, **_kwargs):
465 """
466 Identical to :func:`index_natsorted`.
467
468 This function exists for backwards compatibility with
469 ``index_natsort`` version < 4.0.0. Future development should use
470 :func:`index_natsorted`.
471
472 Please see the :func:`index_natsorted` documentation for use.
473
474 See Also
475 --------
476 index_natsorted
477
478 """
479 return index_natsorted(seq, key, reverse, alg, **_kwargs)
480444
481445
482446 @u_format
677641
678642 cached_keys = {}
679643
680 def __new__(cls, x, y, alg=ns.DEFAULT, *args, **kwargs):
644 def __new__(cls, x, y, alg=ns.DEFAULT):
681645 try:
682 alg = utils.args_to_enum(**kwargs) | alg
646 ns.DEFAULT | alg
683647 except TypeError:
684 msg = "natsort_keygen: 'alg' argument must be " "from the enum 'ns'"
685 raise ValueError(msg + ", got {0}".format(py23_str(alg)))
648 msg = "natsort_keygen: 'alg' argument must be from the enum 'ns'"
649 raise ValueError(msg + ", got {}".format(py23_str(alg)))
686650
687651 # Add the _DUMB option if the locale library is broken.
688652 if alg & ns.LOCALEALPHA and natsort.compat.locale.dumb_sort():
689 alg |= ns_DUMB
653 alg |= NS_DUMB
690654
691655 if alg not in cls.cached_keys:
692656 cls.cached_keys[alg] = natsort_keygen(alg=alg)
55 from __future__ import absolute_import, division, print_function, unicode_literals
66
77 import collections
8
9 # NOTE: OrderedDict is not used below for compatibility with Python 2.6.
108
119 # The below are the base ns options. The values will be stored as powers
1210 # of two so bitmasks can be used to extract the user's requested options.
2725 ]
2826
2927 # Following were previously options but are now defaults.
30 enum_do_nothing = ["DEFAULT", "TYPESAFE", "INT", "VERSION", "DIGIT", "UNSIGNED"]
28 enum_do_nothing = ["DEFAULT", "INT", "UNSIGNED"]
3129
3230 # The following are bitwise-OR combinations of other fields.
3331 enum_combos = [("REAL", ("FLOAT", "SIGNED")), ("LOCALE", ("LOCALEALPHA", "LOCALENUM"))]
3432
3533 # The following are aliases for other fields.
3634 enum_aliases = [
37 ("T", "TYPESAFE"),
3835 ("I", "INT"),
39 ("V", "VERSION"),
40 ("D", "DIGIT"),
4136 ("U", "UNSIGNED"),
4237 ("F", "FLOAT"),
4338 ("S", "SIGNED"),
5954 ]
6055
6156 # Construct the list of bitwise distinct enums with their fields.
62 enum_fields = [(name, 1 << i) for i, name in enumerate(enum_options)]
63 enum_fields.extend((name, 0) for name in enum_do_nothing)
57 enum_fields = collections.OrderedDict(
58 (name, 1 << i) for i, name in enumerate(enum_options)
59 )
60 enum_fields.update((name, 0) for name in enum_do_nothing)
6461
6562 for name, combo in enum_combos:
66 current_mapping = dict(enum_fields)
67 combined_value = current_mapping[combo[0]]
63 combined_value = enum_fields[combo[0]]
6864 for combo_name in combo[1:]:
69 combined_value |= current_mapping[combo_name]
70 enum_fields.append((name, combined_value))
65 combined_value |= enum_fields[combo_name]
66 enum_fields[name] = combined_value
7167
72 current_mapping = dict(enum_fields)
73 enum_fields.extend((alias, current_mapping[name]) for alias, name in enum_aliases)
74
75 # Finally, extract out the enum field names and their values.
76 enum_field_names, enum_field_values = zip(*enum_fields)
68 enum_fields.update(
69 (alias, enum_fields[name]) for alias, name in enum_aliases
70 )
7771
7872
7973 # Subclass the namedtuple to improve the docstring.
8074 # noinspection PyUnresolvedReferences
81 class _NSEnum(collections.namedtuple("_NSEnum", enum_field_names)):
75 class _NSEnum(collections.namedtuple("_NSEnum", enum_fields.keys())):
8276 """
8377 Enum to control the `natsort` algorithm.
8478
129123 default "NFD". This will transform characters such as '⑦' into
130124 '7'. Please see https://stackoverflow.com/a/7934397/1399279,
131125 https://stackoverflow.com/a/7931547/1399279,
132 and http://unicode.org/reports/tr15/ for full details into unicode
126 and https://unicode.org/reports/tr15/ for full details into unicode
133127 normalization.
134128 LOCALE, L
135129 Tell `natsort` to be locale-aware when sorting. This includes both
179173 If an NaN shows up in the input, this instructs `natsort` to
180174 treat these as +Infinity and place them after all the other numbers.
181175 By default, an NaN be treated as -Infinity and be placed first.
182 TYPESAFE, T
183 Deprecated as of `natsort` version 5.0.0; this option is now
184 a no-op because it is always true.
185 VERSION, V
186 Deprecated as of `natsort` version 5.0.0; this option is now
187 a no-op because it is the default.
188 DIGIT, D
189 Same as `VERSION` above.
190176
191177 Notes
192178 -----
204190
205191 # Here is where the instance of the ns enum that will be exported is created.
206192 # It is a poor-man's singleton.
207 ns = _NSEnum(*enum_field_values)
193 ns = _NSEnum(*enum_fields.values())
208194
209195 # The below is private for internal use only.
210 ns_DUMB = 1 << 31
196 NS_DUMB = 1 << 31
17421742 a = py23_unichr(i)
17431743 except ValueError:
17441744 break
1745 if a in set("0123456789"):
1745 if a in "0123456789":
17461746 continue
17471747 if unicodedata.numeric(a, None) is not None:
17481748 hex_chars.append(i)
4949 from os.path import split as path_split
5050 from os.path import splitext as path_splitext
5151 from unicodedata import normalize
52 from warnings import warn
5352
5453 from natsort.compat.fastnumbers import fast_float, fast_int
5554 from natsort.compat.locale import get_decimal_point, get_strxfrm, get_thousands_sep
6261 py23_str,
6362 u_format,
6463 )
65 from natsort.ns_enum import ns, ns_DUMB
64 from natsort.ns_enum import NS_DUMB, ns
6665 from natsort.unicode_numbers import digits_no_decimals, numeric_no_decimals
6766
6867 if PY_VERSION >= 3:
378377 """
379378 # Sometimes we store the "original" input before transformation,
380379 # sometimes after.
381 orig_after_xfrm = not (alg & ns_DUMB and alg & ns.LOCALEALPHA)
380 orig_after_xfrm = not (alg & NS_DUMB and alg & ns.LOCALEALPHA)
382381 original_func = input_transform if orig_after_xfrm else _no_op
383382 normalize_input = _normalize_input_factory(alg)
384383
491490 """
492491 # Shortcuts.
493492 lowfirst = alg & ns.LOWERCASEFIRST
494 dumb = alg & ns_DUMB
493 dumb = alg & NS_DUMB
495494
496495 # Build the chain of functions to execute in order.
497496 function_chain = []
565564 """
566565 # Shortcuts.
567566 use_locale = alg & ns.LOCALEALPHA
568 dumb = alg & ns_DUMB
567 dumb = alg & NS_DUMB
569568 group_letters = (alg & ns.GROUPLETTERS) or (use_locale and dumb)
570569 nan_val = float("+inf") if alg & ns.NANLAST else float("-inf")
571570
613612
614613 """
615614 if alg & ns.UNGROUPLETTERS and alg & ns.LOCALEALPHA:
616 swap = alg & ns_DUMB and alg & ns.LOWERCASEFIRST
615 swap = alg & NS_DUMB and alg & ns.LOWERCASEFIRST
617616 transform = methodcaller("swapcase") if swap else _no_op
618617
619618 def func(split_val, val, _transform=transform, _sep=sep, _pre_sep=pre_sep):
786785
787786 # Return the split parent paths and then the split basename.
788787 return ichain(path_parts, base_parts)
789
790
791 def args_to_enum(**kwargs):
792 """
793 A function to convert input booleans to an enum-type argument.
794
795 For internal use only - will be deprecated in a future release.
796 """
797 alg = 0
798 keys = ("number_type", "signed", "exp", "as_path", "py3_safe")
799 if any(x not in keys for x in kwargs):
800 x = set(kwargs) - set(keys)
801 raise TypeError("Invalid argument(s): " + ", ".join(x))
802 if "number_type" in kwargs and kwargs["number_type"] is not int:
803 msg = "The 'number_type' argument is deprecated as of 3.5.0, "
804 msg += "please use 'alg=ns.FLOAT', 'alg=ns.INT', or 'alg=ns.VERSION'"
805 warn(msg, DeprecationWarning)
806 alg |= ns.FLOAT * bool(kwargs["number_type"] is float)
807 alg |= ns.INT * bool(kwargs["number_type"] in (int, None))
808 alg |= ns.SIGNED * (kwargs["number_type"] not in (float, None))
809 if "signed" in kwargs and kwargs["signed"] is not None:
810 msg = "The 'signed' argument is deprecated as of 3.5.0, "
811 msg += "please use 'alg=ns.SIGNED'."
812 warn(msg, DeprecationWarning)
813 alg |= ns.SIGNED * bool(kwargs["signed"])
814 if "exp" in kwargs and kwargs["exp"] is not None:
815 msg = "The 'exp' argument is deprecated as of 3.5.0, "
816 msg += "please use 'alg=ns.NOEXP'."
817 warn(msg, DeprecationWarning)
818 alg |= ns.NOEXP * (not kwargs["exp"])
819 if "as_path" in kwargs and kwargs["as_path"] is not None:
820 msg = "The 'as_path' argument is deprecated as of 3.5.0, "
821 msg += "please use 'alg=ns.PATH'."
822 warn(msg, DeprecationWarning)
823 alg |= ns.PATH * kwargs["as_path"]
824 return alg
00 [bumpversion]
1 current_version = 5.4.1
1 current_version = 6.0.0
22 commit = True
33 tag = True
44 tag_name = {new_version}
99 url = https://github.com/SethMMorton/natsort
1010 description = Simple yet flexible natural sorting in Python.
1111 long_description = file: README.rst
12 long_description_content_type = text/x-rst
1213 license = MIT
14 license_file = LICENSE
1315 classifiers =
1416 Development Status :: 5 - Production/Stable
1517 Intended Audience :: Developers
2022 Operating System :: OS Independent
2123 License :: OSI Approved :: MIT License
2224 Natural Language :: English
25 Programming Language :: Python
2326 Programming Language :: Python :: 2
24 Programming Language :: Python :: 2.6
2527 Programming Language :: Python :: 2.7
2628 Programming Language :: Python :: 3
2729 Programming Language :: Python :: 3.4
4244
4345 [bumpversion:file:natsort/__init__.py]
4446
45 [bumpversion:file:docs/source/conf.py]
47 [bumpversion:file:docs/conf.py]
4648
47 [bumpversion:file:docs/source/changelog.rst]
49 [bumpversion:file:CHANGELOG.rst]
4850 search = XX-XX-XXXX v. X.X.X
4951 replace = {now:%%m-%%d-%%Y} v. {new_version}
5052
22 from setuptools import find_packages, setup
33 setup(
44 name='natsort',
5 version='5.4.1',
5 version='6.0.0',
66 packages=find_packages(),
7 install_requires=["argparse; python_version < '2.7'"],
87 entry_points={'console_scripts': ['natsort = natsort.__main__:main']},
8 python_requires=">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*",
99 extras_require={
10 'fast': ["fastnumbers >= 2.0.0; python_version > '2.6'"],
10 'fast': ["fastnumbers >= 2.0.0"],
1111 'icu': ["PyICU >= 1.0.0"]
1212 }
1313 )
+0
-39
test_natsort/conftest.py less more
0 """
1 Fixtures for pytest.
2 """
3
4 import locale
5
6 import pytest
7
8
9 def load_locale(x):
10 """Convenience to load a locale, trying ISO8859-1 first."""
11 try:
12 locale.setlocale(locale.LC_ALL, str("{0}.ISO8859-1".format(x)))
13 except locale.Error:
14 locale.setlocale(locale.LC_ALL, str("{0}.UTF-8".format(x)))
15
16
17 @pytest.fixture()
18 def with_locale_en_us():
19 """Convenience to load the en_US locale - reset when complete."""
20 orig = locale.getlocale()
21 yield load_locale("en_US")
22 locale.setlocale(locale.LC_ALL, orig)
23
24
25 @pytest.fixture()
26 def with_locale_de_de():
27 """
28 Convenience to load the de_DE locale - reset when complete - skip if missing.
29 """
30 orig = locale.getlocale()
31 try:
32 load_locale("de_DE")
33 except locale.Error:
34 pytest.skip("requires de_DE locale to be installed")
35 else:
36 yield
37 finally:
38 locale.setlocale(locale.LC_ALL, orig)
+0
-70
test_natsort/profile_natsorted.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 This file contains functions to profile natsorted with different
3 inputs and different settings.
4 """
5 from __future__ import print_function
6
7 import cProfile
8 import locale
9 import sys
10
11 try:
12 from natsort import ns, natsort_keygen
13 from natsort.compat.py23 import py23_range
14 except ImportError:
15 sys.path.insert(0, ".")
16 from natsort import ns, natsort_keygen
17 from natsort.compat.py23 import py23_range
18
19 locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
20
21 # Samples to parse
22 number = 14695498
23 int_string = "43493"
24 float_string = "-434.93e7"
25 plain_string = "hello world"
26 fancy_string = "7abba9342fdab"
27 a_path = "/p/Folder (1)/file (1).tar.gz"
28 some_bytes = b"these are bytes"
29 a_list = ["hello", "goodbye", "74"]
30
31 basic_key = natsort_keygen()
32 real_key = natsort_keygen(alg=ns.REAL)
33 path_key = natsort_keygen(alg=ns.PATH)
34 locale_key = natsort_keygen(alg=ns.LOCALE)
35
36
37 def prof_time_to_generate():
38 print("*** Generate Plain Key ***")
39 for _ in py23_range(100000):
40 natsort_keygen()
41
42
43 cProfile.run("prof_time_to_generate()", sort="time")
44
45
46 def prof_parsing(a, msg, key=basic_key):
47 print(msg)
48 for _ in py23_range(100000):
49 key(a)
50
51
52 cProfile.run(
53 'prof_parsing(int_string, "*** Basic Call, Int as String ***")', sort="time"
54 )
55 cProfile.run(
56 'prof_parsing(float_string, "*** Basic Call, Float as String ***")', sort="time"
57 )
58 cProfile.run('prof_parsing(float_string, "*** Real Call ***", real_key)', sort="time")
59 cProfile.run('prof_parsing(number, "*** Basic Call, Number ***")', sort="time")
60 cProfile.run(
61 'prof_parsing(fancy_string, "*** Basic Call, Mixed String ***")', sort="time"
62 )
63 cProfile.run('prof_parsing(some_bytes, "*** Basic Call, Byte String ***")', sort="time")
64 cProfile.run('prof_parsing(a_path, "*** Path Call ***", path_key)', sort="time")
65 cProfile.run('prof_parsing(a_list, "*** Basic Call, Recursive ***")', sort="time")
66 cProfile.run(
67 'prof_parsing("434,930,000 dollars", "*** Locale Call ***", locale_key)',
68 sort="time",
69 )
+0
-138
test_natsort/test_fake_fastnumbers.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 Test the fake fastnumbers module.
3 """
4 from __future__ import unicode_literals
5
6 import unicodedata
7 from math import isnan
8
9 from hypothesis import given
10 from hypothesis.strategies import floats, integers, text
11 from natsort.compat.fake_fastnumbers import fast_float, fast_int
12 from natsort.compat.py23 import PY_VERSION
13
14 if PY_VERSION >= 3:
15 long = int
16
17
18 def is_float(x):
19 try:
20 float(x)
21 except ValueError:
22 try:
23 unicodedata.numeric(x)
24 except (ValueError, TypeError):
25 return False
26 else:
27 return True
28 else:
29 return True
30
31
32 def not_a_float(x):
33 return not is_float(x)
34
35
36 def is_int(x):
37 try:
38 return x.is_integer()
39 except AttributeError:
40 try:
41 long(x)
42 except ValueError:
43 try:
44 unicodedata.digit(x)
45 except (ValueError, TypeError):
46 return False
47 else:
48 return True
49 else:
50 return True
51
52
53 def not_an_int(x):
54 return not is_int(x)
55
56
57 # Each test has an "example" version for demonstrative purposes,
58 # and a test that uses the hypothesis module.
59
60
61 def test_fast_float_returns_nan_alternate_if_nan_option_is_given():
62 assert fast_float("nan", nan=7) == 7
63
64
65 def test_fast_float_converts_float_string_to_float_example():
66 assert fast_float("45.8") == 45.8
67 assert fast_float("-45") == -45.0
68 assert fast_float("45.8e-2", key=len) == 45.8e-2
69 assert isnan(fast_float("nan"))
70 assert isnan(fast_float("+nan"))
71 assert isnan(fast_float("-NaN"))
72 assert fast_float("۱۲.۱۲") == 12.12
73 assert fast_float("-۱۲.۱۲") == -12.12
74
75
76 @given(floats(allow_nan=False))
77 def test_fast_float_converts_float_string_to_float(x):
78 assert fast_float(repr(x)) == x
79
80
81 def test_fast_float_leaves_string_as_is_example():
82 assert fast_float("invalid") == "invalid"
83
84
85 @given(text().filter(not_a_float).filter(bool))
86 def test_fast_float_leaves_string_as_is(x):
87 assert fast_float(x) == x
88
89
90 def test_fast_float_with_key_applies_to_string_example():
91 assert fast_float("invalid", key=len) == len("invalid")
92
93
94 @given(text().filter(not_a_float).filter(bool))
95 def test_fast_float_with_key_applies_to_string(x):
96 assert fast_float(x, key=len) == len(x)
97
98
99 def test_fast_int_leaves_float_string_as_is_example():
100 assert fast_int("45.8") == "45.8"
101 assert fast_int("nan") == "nan"
102 assert fast_int("inf") == "inf"
103
104
105 @given(floats().filter(not_an_int))
106 def test_fast_int_leaves_float_string_as_is(x):
107 assert fast_int(repr(x)) == repr(x)
108
109
110 def test_fast_int_converts_int_string_to_int_example():
111 assert fast_int("-45") == -45
112 assert fast_int("+45") == 45
113 assert fast_int("۱۲") == 12
114 assert fast_int("-۱۲") == -12
115
116
117 @given(integers())
118 def test_fast_int_converts_int_string_to_int(x):
119 assert fast_int(repr(x)) == x
120
121
122 def test_fast_int_leaves_string_as_is_example():
123 assert fast_int("invalid") == "invalid"
124
125
126 @given(text().filter(not_an_int).filter(bool))
127 def test_fast_int_leaves_string_as_is(x):
128 assert fast_int(x) == x
129
130
131 def test_fast_int_with_key_applies_to_string_example():
132 assert fast_int("invalid", key=len) == len("invalid")
133
134
135 @given(text().filter(not_an_int).filter(bool))
136 def test_fast_int_with_key_applies_to_string(x):
137 assert fast_int(x, key=len) == len(x)
+0
-53
test_natsort/test_final_data_transform_factory.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import example, given
6 from hypothesis.strategies import floats, integers, text
7 from natsort.compat.py23 import py23_str
8 from natsort.ns_enum import ns, ns_DUMB
9 from natsort.utils import final_data_transform_factory
10
11
12 @pytest.mark.parametrize("alg", [ns.DEFAULT, ns.UNGROUPLETTERS, ns.LOCALE])
13 @given(x=text(), y=floats(allow_nan=False, allow_infinity=False) | integers())
14 @pytest.mark.usefixtures("with_locale_en_us")
15 def test_final_data_transform_factory_default(x, y, alg):
16 final_data_transform_func = final_data_transform_factory(alg, "", "::")
17 value = (x, y)
18 original_value = "".join(map(py23_str, value))
19 result = final_data_transform_func(value, original_value)
20 assert result == value
21
22
23 @pytest.mark.parametrize(
24 "alg, func",
25 [
26 (ns.UNGROUPLETTERS | ns.LOCALE, lambda x: x),
27 (ns.LOCALE | ns.UNGROUPLETTERS | ns_DUMB, lambda x: x),
28 (ns.LOCALE | ns.UNGROUPLETTERS | ns.LOWERCASEFIRST, lambda x: x),
29 (
30 ns.LOCALE | ns.UNGROUPLETTERS | ns_DUMB | ns.LOWERCASEFIRST,
31 lambda x: x.swapcase(),
32 ),
33 ],
34 )
35 @given(x=text(), y=floats(allow_nan=False, allow_infinity=False) | integers())
36 @example(x="İ", y=0)
37 @pytest.mark.usefixtures("with_locale_en_us")
38 def test_final_data_transform_factory_ungroup_and_locale(x, y, alg, func):
39 final_data_transform_func = final_data_transform_factory(alg, "", "::")
40 value = (x, y)
41 original_value = "".join(map(py23_str, value))
42 result = final_data_transform_func(value, original_value)
43 if x:
44 expected = ((func(original_value[:1]),), value)
45 else:
46 expected = (("::",), value)
47 assert result == expected
48
49
50 def test_final_data_transform_factory_ungroup_and_locale_empty_tuple():
51 final_data_transform_func = final_data_transform_factory(ns.UG | ns.L, "", "::")
52 assert final_data_transform_func((), "") == ((), ())
+0
-105
test_natsort/test_input_string_transform_factory.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import example, given
6 from hypothesis.strategies import integers, text
7 from natsort.compat.py23 import NEWPY
8 from natsort.ns_enum import ns, ns_DUMB
9 from natsort.utils import input_string_transform_factory
10
11
12 def lower(x):
13 """Call the appropriate lower method for the Python version."""
14 if NEWPY:
15 return x.casefold()
16 else:
17 return x.lower()
18
19
20 def thousands_separated_int(n):
21 """Insert thousands separators in an int."""
22 new_int = ""
23 for i, y in enumerate(reversed(n), 1):
24 new_int = y + new_int
25 # For every third digit, insert a thousands separator.
26 if i % 3 == 0 and i != len(n):
27 new_int = "," + new_int
28 return new_int
29
30
31 @given(text())
32 def test_input_string_transform_factory_is_no_op_for_no_alg_options(x):
33 input_string_transform_func = input_string_transform_factory(ns.DEFAULT)
34 assert input_string_transform_func(x) is x
35
36
37 @pytest.mark.parametrize(
38 "alg, example_func",
39 [
40 (ns.IGNORECASE, lower),
41 (ns_DUMB, lambda x: x.swapcase()),
42 (ns.LOWERCASEFIRST, lambda x: x.swapcase()),
43 (ns_DUMB | ns.LOWERCASEFIRST, lambda x: x), # No-op
44 (ns.IGNORECASE | ns.LOWERCASEFIRST, lambda x: lower(x.swapcase())),
45 ],
46 )
47 @given(x=text())
48 def test_input_string_transform_factory(x, alg, example_func):
49 input_string_transform_func = input_string_transform_factory(alg)
50 assert input_string_transform_func(x) == example_func(x)
51
52
53 @example(12543642642534980) # 12,543,642,642,534,980 => 12543642642534980
54 @given(x=integers(min_value=1000))
55 @pytest.mark.usefixtures("with_locale_en_us")
56 def test_input_string_transform_factory_cleans_thousands(x):
57 int_str = str(x).rstrip("lL")
58 thousands_int_str = thousands_separated_int(int_str)
59 assert thousands_int_str.replace(",", "") != thousands_int_str
60
61 input_string_transform_func = input_string_transform_factory(ns.LOCALE)
62 assert input_string_transform_func(thousands_int_str) == int_str
63
64 # Using LOCALEALPHA does not affect numbers.
65 input_string_transform_func_no_op = input_string_transform_factory(ns.LOCALEALPHA)
66 assert input_string_transform_func_no_op(thousands_int_str) == thousands_int_str
67
68
69 # These might be too much to test with hypothesis.
70
71
72 @pytest.mark.parametrize(
73 "x, expected",
74 [
75 ("12,543,642642.5345,34980", "12543,642642.5345,34980"),
76 ("12,59443,642,642.53,4534980", "12,59443,642642.53,4534980"), # No change
77 ("12543,642,642.5,34534980", "12543,642642.5,34534980"),
78 ],
79 )
80 @pytest.mark.usefixtures("with_locale_en_us")
81 def test_input_string_transform_factory_handles_us_locale(x, expected):
82 input_string_transform_func = input_string_transform_factory(ns.LOCALE)
83 assert input_string_transform_func(x) == expected
84
85
86 @pytest.mark.parametrize(
87 "alg, expected",
88 [
89 (ns.LOCALE, "1543,753"), # Does nothing without FLOAT
90 (ns.LOCALE | ns.FLOAT, "1543.753"),
91 (ns.LOCALEALPHA, "1543,753"), # LOCALEALPHA won't do anything, need LOCALENUM
92 ],
93 )
94 @pytest.mark.usefixtures("with_locale_de_de")
95 def test_input_string_transform_factory_handles_german_locale(alg, expected):
96 input_string_transform_func = input_string_transform_factory(alg)
97 assert input_string_transform_func("1543,753") == expected
98
99
100 @pytest.mark.usefixtures("with_locale_de_de")
101 def test_input_string_transform_factory_does_nothing_with_non_num_input():
102 input_string_transform_func = input_string_transform_factory(ns.LOCALE | ns.FLOAT)
103 expected = "154s,t53"
104 assert input_string_transform_func("154s,t53") == expected
+0
-223
test_natsort/test_main.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 Test the natsort command-line tool functions.
3 """
4 from __future__ import print_function, unicode_literals
5
6 import re
7 import sys
8
9 import pytest
10 from hypothesis import given
11 from hypothesis.strategies import data, floats, integers, lists
12 from natsort.__main__ import (
13 check_filters,
14 keep_entry_range,
15 keep_entry_value,
16 main,
17 range_check,
18 sort_and_print_entries,
19 )
20
21
22 def test_main_passes_default_arguments_with_no_command_line_options(mocker):
23 p = mocker.patch("natsort.__main__.sort_and_print_entries")
24 main("num-2", "num-6", "num-1")
25 args = p.call_args[0][1]
26 assert not args.paths
27 assert args.filter is None
28 assert args.reverse_filter is None
29 assert args.exclude is None
30 assert not args.reverse
31 assert args.number_type == "int"
32 assert not args.signed
33 assert args.exp
34 assert not args.locale
35
36
37 def test_main_passes_arguments_with_all_command_line_options(mocker):
38 arguments = ["--paths", "--reverse", "--locale"]
39 arguments.extend(["--filter", "4", "10"])
40 arguments.extend(["--reverse-filter", "100", "110"])
41 arguments.extend(["--number-type", "float"])
42 arguments.extend(["--noexp", "--sign"])
43 arguments.extend(["--exclude", "34"])
44 arguments.extend(["--exclude", "35"])
45 arguments.extend(["num-2", "num-6", "num-1"])
46 p = mocker.patch("natsort.__main__.sort_and_print_entries")
47 main(*arguments)
48 args = p.call_args[0][1]
49 assert args.paths
50 assert args.filter == [(4.0, 10.0)]
51 assert args.reverse_filter == [(100.0, 110.0)]
52 assert args.exclude == [34, 35]
53 assert args.reverse
54 assert args.number_type == "float"
55 assert args.signed
56 assert not args.exp
57 assert args.locale
58
59
60 class Args:
61 """A dummy class to simulate the argparse Namespace object"""
62
63 def __init__(self, filt, reverse_filter, exclude, as_path, reverse):
64 self.filter = filt
65 self.reverse_filter = reverse_filter
66 self.exclude = exclude
67 self.reverse = reverse
68 self.number_type = "float"
69 self.signed = True
70 self.exp = True
71 self.paths = as_path
72 self.locale = 0
73
74
75 mock_print = "__builtin__.print" if sys.version[0] == "2" else "builtins.print"
76
77 entries = [
78 "tmp/a57/path2",
79 "tmp/a23/path1",
80 "tmp/a1/path1",
81 "tmp/a1 (1)/path1",
82 "tmp/a130/path1",
83 "tmp/a64/path1",
84 "tmp/a64/path2",
85 ]
86
87
88 @pytest.mark.parametrize(
89 "options, order",
90 [
91 # Defaults, all options false
92 # tmp/a1 (1)/path1
93 # tmp/a1/path1
94 # tmp/a23/path1
95 # tmp/a57/path2
96 # tmp/a64/path1
97 # tmp/a64/path2
98 # tmp/a130/path1
99 ([None, None, False, False, False], [3, 2, 1, 0, 5, 6, 4]),
100 # Path option True
101 # tmp/a1/path1
102 # tmp/a1 (1)/path1
103 # tmp/a23/path1
104 # tmp/a57/path2
105 # tmp/a64/path1
106 # tmp/a64/path2
107 # tmp/a130/path1
108 ([None, None, False, True, False], [2, 3, 1, 0, 5, 6, 4]),
109 # Filter option keeps only within range
110 # tmp/a23/path1
111 # tmp/a57/path2
112 # tmp/a64/path1
113 # tmp/a64/path2
114 ([[(20, 100)], None, False, False, False], [1, 0, 5, 6]),
115 # Reverse filter, exclude in range
116 # tmp/a1/path1
117 # tmp/a1 (1)/path1
118 # tmp/a130/path1
119 ([None, [(20, 100)], False, True, False], [2, 3, 4]),
120 # Exclude given values with exclude list
121 # tmp/a1/path1
122 # tmp/a1 (1)/path1
123 # tmp/a57/path2
124 # tmp/a64/path1
125 # tmp/a64/path2
126 ([None, None, [23, 130], True, False], [2, 3, 0, 5, 6]),
127 # Reverse order
128 # tmp/a130/path1
129 # tmp/a64/path2
130 # tmp/a64/path1
131 # tmp/a57/path2
132 # tmp/a23/path1
133 # tmp/a1 (1)/path1
134 # tmp/a1/path1
135 ([None, None, False, True, True], reversed([2, 3, 1, 0, 5, 6, 4])),
136 ],
137 )
138 def test_sort_and_print_entries(options, order, mocker):
139 p = mocker.patch(mock_print)
140 sort_and_print_entries(entries, Args(*options))
141 e = [mocker.call(entries[i]) for i in order]
142 p.assert_has_calls(e)
143
144
145 # Each test has an "example" version for demonstrative purposes,
146 # and a test that uses the hypothesis module.
147
148
149 def test_range_check_returns_range_as_is_but_with_floats_example():
150 assert range_check(10, 11) == (10.0, 11.0)
151 assert range_check(6.4, 30) == (6.4, 30.0)
152
153
154 @given(x=floats(allow_nan=False, min_value=-1E8, max_value=1E8) | integers(), d=data())
155 def test_range_check_returns_range_as_is_if_first_is_less_than_second(x, d):
156 # Pull data such that the first is less than the second.
157 if isinstance(x, float):
158 y = d.draw(floats(min_value=x + 1.0, max_value=1E9, allow_nan=False))
159 else:
160 y = d.draw(integers(min_value=x + 1))
161 assert range_check(x, y) == (x, y)
162
163
164 def test_range_check_raises_value_error_if_second_is_less_than_first_example():
165 with pytest.raises(ValueError, match="low >= high"):
166 range_check(7, 2)
167
168
169 @given(x=floats(allow_nan=False), d=data())
170 def test_range_check_raises_value_error_if_second_is_less_than_first(x, d):
171 # Pull data such that the first is greater than or equal to the second.
172 y = d.draw(floats(max_value=x, allow_nan=False))
173 with pytest.raises(ValueError, match="low >= high"):
174 range_check(x, y)
175
176
177 def test_check_filters_returns_none_if_filter_evaluates_to_false():
178 assert check_filters(()) is None
179 assert check_filters(False) is None
180 assert check_filters(None) is None
181
182
183 def test_check_filters_returns_input_as_is_if_filter_is_valid_example():
184 assert check_filters([(6, 7)]) == [(6, 7)]
185 assert check_filters([(6, 7), (2, 8)]) == [(6, 7), (2, 8)]
186
187
188 @given(x=lists(integers(), min_size=1), d=data())
189 def test_check_filters_returns_input_as_is_if_filter_is_valid(x, d):
190 # ensure y is element-wise greater than x
191 y = [d.draw(integers(min_value=val + 1)) for val in x]
192 assert check_filters(list(zip(x, y))) == [(i, j) for i, j in zip(x, y)]
193
194
195 def test_check_filters_raises_value_error_if_filter_is_invalid_example():
196 with pytest.raises(ValueError, match="Error in --filter: low >= high"):
197 check_filters([(7, 2)])
198
199
200 @given(x=lists(integers(), min_size=1), d=data())
201 def test_check_filters_raises_value_error_if_filter_is_invalid(x, d):
202 # ensure y is element-wise less than or equal to x
203 y = [d.draw(integers(max_value=val)) for val in x]
204 with pytest.raises(ValueError, match="Error in --filter: low >= high"):
205 check_filters(list(zip(x, y)))
206
207
208 @pytest.mark.parametrize(
209 "lows, highs, truth",
210 # 1. Any portion is between the bounds => True.
211 # 2. Any portion is between any bounds => True.
212 # 3. No portion is between the bounds => False.
213 [([0], [100], True), ([1, 88], [20, 90], True), ([1], [20], False)],
214 )
215 def test_keep_entry_range(lows, highs, truth):
216 assert keep_entry_range("a56b23c89", lows, highs, int, re.compile(r"\d+")) is truth
217
218
219 # 1. Values not in entry => True. 2. Values in entry => False.
220 @pytest.mark.parametrize("values, truth", [([100, 45], True), ([23], False)])
221 def test_keep_entry_value(values, truth):
222 assert keep_entry_value("a56b23c89", values, int, re.compile(r"\d+")) is truth
+0
-83
test_natsort/test_natsort_cmp.py less more
0 # -*- coding: utf-8 -*-
1 # pylint: disable=unused-variable
2 """These test the natcmp() function.
3
4 Note that these tests are only relevant for Python version < 3.
5 """
6 from functools import partial
7
8 import pytest
9 from hypothesis import given
10 from hypothesis.strategies import floats, integers, lists
11 from natsort import ns
12 from natsort.compat.py23 import PY_VERSION, py23_cmp
13
14 if PY_VERSION < 3:
15 from natsort import natcmp
16
17
18 class Comparable(object):
19 """Stub class for testing natcmp functionality."""
20
21 def __init__(self, value):
22 self.value = value
23
24 def __cmp__(self, other):
25 return natcmp(self.value, other.value)
26
27
28 @pytest.mark.skipif(PY_VERSION >= 3.0, reason="cmp() deprecated in Python 3")
29 class TestNatCmp:
30
31 def test_classes_can_be_compared(self):
32 one = Comparable("1")
33 two = Comparable("2")
34 another_two = Comparable("2")
35 ten = Comparable("10")
36 assert ten > two == another_two > one
37
38 def test_keys_are_being_cached(self, mocker):
39 natcmp.cached_keys = {}
40 assert len(natcmp.cached_keys) == 0
41 natcmp(0, 0)
42 assert len(natcmp.cached_keys) == 1
43 natcmp(0, 0)
44 assert len(natcmp.cached_keys) == 1
45
46 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=False):
47 natcmp(0, 0, alg=ns.L)
48 assert len(natcmp.cached_keys) == 2
49 natcmp(0, 0, alg=ns.L)
50 assert len(natcmp.cached_keys) == 2
51
52 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=True):
53 natcmp(0, 0, alg=ns.L)
54 assert len(natcmp.cached_keys) == 3
55 natcmp(0, 0, alg=ns.L)
56 assert len(natcmp.cached_keys) == 3
57
58 def test_illegal_algorithm_raises_error(self):
59 with pytest.raises(ValueError):
60 natcmp(0, 0, alg="Just random stuff")
61
62 def test_classes_can_utilize_max_or_min(self):
63 comparables = [Comparable(i) for i in range(10)]
64
65 assert max(comparables) == comparables[-1]
66 assert min(comparables) == comparables[0]
67
68 @given(integers(), integers())
69 def test_natcmp_works_the_same_for_integers_as_cmp(self, x, y):
70 assert py23_cmp(x, y) == natcmp(x, y)
71
72 @given(floats(allow_nan=False), floats(allow_nan=False))
73 def test_natcmp_works_the_same_for_floats_as_cmp(self, x, y):
74 assert py23_cmp(x, y) == natcmp(x, y)
75
76 @given(lists(elements=integers()))
77 def test_sort_strings_with_numbers(self, a_list):
78 strings = [str(var) for var in a_list]
79 # noinspection PyArgumentList
80 natcmp_sorted = sorted(strings, cmp=partial(natcmp, alg=ns.SIGNED))
81
82 assert sorted(a_list) == [int(var) for var in natcmp_sorted]
+0
-49
test_natsort/test_natsort_key.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import given
6 from hypothesis.strategies import binary, floats, integers, lists, text
7 from natsort.compat.py23 import PY_VERSION, py23_str
8 from natsort.utils import natsort_key
9
10 if PY_VERSION >= 3:
11 long = int
12
13
14 def str_func(x):
15 if isinstance(x, py23_str):
16 return x
17 else:
18 raise TypeError("Not a str!")
19
20
21 def fail(_):
22 raise AssertionError("This should never be reached!")
23
24
25 @given(floats(allow_nan=False) | integers())
26 def test_natsort_key_with_numeric_input_takes_number_path(x):
27 assert natsort_key(x, None, str_func, fail, lambda y: y) is x
28
29
30 @pytest.mark.skipif(PY_VERSION < 3, reason="only valid on python3")
31 @given(binary().filter(bool))
32 def test_natsort_key_with_bytes_input_takes_bytes_path(x):
33 assert natsort_key(x, None, str_func, lambda y: y, fail) is x
34
35
36 @given(text())
37 def test_natsort_key_with_text_input_takes_string_path(x):
38 assert natsort_key(x, None, str_func, fail, fail) is x
39
40
41 @given(lists(elements=text(), min_size=1, max_size=10))
42 def test_natsort_key_with_nested_input_takes_nested_path(x):
43 assert natsort_key(x, None, str_func, fail, fail) == tuple(x)
44
45
46 @given(text())
47 def test_natsort_key_with_key_argument_applies_key_before_processing(x):
48 assert natsort_key(x, len, str_func, fail, lambda y: y) == len(x)
+0
-168
test_natsort/test_natsort_keygen.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 Here are a collection of examples of how this module can be used.
3 See the README or the natsort homepage for more details.
4 """
5 from __future__ import print_function, unicode_literals
6
7 import pytest
8 from natsort import natsort_key, natsort_keygen, natsorted, ns
9 from natsort.compat.locale import get_strxfrm, null_string_locale
10 from natsort.compat.py23 import PY_VERSION
11
12
13 @pytest.fixture
14 def arbitrary_input():
15 return ["6A-5.034e+1", "/Folder (1)/Foo", 56.7]
16
17
18 @pytest.fixture
19 def bytes_input():
20 return b"6A-5.034e+1"
21
22
23 def test_natsort_keygen_demonstration():
24 original_list = ["a50", "a51.", "a50.31", "a50.4", "a5.034e1", "a50.300"]
25 copy_of_list = original_list[:]
26 original_list.sort(key=natsort_keygen(alg=ns.F))
27 # natsorted uses the output of natsort_keygen under the hood.
28 assert original_list == natsorted(copy_of_list, alg=ns.F)
29
30
31 def test_natsort_key_public():
32 assert natsort_key("a-5.034e2") == ("a-", 5, ".", 34, "e", 2)
33
34
35 def test_natsort_keygen_with_invalid_alg_input_raises_value_error():
36 # Invalid arguments give the correct response
37 with pytest.raises(ValueError, match="'alg' argument"):
38 natsort_keygen(None, "1")
39
40
41 @pytest.mark.parametrize(
42 "alg, expected",
43 [(ns.DEFAULT, ("a-", 5, ".", 34, "e", 1)), (ns.FLOAT | ns.SIGNED, ("a", -50.34))],
44 )
45 def test_natsort_keygen_returns_natsort_key_that_parses_input(alg, expected):
46 ns_key = natsort_keygen(alg=alg)
47 assert ns_key("a-5.034e1") == expected
48
49
50 @pytest.mark.parametrize(
51 "alg, expected",
52 [
53 (
54 ns.DEFAULT,
55 (("", 6, "A-", 5, ".", 34, "e+", 1), ("/Folder (", 1, ")/Foo"), ("", 56.7)),
56 ),
57 (
58 ns.IGNORECASE,
59 (("", 6, "a-", 5, ".", 34, "e+", 1), ("/folder (", 1, ")/foo"), ("", 56.7)),
60 ),
61 (ns.REAL, (("", 6.0, "A", -50.34), ("/Folder (", 1.0, ")/Foo"), ("", 56.7))),
62 (
63 ns.LOWERCASEFIRST | ns.FLOAT | ns.NOEXP,
64 (
65 ("", 6.0, "a-", 5.034, "E+", 1.0),
66 ("/fOLDER (", 1.0, ")/fOO"),
67 ("", 56.7),
68 ),
69 ),
70 (
71 ns.PATH | ns.GROUPLETTERS,
72 (
73 (("", 6, "aA--", 5, "..", 34, "ee++", 1),),
74 (("//",), ("fFoollddeerr ((", 1, "))"), ("fFoooo",)),
75 (("", 56.7),),
76 ),
77 ),
78 ],
79 )
80 def test_natsort_keygen_handles_arbitrary_input(arbitrary_input, alg, expected):
81 ns_key = natsort_keygen(alg=alg)
82 assert ns_key(arbitrary_input) == expected
83
84
85 @pytest.mark.parametrize(
86 "alg, expected",
87 [
88 (ns.DEFAULT, (b"6A-5.034e+1",)),
89 (ns.IGNORECASE, (b"6a-5.034e+1",)),
90 (ns.REAL, (b"6A-5.034e+1",)),
91 (ns.LOWERCASEFIRST | ns.FLOAT | ns.NOEXP, (b"6A-5.034e+1",)),
92 (ns.PATH | ns.GROUPLETTERS, ((b"6A-5.034e+1",),)),
93 ],
94 )
95 @pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
96 def test_natsort_keygen_handles_bytes_input(bytes_input, alg, expected):
97 ns_key = natsort_keygen(alg=alg)
98 assert ns_key(bytes_input) == expected
99
100
101 @pytest.mark.parametrize(
102 "alg, expected, is_dumb",
103 [
104 (
105 ns.LOCALE,
106 (
107 (null_string_locale, 6, "A-", 5, ".", 34, "e+", 1),
108 ("/Folder (", 1, ")/Foo"),
109 (null_string_locale, 56.7),
110 ),
111 False,
112 ),
113 (
114 ns.LOCALE,
115 (
116 (null_string_locale, 6, "aa--", 5, "..", 34, "eE++", 1),
117 ("//ffoOlLdDeErR ((", 1, "))//ffoOoO"),
118 (null_string_locale, 56.7),
119 ),
120 True,
121 ),
122 (
123 ns.LOCALE | ns.CAPITALFIRST,
124 (
125 (("",), (null_string_locale, 6, "A-", 5, ".", 34, "e+", 1)),
126 (("/",), ("/Folder (", 1, ")/Foo")),
127 (("",), (null_string_locale, 56.7)),
128 ),
129 False,
130 ),
131 ],
132 )
133 @pytest.mark.usefixtures("with_locale_en_us")
134 def test_natsort_keygen_with_locale(mocker, arbitrary_input, alg, expected, is_dumb):
135 # First, apply the correct strxfrm function to the string values.
136 strxfrm = get_strxfrm()
137 expected = [list(sub) for sub in expected]
138 try:
139 for i in (2, 4, 6):
140 expected[0][i] = strxfrm(expected[0][i])
141 for i in (0, 2):
142 expected[1][i] = strxfrm(expected[1][i])
143 expected = tuple(tuple(sub) for sub in expected)
144 except IndexError: # ns.LOCALE | ns.CAPITALFIRST
145 expected = [[list(subsub) for subsub in sub] for sub in expected]
146 for i in (2, 4, 6):
147 expected[0][1][i] = strxfrm(expected[0][1][i])
148 for i in (0, 2):
149 expected[1][1][i] = strxfrm(expected[1][1][i])
150 expected = tuple(tuple(tuple(subsub) for subsub in sub) for sub in expected)
151
152 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
153 ns_key = natsort_keygen(alg=alg)
154 assert ns_key(arbitrary_input) == expected
155
156
157 @pytest.mark.parametrize(
158 "alg, is_dumb",
159 [(ns.LOCALE, False), (ns.LOCALE, True), (ns.LOCALE | ns.CAPITALFIRST, False)],
160 )
161 @pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
162 @pytest.mark.usefixtures("with_locale_en_us")
163 def test_natsort_keygen_with_locale_bytes(mocker, bytes_input, alg, is_dumb):
164 expected = (b"6A-5.034e+1",)
165 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
166 ns_key = natsort_keygen(alg=alg)
167 assert ns_key(bytes_input) == expected
+0
-299
test_natsort/test_natsorted.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 Here are a collection of examples of how this module can be used.
3 See the README or the natsort homepage for more details.
4 """
5 from __future__ import print_function, unicode_literals
6
7 from operator import itemgetter
8
9 import pytest
10 from natsort import as_utf8, natsorted, ns
11 from natsort.compat.py23 import PY_VERSION
12 from pytest import raises
13
14
15 @pytest.fixture
16 def float_list():
17 return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
18
19
20 @pytest.fixture
21 def fruit_list():
22 return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
23
24
25 @pytest.fixture
26 def mixed_list():
27 return ["Ä", "0", "ä", 3, "b", 1.5, "2", "Z"]
28
29
30 def test_natsorted_numbers_in_ascending_order():
31 given = ["a2", "a5", "a9", "a1", "a4", "a10", "a6"]
32 expected = ["a1", "a2", "a4", "a5", "a6", "a9", "a10"]
33 assert natsorted(given) == expected
34
35
36 def test_natsorted_can_sort_as_signed_floats_with_exponents(float_list):
37 expected = ["a-50", "a50", "a50.300", "a50.31", "a5.034e1", "a50.4", "a51."]
38 assert natsorted(float_list, alg=ns.REAL) == expected
39
40
41 @pytest.mark.parametrize(
42 # UNSIGNED is default
43 "alg",
44 [ns.NOEXP | ns.FLOAT | ns.UNSIGNED, ns.NOEXP | ns.FLOAT],
45 )
46 def test_natsorted_can_sort_as_unsigned_and_ignore_exponents(float_list, alg):
47 expected = ["a5.034e1", "a50", "a50.300", "a50.31", "a50.4", "a51.", "a-50"]
48 assert natsorted(float_list, alg=alg) == expected
49
50
51 # INT, DIGIT, and VERSION are all equivalent.
52 @pytest.mark.parametrize("alg", [ns.DEFAULT, ns.INT, ns.DIGIT, ns.VERSION])
53 def test_natsorted_can_sort_as_unsigned_ints_which_is_default(float_list, alg):
54 expected = ["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"]
55 assert natsorted(float_list, alg=alg) == expected
56
57
58 def test_natsorted_can_sort_as_signed_ints(float_list):
59 expected = ["a-50", "a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51."]
60 assert natsorted(float_list, alg=ns.SIGNED) == expected
61
62
63 @pytest.mark.parametrize(
64 "alg, expected",
65 [(ns.UNSIGNED, ["a7", "a+2", "a-5"]), (ns.SIGNED, ["a-5", "a+2", "a7"])],
66 )
67 def test_natsorted_can_sort_with_or_without_accounting_for_sign(alg, expected):
68 given = ["a-5", "a7", "a+2"]
69 assert natsorted(given, alg=alg) == expected
70
71
72 @pytest.mark.parametrize("alg", [ns.DEFAULT, ns.VERSION])
73 def test_natsorted_can_sort_as_version_numbers(alg):
74 given = ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
75 expected = ["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"]
76 assert natsorted(given, alg=alg) == expected
77
78
79 @pytest.mark.parametrize(
80 "alg, expected",
81 [
82 (ns.DEFAULT, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
83 (ns.NUMAFTER, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
84 ],
85 )
86 def test_natsorted_handles_mixed_types(mixed_list, alg, expected):
87 assert natsorted(mixed_list, alg=alg) == expected
88
89
90 @pytest.mark.parametrize(
91 "alg, expected, slc",
92 [
93 (ns.DEFAULT, [float("nan"), 5, "25", 1E40], slice(1, None)),
94 (ns.NANLAST, [5, "25", 1E40, float("nan")], slice(None, 3)),
95 ],
96 )
97 def test_natsorted_handles_nan(alg, expected, slc):
98 given = ["25", 5, float("nan"), 1E40]
99 # The slice is because NaN != NaN
100 # noinspection PyUnresolvedReferences
101 assert natsorted(given, alg=alg)[slc] == expected[slc]
102
103
104 @pytest.mark.skipif(PY_VERSION < 3.0, reason="error is only raised on Python 3")
105 def test_natsorted_with_mixed_bytes_and_str_input_raises_type_error():
106 with raises(TypeError, match="bytes"):
107 natsorted(["ä", b"b"])
108
109 # ...unless you use as_utf (or some other decoder).
110 assert natsorted(["ä", b"b"], key=as_utf8) == ["ä", b"b"]
111
112
113 def test_natsorted_raises_type_error_for_non_iterable_input():
114 with raises(TypeError, match="'int' object is not iterable"):
115 natsorted(100)
116
117
118 def test_natsorted_recurses_into_nested_lists():
119 given = [["a1", "a5"], ["a1", "a40"], ["a10", "a1"], ["a2", "a5"]]
120 expected = [["a1", "a5"], ["a1", "a40"], ["a2", "a5"], ["a10", "a1"]]
121 assert natsorted(given) == expected
122
123
124 def test_natsorted_applies_key_to_each_list_element_before_sorting_list():
125 given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
126 expected = [("c", "num2"), ("a", "num3"), ("b", "num5")]
127 assert natsorted(given, key=itemgetter(1)) == expected
128
129
130 def test_natsorted_returns_list_in_reversed_order_with_reverse_option(float_list):
131 expected = natsorted(float_list)[::-1]
132 assert natsorted(float_list, reverse=True) == expected
133
134
135 def test_natsorted_handles_filesystem_paths():
136 given = [
137 "/p/Folder (10)/file.tar.gz",
138 "/p/Folder/file.tar.gz",
139 "/p/Folder (1)/file (1).tar.gz",
140 "/p/Folder (1)/file.tar.gz",
141 ]
142 expected_correct = [
143 "/p/Folder/file.tar.gz",
144 "/p/Folder (1)/file.tar.gz",
145 "/p/Folder (1)/file (1).tar.gz",
146 "/p/Folder (10)/file.tar.gz",
147 ]
148 expected_incorrect = [
149 "/p/Folder (1)/file (1).tar.gz",
150 "/p/Folder (1)/file.tar.gz",
151 "/p/Folder (10)/file.tar.gz",
152 "/p/Folder/file.tar.gz",
153 ]
154 # Is incorrect by default.
155 assert natsorted(given) == expected_incorrect
156 # Need ns.PATH to make it correct.
157 assert natsorted(given, alg=ns.PATH) == expected_correct
158
159
160 def test_natsorted_handles_numbers_and_filesystem_paths_simultaneously():
161 # You can sort paths and numbers, not that you'd want to
162 given = ["/Folder (9)/file.exe", 43]
163 expected = [43, "/Folder (9)/file.exe"]
164 assert natsorted(given, alg=ns.PATH) == expected
165
166
167 @pytest.mark.parametrize(
168 "alg, expected",
169 [
170 (ns.DEFAULT, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
171 (ns.IGNORECASE, ["Apple", "apple", "Banana", "banana", "corn", "Corn"]),
172 (ns.LOWERCASEFIRST, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
173 (ns.GROUPLETTERS, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
174 (ns.G | ns.LF, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
175 ],
176 )
177 def test_natsorted_supports_case_handling(alg, expected, fruit_list):
178 assert natsorted(fruit_list, alg=alg) == expected
179
180
181 @pytest.mark.parametrize(
182 "alg, expected",
183 [
184 (ns.DEFAULT, [("A5", "a6"), ("a3", "a1")]),
185 (ns.LOWERCASEFIRST, [("a3", "a1"), ("A5", "a6")]),
186 (ns.IGNORECASE, [("a3", "a1"), ("A5", "a6")]),
187 ],
188 )
189 def test_natsorted_supports_nested_case_handling(alg, expected):
190 given = [("A5", "a6"), ("a3", "a1")]
191 assert natsorted(given, alg=alg) == expected
192
193
194 @pytest.mark.parametrize(
195 "alg, expected",
196 [
197 (ns.DEFAULT, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
198 (ns.CAPITALFIRST, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
199 (ns.LOWERCASEFIRST, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
200 (ns.C | ns.LF, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
201 ],
202 )
203 @pytest.mark.usefixtures("with_locale_en_us")
204 def test_natsorted_can_sort_using_locale(fruit_list, alg, expected):
205 assert natsorted(fruit_list, alg=ns.LOCALE | alg) == expected
206
207
208 @pytest.mark.usefixtures("with_locale_en_us")
209 def test_natsorted_can_sort_locale_specific_numbers_en():
210 given = ["c", "a5,467.86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
211 expected = ["a5,6", "a5,50", "a5367.86", "a5,467.86", "ä", "b", "c"]
212 assert natsorted(given, alg=ns.LOCALE | ns.F) == expected
213
214
215 @pytest.mark.usefixtures("with_locale_de_de")
216 def test_natsorted_can_sort_locale_specific_numbers_de():
217 given = ["c", "a5.467,86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
218 expected = ["a5,50", "a5,6", "a5367.86", "a5.467,86", "ä", "b", "c"]
219 assert natsorted(given, alg=ns.LOCALE | ns.F) == expected
220
221
222 @pytest.mark.parametrize(
223 "alg, expected",
224 [
225 (ns.DEFAULT, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
226 (ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
227 (ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
228 (ns.UG | ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
229 # Adding PATH changes nothing.
230 (ns.PATH, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
231 (ns.PATH | ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
232 (ns.PATH | ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
233 (ns.PATH | ns.UG | ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
234 ],
235 )
236 @pytest.mark.usefixtures("with_locale_en_us")
237 def test_natsorted_handles_mixed_types_with_locale(mixed_list, alg, expected):
238 assert natsorted(mixed_list, alg=ns.LOCALE | alg) == expected
239
240
241 @pytest.mark.parametrize(
242 "alg, expected",
243 [
244 (ns.DEFAULT, ["73", "5039", "Banana", "apple", "corn", "~~~~~~"]),
245 (ns.NUMAFTER, ["Banana", "apple", "corn", "~~~~~~", "73", "5039"]),
246 ],
247 )
248 def test_natsorted_sorts_an_odd_collection_of_strings(alg, expected):
249 given = ["apple", "Banana", "73", "5039", "corn", "~~~~~~"]
250 assert natsorted(given, alg=alg) == expected
251
252
253 def test_natsorted_sorts_mixed_ascii_and_non_ascii_numbers():
254 given = [
255 "1st street",
256 "10th street",
257 "2nd street",
258 "2 street",
259 "1 street",
260 "1street",
261 "11 street",
262 "street 2",
263 "street 1",
264 "Street 11",
265 "۲ street",
266 "۱ street",
267 "۱street",
268 "۱۲street",
269 "۱۱ street",
270 "street ۲",
271 "street ۱",
272 "street ۱",
273 "street ۱۲",
274 "street ۱۱",
275 ]
276 expected = [
277 "1 street",
278 "۱ street",
279 "1st street",
280 "1street",
281 "۱street",
282 "2 street",
283 "۲ street",
284 "2nd street",
285 "10th street",
286 "11 street",
287 "۱۱ street",
288 "۱۲street",
289 "street 1",
290 "street ۱",
291 "street ۱",
292 "street 2",
293 "street ۲",
294 "Street 11",
295 "street ۱۱",
296 "street ۱۲",
297 ]
298 assert natsorted(given, alg=ns.IGNORECASE) == expected
+0
-129
test_natsort/test_natsorted_convenience.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 Here are a collection of examples of how this module can be used.
3 See the README or the natsort homepage for more details.
4 """
5 from __future__ import print_function, unicode_literals
6
7 from operator import itemgetter
8
9 import pytest
10 from natsort import (
11 as_ascii,
12 as_utf8,
13 decoder,
14 humansorted,
15 index_humansorted,
16 index_natsorted,
17 index_realsorted,
18 index_versorted,
19 natsorted,
20 ns,
21 order_by_index,
22 realsorted,
23 versorted,
24 )
25 from natsort.compat.py23 import PY_VERSION
26
27
28 @pytest.fixture
29 def version_list():
30 return ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
31
32
33 @pytest.fixture
34 def float_list():
35 return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
36
37
38 @pytest.fixture
39 def fruit_list():
40 return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
41
42
43 def test_decoder_returns_function_that_can_decode_bytes_but_return_non_bytes_as_is():
44 func = decoder("latin1")
45 str_obj = "bytes"
46 int_obj = 14
47 assert func(b"bytes") == str_obj
48 assert func(int_obj) is int_obj # returns as-is, same object ID
49 if PY_VERSION >= 3:
50 assert (
51 func(str_obj) is str_obj
52 ) # same object returned on Python3 b/c only bytes has decode
53 else:
54 assert func(str_obj) is not str_obj
55 assert (
56 func(str_obj) == str_obj
57 ) # not same object on Python2 because str can decode
58
59
60 def test_as_ascii_converts_bytes_to_ascii():
61 assert decoder("ascii")(b"bytes") == as_ascii(b"bytes")
62
63
64 def test_as_utf8_converts_bytes_to_utf8():
65 assert decoder("utf8")(b"bytes") == as_utf8(b"bytes")
66
67
68 def test_versorted_is_identical_to_natsorted(version_list):
69 # versorted is retained for backwards compatibility
70 assert versorted(version_list) == natsorted(version_list)
71
72
73 def test_realsorted_is_identical_to_natsorted_with_real_alg(float_list):
74 assert realsorted(float_list) == natsorted(float_list, alg=ns.REAL)
75
76
77 @pytest.mark.usefixtures("with_locale_en_us")
78 def test_humansorted_is_identical_to_natsorted_with_locale_alg(fruit_list):
79 assert humansorted(fruit_list) == natsorted(fruit_list, alg=ns.LOCALE)
80
81
82 def test_index_natsorted_returns_integer_list_of_sort_order_for_input_list():
83 given = ["num3", "num5", "num2"]
84 other = ["foo", "bar", "baz"]
85 index = index_natsorted(given)
86 assert index == [2, 0, 1]
87 assert [given[i] for i in index] == ["num2", "num3", "num5"]
88 assert [other[i] for i in index] == ["baz", "foo", "bar"]
89
90
91 def test_index_natsorted_reverse():
92 given = ["num3", "num5", "num2"]
93 assert index_natsorted(given, reverse=True) == index_natsorted(given)[::-1]
94
95
96 def test_index_natsorted_applies_key_function_before_sorting():
97 given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
98 expected = [2, 0, 1]
99 assert index_natsorted(given, key=itemgetter(1)) == expected
100
101
102 def test_index_versorted_is_identical_to_index_natsorted(version_list):
103 # index_versorted is retained for backwards compatibility
104 assert index_versorted(version_list) == index_natsorted(version_list)
105
106
107 def test_index_realsorted_is_identical_to_index_natsorted_with_real_alg(float_list):
108 assert index_realsorted(float_list) == index_natsorted(float_list, alg=ns.REAL)
109
110
111 @pytest.mark.usefixtures("with_locale_en_us")
112 def test_index_humansorted_is_identical_to_index_natsorted_with_locale_alg(fruit_list):
113 assert index_humansorted(fruit_list) == index_natsorted(fruit_list, alg=ns.LOCALE)
114
115
116 def test_order_by_index_sorts_list_according_to_order_of_integer_list():
117 given = ["num3", "num5", "num2"]
118 index = [2, 0, 1]
119 expected = [given[i] for i in index]
120 assert expected == ["num2", "num3", "num5"]
121 assert order_by_index(given, index) == expected
122
123
124 def test_order_by_index_returns_generator_with_iter_true():
125 given = ["num3", "num5", "num2"]
126 index = [2, 0, 1]
127 assert order_by_index(given, index, True) != [given[i] for i in index]
128 assert list(order_by_index(given, index, True)) == [given[i] for i in index]
+0
-25
test_natsort/test_parse_bytes_function.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import given
6 from hypothesis.strategies import binary
7 from natsort.ns_enum import ns
8 from natsort.utils import parse_bytes_factory
9
10
11 @pytest.mark.parametrize(
12 "alg, example_func",
13 [
14 (ns.DEFAULT, lambda x: (x,)),
15 (ns.IGNORECASE, lambda x: (x.lower(),)),
16 # With PATH, it becomes a tested tuple.
17 (ns.PATH, lambda x: ((x,),)),
18 (ns.PATH | ns.IGNORECASE, lambda x: ((x.lower(),),)),
19 ],
20 )
21 @given(x=binary())
22 def test_parse_bytest_factory_makes_function_that_returns_tuple(x, alg, example_func):
23 parse_bytes_func = parse_bytes_factory(alg)
24 assert parse_bytes_func(x) == example_func(x)
+0
-38
test_natsort/test_parse_number_function.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import given
6 from hypothesis.strategies import floats, integers
7 from natsort.ns_enum import ns
8 from natsort.utils import parse_number_factory
9
10
11 @pytest.mark.usefixtures("with_locale_en_us")
12 @pytest.mark.parametrize(
13 "alg, example_func",
14 [
15 (ns.DEFAULT, lambda x: ("", x)),
16 (ns.PATH, lambda x: (("", x),)),
17 (ns.UNGROUPLETTERS | ns.LOCALE, lambda x: (("xx",), ("", x))),
18 (ns.PATH | ns.UNGROUPLETTERS | ns.LOCALE, lambda x: ((("xx",), ("", x)),)),
19 ],
20 )
21 @given(x=floats(allow_nan=False) | integers())
22 def test_parse_number_factory_makes_function_that_returns_tuple(x, alg, example_func):
23 parse_number_func = parse_number_factory(alg, "", "xx")
24 assert parse_number_func(x) == example_func(x)
25
26
27 @pytest.mark.parametrize(
28 "alg, x, result",
29 [
30 (ns.DEFAULT, 57, ("", 57)),
31 (ns.DEFAULT, float("nan"), ("", float("-inf"))), # NaN transformed to -infinity
32 (ns.NANLAST, float("nan"), ("", float("+inf"))), # NANLAST makes it +infinity
33 ],
34 )
35 def test_parse_number_factory_treats_nan_special(alg, x, result):
36 parse_number_func = parse_number_factory(alg, "", "xx")
37 assert parse_number_func(x) == result
+0
-93
test_natsort/test_parse_string_function.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import unicodedata
5
6 import pytest
7 from hypothesis import given
8 from hypothesis.strategies import floats, integers, lists, text
9 from natsort.compat.fastnumbers import fast_float
10 from natsort.compat.py23 import py23_str
11 from natsort.ns_enum import ns, ns_DUMB
12 from natsort.utils import NumericalRegularExpressions as NumRegex
13 from natsort.utils import parse_string_factory
14
15
16 class CustomTuple(tuple):
17 """Used to ensure what is given during testing is what is returned."""
18
19 original = None
20
21
22 def input_transform(x):
23 """Make uppercase."""
24 try:
25 return x.upper()
26 except AttributeError:
27 return x
28
29
30 def final_transform(x, original):
31 """Make the input a CustomTuple."""
32 t = CustomTuple(x)
33 t.original = original
34 return t
35
36
37 @pytest.fixture
38 def parse_string_func(request):
39 """A parse_string_factory result with sample arguments."""
40 sep = ""
41 return parse_string_factory(
42 request.param, # algorirhm
43 sep,
44 NumRegex.int_nosign().split,
45 input_transform,
46 fast_float,
47 final_transform,
48 )
49
50
51 @pytest.mark.parametrize("parse_string_func", [ns.DEFAULT], indirect=True)
52 @given(x=floats() | integers())
53 def test_parse_string_factory_raises_type_error_if_given_number(x, parse_string_func):
54 with pytest.raises(TypeError):
55 assert parse_string_func(x)
56
57
58 # noinspection PyCallingNonCallable
59 @pytest.mark.parametrize(
60 "parse_string_func, orig_func",
61 [
62 (ns.DEFAULT, lambda x: x.upper()),
63 (ns.LOCALE, lambda x: x.upper()),
64 (ns.LOCALE | ns_DUMB, lambda x: x), # This changes the "original" handling.
65 ],
66 indirect=["parse_string_func"],
67 )
68 @given(
69 x=lists(
70 elements=floats(allow_nan=False) | text() | integers(), min_size=1, max_size=10
71 )
72 )
73 @pytest.mark.usefixtures("with_locale_en_us")
74 def test_parse_string_factory_invariance(x, parse_string_func, orig_func):
75 # parse_string_factory is the high-level combination of several dedicated
76 # functions involved in splitting and manipulating a string. The details of
77 # what those functions do is not relevant to testing parse_string_factory.
78 # What is relevant is that the form of the output matches the invariant
79 # that even elements are string and odd are numerical. That each component
80 # function is doing what it should is tested elsewhere.
81 value = "".join(map(py23_str, x)) # Convert the input to a single string.
82 result = parse_string_func(value)
83 result_types = list(map(type, result))
84 expected_types = [py23_str if i % 2 == 0 else float for i in range(len(result))]
85 assert result_types == expected_types
86
87 # The result is in our CustomTuple.
88 assert isinstance(result, CustomTuple)
89
90 # Original should have gone through the "input_transform"
91 # which is uppercase in these tests.
92 assert result.original == orig_func(unicodedata.normalize("NFD", value))
+0
-100
test_natsort/test_regex.py less more
0 # -*- coding: utf-8 -*-
1 """These test the splitting regular expressions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from natsort.utils import NumericalRegularExpressions as NumRegex
6
7
8 regex_names = {
9 NumRegex.int_nosign(): "int_nosign",
10 NumRegex.int_sign(): "int_sign",
11 NumRegex.float_nosign_noexp(): "float_nosign_noexp",
12 NumRegex.float_sign_noexp(): "float_sign_noexp",
13 NumRegex.float_nosign_exp(): "float_nosign_exp",
14 NumRegex.float_sign_exp(): "float_sign_exp",
15 }
16
17 # Regex Aliases (so lines stay a reasonable length.
18 i_u = NumRegex.int_nosign()
19 i_s = NumRegex.int_sign()
20 f_u = NumRegex.float_nosign_noexp()
21 f_s = NumRegex.float_sign_noexp()
22 f_ue = NumRegex.float_nosign_exp()
23 f_se = NumRegex.float_sign_exp()
24
25 # Assemble a test suite of regular strings and their regular expression
26 # splitting result. Organize by the input string.
27 regex_tests = {
28 "-123.45e+67": {
29 i_u: ["-", "123", ".", "45", "e+", "67", ""],
30 i_s: ["", "-123", ".", "45", "e", "+67", ""],
31 f_u: ["-", "123.45", "e+", "67", ""],
32 f_s: ["", "-123.45", "e", "+67", ""],
33 f_ue: ["-", "123.45e+67", ""],
34 f_se: ["", "-123.45e+67", ""],
35 },
36 "a-123.45e+67b": {
37 i_u: ["a-", "123", ".", "45", "e+", "67", "b"],
38 i_s: ["a", "-123", ".", "45", "e", "+67", "b"],
39 f_u: ["a-", "123.45", "e+", "67", "b"],
40 f_s: ["a", "-123.45", "e", "+67", "b"],
41 f_ue: ["a-", "123.45e+67", "b"],
42 f_se: ["a", "-123.45e+67", "b"],
43 },
44 "hello": {
45 i_u: ["hello"],
46 i_s: ["hello"],
47 f_u: ["hello"],
48 f_s: ["hello"],
49 f_ue: ["hello"],
50 f_se: ["hello"],
51 },
52 "abc12.34.56-7def": {
53 i_u: ["abc", "12", ".", "34", ".", "56", "-", "7", "def"],
54 i_s: ["abc", "12", ".", "34", ".", "56", "", "-7", "def"],
55 f_u: ["abc", "12.34", "", ".56", "-", "7", "def"],
56 f_s: ["abc", "12.34", "", ".56", "", "-7", "def"],
57 f_ue: ["abc", "12.34", "", ".56", "-", "7", "def"],
58 f_se: ["abc", "12.34", "", ".56", "", "-7", "def"],
59 },
60 "a1b2c3d4e5e6": {
61 i_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
62 i_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
63 f_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
64 f_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
65 f_ue: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
66 f_se: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
67 },
68 "eleven۱۱eleven11eleven১১": { # All of these are the decimal 11
69 i_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
70 i_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
71 f_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
72 f_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
73 f_ue: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
74 f_se: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
75 },
76 "12①②ⅠⅡ⅓": { # Two decimals, Two digits, Two numerals, fraction
77 i_u: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
78 i_s: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
79 f_u: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
80 f_s: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
81 f_ue: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
82 f_se: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
83 }
84 }
85
86
87 # From the above collections, create the parametrized tests and labels.
88 regex_params = [
89 (given, expected, regex)
90 for given, values in regex_tests.items()
91 for regex, expected in values.items()
92 ]
93 labels = ["{}-{}".format(given, regex_names[regex]) for given, _, regex in regex_params]
94
95
96 @pytest.mark.parametrize("x, expected, regex", regex_params, ids=labels)
97 def test_regex_splits_correctly(x, expected, regex):
98 # noinspection PyUnresolvedReferences
99 assert regex.split(x) == expected
+0
-78
test_natsort/test_string_component_transform_factory.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 from functools import partial
5
6 import pytest
7 from hypothesis import example, given
8 from hypothesis.strategies import floats, integers, text
9 from natsort.compat.fastnumbers import fast_float, fast_int
10 from natsort.compat.locale import get_strxfrm
11 from natsort.compat.py23 import py23_range, py23_str, py23_unichr
12 from natsort.ns_enum import ns, ns_DUMB
13 from natsort.utils import groupletters, string_component_transform_factory
14
15 # There are some unicode values that are known failures with the builtin locale
16 # library on BSD systems that has nothing to do with natsort (a ValueError is
17 # raised by strxfrm). Let's filter them out.
18 try:
19 bad_uni_chars = frozenset(
20 py23_unichr(x) for x in py23_range(0X10fefd, 0X10ffff + 1)
21 )
22 except ValueError:
23 # Narrow unicode build... no worries.
24 bad_uni_chars = frozenset()
25
26
27 def no_bad_uni_chars(x, _bad_chars=bad_uni_chars):
28 """Ensure text does not contain bad unicode characters"""
29 return not any(y in _bad_chars for y in x)
30
31
32 def no_null(x):
33 """Ensure text does not contain a null character."""
34 return "\0" not in x
35
36
37 @pytest.mark.parametrize(
38 "alg, example_func",
39 [
40 (ns.INT, fast_int),
41 (ns.DEFAULT, fast_int),
42 (ns.FLOAT, partial(fast_float, nan=float("-inf"))),
43 (ns.FLOAT | ns.NANLAST, partial(fast_float, nan=float("+inf"))),
44 (ns.GROUPLETTERS, partial(fast_int, key=groupletters)),
45 (ns.LOCALE, partial(fast_int, key=lambda x: get_strxfrm()(x))),
46 (
47 ns.GROUPLETTERS | ns.LOCALE,
48 partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
49 ),
50 (
51 ns_DUMB | ns.LOCALE,
52 partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
53 ),
54 (
55 ns.GROUPLETTERS | ns.LOCALE | ns.FLOAT | ns.NANLAST,
56 partial(
57 fast_float,
58 key=lambda x: get_strxfrm()(groupletters(x)),
59 nan=float("+inf"),
60 ),
61 ),
62 ],
63 )
64 @example(x=float("nan"))
65 @given(
66 x=integers()
67 | floats()
68 | text().filter(bool).filter(no_bad_uni_chars).filter(no_null)
69 )
70 @pytest.mark.usefixtures("with_locale_en_us")
71 def test_string_component_transform_factory(x, alg, example_func):
72 string_component_transform_func = string_component_transform_factory(alg)
73 try:
74 assert string_component_transform_func(py23_str(x)) == example_func(py23_str(x))
75 except ValueError as e: # handle broken locale lib on BSD.
76 if "is not in range" not in str(e):
77 raise
+0
-70
test_natsort/test_unicode_numbers.py less more
0 # -*- coding: utf-8 -*-
1 """\
2 Test the Unicode numbers module.
3 """
4 from __future__ import unicode_literals
5
6 import unicodedata
7
8 from natsort.compat.py23 import py23_range, py23_unichr
9 from natsort.unicode_numbers import (
10 decimal_chars,
11 decimals,
12 digit_chars,
13 digits,
14 digits_no_decimals,
15 numeric,
16 numeric_chars,
17 numeric_hex,
18 numeric_no_decimals,
19 )
20
21
22 def test_numeric_chars_contains_only_valid_unicode_numeric_characters():
23 for a in numeric_chars:
24 assert unicodedata.numeric(a, None) is not None
25
26
27 def test_digit_chars_contains_only_valid_unicode_digit_characters():
28 for a in digit_chars:
29 assert unicodedata.digit(a, None) is not None
30
31
32 def test_decimal_chars_contains_only_valid_unicode_decimal_characters():
33 for a in decimal_chars:
34 assert unicodedata.decimal(a, None) is not None
35
36
37 def test_numeric_chars_contains_all_valid_unicode_numeric_and_digit_characters():
38 set_numeric_hex = set(numeric_hex)
39 set_numeric_chars = set(numeric_chars)
40 set_digit_chars = set(digit_chars)
41 set_decimal_chars = set(decimal_chars)
42 for i in py23_range(0X110000):
43 try:
44 a = py23_unichr(i)
45 except ValueError:
46 break
47 if a in set("0123456789"):
48 continue
49 if unicodedata.numeric(a, None) is not None:
50 assert i in set_numeric_hex
51 assert a in set_numeric_chars
52 if unicodedata.digit(a, None) is not None:
53 assert i in set_numeric_hex
54 assert a in set_digit_chars
55 if unicodedata.decimal(a, None) is not None:
56 assert i in set_numeric_hex
57 assert a in set_decimal_chars
58
59 assert set_decimal_chars.isdisjoint(digits_no_decimals)
60 assert set_digit_chars.issuperset(digits_no_decimals)
61
62 assert set_decimal_chars.isdisjoint(numeric_no_decimals)
63 assert set_numeric_chars.issuperset(numeric_no_decimals)
64
65
66 def test_combined_string_contains_all_characters_in_list():
67 assert numeric == "".join(numeric_chars)
68 assert digits == "".join(digit_chars)
69 assert decimals == "".join(decimal_chars)
+0
-197
test_natsort/test_utils.py less more
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pathlib
5 import string
6 from itertools import chain
7 from operator import neg as op_neg
8
9 import pytest
10 from hypothesis import given
11 from hypothesis.strategies import integers, lists, sampled_from, text
12 from natsort import utils
13 from natsort.compat.py23 import py23_cmp, py23_int, py23_lower, py23_str
14 from natsort.ns_enum import ns
15
16
17 def test_do_decoding_decodes_bytes_string_to_unicode():
18 assert type(utils.do_decoding(b"bytes", "ascii")) is py23_str
19 assert utils.do_decoding(b"bytes", "ascii") == "bytes"
20 assert utils.do_decoding(b"bytes", "ascii") == b"bytes".decode("ascii")
21
22
23 def test_args_to_enum_raises_typeerror_for_invalid_argument():
24 with pytest.raises(TypeError):
25 utils.args_to_enum(**{"alf": 0})
26
27
28 @pytest.mark.parametrize(
29 "kwargs, expected",
30 [
31 ({"number_type": float, "signed": True, "exp": True}, ns.F | ns.S),
32 ({"number_type": float, "signed": True, "exp": False}, ns.F | ns.N | ns.S),
33 ({"number_type": float, "signed": False, "exp": True}, ns.F | ns.U),
34 ({"number_type": float, "signed": False, "exp": True}, ns.F),
35 ({"number_type": float, "signed": False, "exp": False}, ns.F | ns.U | ns.N),
36 ({"number_type": float, "as_path": True}, ns.F | ns.P),
37 ({"number_type": int, "as_path": True}, ns.I | ns.P),
38 ({"number_type": int, "signed": False}, ns.I | ns.U),
39 ({"number_type": None, "exp": True}, ns.I | ns.U),
40 ],
41 )
42 def test_args_to_enum(kwargs, expected):
43 with pytest.warns(DeprecationWarning):
44 assert utils.args_to_enum(**kwargs) == expected
45
46
47 @pytest.mark.parametrize(
48 "alg, expected",
49 [
50 (ns.I, utils.NumericalRegularExpressions.int_nosign()),
51 (ns.I | ns.N, utils.NumericalRegularExpressions.int_nosign()),
52 (ns.I | ns.S, utils.NumericalRegularExpressions.int_sign()),
53 (ns.I | ns.S | ns.N, utils.NumericalRegularExpressions.int_sign()),
54 (ns.F, utils.NumericalRegularExpressions.float_nosign_exp()),
55 (ns.F | ns.N, utils.NumericalRegularExpressions.float_nosign_noexp()),
56 (ns.F | ns.S, utils.NumericalRegularExpressions.float_sign_exp()),
57 (ns.F | ns.S | ns.N, utils.NumericalRegularExpressions.float_sign_noexp()),
58 ],
59 )
60 def test_regex_chooser_returns_correct_regular_expression_object(alg, expected):
61 assert utils.regex_chooser(alg).pattern == expected.pattern
62
63
64 @pytest.mark.parametrize(
65 "alg, value_or_alias",
66 [
67 # Defaults
68 (ns.DEFAULT, 0),
69 (ns.TYPESAFE, 0),
70 (ns.INT, 0),
71 (ns.VERSION, 0),
72 (ns.DIGIT, 0),
73 (ns.UNSIGNED, 0),
74 # Aliases
75 (ns.TYPESAFE, ns.T),
76 (ns.INT, ns.I),
77 (ns.VERSION, ns.V),
78 (ns.DIGIT, ns.D),
79 (ns.UNSIGNED, ns.U),
80 (ns.FLOAT, ns.F),
81 (ns.SIGNED, ns.S),
82 (ns.NOEXP, ns.N),
83 (ns.PATH, ns.P),
84 (ns.LOCALEALPHA, ns.LA),
85 (ns.LOCALENUM, ns.LN),
86 (ns.LOCALE, ns.L),
87 (ns.IGNORECASE, ns.IC),
88 (ns.LOWERCASEFIRST, ns.LF),
89 (ns.GROUPLETTERS, ns.G),
90 (ns.UNGROUPLETTERS, ns.UG),
91 (ns.CAPITALFIRST, ns.C),
92 (ns.UNGROUPLETTERS, ns.CAPITALFIRST),
93 (ns.NANLAST, ns.NL),
94 (ns.COMPATIBILITYNORMALIZE, ns.CN),
95 (ns.NUMAFTER, ns.NA),
96 # Convenience
97 (ns.LOCALE, ns.LOCALEALPHA | ns.LOCALENUM),
98 (ns.REAL, ns.FLOAT | ns.SIGNED),
99 ],
100 )
101 def test_ns_enum_values_and_aliases(alg, value_or_alias):
102 assert alg == value_or_alias
103
104
105 def test_chain_functions_is_a_no_op_if_no_functions_are_given():
106 x = 2345
107 assert utils.chain_functions([])(x) is x
108
109
110 def test_chain_functions_does_one_function_if_one_function_is_given():
111 x = "2345"
112 assert utils.chain_functions([len])(x) == 4
113
114
115 def test_chain_functions_combines_functions_in_given_order():
116 x = 2345
117 assert utils.chain_functions([str, len, op_neg])(x) == -len(str(x))
118
119
120 # Each test has an "example" version for demonstrative purposes,
121 # and a test that uses the hypothesis module.
122
123
124 def test_groupletters_returns_letters_with_lowercase_transform_of_letter_example():
125 assert utils.groupletters("HELLO") == "hHeElLlLoO"
126 assert utils.groupletters("hello") == "hheelllloo"
127
128
129 @given(text().filter(bool))
130 def test_groupletters_returns_letters_with_lowercase_transform_of_letter(x):
131 assert utils.groupletters(x) == "".join(
132 chain.from_iterable([py23_lower(y), y] for y in x)
133 )
134
135
136 def test_sep_inserter_does_nothing_if_no_numbers_example():
137 assert list(utils.sep_inserter(iter(["a", "b", "c"]), "")) == ["a", "b", "c"]
138 assert list(utils.sep_inserter(iter(["a"]), "")) == ["a"]
139
140
141 def test_sep_inserter_does_nothing_if_only_one_number_example():
142 assert list(utils.sep_inserter(iter(["a", 5]), "")) == ["a", 5]
143
144
145 def test_sep_inserter_inserts_separator_string_between_two_numbers_example():
146 assert list(utils.sep_inserter(iter([5, 9]), "")) == ["", 5, "", 9]
147
148
149 @given(lists(elements=text().filter(bool) | integers(), min_size=3))
150 def test_sep_inserter_inserts_separator_between_two_numbers(x):
151 # Rather than just replicating the the results in a different
152 # algorithm, validate that the "shape" of the output is as expected.
153 result = list(utils.sep_inserter(iter(x), ""))
154 for i, pos in enumerate(result[1:-1], 1):
155 if pos == "":
156 assert isinstance(result[i - 1], py23_int)
157 assert isinstance(result[i + 1], py23_int)
158
159
160 def test_path_splitter_splits_path_string_by_separator_example():
161 z = "/this/is/a/path"
162 assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
163 z = pathlib.Path("/this/is/a/path")
164 assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
165
166
167 @given(lists(sampled_from(string.ascii_letters), min_size=2).filter(all))
168 def test_path_splitter_splits_path_string_by_separator(x):
169 z = py23_str(pathlib.Path(*x))
170 assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
171
172
173 def test_path_splitter_splits_path_string_by_separator_and_removes_extension_example():
174 z = "/this/is/a/path/file.exe"
175 y = tuple(pathlib.Path(z).parts)
176 assert tuple(utils.path_splitter(z)) == y[:-1] + (
177 pathlib.Path(z).stem,
178 pathlib.Path(z).suffix,
179 )
180
181
182 @given(lists(sampled_from(string.ascii_letters), min_size=3).filter(all))
183 def test_path_splitter_splits_path_string_by_separator_and_removes_extension(x):
184 z = py23_str(pathlib.Path(*x[:-2])) + "." + x[-1]
185 y = tuple(pathlib.Path(z).parts)
186 assert tuple(utils.path_splitter(z)) == y[:-1] + (
187 pathlib.Path(z).stem,
188 pathlib.Path(z).suffix,
189 )
190
191
192 @given(integers())
193 def test_py23_cmp(x):
194 assert py23_cmp(x, x) == 0
195 assert py23_cmp(x, x + 1) < 0
196 assert py23_cmp(x, x - 1) > 0
0 """
1 Fixtures for pytest.
2 """
3
4 import locale
5
6 import pytest
7
8
9 def load_locale(x):
10 """Convenience to load a locale, trying ISO8859-1 first."""
11 try:
12 locale.setlocale(locale.LC_ALL, str("{}.ISO8859-1".format(x)))
13 except locale.Error:
14 locale.setlocale(locale.LC_ALL, str("{}.UTF-8".format(x)))
15
16
17 @pytest.fixture()
18 def with_locale_en_us():
19 """Convenience to load the en_US locale - reset when complete."""
20 orig = locale.getlocale()
21 yield load_locale("en_US")
22 locale.setlocale(locale.LC_ALL, orig)
23
24
25 @pytest.fixture()
26 def with_locale_de_de():
27 """
28 Convenience to load the de_DE locale - reset when complete - skip if missing.
29 """
30 orig = locale.getlocale()
31 try:
32 load_locale("de_DE")
33 except locale.Error:
34 pytest.skip("requires de_DE locale to be installed")
35 else:
36 yield
37 finally:
38 locale.setlocale(locale.LC_ALL, orig)
0 # -*- coding: utf-8 -*-
1 """\
2 This file contains functions to profile natsorted with different
3 inputs and different settings.
4 """
5 from __future__ import print_function
6
7 import cProfile
8 import locale
9 import sys
10
11 try:
12 from natsort import ns, natsort_keygen
13 from natsort.compat.py23 import py23_range
14 except ImportError:
15 sys.path.insert(0, ".")
16 from natsort import ns, natsort_keygen
17 from natsort.compat.py23 import py23_range
18
19 locale.setlocale(locale.LC_ALL, "en_US.UTF-8")
20
21 # Samples to parse
22 number = 14695498
23 int_string = "43493"
24 float_string = "-434.93e7"
25 plain_string = "hello world"
26 fancy_string = "7abba9342fdab"
27 a_path = "/p/Folder (1)/file (1).tar.gz"
28 some_bytes = b"these are bytes"
29 a_list = ["hello", "goodbye", "74"]
30
31 basic_key = natsort_keygen()
32 real_key = natsort_keygen(alg=ns.REAL)
33 path_key = natsort_keygen(alg=ns.PATH)
34 locale_key = natsort_keygen(alg=ns.LOCALE)
35
36
37 def prof_time_to_generate():
38 print("*** Generate Plain Key ***")
39 for _ in py23_range(100000):
40 natsort_keygen()
41
42
43 cProfile.run("prof_time_to_generate()", sort="time")
44
45
46 def prof_parsing(a, msg, key=basic_key):
47 print(msg)
48 for _ in py23_range(100000):
49 key(a)
50
51
52 cProfile.run(
53 'prof_parsing(int_string, "*** Basic Call, Int as String ***")', sort="time"
54 )
55 cProfile.run(
56 'prof_parsing(float_string, "*** Basic Call, Float as String ***")', sort="time"
57 )
58 cProfile.run('prof_parsing(float_string, "*** Real Call ***", real_key)', sort="time")
59 cProfile.run('prof_parsing(number, "*** Basic Call, Number ***")', sort="time")
60 cProfile.run(
61 'prof_parsing(fancy_string, "*** Basic Call, Mixed String ***")', sort="time"
62 )
63 cProfile.run('prof_parsing(some_bytes, "*** Basic Call, Byte String ***")', sort="time")
64 cProfile.run('prof_parsing(a_path, "*** Path Call ***", path_key)', sort="time")
65 cProfile.run('prof_parsing(a_list, "*** Basic Call, Recursive ***")', sort="time")
66 cProfile.run(
67 'prof_parsing("434,930,000 dollars", "*** Locale Call ***", locale_key)',
68 sort="time",
69 )
0 # -*- coding: utf-8 -*-
1 """\
2 Test the fake fastnumbers module.
3 """
4 from __future__ import unicode_literals
5
6 import unicodedata
7 from math import isnan
8
9 from hypothesis import given
10 from hypothesis.strategies import floats, integers, text
11 from natsort.compat.fake_fastnumbers import fast_float, fast_int
12 from natsort.compat.py23 import PY_VERSION
13
14 if PY_VERSION >= 3:
15 long = int
16
17
18 def is_float(x):
19 try:
20 float(x)
21 except ValueError:
22 try:
23 unicodedata.numeric(x)
24 except (ValueError, TypeError):
25 return False
26 else:
27 return True
28 else:
29 return True
30
31
32 def not_a_float(x):
33 return not is_float(x)
34
35
36 def is_int(x):
37 try:
38 return x.is_integer()
39 except AttributeError:
40 try:
41 long(x)
42 except ValueError:
43 try:
44 unicodedata.digit(x)
45 except (ValueError, TypeError):
46 return False
47 else:
48 return True
49 else:
50 return True
51
52
53 def not_an_int(x):
54 return not is_int(x)
55
56
57 # Each test has an "example" version for demonstrative purposes,
58 # and a test that uses the hypothesis module.
59
60
61 def test_fast_float_returns_nan_alternate_if_nan_option_is_given():
62 assert fast_float("nan", nan=7) == 7
63
64
65 def test_fast_float_converts_float_string_to_float_example():
66 assert fast_float("45.8") == 45.8
67 assert fast_float("-45") == -45.0
68 assert fast_float("45.8e-2", key=len) == 45.8e-2
69 assert isnan(fast_float("nan"))
70 assert isnan(fast_float("+nan"))
71 assert isnan(fast_float("-NaN"))
72 assert fast_float("۱۲.۱۲") == 12.12
73 assert fast_float("-۱۲.۱۲") == -12.12
74
75
76 @given(floats(allow_nan=False))
77 def test_fast_float_converts_float_string_to_float(x):
78 assert fast_float(repr(x)) == x
79
80
81 def test_fast_float_leaves_string_as_is_example():
82 assert fast_float("invalid") == "invalid"
83
84
85 @given(text().filter(not_a_float).filter(bool))
86 def test_fast_float_leaves_string_as_is(x):
87 assert fast_float(x) == x
88
89
90 def test_fast_float_with_key_applies_to_string_example():
91 assert fast_float("invalid", key=len) == len("invalid")
92
93
94 @given(text().filter(not_a_float).filter(bool))
95 def test_fast_float_with_key_applies_to_string(x):
96 assert fast_float(x, key=len) == len(x)
97
98
99 def test_fast_int_leaves_float_string_as_is_example():
100 assert fast_int("45.8") == "45.8"
101 assert fast_int("nan") == "nan"
102 assert fast_int("inf") == "inf"
103
104
105 @given(floats().filter(not_an_int))
106 def test_fast_int_leaves_float_string_as_is(x):
107 assert fast_int(repr(x)) == repr(x)
108
109
110 def test_fast_int_converts_int_string_to_int_example():
111 assert fast_int("-45") == -45
112 assert fast_int("+45") == 45
113 assert fast_int("۱۲") == 12
114 assert fast_int("-۱۲") == -12
115
116
117 @given(integers())
118 def test_fast_int_converts_int_string_to_int(x):
119 assert fast_int(repr(x)) == x
120
121
122 def test_fast_int_leaves_string_as_is_example():
123 assert fast_int("invalid") == "invalid"
124
125
126 @given(text().filter(not_an_int).filter(bool))
127 def test_fast_int_leaves_string_as_is(x):
128 assert fast_int(x) == x
129
130
131 def test_fast_int_with_key_applies_to_string_example():
132 assert fast_int("invalid", key=len) == len("invalid")
133
134
135 @given(text().filter(not_an_int).filter(bool))
136 def test_fast_int_with_key_applies_to_string(x):
137 assert fast_int(x, key=len) == len(x)
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import example, given
6 from hypothesis.strategies import floats, integers, text
7 from natsort.compat.py23 import py23_str
8 from natsort.ns_enum import NS_DUMB, ns
9 from natsort.utils import final_data_transform_factory
10
11
12 @pytest.mark.parametrize("alg", [ns.DEFAULT, ns.UNGROUPLETTERS, ns.LOCALE])
13 @given(x=text(), y=floats(allow_nan=False, allow_infinity=False) | integers())
14 @pytest.mark.usefixtures("with_locale_en_us")
15 def test_final_data_transform_factory_default(x, y, alg):
16 final_data_transform_func = final_data_transform_factory(alg, "", "::")
17 value = (x, y)
18 original_value = "".join(map(py23_str, value))
19 result = final_data_transform_func(value, original_value)
20 assert result == value
21
22
23 @pytest.mark.parametrize(
24 "alg, func",
25 [
26 (ns.UNGROUPLETTERS | ns.LOCALE, lambda x: x),
27 (ns.LOCALE | ns.UNGROUPLETTERS | NS_DUMB, lambda x: x),
28 (ns.LOCALE | ns.UNGROUPLETTERS | ns.LOWERCASEFIRST, lambda x: x),
29 (
30 ns.LOCALE | ns.UNGROUPLETTERS | NS_DUMB | ns.LOWERCASEFIRST,
31 lambda x: x.swapcase(),
32 ),
33 ],
34 )
35 @given(x=text(), y=floats(allow_nan=False, allow_infinity=False) | integers())
36 @example(x="İ", y=0)
37 @pytest.mark.usefixtures("with_locale_en_us")
38 def test_final_data_transform_factory_ungroup_and_locale(x, y, alg, func):
39 final_data_transform_func = final_data_transform_factory(alg, "", "::")
40 value = (x, y)
41 original_value = "".join(map(py23_str, value))
42 result = final_data_transform_func(value, original_value)
43 if x:
44 expected = ((func(original_value[:1]),), value)
45 else:
46 expected = (("::",), value)
47 assert result == expected
48
49
50 def test_final_data_transform_factory_ungroup_and_locale_empty_tuple():
51 final_data_transform_func = final_data_transform_factory(ns.UG | ns.L, "", "::")
52 assert final_data_transform_func((), "") == ((), ())
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import example, given
6 from hypothesis.strategies import integers, text
7 from natsort.compat.py23 import NEWPY
8 from natsort.ns_enum import NS_DUMB, ns
9 from natsort.utils import input_string_transform_factory
10
11
12 def lower(x):
13 """Call the appropriate lower method for the Python version."""
14 if NEWPY:
15 return x.casefold()
16 else:
17 return x.lower()
18
19
20 def thousands_separated_int(n):
21 """Insert thousands separators in an int."""
22 new_int = ""
23 for i, y in enumerate(reversed(n), 1):
24 new_int = y + new_int
25 # For every third digit, insert a thousands separator.
26 if i % 3 == 0 and i != len(n):
27 new_int = "," + new_int
28 return new_int
29
30
31 @given(text())
32 def test_input_string_transform_factory_is_no_op_for_no_alg_options(x):
33 input_string_transform_func = input_string_transform_factory(ns.DEFAULT)
34 assert input_string_transform_func(x) is x
35
36
37 @pytest.mark.parametrize(
38 "alg, example_func",
39 [
40 (ns.IGNORECASE, lower),
41 (NS_DUMB, lambda x: x.swapcase()),
42 (ns.LOWERCASEFIRST, lambda x: x.swapcase()),
43 (NS_DUMB | ns.LOWERCASEFIRST, lambda x: x), # No-op
44 (ns.IGNORECASE | ns.LOWERCASEFIRST, lambda x: lower(x.swapcase())),
45 ],
46 )
47 @given(x=text())
48 def test_input_string_transform_factory(x, alg, example_func):
49 input_string_transform_func = input_string_transform_factory(alg)
50 assert input_string_transform_func(x) == example_func(x)
51
52
53 @example(12543642642534980) # 12,543,642,642,534,980 => 12543642642534980
54 @given(x=integers(min_value=1000))
55 @pytest.mark.usefixtures("with_locale_en_us")
56 def test_input_string_transform_factory_cleans_thousands(x):
57 int_str = str(x).rstrip("lL")
58 thousands_int_str = thousands_separated_int(int_str)
59 assert thousands_int_str.replace(",", "") != thousands_int_str
60
61 input_string_transform_func = input_string_transform_factory(ns.LOCALE)
62 assert input_string_transform_func(thousands_int_str) == int_str
63
64 # Using LOCALEALPHA does not affect numbers.
65 input_string_transform_func_no_op = input_string_transform_factory(ns.LOCALEALPHA)
66 assert input_string_transform_func_no_op(thousands_int_str) == thousands_int_str
67
68
69 # These might be too much to test with hypothesis.
70
71
72 @pytest.mark.parametrize(
73 "x, expected",
74 [
75 ("12,543,642642.5345,34980", "12543,642642.5345,34980"),
76 ("12,59443,642,642.53,4534980", "12,59443,642642.53,4534980"), # No change
77 ("12543,642,642.5,34534980", "12543,642642.5,34534980"),
78 ],
79 )
80 @pytest.mark.usefixtures("with_locale_en_us")
81 def test_input_string_transform_factory_handles_us_locale(x, expected):
82 input_string_transform_func = input_string_transform_factory(ns.LOCALE)
83 assert input_string_transform_func(x) == expected
84
85
86 @pytest.mark.parametrize(
87 "alg, expected",
88 [
89 (ns.LOCALE, "1543,753"), # Does nothing without FLOAT
90 (ns.LOCALE | ns.FLOAT, "1543.753"),
91 (ns.LOCALEALPHA, "1543,753"), # LOCALEALPHA won't do anything, need LOCALENUM
92 ],
93 )
94 @pytest.mark.usefixtures("with_locale_de_de")
95 def test_input_string_transform_factory_handles_german_locale(alg, expected):
96 input_string_transform_func = input_string_transform_factory(alg)
97 assert input_string_transform_func("1543,753") == expected
98
99
100 @pytest.mark.usefixtures("with_locale_de_de")
101 def test_input_string_transform_factory_does_nothing_with_non_num_input():
102 input_string_transform_func = input_string_transform_factory(ns.LOCALE | ns.FLOAT)
103 expected = "154s,t53"
104 assert input_string_transform_func("154s,t53") == expected
0 # -*- coding: utf-8 -*-
1 """\
2 Test the natsort command-line tool functions.
3 """
4 from __future__ import print_function, unicode_literals
5
6 import re
7 import sys
8
9 import pytest
10 from hypothesis import given
11 from hypothesis.strategies import data, floats, integers, lists
12 from natsort.__main__ import (
13 check_filters,
14 keep_entry_range,
15 keep_entry_value,
16 main,
17 range_check,
18 sort_and_print_entries,
19 )
20
21
22 def test_main_passes_default_arguments_with_no_command_line_options(mocker):
23 p = mocker.patch("natsort.__main__.sort_and_print_entries")
24 main("num-2", "num-6", "num-1")
25 args = p.call_args[0][1]
26 assert not args.paths
27 assert args.filter is None
28 assert args.reverse_filter is None
29 assert args.exclude is None
30 assert not args.reverse
31 assert args.number_type == "int"
32 assert not args.signed
33 assert args.exp
34 assert not args.locale
35
36
37 def test_main_passes_arguments_with_all_command_line_options(mocker):
38 arguments = ["--paths", "--reverse", "--locale"]
39 arguments.extend(["--filter", "4", "10"])
40 arguments.extend(["--reverse-filter", "100", "110"])
41 arguments.extend(["--number-type", "float"])
42 arguments.extend(["--noexp", "--sign"])
43 arguments.extend(["--exclude", "34"])
44 arguments.extend(["--exclude", "35"])
45 arguments.extend(["num-2", "num-6", "num-1"])
46 p = mocker.patch("natsort.__main__.sort_and_print_entries")
47 main(*arguments)
48 args = p.call_args[0][1]
49 assert args.paths
50 assert args.filter == [(4.0, 10.0)]
51 assert args.reverse_filter == [(100.0, 110.0)]
52 assert args.exclude == [34, 35]
53 assert args.reverse
54 assert args.number_type == "float"
55 assert args.signed
56 assert not args.exp
57 assert args.locale
58
59
60 class Args:
61 """A dummy class to simulate the argparse Namespace object"""
62
63 def __init__(self, filt, reverse_filter, exclude, as_path, reverse):
64 self.filter = filt
65 self.reverse_filter = reverse_filter
66 self.exclude = exclude
67 self.reverse = reverse
68 self.number_type = "float"
69 self.signed = True
70 self.exp = True
71 self.paths = as_path
72 self.locale = 0
73
74
75 mock_print = "__builtin__.print" if sys.version[0] == "2" else "builtins.print"
76
77 entries = [
78 "tmp/a57/path2",
79 "tmp/a23/path1",
80 "tmp/a1/path1",
81 "tmp/a1 (1)/path1",
82 "tmp/a130/path1",
83 "tmp/a64/path1",
84 "tmp/a64/path2",
85 ]
86
87
88 @pytest.mark.parametrize(
89 "options, order",
90 [
91 # Defaults, all options false
92 # tmp/a1 (1)/path1
93 # tmp/a1/path1
94 # tmp/a23/path1
95 # tmp/a57/path2
96 # tmp/a64/path1
97 # tmp/a64/path2
98 # tmp/a130/path1
99 ([None, None, False, False, False], [3, 2, 1, 0, 5, 6, 4]),
100 # Path option True
101 # tmp/a1/path1
102 # tmp/a1 (1)/path1
103 # tmp/a23/path1
104 # tmp/a57/path2
105 # tmp/a64/path1
106 # tmp/a64/path2
107 # tmp/a130/path1
108 ([None, None, False, True, False], [2, 3, 1, 0, 5, 6, 4]),
109 # Filter option keeps only within range
110 # tmp/a23/path1
111 # tmp/a57/path2
112 # tmp/a64/path1
113 # tmp/a64/path2
114 ([[(20, 100)], None, False, False, False], [1, 0, 5, 6]),
115 # Reverse filter, exclude in range
116 # tmp/a1/path1
117 # tmp/a1 (1)/path1
118 # tmp/a130/path1
119 ([None, [(20, 100)], False, True, False], [2, 3, 4]),
120 # Exclude given values with exclude list
121 # tmp/a1/path1
122 # tmp/a1 (1)/path1
123 # tmp/a57/path2
124 # tmp/a64/path1
125 # tmp/a64/path2
126 ([None, None, [23, 130], True, False], [2, 3, 0, 5, 6]),
127 # Reverse order
128 # tmp/a130/path1
129 # tmp/a64/path2
130 # tmp/a64/path1
131 # tmp/a57/path2
132 # tmp/a23/path1
133 # tmp/a1 (1)/path1
134 # tmp/a1/path1
135 ([None, None, False, True, True], reversed([2, 3, 1, 0, 5, 6, 4])),
136 ],
137 )
138 def test_sort_and_print_entries(options, order, mocker):
139 p = mocker.patch(mock_print)
140 sort_and_print_entries(entries, Args(*options))
141 e = [mocker.call(entries[i]) for i in order]
142 p.assert_has_calls(e)
143
144
145 # Each test has an "example" version for demonstrative purposes,
146 # and a test that uses the hypothesis module.
147
148
149 def test_range_check_returns_range_as_is_but_with_floats_example():
150 assert range_check(10, 11) == (10.0, 11.0)
151 assert range_check(6.4, 30) == (6.4, 30.0)
152
153
154 @given(x=floats(allow_nan=False, min_value=-1E8, max_value=1E8) | integers(), d=data())
155 def test_range_check_returns_range_as_is_if_first_is_less_than_second(x, d):
156 # Pull data such that the first is less than the second.
157 if isinstance(x, float):
158 y = d.draw(floats(min_value=x + 1.0, max_value=1E9, allow_nan=False))
159 else:
160 y = d.draw(integers(min_value=x + 1))
161 assert range_check(x, y) == (x, y)
162
163
164 def test_range_check_raises_value_error_if_second_is_less_than_first_example():
165 with pytest.raises(ValueError, match="low >= high"):
166 range_check(7, 2)
167
168
169 @given(x=floats(allow_nan=False), d=data())
170 def test_range_check_raises_value_error_if_second_is_less_than_first(x, d):
171 # Pull data such that the first is greater than or equal to the second.
172 y = d.draw(floats(max_value=x, allow_nan=False))
173 with pytest.raises(ValueError, match="low >= high"):
174 range_check(x, y)
175
176
177 def test_check_filters_returns_none_if_filter_evaluates_to_false():
178 assert check_filters(()) is None
179 assert check_filters(False) is None
180 assert check_filters(None) is None
181
182
183 def test_check_filters_returns_input_as_is_if_filter_is_valid_example():
184 assert check_filters([(6, 7)]) == [(6, 7)]
185 assert check_filters([(6, 7), (2, 8)]) == [(6, 7), (2, 8)]
186
187
188 @given(x=lists(integers(), min_size=1), d=data())
189 def test_check_filters_returns_input_as_is_if_filter_is_valid(x, d):
190 # ensure y is element-wise greater than x
191 y = [d.draw(integers(min_value=val + 1)) for val in x]
192 assert check_filters(list(zip(x, y))) == [(i, j) for i, j in zip(x, y)]
193
194
195 def test_check_filters_raises_value_error_if_filter_is_invalid_example():
196 with pytest.raises(ValueError, match="Error in --filter: low >= high"):
197 check_filters([(7, 2)])
198
199
200 @given(x=lists(integers(), min_size=1), d=data())
201 def test_check_filters_raises_value_error_if_filter_is_invalid(x, d):
202 # ensure y is element-wise less than or equal to x
203 y = [d.draw(integers(max_value=val)) for val in x]
204 with pytest.raises(ValueError, match="Error in --filter: low >= high"):
205 check_filters(list(zip(x, y)))
206
207
208 @pytest.mark.parametrize(
209 "lows, highs, truth",
210 # 1. Any portion is between the bounds => True.
211 # 2. Any portion is between any bounds => True.
212 # 3. No portion is between the bounds => False.
213 [([0], [100], True), ([1, 88], [20, 90], True), ([1], [20], False)],
214 )
215 def test_keep_entry_range(lows, highs, truth):
216 assert keep_entry_range("a56b23c89", lows, highs, int, re.compile(r"\d+")) is truth
217
218
219 # 1. Values not in entry => True. 2. Values in entry => False.
220 @pytest.mark.parametrize("values, truth", [([100, 45], True), ([23], False)])
221 def test_keep_entry_value(values, truth):
222 assert keep_entry_value("a56b23c89", values, int, re.compile(r"\d+")) is truth
0 # -*- coding: utf-8 -*-
1 # pylint: disable=unused-variable
2 """These test the natcmp() function.
3
4 Note that these tests are only relevant for Python version < 3.
5 """
6 from functools import partial
7
8 import pytest
9 from hypothesis import given
10 from hypothesis.strategies import floats, integers, lists
11 from natsort import ns
12 from natsort.compat.py23 import PY_VERSION, py23_cmp
13
14 if PY_VERSION < 3:
15 from natsort import natcmp
16
17
18 class Comparable(object):
19 """Stub class for testing natcmp functionality."""
20
21 def __init__(self, value):
22 self.value = value
23
24 def __cmp__(self, other):
25 return natcmp(self.value, other.value)
26
27
28 @pytest.mark.skipif(PY_VERSION >= 3.0, reason="cmp() deprecated in Python 3")
29 class TestNatCmp:
30
31 def test_classes_can_be_compared(self):
32 one = Comparable("1")
33 two = Comparable("2")
34 another_two = Comparable("2")
35 ten = Comparable("10")
36 assert ten > two == another_two > one
37
38 def test_keys_are_being_cached(self, mocker):
39 natcmp.cached_keys = {}
40 assert len(natcmp.cached_keys) == 0
41 natcmp(0, 0)
42 assert len(natcmp.cached_keys) == 1
43 natcmp(0, 0)
44 assert len(natcmp.cached_keys) == 1
45
46 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=False):
47 natcmp(0, 0, alg=ns.L)
48 assert len(natcmp.cached_keys) == 2
49 natcmp(0, 0, alg=ns.L)
50 assert len(natcmp.cached_keys) == 2
51
52 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=True):
53 natcmp(0, 0, alg=ns.L)
54 assert len(natcmp.cached_keys) == 3
55 natcmp(0, 0, alg=ns.L)
56 assert len(natcmp.cached_keys) == 3
57
58 def test_illegal_algorithm_raises_error(self):
59 with pytest.raises(ValueError):
60 natcmp(0, 0, alg="Just random stuff")
61
62 def test_classes_can_utilize_max_or_min(self):
63 comparables = [Comparable(i) for i in range(10)]
64
65 assert max(comparables) == comparables[-1]
66 assert min(comparables) == comparables[0]
67
68 @given(integers(), integers())
69 def test_natcmp_works_the_same_for_integers_as_cmp(self, x, y):
70 assert py23_cmp(x, y) == natcmp(x, y)
71
72 @given(floats(allow_nan=False), floats(allow_nan=False))
73 def test_natcmp_works_the_same_for_floats_as_cmp(self, x, y):
74 assert py23_cmp(x, y) == natcmp(x, y)
75
76 @given(lists(elements=integers()))
77 def test_sort_strings_with_numbers(self, a_list):
78 strings = [str(var) for var in a_list]
79 # noinspection PyArgumentList
80 natcmp_sorted = sorted(strings, cmp=partial(natcmp, alg=ns.SIGNED))
81
82 assert sorted(a_list) == [int(var) for var in natcmp_sorted]
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import given
6 from hypothesis.strategies import binary, floats, integers, lists, text
7 from natsort.compat.py23 import PY_VERSION, py23_str
8 from natsort.utils import natsort_key
9
10 if PY_VERSION >= 3:
11 long = int
12
13
14 def str_func(x):
15 if isinstance(x, py23_str):
16 return x
17 else:
18 raise TypeError("Not a str!")
19
20
21 def fail(_):
22 raise AssertionError("This should never be reached!")
23
24
25 @given(floats(allow_nan=False) | integers())
26 def test_natsort_key_with_numeric_input_takes_number_path(x):
27 assert natsort_key(x, None, str_func, fail, lambda y: y) is x
28
29
30 @pytest.mark.skipif(PY_VERSION < 3, reason="only valid on python3")
31 @given(binary().filter(bool))
32 def test_natsort_key_with_bytes_input_takes_bytes_path(x):
33 assert natsort_key(x, None, str_func, lambda y: y, fail) is x
34
35
36 @given(text())
37 def test_natsort_key_with_text_input_takes_string_path(x):
38 assert natsort_key(x, None, str_func, fail, fail) is x
39
40
41 @given(lists(elements=text(), min_size=1, max_size=10))
42 def test_natsort_key_with_nested_input_takes_nested_path(x):
43 assert natsort_key(x, None, str_func, fail, fail) == tuple(x)
44
45
46 @given(text())
47 def test_natsort_key_with_key_argument_applies_key_before_processing(x):
48 assert natsort_key(x, len, str_func, fail, lambda y: y) == len(x)
0 # -*- coding: utf-8 -*-
1 """\
2 Here are a collection of examples of how this module can be used.
3 See the README or the natsort homepage for more details.
4 """
5 from __future__ import print_function, unicode_literals
6
7 import pytest
8 from natsort import natsort_key, natsort_keygen, natsorted, ns
9 from natsort.compat.locale import get_strxfrm, null_string_locale
10 from natsort.compat.py23 import PY_VERSION
11
12
13 @pytest.fixture
14 def arbitrary_input():
15 return ["6A-5.034e+1", "/Folder (1)/Foo", 56.7]
16
17
18 @pytest.fixture
19 def bytes_input():
20 return b"6A-5.034e+1"
21
22
23 def test_natsort_keygen_demonstration():
24 original_list = ["a50", "a51.", "a50.31", "a50.4", "a5.034e1", "a50.300"]
25 copy_of_list = original_list[:]
26 original_list.sort(key=natsort_keygen(alg=ns.F))
27 # natsorted uses the output of natsort_keygen under the hood.
28 assert original_list == natsorted(copy_of_list, alg=ns.F)
29
30
31 def test_natsort_key_public():
32 assert natsort_key("a-5.034e2") == ("a-", 5, ".", 34, "e", 2)
33
34
35 def test_natsort_keygen_with_invalid_alg_input_raises_value_error():
36 # Invalid arguments give the correct response
37 with pytest.raises(ValueError, match="'alg' argument"):
38 natsort_keygen(None, "1")
39
40
41 @pytest.mark.parametrize(
42 "alg, expected",
43 [(ns.DEFAULT, ("a-", 5, ".", 34, "e", 1)), (ns.FLOAT | ns.SIGNED, ("a", -50.34))],
44 )
45 def test_natsort_keygen_returns_natsort_key_that_parses_input(alg, expected):
46 ns_key = natsort_keygen(alg=alg)
47 assert ns_key("a-5.034e1") == expected
48
49
50 @pytest.mark.parametrize(
51 "alg, expected",
52 [
53 (
54 ns.DEFAULT,
55 (("", 6, "A-", 5, ".", 34, "e+", 1), ("/Folder (", 1, ")/Foo"), ("", 56.7)),
56 ),
57 (
58 ns.IGNORECASE,
59 (("", 6, "a-", 5, ".", 34, "e+", 1), ("/folder (", 1, ")/foo"), ("", 56.7)),
60 ),
61 (ns.REAL, (("", 6.0, "A", -50.34), ("/Folder (", 1.0, ")/Foo"), ("", 56.7))),
62 (
63 ns.LOWERCASEFIRST | ns.FLOAT | ns.NOEXP,
64 (
65 ("", 6.0, "a-", 5.034, "E+", 1.0),
66 ("/fOLDER (", 1.0, ")/fOO"),
67 ("", 56.7),
68 ),
69 ),
70 (
71 ns.PATH | ns.GROUPLETTERS,
72 (
73 (("", 6, "aA--", 5, "..", 34, "ee++", 1),),
74 (("//",), ("fFoollddeerr ((", 1, "))"), ("fFoooo",)),
75 (("", 56.7),),
76 ),
77 ),
78 ],
79 )
80 def test_natsort_keygen_handles_arbitrary_input(arbitrary_input, alg, expected):
81 ns_key = natsort_keygen(alg=alg)
82 assert ns_key(arbitrary_input) == expected
83
84
85 @pytest.mark.parametrize(
86 "alg, expected",
87 [
88 (ns.DEFAULT, (b"6A-5.034e+1",)),
89 (ns.IGNORECASE, (b"6a-5.034e+1",)),
90 (ns.REAL, (b"6A-5.034e+1",)),
91 (ns.LOWERCASEFIRST | ns.FLOAT | ns.NOEXP, (b"6A-5.034e+1",)),
92 (ns.PATH | ns.GROUPLETTERS, ((b"6A-5.034e+1",),)),
93 ],
94 )
95 @pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
96 def test_natsort_keygen_handles_bytes_input(bytes_input, alg, expected):
97 ns_key = natsort_keygen(alg=alg)
98 assert ns_key(bytes_input) == expected
99
100
101 @pytest.mark.parametrize(
102 "alg, expected, is_dumb",
103 [
104 (
105 ns.LOCALE,
106 (
107 (null_string_locale, 6, "A-", 5, ".", 34, "e+", 1),
108 ("/Folder (", 1, ")/Foo"),
109 (null_string_locale, 56.7),
110 ),
111 False,
112 ),
113 (
114 ns.LOCALE,
115 (
116 (null_string_locale, 6, "aa--", 5, "..", 34, "eE++", 1),
117 ("//ffoOlLdDeErR ((", 1, "))//ffoOoO"),
118 (null_string_locale, 56.7),
119 ),
120 True,
121 ),
122 (
123 ns.LOCALE | ns.CAPITALFIRST,
124 (
125 (("",), (null_string_locale, 6, "A-", 5, ".", 34, "e+", 1)),
126 (("/",), ("/Folder (", 1, ")/Foo")),
127 (("",), (null_string_locale, 56.7)),
128 ),
129 False,
130 ),
131 ],
132 )
133 @pytest.mark.usefixtures("with_locale_en_us")
134 def test_natsort_keygen_with_locale(mocker, arbitrary_input, alg, expected, is_dumb):
135 # First, apply the correct strxfrm function to the string values.
136 strxfrm = get_strxfrm()
137 expected = [list(sub) for sub in expected]
138 try:
139 for i in (2, 4, 6):
140 expected[0][i] = strxfrm(expected[0][i])
141 for i in (0, 2):
142 expected[1][i] = strxfrm(expected[1][i])
143 expected = tuple(tuple(sub) for sub in expected)
144 except IndexError: # ns.LOCALE | ns.CAPITALFIRST
145 expected = [[list(subsub) for subsub in sub] for sub in expected]
146 for i in (2, 4, 6):
147 expected[0][1][i] = strxfrm(expected[0][1][i])
148 for i in (0, 2):
149 expected[1][1][i] = strxfrm(expected[1][1][i])
150 expected = tuple(tuple(tuple(subsub) for subsub in sub) for sub in expected)
151
152 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
153 ns_key = natsort_keygen(alg=alg)
154 assert ns_key(arbitrary_input) == expected
155
156
157 @pytest.mark.parametrize(
158 "alg, is_dumb",
159 [(ns.LOCALE, False), (ns.LOCALE, True), (ns.LOCALE | ns.CAPITALFIRST, False)],
160 )
161 @pytest.mark.skipif(PY_VERSION < 3.0, reason="special bytes handling only on Python3")
162 @pytest.mark.usefixtures("with_locale_en_us")
163 def test_natsort_keygen_with_locale_bytes(mocker, bytes_input, alg, is_dumb):
164 expected = (b"6A-5.034e+1",)
165 with mocker.patch("natsort.compat.locale.dumb_sort", return_value=is_dumb):
166 ns_key = natsort_keygen(alg=alg)
167 assert ns_key(bytes_input) == expected
0 # -*- coding: utf-8 -*-
1 """\
2 Here are a collection of examples of how this module can be used.
3 See the README or the natsort homepage for more details.
4 """
5 from __future__ import print_function, unicode_literals
6
7 from operator import itemgetter
8
9 import pytest
10 from natsort import as_utf8, natsorted, ns
11 from natsort.compat.py23 import PY_VERSION
12 from pytest import raises
13
14
15 @pytest.fixture
16 def float_list():
17 return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
18
19
20 @pytest.fixture
21 def fruit_list():
22 return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
23
24
25 @pytest.fixture
26 def mixed_list():
27 return ["Ä", "0", "ä", 3, "b", 1.5, "2", "Z"]
28
29
30 def test_natsorted_numbers_in_ascending_order():
31 given = ["a2", "a5", "a9", "a1", "a4", "a10", "a6"]
32 expected = ["a1", "a2", "a4", "a5", "a6", "a9", "a10"]
33 assert natsorted(given) == expected
34
35
36 def test_natsorted_can_sort_as_signed_floats_with_exponents(float_list):
37 expected = ["a-50", "a50", "a50.300", "a50.31", "a5.034e1", "a50.4", "a51."]
38 assert natsorted(float_list, alg=ns.REAL) == expected
39
40
41 @pytest.mark.parametrize(
42 # UNSIGNED is default
43 "alg",
44 [ns.NOEXP | ns.FLOAT | ns.UNSIGNED, ns.NOEXP | ns.FLOAT],
45 )
46 def test_natsorted_can_sort_as_unsigned_and_ignore_exponents(float_list, alg):
47 expected = ["a5.034e1", "a50", "a50.300", "a50.31", "a50.4", "a51.", "a-50"]
48 assert natsorted(float_list, alg=alg) == expected
49
50
51 # DEFAULT and INT are all equivalent.
52 @pytest.mark.parametrize("alg", [ns.DEFAULT, ns.INT])
53 def test_natsorted_can_sort_as_unsigned_ints_which_is_default(float_list, alg):
54 expected = ["a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51.", "a-50"]
55 assert natsorted(float_list, alg=alg) == expected
56
57
58 def test_natsorted_can_sort_as_signed_ints(float_list):
59 expected = ["a-50", "a5.034e1", "a50", "a50.4", "a50.31", "a50.300", "a51."]
60 assert natsorted(float_list, alg=ns.SIGNED) == expected
61
62
63 @pytest.mark.parametrize(
64 "alg, expected",
65 [(ns.UNSIGNED, ["a7", "a+2", "a-5"]), (ns.SIGNED, ["a-5", "a+2", "a7"])],
66 )
67 def test_natsorted_can_sort_with_or_without_accounting_for_sign(alg, expected):
68 given = ["a-5", "a7", "a+2"]
69 assert natsorted(given, alg=alg) == expected
70
71
72 def test_natsorted_can_sort_as_version_numbers():
73 given = ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
74 expected = ["1.9.9a", "1.9.9b", "1.10.1", "1.11", "1.11.4"]
75 assert natsorted(given) == expected
76
77
78 @pytest.mark.parametrize(
79 "alg, expected",
80 [
81 (ns.DEFAULT, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
82 (ns.NUMAFTER, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
83 ],
84 )
85 def test_natsorted_handles_mixed_types(mixed_list, alg, expected):
86 assert natsorted(mixed_list, alg=alg) == expected
87
88
89 @pytest.mark.parametrize(
90 "alg, expected, slc",
91 [
92 (ns.DEFAULT, [float("nan"), 5, "25", 1E40], slice(1, None)),
93 (ns.NANLAST, [5, "25", 1E40, float("nan")], slice(None, 3)),
94 ],
95 )
96 def test_natsorted_handles_nan(alg, expected, slc):
97 given = ["25", 5, float("nan"), 1E40]
98 # The slice is because NaN != NaN
99 # noinspection PyUnresolvedReferences
100 assert natsorted(given, alg=alg)[slc] == expected[slc]
101
102
103 @pytest.mark.skipif(PY_VERSION < 3.0, reason="error is only raised on Python 3")
104 def test_natsorted_with_mixed_bytes_and_str_input_raises_type_error():
105 with raises(TypeError, match="bytes"):
106 natsorted(["ä", b"b"])
107
108 # ...unless you use as_utf (or some other decoder).
109 assert natsorted(["ä", b"b"], key=as_utf8) == ["ä", b"b"]
110
111
112 def test_natsorted_raises_type_error_for_non_iterable_input():
113 with raises(TypeError, match="'int' object is not iterable"):
114 natsorted(100)
115
116
117 def test_natsorted_recurses_into_nested_lists():
118 given = [["a1", "a5"], ["a1", "a40"], ["a10", "a1"], ["a2", "a5"]]
119 expected = [["a1", "a5"], ["a1", "a40"], ["a2", "a5"], ["a10", "a1"]]
120 assert natsorted(given) == expected
121
122
123 def test_natsorted_applies_key_to_each_list_element_before_sorting_list():
124 given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
125 expected = [("c", "num2"), ("a", "num3"), ("b", "num5")]
126 assert natsorted(given, key=itemgetter(1)) == expected
127
128
129 def test_natsorted_returns_list_in_reversed_order_with_reverse_option(float_list):
130 expected = natsorted(float_list)[::-1]
131 assert natsorted(float_list, reverse=True) == expected
132
133
134 def test_natsorted_handles_filesystem_paths():
135 given = [
136 "/p/Folder (10)/file.tar.gz",
137 "/p/Folder/file.tar.gz",
138 "/p/Folder (1)/file (1).tar.gz",
139 "/p/Folder (1)/file.tar.gz",
140 ]
141 expected_correct = [
142 "/p/Folder/file.tar.gz",
143 "/p/Folder (1)/file.tar.gz",
144 "/p/Folder (1)/file (1).tar.gz",
145 "/p/Folder (10)/file.tar.gz",
146 ]
147 expected_incorrect = [
148 "/p/Folder (1)/file (1).tar.gz",
149 "/p/Folder (1)/file.tar.gz",
150 "/p/Folder (10)/file.tar.gz",
151 "/p/Folder/file.tar.gz",
152 ]
153 # Is incorrect by default.
154 assert natsorted(given) == expected_incorrect
155 # Need ns.PATH to make it correct.
156 assert natsorted(given, alg=ns.PATH) == expected_correct
157
158
159 def test_natsorted_handles_numbers_and_filesystem_paths_simultaneously():
160 # You can sort paths and numbers, not that you'd want to
161 given = ["/Folder (9)/file.exe", 43]
162 expected = [43, "/Folder (9)/file.exe"]
163 assert natsorted(given, alg=ns.PATH) == expected
164
165
166 @pytest.mark.parametrize(
167 "alg, expected",
168 [
169 (ns.DEFAULT, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
170 (ns.IGNORECASE, ["Apple", "apple", "Banana", "banana", "corn", "Corn"]),
171 (ns.LOWERCASEFIRST, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
172 (ns.GROUPLETTERS, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
173 (ns.G | ns.LF, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
174 ],
175 )
176 def test_natsorted_supports_case_handling(alg, expected, fruit_list):
177 assert natsorted(fruit_list, alg=alg) == expected
178
179
180 @pytest.mark.parametrize(
181 "alg, expected",
182 [
183 (ns.DEFAULT, [("A5", "a6"), ("a3", "a1")]),
184 (ns.LOWERCASEFIRST, [("a3", "a1"), ("A5", "a6")]),
185 (ns.IGNORECASE, [("a3", "a1"), ("A5", "a6")]),
186 ],
187 )
188 def test_natsorted_supports_nested_case_handling(alg, expected):
189 given = [("A5", "a6"), ("a3", "a1")]
190 assert natsorted(given, alg=alg) == expected
191
192
193 @pytest.mark.parametrize(
194 "alg, expected",
195 [
196 (ns.DEFAULT, ["apple", "Apple", "banana", "Banana", "corn", "Corn"]),
197 (ns.CAPITALFIRST, ["Apple", "Banana", "Corn", "apple", "banana", "corn"]),
198 (ns.LOWERCASEFIRST, ["Apple", "apple", "Banana", "banana", "Corn", "corn"]),
199 (ns.C | ns.LF, ["apple", "banana", "corn", "Apple", "Banana", "Corn"]),
200 ],
201 )
202 @pytest.mark.usefixtures("with_locale_en_us")
203 def test_natsorted_can_sort_using_locale(fruit_list, alg, expected):
204 assert natsorted(fruit_list, alg=ns.LOCALE | alg) == expected
205
206
207 @pytest.mark.usefixtures("with_locale_en_us")
208 def test_natsorted_can_sort_locale_specific_numbers_en():
209 given = ["c", "a5,467.86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
210 expected = ["a5,6", "a5,50", "a5367.86", "a5,467.86", "ä", "b", "c"]
211 assert natsorted(given, alg=ns.LOCALE | ns.F) == expected
212
213
214 @pytest.mark.usefixtures("with_locale_de_de")
215 def test_natsorted_can_sort_locale_specific_numbers_de():
216 given = ["c", "a5.467,86", "ä", "b", "a5367.86", "a5,6", "a5,50"]
217 expected = ["a5,50", "a5,6", "a5367.86", "a5.467,86", "ä", "b", "c"]
218 assert natsorted(given, alg=ns.LOCALE | ns.F) == expected
219
220
221 @pytest.mark.parametrize(
222 "alg, expected",
223 [
224 (ns.DEFAULT, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
225 (ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
226 (ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
227 (ns.UG | ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
228 # Adding PATH changes nothing.
229 (ns.PATH, ["0", 1.5, "2", 3, "ä", "Ä", "b", "Z"]),
230 (ns.PATH | ns.NUMAFTER, ["ä", "Ä", "b", "Z", "0", 1.5, "2", 3]),
231 (ns.PATH | ns.UNGROUPLETTERS, ["0", 1.5, "2", 3, "Ä", "Z", "ä", "b"]),
232 (ns.PATH | ns.UG | ns.NA, ["Ä", "Z", "ä", "b", "0", 1.5, "2", 3]),
233 ],
234 )
235 @pytest.mark.usefixtures("with_locale_en_us")
236 def test_natsorted_handles_mixed_types_with_locale(mixed_list, alg, expected):
237 assert natsorted(mixed_list, alg=ns.LOCALE | alg) == expected
238
239
240 @pytest.mark.parametrize(
241 "alg, expected",
242 [
243 (ns.DEFAULT, ["73", "5039", "Banana", "apple", "corn", "~~~~~~"]),
244 (ns.NUMAFTER, ["Banana", "apple", "corn", "~~~~~~", "73", "5039"]),
245 ],
246 )
247 def test_natsorted_sorts_an_odd_collection_of_strings(alg, expected):
248 given = ["apple", "Banana", "73", "5039", "corn", "~~~~~~"]
249 assert natsorted(given, alg=alg) == expected
250
251
252 def test_natsorted_sorts_mixed_ascii_and_non_ascii_numbers():
253 given = [
254 "1st street",
255 "10th street",
256 "2nd street",
257 "2 street",
258 "1 street",
259 "1street",
260 "11 street",
261 "street 2",
262 "street 1",
263 "Street 11",
264 "۲ street",
265 "۱ street",
266 "۱street",
267 "۱۲street",
268 "۱۱ street",
269 "street ۲",
270 "street ۱",
271 "street ۱",
272 "street ۱۲",
273 "street ۱۱",
274 ]
275 expected = [
276 "1 street",
277 "۱ street",
278 "1st street",
279 "1street",
280 "۱street",
281 "2 street",
282 "۲ street",
283 "2nd street",
284 "10th street",
285 "11 street",
286 "۱۱ street",
287 "۱۲street",
288 "street 1",
289 "street ۱",
290 "street ۱",
291 "street 2",
292 "street ۲",
293 "Street 11",
294 "street ۱۱",
295 "street ۱۲",
296 ]
297 assert natsorted(given, alg=ns.IGNORECASE) == expected
0 # -*- coding: utf-8 -*-
1 """\
2 Here are a collection of examples of how this module can be used.
3 See the README or the natsort homepage for more details.
4 """
5 from __future__ import print_function, unicode_literals
6
7 from operator import itemgetter
8
9 import pytest
10 from natsort import (
11 as_ascii,
12 as_utf8,
13 decoder,
14 humansorted,
15 index_humansorted,
16 index_natsorted,
17 index_realsorted,
18 natsorted,
19 ns,
20 order_by_index,
21 realsorted,
22 )
23 from natsort.compat.py23 import PY_VERSION
24
25
26 @pytest.fixture
27 def version_list():
28 return ["1.9.9a", "1.11", "1.9.9b", "1.11.4", "1.10.1"]
29
30
31 @pytest.fixture
32 def float_list():
33 return ["a50", "a51.", "a50.31", "a-50", "a50.4", "a5.034e1", "a50.300"]
34
35
36 @pytest.fixture
37 def fruit_list():
38 return ["Apple", "corn", "Corn", "Banana", "apple", "banana"]
39
40
41 def test_decoder_returns_function_that_can_decode_bytes_but_return_non_bytes_as_is():
42 func = decoder("latin1")
43 str_obj = "bytes"
44 int_obj = 14
45 assert func(b"bytes") == str_obj
46 assert func(int_obj) is int_obj # returns as-is, same object ID
47 if PY_VERSION >= 3:
48 assert (
49 func(str_obj) is str_obj
50 ) # same object returned on Python3 b/c only bytes has decode
51 else:
52 assert func(str_obj) is not str_obj
53 assert (
54 func(str_obj) == str_obj
55 ) # not same object on Python2 because str can decode
56
57
58 def test_as_ascii_converts_bytes_to_ascii():
59 assert decoder("ascii")(b"bytes") == as_ascii(b"bytes")
60
61
62 def test_as_utf8_converts_bytes_to_utf8():
63 assert decoder("utf8")(b"bytes") == as_utf8(b"bytes")
64
65
66 def test_realsorted_is_identical_to_natsorted_with_real_alg(float_list):
67 assert realsorted(float_list) == natsorted(float_list, alg=ns.REAL)
68
69
70 @pytest.mark.usefixtures("with_locale_en_us")
71 def test_humansorted_is_identical_to_natsorted_with_locale_alg(fruit_list):
72 assert humansorted(fruit_list) == natsorted(fruit_list, alg=ns.LOCALE)
73
74
75 def test_index_natsorted_returns_integer_list_of_sort_order_for_input_list():
76 given = ["num3", "num5", "num2"]
77 other = ["foo", "bar", "baz"]
78 index = index_natsorted(given)
79 assert index == [2, 0, 1]
80 assert [given[i] for i in index] == ["num2", "num3", "num5"]
81 assert [other[i] for i in index] == ["baz", "foo", "bar"]
82
83
84 def test_index_natsorted_reverse():
85 given = ["num3", "num5", "num2"]
86 assert index_natsorted(given, reverse=True) == index_natsorted(given)[::-1]
87
88
89 def test_index_natsorted_applies_key_function_before_sorting():
90 given = [("a", "num3"), ("b", "num5"), ("c", "num2")]
91 expected = [2, 0, 1]
92 assert index_natsorted(given, key=itemgetter(1)) == expected
93
94
95 def test_index_realsorted_is_identical_to_index_natsorted_with_real_alg(float_list):
96 assert index_realsorted(float_list) == index_natsorted(float_list, alg=ns.REAL)
97
98
99 @pytest.mark.usefixtures("with_locale_en_us")
100 def test_index_humansorted_is_identical_to_index_natsorted_with_locale_alg(fruit_list):
101 assert index_humansorted(fruit_list) == index_natsorted(fruit_list, alg=ns.LOCALE)
102
103
104 def test_order_by_index_sorts_list_according_to_order_of_integer_list():
105 given = ["num3", "num5", "num2"]
106 index = [2, 0, 1]
107 expected = [given[i] for i in index]
108 assert expected == ["num2", "num3", "num5"]
109 assert order_by_index(given, index) == expected
110
111
112 def test_order_by_index_returns_generator_with_iter_true():
113 given = ["num3", "num5", "num2"]
114 index = [2, 0, 1]
115 assert order_by_index(given, index, True) != [given[i] for i in index]
116 assert list(order_by_index(given, index, True)) == [given[i] for i in index]
0 from natsort import ns
1
2
3 def test_ns_enum():
4 enum_name_values = [
5 ("FLOAT", 0x0001),
6 ("SIGNED", 0x0002),
7 ("NOEXP", 0x0004),
8 ("PATH", 0x0008),
9 ("LOCALEALPHA", 0x0010),
10 ("LOCALENUM", 0x0020),
11 ("IGNORECASE", 0x0040),
12 ("LOWERCASEFIRST", 0x0080),
13 ("GROUPLETTERS", 0x0100),
14 ("UNGROUPLETTERS", 0x0200),
15 ("NANLAST", 0x0400),
16 ("COMPATIBILITYNORMALIZE", 0x0800),
17 ("NUMAFTER", 0x1000),
18 ("DEFAULT", 0x0000),
19 ("INT", 0x0000),
20 ("UNSIGNED", 0x0000),
21 ("REAL", 0x0003),
22 ("LOCALE", 0x0030),
23 ("I", 0x0000),
24 ("U", 0x0000),
25 ("F", 0x0001),
26 ("S", 0x0002),
27 ("R", 0x0003),
28 ("N", 0x0004),
29 ("P", 0x0008),
30 ("LA", 0x0010),
31 ("LN", 0x0020),
32 ("L", 0x0030),
33 ("IC", 0x0040),
34 ("LF", 0x0080),
35 ("G", 0x0100),
36 ("UG", 0x0200),
37 ("C", 0x0200),
38 ("CAPITALFIRST", 0x0200),
39 ("NL", 0x0400),
40 ("CN", 0x0800),
41 ("NA", 0x1000),
42 ]
43 assert list(ns._asdict().items()) == enum_name_values
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import given
6 from hypothesis.strategies import binary
7 from natsort.ns_enum import ns
8 from natsort.utils import parse_bytes_factory
9
10
11 @pytest.mark.parametrize(
12 "alg, example_func",
13 [
14 (ns.DEFAULT, lambda x: (x,)),
15 (ns.IGNORECASE, lambda x: (x.lower(),)),
16 # With PATH, it becomes a tested tuple.
17 (ns.PATH, lambda x: ((x,),)),
18 (ns.PATH | ns.IGNORECASE, lambda x: ((x.lower(),),)),
19 ],
20 )
21 @given(x=binary())
22 def test_parse_bytest_factory_makes_function_that_returns_tuple(x, alg, example_func):
23 parse_bytes_func = parse_bytes_factory(alg)
24 assert parse_bytes_func(x) == example_func(x)
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from hypothesis import given
6 from hypothesis.strategies import floats, integers
7 from natsort.ns_enum import ns
8 from natsort.utils import parse_number_factory
9
10
11 @pytest.mark.usefixtures("with_locale_en_us")
12 @pytest.mark.parametrize(
13 "alg, example_func",
14 [
15 (ns.DEFAULT, lambda x: ("", x)),
16 (ns.PATH, lambda x: (("", x),)),
17 (ns.UNGROUPLETTERS | ns.LOCALE, lambda x: (("xx",), ("", x))),
18 (ns.PATH | ns.UNGROUPLETTERS | ns.LOCALE, lambda x: ((("xx",), ("", x)),)),
19 ],
20 )
21 @given(x=floats(allow_nan=False) | integers())
22 def test_parse_number_factory_makes_function_that_returns_tuple(x, alg, example_func):
23 parse_number_func = parse_number_factory(alg, "", "xx")
24 assert parse_number_func(x) == example_func(x)
25
26
27 @pytest.mark.parametrize(
28 "alg, x, result",
29 [
30 (ns.DEFAULT, 57, ("", 57)),
31 (ns.DEFAULT, float("nan"), ("", float("-inf"))), # NaN transformed to -infinity
32 (ns.NANLAST, float("nan"), ("", float("+inf"))), # NANLAST makes it +infinity
33 ],
34 )
35 def test_parse_number_factory_treats_nan_special(alg, x, result):
36 parse_number_func = parse_number_factory(alg, "", "xx")
37 assert parse_number_func(x) == result
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import unicodedata
5
6 import pytest
7 from hypothesis import given
8 from hypothesis.strategies import floats, integers, lists, text
9 from natsort.compat.fastnumbers import fast_float
10 from natsort.compat.py23 import py23_str
11 from natsort.ns_enum import NS_DUMB, ns
12 from natsort.utils import NumericalRegularExpressions as NumRegex
13 from natsort.utils import parse_string_factory
14
15
16 class CustomTuple(tuple):
17 """Used to ensure what is given during testing is what is returned."""
18
19 original = None
20
21
22 def input_transform(x):
23 """Make uppercase."""
24 try:
25 return x.upper()
26 except AttributeError:
27 return x
28
29
30 def final_transform(x, original):
31 """Make the input a CustomTuple."""
32 t = CustomTuple(x)
33 t.original = original
34 return t
35
36
37 @pytest.fixture
38 def parse_string_func(request):
39 """A parse_string_factory result with sample arguments."""
40 sep = ""
41 return parse_string_factory(
42 request.param, # algorirhm
43 sep,
44 NumRegex.int_nosign().split,
45 input_transform,
46 fast_float,
47 final_transform,
48 )
49
50
51 @pytest.mark.parametrize("parse_string_func", [ns.DEFAULT], indirect=True)
52 @given(x=floats() | integers())
53 def test_parse_string_factory_raises_type_error_if_given_number(x, parse_string_func):
54 with pytest.raises(TypeError):
55 assert parse_string_func(x)
56
57
58 # noinspection PyCallingNonCallable
59 @pytest.mark.parametrize(
60 "parse_string_func, orig_func",
61 [
62 (ns.DEFAULT, lambda x: x.upper()),
63 (ns.LOCALE, lambda x: x.upper()),
64 (ns.LOCALE | NS_DUMB, lambda x: x), # This changes the "original" handling.
65 ],
66 indirect=["parse_string_func"],
67 )
68 @given(
69 x=lists(
70 elements=floats(allow_nan=False) | text() | integers(), min_size=1, max_size=10
71 )
72 )
73 @pytest.mark.usefixtures("with_locale_en_us")
74 def test_parse_string_factory_invariance(x, parse_string_func, orig_func):
75 # parse_string_factory is the high-level combination of several dedicated
76 # functions involved in splitting and manipulating a string. The details of
77 # what those functions do is not relevant to testing parse_string_factory.
78 # What is relevant is that the form of the output matches the invariant
79 # that even elements are string and odd are numerical. That each component
80 # function is doing what it should is tested elsewhere.
81 value = "".join(map(py23_str, x)) # Convert the input to a single string.
82 result = parse_string_func(value)
83 result_types = list(map(type, result))
84 expected_types = [py23_str if i % 2 == 0 else float for i in range(len(result))]
85 assert result_types == expected_types
86
87 # The result is in our CustomTuple.
88 assert isinstance(result, CustomTuple)
89
90 # Original should have gone through the "input_transform"
91 # which is uppercase in these tests.
92 assert result.original == orig_func(unicodedata.normalize("NFD", value))
0 # -*- coding: utf-8 -*-
1 """These test the splitting regular expressions."""
2 from __future__ import unicode_literals
3
4 import pytest
5 from natsort.utils import NumericalRegularExpressions as NumRegex
6
7
8 regex_names = {
9 NumRegex.int_nosign(): "int_nosign",
10 NumRegex.int_sign(): "int_sign",
11 NumRegex.float_nosign_noexp(): "float_nosign_noexp",
12 NumRegex.float_sign_noexp(): "float_sign_noexp",
13 NumRegex.float_nosign_exp(): "float_nosign_exp",
14 NumRegex.float_sign_exp(): "float_sign_exp",
15 }
16
17 # Regex Aliases (so lines stay a reasonable length.
18 i_u = NumRegex.int_nosign()
19 i_s = NumRegex.int_sign()
20 f_u = NumRegex.float_nosign_noexp()
21 f_s = NumRegex.float_sign_noexp()
22 f_ue = NumRegex.float_nosign_exp()
23 f_se = NumRegex.float_sign_exp()
24
25 # Assemble a test suite of regular strings and their regular expression
26 # splitting result. Organize by the input string.
27 regex_tests = {
28 "-123.45e+67": {
29 i_u: ["-", "123", ".", "45", "e+", "67", ""],
30 i_s: ["", "-123", ".", "45", "e", "+67", ""],
31 f_u: ["-", "123.45", "e+", "67", ""],
32 f_s: ["", "-123.45", "e", "+67", ""],
33 f_ue: ["-", "123.45e+67", ""],
34 f_se: ["", "-123.45e+67", ""],
35 },
36 "a-123.45e+67b": {
37 i_u: ["a-", "123", ".", "45", "e+", "67", "b"],
38 i_s: ["a", "-123", ".", "45", "e", "+67", "b"],
39 f_u: ["a-", "123.45", "e+", "67", "b"],
40 f_s: ["a", "-123.45", "e", "+67", "b"],
41 f_ue: ["a-", "123.45e+67", "b"],
42 f_se: ["a", "-123.45e+67", "b"],
43 },
44 "hello": {
45 i_u: ["hello"],
46 i_s: ["hello"],
47 f_u: ["hello"],
48 f_s: ["hello"],
49 f_ue: ["hello"],
50 f_se: ["hello"],
51 },
52 "abc12.34.56-7def": {
53 i_u: ["abc", "12", ".", "34", ".", "56", "-", "7", "def"],
54 i_s: ["abc", "12", ".", "34", ".", "56", "", "-7", "def"],
55 f_u: ["abc", "12.34", "", ".56", "-", "7", "def"],
56 f_s: ["abc", "12.34", "", ".56", "", "-7", "def"],
57 f_ue: ["abc", "12.34", "", ".56", "-", "7", "def"],
58 f_se: ["abc", "12.34", "", ".56", "", "-7", "def"],
59 },
60 "a1b2c3d4e5e6": {
61 i_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
62 i_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
63 f_u: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
64 f_s: ["a", "1", "b", "2", "c", "3", "d", "4", "e", "5", "e", "6", ""],
65 f_ue: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
66 f_se: ["a", "1", "b", "2", "c", "3", "d", "4e5", "e", "6", ""],
67 },
68 "eleven۱۱eleven11eleven১১": { # All of these are the decimal 11
69 i_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
70 i_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
71 f_u: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
72 f_s: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
73 f_ue: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
74 f_se: ["eleven", "۱۱", "eleven", "11", "eleven", "১১", ""],
75 },
76 "12①②ⅠⅡ⅓": { # Two decimals, Two digits, Two numerals, fraction
77 i_u: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
78 i_s: ["", "12", "", "①", "", "②", "ⅠⅡ⅓"],
79 f_u: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
80 f_s: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
81 f_ue: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
82 f_se: ["", "12", "", "①", "", "②", "", "Ⅰ", "", "Ⅱ", "", "⅓", ""],
83 }
84 }
85
86
87 # From the above collections, create the parametrized tests and labels.
88 regex_params = [
89 (given, expected, regex)
90 for given, values in regex_tests.items()
91 for regex, expected in values.items()
92 ]
93 labels = ["{}-{}".format(given, regex_names[regex]) for given, _, regex in regex_params]
94
95
96 @pytest.mark.parametrize("x, expected, regex", regex_params, ids=labels)
97 def test_regex_splits_correctly(x, expected, regex):
98 # noinspection PyUnresolvedReferences
99 assert regex.split(x) == expected
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 from functools import partial
5
6 import pytest
7 from hypothesis import example, given
8 from hypothesis.strategies import floats, integers, text
9 from natsort.compat.fastnumbers import fast_float, fast_int
10 from natsort.compat.locale import get_strxfrm
11 from natsort.compat.py23 import py23_range, py23_str, py23_unichr
12 from natsort.ns_enum import NS_DUMB, ns
13 from natsort.utils import groupletters, string_component_transform_factory
14
15 # There are some unicode values that are known failures with the builtin locale
16 # library on BSD systems that has nothing to do with natsort (a ValueError is
17 # raised by strxfrm). Let's filter them out.
18 try:
19 bad_uni_chars = frozenset(
20 py23_unichr(x) for x in py23_range(0X10fefd, 0X10ffff + 1)
21 )
22 except ValueError:
23 # Narrow unicode build... no worries.
24 bad_uni_chars = frozenset()
25
26
27 def no_bad_uni_chars(x, _bad_chars=bad_uni_chars):
28 """Ensure text does not contain bad unicode characters"""
29 return not any(y in _bad_chars for y in x)
30
31
32 def no_null(x):
33 """Ensure text does not contain a null character."""
34 return "\0" not in x
35
36
37 @pytest.mark.parametrize(
38 "alg, example_func",
39 [
40 (ns.INT, fast_int),
41 (ns.DEFAULT, fast_int),
42 (ns.FLOAT, partial(fast_float, nan=float("-inf"))),
43 (ns.FLOAT | ns.NANLAST, partial(fast_float, nan=float("+inf"))),
44 (ns.GROUPLETTERS, partial(fast_int, key=groupletters)),
45 (ns.LOCALE, partial(fast_int, key=lambda x: get_strxfrm()(x))),
46 (
47 ns.GROUPLETTERS | ns.LOCALE,
48 partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
49 ),
50 (
51 NS_DUMB | ns.LOCALE,
52 partial(fast_int, key=lambda x: get_strxfrm()(groupletters(x))),
53 ),
54 (
55 ns.GROUPLETTERS | ns.LOCALE | ns.FLOAT | ns.NANLAST,
56 partial(
57 fast_float,
58 key=lambda x: get_strxfrm()(groupletters(x)),
59 nan=float("+inf"),
60 ),
61 ),
62 ],
63 )
64 @example(x=float("nan"))
65 @given(
66 x=integers()
67 | floats()
68 | text().filter(bool).filter(no_bad_uni_chars).filter(no_null)
69 )
70 @pytest.mark.usefixtures("with_locale_en_us")
71 def test_string_component_transform_factory(x, alg, example_func):
72 string_component_transform_func = string_component_transform_factory(alg)
73 try:
74 assert string_component_transform_func(py23_str(x)) == example_func(py23_str(x))
75 except ValueError as e: # handle broken locale lib on BSD.
76 if "is not in range" not in str(e):
77 raise
0 # -*- coding: utf-8 -*-
1 """\
2 Test the Unicode numbers module.
3 """
4 from __future__ import unicode_literals
5
6 import unicodedata
7
8 from natsort.compat.py23 import py23_range, py23_unichr
9 from natsort.unicode_numbers import (
10 decimal_chars,
11 decimals,
12 digit_chars,
13 digits,
14 digits_no_decimals,
15 numeric,
16 numeric_chars,
17 numeric_hex,
18 numeric_no_decimals,
19 )
20
21
22 def test_numeric_chars_contains_only_valid_unicode_numeric_characters():
23 for a in numeric_chars:
24 assert unicodedata.numeric(a, None) is not None
25
26
27 def test_digit_chars_contains_only_valid_unicode_digit_characters():
28 for a in digit_chars:
29 assert unicodedata.digit(a, None) is not None
30
31
32 def test_decimal_chars_contains_only_valid_unicode_decimal_characters():
33 for a in decimal_chars:
34 assert unicodedata.decimal(a, None) is not None
35
36
37 def test_numeric_chars_contains_all_valid_unicode_numeric_and_digit_characters():
38 set_numeric_hex = set(numeric_hex)
39 set_numeric_chars = set(numeric_chars)
40 set_digit_chars = set(digit_chars)
41 set_decimal_chars = set(decimal_chars)
42 for i in py23_range(0X110000):
43 try:
44 a = py23_unichr(i)
45 except ValueError:
46 break
47 if a in "0123456789":
48 continue
49 if unicodedata.numeric(a, None) is not None:
50 assert i in set_numeric_hex
51 assert a in set_numeric_chars
52 if unicodedata.digit(a, None) is not None:
53 assert i in set_numeric_hex
54 assert a in set_digit_chars
55 if unicodedata.decimal(a, None) is not None:
56 assert i in set_numeric_hex
57 assert a in set_decimal_chars
58
59 assert set_decimal_chars.isdisjoint(digits_no_decimals)
60 assert set_digit_chars.issuperset(digits_no_decimals)
61
62 assert set_decimal_chars.isdisjoint(numeric_no_decimals)
63 assert set_numeric_chars.issuperset(numeric_no_decimals)
64
65
66 def test_combined_string_contains_all_characters_in_list():
67 assert numeric == "".join(numeric_chars)
68 assert digits == "".join(digit_chars)
69 assert decimals == "".join(decimal_chars)
0 # -*- coding: utf-8 -*-
1 """These test the utils.py functions."""
2 from __future__ import unicode_literals
3
4 import pathlib
5 import string
6 from itertools import chain
7 from operator import neg as op_neg
8
9 import pytest
10 from hypothesis import given
11 from hypothesis.strategies import integers, lists, sampled_from, text
12 from natsort import utils
13 from natsort.compat.py23 import py23_cmp, py23_int, py23_lower, py23_str
14 from natsort.ns_enum import ns
15
16
17 def test_do_decoding_decodes_bytes_string_to_unicode():
18 assert type(utils.do_decoding(b"bytes", "ascii")) is py23_str
19 assert utils.do_decoding(b"bytes", "ascii") == "bytes"
20 assert utils.do_decoding(b"bytes", "ascii") == b"bytes".decode("ascii")
21
22
23 @pytest.mark.parametrize(
24 "alg, expected",
25 [
26 (ns.I, utils.NumericalRegularExpressions.int_nosign()),
27 (ns.I | ns.N, utils.NumericalRegularExpressions.int_nosign()),
28 (ns.I | ns.S, utils.NumericalRegularExpressions.int_sign()),
29 (ns.I | ns.S | ns.N, utils.NumericalRegularExpressions.int_sign()),
30 (ns.F, utils.NumericalRegularExpressions.float_nosign_exp()),
31 (ns.F | ns.N, utils.NumericalRegularExpressions.float_nosign_noexp()),
32 (ns.F | ns.S, utils.NumericalRegularExpressions.float_sign_exp()),
33 (ns.F | ns.S | ns.N, utils.NumericalRegularExpressions.float_sign_noexp()),
34 ],
35 )
36 def test_regex_chooser_returns_correct_regular_expression_object(alg, expected):
37 assert utils.regex_chooser(alg).pattern == expected.pattern
38
39
40 @pytest.mark.parametrize(
41 "alg, value_or_alias",
42 [
43 # Defaults
44 (ns.DEFAULT, 0),
45 (ns.INT, 0),
46 (ns.UNSIGNED, 0),
47 # Aliases
48 (ns.INT, ns.I),
49 (ns.UNSIGNED, ns.U),
50 (ns.FLOAT, ns.F),
51 (ns.SIGNED, ns.S),
52 (ns.NOEXP, ns.N),
53 (ns.PATH, ns.P),
54 (ns.LOCALEALPHA, ns.LA),
55 (ns.LOCALENUM, ns.LN),
56 (ns.LOCALE, ns.L),
57 (ns.IGNORECASE, ns.IC),
58 (ns.LOWERCASEFIRST, ns.LF),
59 (ns.GROUPLETTERS, ns.G),
60 (ns.UNGROUPLETTERS, ns.UG),
61 (ns.CAPITALFIRST, ns.C),
62 (ns.UNGROUPLETTERS, ns.CAPITALFIRST),
63 (ns.NANLAST, ns.NL),
64 (ns.COMPATIBILITYNORMALIZE, ns.CN),
65 (ns.NUMAFTER, ns.NA),
66 # Convenience
67 (ns.LOCALE, ns.LOCALEALPHA | ns.LOCALENUM),
68 (ns.REAL, ns.FLOAT | ns.SIGNED),
69 ],
70 )
71 def test_ns_enum_values_and_aliases(alg, value_or_alias):
72 assert alg == value_or_alias
73
74
75 def test_chain_functions_is_a_no_op_if_no_functions_are_given():
76 x = 2345
77 assert utils.chain_functions([])(x) is x
78
79
80 def test_chain_functions_does_one_function_if_one_function_is_given():
81 x = "2345"
82 assert utils.chain_functions([len])(x) == 4
83
84
85 def test_chain_functions_combines_functions_in_given_order():
86 x = 2345
87 assert utils.chain_functions([str, len, op_neg])(x) == -len(str(x))
88
89
90 # Each test has an "example" version for demonstrative purposes,
91 # and a test that uses the hypothesis module.
92
93
94 def test_groupletters_returns_letters_with_lowercase_transform_of_letter_example():
95 assert utils.groupletters("HELLO") == "hHeElLlLoO"
96 assert utils.groupletters("hello") == "hheelllloo"
97
98
99 @given(text().filter(bool))
100 def test_groupletters_returns_letters_with_lowercase_transform_of_letter(x):
101 assert utils.groupletters(x) == "".join(
102 chain.from_iterable([py23_lower(y), y] for y in x)
103 )
104
105
106 def test_sep_inserter_does_nothing_if_no_numbers_example():
107 assert list(utils.sep_inserter(iter(["a", "b", "c"]), "")) == ["a", "b", "c"]
108 assert list(utils.sep_inserter(iter(["a"]), "")) == ["a"]
109
110
111 def test_sep_inserter_does_nothing_if_only_one_number_example():
112 assert list(utils.sep_inserter(iter(["a", 5]), "")) == ["a", 5]
113
114
115 def test_sep_inserter_inserts_separator_string_between_two_numbers_example():
116 assert list(utils.sep_inserter(iter([5, 9]), "")) == ["", 5, "", 9]
117
118
119 @given(lists(elements=text().filter(bool) | integers(), min_size=3))
120 def test_sep_inserter_inserts_separator_between_two_numbers(x):
121 # Rather than just replicating the the results in a different
122 # algorithm, validate that the "shape" of the output is as expected.
123 result = list(utils.sep_inserter(iter(x), ""))
124 for i, pos in enumerate(result[1:-1], 1):
125 if pos == "":
126 assert isinstance(result[i - 1], py23_int)
127 assert isinstance(result[i + 1], py23_int)
128
129
130 def test_path_splitter_splits_path_string_by_separator_example():
131 z = "/this/is/a/path"
132 assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
133 z = pathlib.Path("/this/is/a/path")
134 assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
135
136
137 @given(lists(sampled_from(string.ascii_letters), min_size=2).filter(all))
138 def test_path_splitter_splits_path_string_by_separator(x):
139 z = py23_str(pathlib.Path(*x))
140 assert tuple(utils.path_splitter(z)) == tuple(pathlib.Path(z).parts)
141
142
143 def test_path_splitter_splits_path_string_by_separator_and_removes_extension_example():
144 z = "/this/is/a/path/file.exe"
145 y = tuple(pathlib.Path(z).parts)
146 assert tuple(utils.path_splitter(z)) == y[:-1] + (
147 pathlib.Path(z).stem,
148 pathlib.Path(z).suffix,
149 )
150
151
152 @given(lists(sampled_from(string.ascii_letters), min_size=3).filter(all))
153 def test_path_splitter_splits_path_string_by_separator_and_removes_extension(x):
154 z = py23_str(pathlib.Path(*x[:-2])) + "." + x[-1]
155 y = tuple(pathlib.Path(z).parts)
156 assert tuple(utils.path_splitter(z)) == y[:-1] + (
157 pathlib.Path(z).stem,
158 pathlib.Path(z).suffix,
159 )
160
161
162 @given(integers())
163 def test_py23_cmp(x):
164 assert py23_cmp(x, x) == 0
165 assert py23_cmp(x, x + 1) < 0
166 assert py23_cmp(x, x - 1) > 0
1717 passenv =
1818 WITH_EXTRAS
1919 deps =
20 pipenv
20 -r dev-requirements.txt
2121 extras =
2222 {env:WITH_EXTRAS:}
2323 commands =
24 pipenv install --dev --skip-lock
2524 # Only run How It Works doctest on Python 3.6.
26 py36: {envpython} -m doctest -o IGNORE_EXCEPTION_DETAIL docs/source/howitworks.rst
25 py36: {envpython} -m doctest -o IGNORE_EXCEPTION_DETAIL docs/howitworks.rst
2726 # Other doctests are run for all pythons.
28 pytest README.rst docs/source/intro.rst docs/source/examples.rst
27 pytest README.rst docs/intro.rst docs/examples.rst
2928 pytest --doctest-modules {envsitepackagesdir}/natsort
3029 # Full test suite. Allow the user to pass command-line objects.
3130 pytest --tb=short --cov {envsitepackagesdir}/natsort --cov-report term-missing {posargs:}
3736 flake8-import-order
3837 flake8-bugbear
3938 pep8-naming
40 commands = flake8
39 check-manifest
40 twine
41 commands =
42 {envpython} setup.py sdist bdist_wheel
43 flake8
44 check-manifest --ignore ".github*,*.md,.coveragerc"
45 twine check dist/*
46 skip_install = true
4147
4248 # Build documentation.
4349 [testenv:docs]
4652 sphinx_rtd_theme
4753 commands =
4854 {envpython} setup.py build_sphinx
55 skip_install = true
4956
5057 # Release the code to PyPI
5158 [testenv:release]
5259 deps =
5360 twine
54 check-manifest
5561 commands =
56 check-manifest
5762 {envpython} setup.py sdist bdist_wheel
5863 twine upload dist/*
64 skip_install = true