Update upstream source from tag 'upstream/1.3.0'
Update to upstream version '1.3.0'
with Debian dir 89a3a88e3e22cd000da384c119baaf4436680466
Diane Trout
1 year, 3 months ago
0 | 0 | Metadata-Version: 2.1 |
1 | 1 | Name: partd |
2 | Version: 1.2.0 | |
2 | Version: 1.3.0 | |
3 | 3 | Summary: Appendable key-value storage |
4 | 4 | Home-page: http://github.com/dask/partd/ |
5 | 5 | Maintainer: Matthew Rocklin |
6 | 6 | Maintainer-email: mrocklin@gmail.com |
7 | 7 | License: BSD |
8 | Description: PartD | |
9 | ===== | |
10 | ||
11 | |Build Status| |Version Status| | |
12 | ||
13 | Key-value byte store with appendable values | |
14 | ||
15 | Partd stores key-value pairs. | |
16 | Values are raw bytes. | |
17 | We append on old values. | |
18 | ||
19 | Partd excels at shuffling operations. | |
20 | ||
21 | Operations | |
22 | ---------- | |
23 | ||
24 | PartD has two main operations, ``append`` and ``get``. | |
25 | ||
26 | ||
27 | Example | |
28 | ------- | |
29 | ||
30 | 1. Create a Partd backed by a directory:: | |
31 | ||
32 | >>> import partd | |
33 | >>> p = partd.File('/path/to/new/dataset/') | |
34 | ||
35 | 2. Append key-byte pairs to dataset:: | |
36 | ||
37 | >>> p.append({'x': b'Hello ', 'y': b'123'}) | |
38 | >>> p.append({'x': b'world!', 'y': b'456'}) | |
39 | ||
40 | 3. Get bytes associated to keys:: | |
41 | ||
42 | >>> p.get('x') # One key | |
43 | b'Hello world!' | |
44 | ||
45 | >>> p.get(['y', 'x']) # List of keys | |
46 | [b'123456', b'Hello world!'] | |
47 | ||
48 | 4. Destroy partd dataset:: | |
49 | ||
50 | >>> p.drop() | |
51 | ||
52 | That's it. | |
53 | ||
54 | ||
55 | Implementations | |
56 | --------------- | |
57 | ||
58 | We can back a partd by an in-memory dictionary:: | |
59 | ||
60 | >>> p = Dict() | |
61 | ||
62 | For larger amounts of data or to share data between processes we back a partd | |
63 | by a directory of files. This uses file-based locks for consistency.:: | |
64 | ||
65 | >>> p = File('/path/to/dataset/') | |
66 | ||
67 | However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low:: | |
68 | ||
69 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
70 | ||
71 | You might also want to have many distributed process write to a single partd | |
72 | consistently. This can be done with a server | |
73 | ||
74 | * Server Process:: | |
75 | ||
76 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
77 | >>> s = Server(p, address='ipc://server') | |
78 | ||
79 | * Worker processes:: | |
80 | ||
81 | >>> p = Client('ipc://server') # Client machine talks to remote server | |
82 | ||
83 | ||
84 | Encodings and Compression | |
85 | ------------------------- | |
86 | ||
87 | Once we can robustly and efficiently append bytes to a partd we consider | |
88 | compression and encodings. This is generally available with the ``Encode`` | |
89 | partd, which accepts three functions, one to apply on bytes as they are | |
90 | written, one to apply to bytes as they are read, and one to join bytestreams. | |
91 | Common configurations already exist for common data and compression formats. | |
92 | ||
93 | We may wish to compress and decompress data transparently as we interact with a | |
94 | partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take | |
95 | another partd as an argument.:: | |
96 | ||
97 | >>> p = File(...) | |
98 | >>> p = ZLib(p) | |
99 | ||
100 | These work exactly as before, the (de)compression happens automatically. | |
101 | ||
102 | Common data formats like Python lists, numpy arrays, and pandas | |
103 | dataframes are also supported out of the box.:: | |
104 | ||
105 | >>> p = File(...) | |
106 | >>> p = NumPy(p) | |
107 | >>> p.append({'x': np.array([...])}) | |
108 | ||
109 | This lets us forget about bytes and think instead in our normal data types. | |
110 | ||
111 | Composition | |
112 | ----------- | |
113 | ||
114 | In principle we want to compose all of these choices together | |
115 | ||
116 | 1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client`` | |
117 | 2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ... | |
118 | 3. Compression: ``Blosc``, ``Snappy``, ... | |
119 | ||
120 | Partd objects compose by nesting. Here we make a partd that writes pickle | |
121 | encoded BZ2 compressed bytes directly to disk:: | |
122 | ||
123 | >>> p = Pickle(BZ2(File('foo'))) | |
124 | ||
125 | We could construct more complex systems that include compression, | |
126 | serialization, buffering, and remote access.:: | |
127 | ||
128 | >>> server = Server(Buffer(Dict(), File(), available_memory=2e0)) | |
129 | ||
130 | >>> client = Pickle(Snappy(Client(server.address))) | |
131 | >>> client.append({'x': [1, 2, 3]}) | |
132 | ||
133 | .. |Build Status| image:: https://github.com/dask/partd/workflows/CI/badge.svg | |
134 | :target: https://github.com/dask/partd/actions?query=workflow%3ACI | |
135 | .. |Version Status| image:: https://img.shields.io/pypi/v/partd.svg | |
136 | :target: https://pypi.python.org/pypi/partd/ | |
137 | ||
138 | Platform: UNKNOWN | |
139 | 8 | Classifier: Programming Language :: Python :: 3 |
140 | Classifier: Programming Language :: Python :: 3.5 | |
141 | Classifier: Programming Language :: Python :: 3.6 | |
142 | 9 | Classifier: Programming Language :: Python :: 3.7 |
143 | 10 | Classifier: Programming Language :: Python :: 3.8 |
144 | Requires-Python: >=3.5 | |
11 | Classifier: Programming Language :: Python :: 3.9 | |
12 | Requires-Python: >=3.7 | |
145 | 13 | Provides-Extra: complete |
14 | License-File: LICENSE.txt | |
15 | ||
16 | PartD | |
17 | ===== | |
18 | ||
19 | |Build Status| |Version Status| | |
20 | ||
21 | Key-value byte store with appendable values | |
22 | ||
23 | Partd stores key-value pairs. | |
24 | Values are raw bytes. | |
25 | We append on old values. | |
26 | ||
27 | Partd excels at shuffling operations. | |
28 | ||
29 | Operations | |
30 | ---------- | |
31 | ||
32 | PartD has two main operations, ``append`` and ``get``. | |
33 | ||
34 | ||
35 | Example | |
36 | ------- | |
37 | ||
38 | 1. Create a Partd backed by a directory:: | |
39 | ||
40 | >>> import partd | |
41 | >>> p = partd.File('/path/to/new/dataset/') | |
42 | ||
43 | 2. Append key-byte pairs to dataset:: | |
44 | ||
45 | >>> p.append({'x': b'Hello ', 'y': b'123'}) | |
46 | >>> p.append({'x': b'world!', 'y': b'456'}) | |
47 | ||
48 | 3. Get bytes associated to keys:: | |
49 | ||
50 | >>> p.get('x') # One key | |
51 | b'Hello world!' | |
52 | ||
53 | >>> p.get(['y', 'x']) # List of keys | |
54 | [b'123456', b'Hello world!'] | |
55 | ||
56 | 4. Destroy partd dataset:: | |
57 | ||
58 | >>> p.drop() | |
59 | ||
60 | That's it. | |
61 | ||
62 | ||
63 | Implementations | |
64 | --------------- | |
65 | ||
66 | We can back a partd by an in-memory dictionary:: | |
67 | ||
68 | >>> p = Dict() | |
69 | ||
70 | For larger amounts of data or to share data between processes we back a partd | |
71 | by a directory of files. This uses file-based locks for consistency.:: | |
72 | ||
73 | >>> p = File('/path/to/dataset/') | |
74 | ||
75 | However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low:: | |
76 | ||
77 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
78 | ||
79 | You might also want to have many distributed process write to a single partd | |
80 | consistently. This can be done with a server | |
81 | ||
82 | * Server Process:: | |
83 | ||
84 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
85 | >>> s = Server(p, address='ipc://server') | |
86 | ||
87 | * Worker processes:: | |
88 | ||
89 | >>> p = Client('ipc://server') # Client machine talks to remote server | |
90 | ||
91 | ||
92 | Encodings and Compression | |
93 | ------------------------- | |
94 | ||
95 | Once we can robustly and efficiently append bytes to a partd we consider | |
96 | compression and encodings. This is generally available with the ``Encode`` | |
97 | partd, which accepts three functions, one to apply on bytes as they are | |
98 | written, one to apply to bytes as they are read, and one to join bytestreams. | |
99 | Common configurations already exist for common data and compression formats. | |
100 | ||
101 | We may wish to compress and decompress data transparently as we interact with a | |
102 | partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take | |
103 | another partd as an argument.:: | |
104 | ||
105 | >>> p = File(...) | |
106 | >>> p = ZLib(p) | |
107 | ||
108 | These work exactly as before, the (de)compression happens automatically. | |
109 | ||
110 | Common data formats like Python lists, numpy arrays, and pandas | |
111 | dataframes are also supported out of the box.:: | |
112 | ||
113 | >>> p = File(...) | |
114 | >>> p = NumPy(p) | |
115 | >>> p.append({'x': np.array([...])}) | |
116 | ||
117 | This lets us forget about bytes and think instead in our normal data types. | |
118 | ||
119 | Composition | |
120 | ----------- | |
121 | ||
122 | In principle we want to compose all of these choices together | |
123 | ||
124 | 1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client`` | |
125 | 2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ... | |
126 | 3. Compression: ``Blosc``, ``Snappy``, ... | |
127 | ||
128 | Partd objects compose by nesting. Here we make a partd that writes pickle | |
129 | encoded BZ2 compressed bytes directly to disk:: | |
130 | ||
131 | >>> p = Pickle(BZ2(File('foo'))) | |
132 | ||
133 | We could construct more complex systems that include compression, | |
134 | serialization, buffering, and remote access.:: | |
135 | ||
136 | >>> server = Server(Buffer(Dict(), File(), available_memory=2e0)) | |
137 | ||
138 | >>> client = Pickle(Snappy(Client(server.address))) | |
139 | >>> client.append({'x': [1, 2, 3]}) | |
140 | ||
141 | .. |Build Status| image:: https://github.com/dask/partd/workflows/CI/badge.svg | |
142 | :target: https://github.com/dask/partd/actions?query=workflow%3ACI | |
143 | .. |Version Status| image:: https://img.shields.io/pypi/v/partd.svg | |
144 | :target: https://pypi.python.org/pypi/partd/ |
0 | from __future__ import absolute_import | |
0 | from contextlib import suppress | |
1 | 1 | |
2 | 2 | from .file import File |
3 | 3 | from .dict import Dict |
6 | 6 | from .pickle import Pickle |
7 | 7 | from .python import Python |
8 | 8 | from .compressed import * |
9 | from .utils import ignoring | |
10 | with ignoring(ImportError): | |
9 | with suppress(ImportError): | |
11 | 10 | from .numpy import Numpy |
12 | with ignoring(ImportError): | |
11 | with suppress(ImportError): | |
13 | 12 | from .pandas import PandasColumns, PandasBlocks |
14 | with ignoring(ImportError): | |
13 | with suppress(ImportError): | |
15 | 14 | from .zmq import Client, Server |
16 | 15 | |
17 | 16 | from ._version import get_versions |
7 | 7 | |
8 | 8 | version_json = ''' |
9 | 9 | { |
10 | "date": "2021-04-08T12:31:19-0500", | |
10 | "date": "2022-08-11T18:17:31-0500", | |
11 | 11 | "dirty": false, |
12 | 12 | "error": null, |
13 | "full-revisionid": "9c9ba0a3a91b6b1eeb560615114a1df81fc427c1", | |
14 | "version": "1.2.0" | |
13 | "full-revisionid": "d1faa885e2a7789ff3599b49be31b4c8cf48ba4d", | |
14 | "version": "1.3.0" | |
15 | 15 | } |
16 | 16 | ''' # END VERSION_JSON |
17 | 17 |
3 | 3 | from operator import add |
4 | 4 | from bisect import bisect |
5 | 5 | from collections import defaultdict |
6 | from .compatibility import Queue, Empty | |
6 | from queue import Queue, Empty | |
7 | 7 | |
8 | 8 | |
9 | 9 | def zero(): |
0 | from __future__ import absolute_import | |
1 | ||
2 | import sys | |
3 | ||
4 | if sys.version_info[0] == 3: | |
5 | from io import StringIO | |
6 | unicode = str | |
7 | import pickle | |
8 | from queue import Queue, Empty | |
9 | if sys.version_info[0] == 2: | |
10 | from StringIO import StringIO | |
11 | unicode = unicode | |
12 | import cPickle as pickle | |
13 | from Queue import Queue, Empty | |
14 |
0 | from .utils import ignoring | |
0 | from contextlib import suppress | |
1 | from functools import partial | |
2 | ||
1 | 3 | from .encode import Encode |
2 | from functools import partial | |
3 | 4 | |
4 | 5 | __all__ = [] |
5 | 6 | |
8 | 9 | return b''.join(L) |
9 | 10 | |
10 | 11 | |
11 | with ignoring(ImportError, AttributeError): | |
12 | with suppress(ImportError, AttributeError): | |
12 | 13 | # In case snappy is not installed, or another package called snappy that does not implement compress / decompress. |
13 | 14 | # For example, SnapPy (https://pypi.org/project/snappy/) |
14 | 15 | import snappy |
19 | 20 | __all__.append('Snappy') |
20 | 21 | |
21 | 22 | |
22 | with ignoring(ImportError): | |
23 | with suppress(ImportError): | |
23 | 24 | import zlib |
24 | 25 | ZLib = partial(Encode, |
25 | 26 | zlib.compress, |
28 | 29 | __all__.append('ZLib') |
29 | 30 | |
30 | 31 | |
31 | with ignoring(ImportError): | |
32 | with suppress(ImportError): | |
32 | 33 | import bz2 |
33 | 34 | BZ2 = partial(Encode, |
34 | 35 | bz2.compress, |
37 | 38 | __all__.append('BZ2') |
38 | 39 | |
39 | 40 | |
40 | with ignoring(ImportError): | |
41 | with suppress(ImportError): | |
41 | 42 | import blosc |
42 | 43 | Blosc = partial(Encode, |
43 | 44 | blosc.compress, |
0 | from __future__ import absolute_import | |
1 | ||
2 | 0 | import os |
3 | 1 | import shutil |
4 | 2 | import locket |
43 | 41 | return str(key) |
44 | 42 | |
45 | 43 | |
46 | class Interface(object): | |
44 | class Interface: | |
47 | 45 | def __init__(self): |
48 | 46 | self._iset_seen = set() |
49 | 47 |
0 | from __future__ import absolute_import | |
1 | ||
2 | 0 | import atexit |
1 | from contextlib import suppress | |
3 | 2 | import os |
4 | 3 | import shutil |
5 | 4 | import string |
7 | 6 | |
8 | 7 | from .core import Interface |
9 | 8 | import locket |
10 | from .utils import ignoring | |
11 | 9 | |
12 | 10 | |
13 | 11 | class File(Interface): |
20 | 18 | self._explicitly_given_path = True |
21 | 19 | self.path = path |
22 | 20 | if not os.path.exists(path): |
23 | with ignoring(OSError): | |
21 | with suppress(OSError): | |
24 | 22 | os.makedirs(path) |
25 | 23 | self.lock = locket.lock_file(self.filename('.lock')) |
26 | 24 | Interface.__init__(self) |
56 | 54 | try: |
57 | 55 | with open(self.filename(key), 'rb') as f: |
58 | 56 | result.append(f.read()) |
59 | except IOError: | |
57 | except OSError: | |
60 | 58 | result.append(b'') |
61 | 59 | finally: |
62 | 60 | if lock: |
3 | 3 | Alongside each array x we ensure the value x.dtype which stores the string |
4 | 4 | description of the array's dtype. |
5 | 5 | """ |
6 | from __future__ import absolute_import | |
6 | from contextlib import suppress | |
7 | import pickle | |
8 | ||
7 | 9 | import numpy as np |
8 | 10 | from toolz import valmap, identity, partial |
9 | from .compatibility import pickle | |
10 | 11 | from .core import Interface |
11 | 12 | from .file import File |
12 | from .utils import frame, framesplit, suffix, ignoring | |
13 | from .utils import frame, framesplit, suffix | |
13 | 14 | |
14 | 15 | |
15 | 16 | def serialize_dtype(dt): |
93 | 94 | def serialize(x): |
94 | 95 | if x.dtype == 'O': |
95 | 96 | l = x.flatten().tolist() |
96 | with ignoring(Exception): # Try msgpack (faster on strings) | |
97 | with suppress(Exception): # Try msgpack (faster on strings) | |
97 | 98 | return frame(msgpack.packb(l, use_bin_type=True)) |
98 | 99 | return frame(pickle.dumps(l, protocol=pickle.HIGHEST_PROTOCOL)) |
99 | 100 | else: |
131 | 132 | compress_bytes = lambda bytes, itemsize: bytes |
132 | 133 | decompress_bytes = identity |
133 | 134 | |
134 | with ignoring(ImportError): | |
135 | with suppress(ImportError): | |
135 | 136 | import blosc |
136 | 137 | blosc.set_nthreads(1) |
137 | 138 | |
141 | 142 | compress_text = partial(blosc.compress, typesize=1) |
142 | 143 | decompress_text = blosc.decompress |
143 | 144 | |
144 | with ignoring(ImportError): | |
145 | with suppress(ImportError): | |
145 | 146 | from snappy import compress as compress_text |
146 | 147 | from snappy import decompress as decompress_text |
147 | 148 |
0 | from __future__ import absolute_import | |
1 | ||
2 | 0 | from functools import partial |
1 | import pickle | |
3 | 2 | |
4 | 3 | import numpy as np |
5 | 4 | import pandas as pd |
7 | 6 | |
8 | 7 | from . import numpy as pnp |
9 | 8 | from .core import Interface |
10 | from .compatibility import pickle | |
11 | 9 | from .encode import Encode |
12 | 10 | from .utils import extend, framesplit, frame |
13 | 11 | |
46 | 44 | |
47 | 45 | # TODO: don't use values, it does some work. Look at _blocks instead |
48 | 46 | # pframe/cframe do this well |
49 | arrays = dict((extend(k, col), df[col].values) | |
47 | arrays = {extend(k, col): df[col].values | |
50 | 48 | for k, df in data.items() |
51 | for col in df.columns) | |
52 | arrays.update(dict((extend(k, '.index'), df.index.values) | |
53 | for k, df in data.items())) | |
49 | for col in df.columns} | |
50 | arrays.update({extend(k, '.index'): df.index.values | |
51 | for k, df in data.items()}) | |
54 | 52 | # TODO: handle categoricals |
55 | 53 | self.partd.append(arrays, **kwargs) |
56 | 54 | |
109 | 107 | cat = None |
110 | 108 | values = ind.values |
111 | 109 | |
112 | header = (type(ind), ind._get_attributes_dict(), values.dtype, cat) | |
110 | header = (type(ind), {k: getattr(ind, k, None) for k in ind._attributes}, values.dtype, cat) | |
113 | 111 | bytes = pnp.compress(pnp.serialize(values), values.dtype) |
114 | 112 | return header, bytes |
115 | 113 |
0 | 0 | """ |
1 | 1 | get/put functions that consume/produce Python lists using Pickle to serialize |
2 | 2 | """ |
3 | from __future__ import absolute_import | |
4 | from .compatibility import pickle | |
5 | ||
3 | import pickle | |
6 | 4 | |
7 | 5 | from .encode import Encode |
8 | 6 | from functools import partial |
3 | 3 | |
4 | 4 | First we try msgpack (it's faster). If that fails then we default to pickle. |
5 | 5 | """ |
6 | from __future__ import absolute_import | |
7 | from .compatibility import pickle | |
6 | import pickle | |
8 | 7 | |
9 | 8 | try: |
10 | 9 | from pandas import msgpack |
0 | from __future__ import absolute_import | |
1 | ||
2 | 0 | import pytest |
3 | 1 | np = pytest.importorskip('numpy') # noqa |
4 | 2 |
0 | from __future__ import absolute_import | |
1 | ||
2 | 0 | import pytest |
3 | 1 | pytest.importorskip('pandas') # noqa |
4 | 2 | |
5 | 3 | import numpy as np |
6 | 4 | import pandas as pd |
7 | import pandas.util.testing as tm | |
5 | import pandas.testing as tm | |
8 | 6 | import os |
9 | 7 | |
10 | 8 | from partd.pandas import PandasColumns, PandasBlocks, serialize, deserialize |
53 | 53 | held_append.start() |
54 | 54 | |
55 | 55 | sleep(0.1) |
56 | assert held_append.isAlive() # held! | |
56 | assert held_append.is_alive() # held! | |
57 | 57 | |
58 | 58 | assert not s._frozen_sockets.empty() |
59 | 59 | |
63 | 63 | free_frozen_sockets_thread.start() |
64 | 64 | |
65 | 65 | sleep(0.2) |
66 | assert not held_append.isAlive() | |
66 | assert not held_append.is_alive() | |
67 | 67 | assert s._frozen_sockets.empty() |
68 | 68 | finally: |
69 | 69 | s.close() |
73 | 73 | yield bytes[i: i+n] |
74 | 74 | |
75 | 75 | |
76 | @contextmanager | |
77 | def ignoring(*exc): | |
78 | try: | |
79 | yield | |
80 | except exc: | |
81 | pass | |
82 | ||
83 | ||
84 | @contextmanager | |
85 | def do_nothing(*args, **kwargs): | |
86 | yield | |
87 | ||
88 | ||
89 | 76 | def nested_get(ind, coll, lazy=False): |
90 | 77 | """ Get nested index from collection |
91 | 78 | |
128 | 115 | """ |
129 | 116 | for item in seq: |
130 | 117 | if isinstance(item, list): |
131 | for item2 in flatten(item): | |
132 | yield item2 | |
118 | yield from flatten(item) | |
133 | 119 | else: |
134 | 120 | yield item |
135 | 121 |
0 | from __future__ import absolute_import, print_function | |
1 | ||
2 | 0 | import zmq |
3 | 1 | import logging |
4 | 2 | from itertools import chain |
9 | 7 | from toolz import accumulate, topk, pluck, merge, keymap |
10 | 8 | import uuid |
11 | 9 | from collections import defaultdict |
12 | from contextlib import contextmanager | |
10 | from contextlib import contextmanager, suppress | |
13 | 11 | from threading import Thread, Lock |
14 | 12 | from datetime import datetime |
15 | 13 | from multiprocessing import Process |
19 | 17 | from .file import File |
20 | 18 | from .buffer import Buffer |
21 | 19 | from . import core |
22 | from .compatibility import Queue, Empty, unicode | |
23 | from .utils import ignoring | |
24 | 20 | |
25 | 21 | |
26 | 22 | tuple_sep = b'-|-' |
37 | 33 | raise |
38 | 34 | |
39 | 35 | |
40 | class Server(object): | |
36 | class Server: | |
41 | 37 | def __init__(self, partd=None, bind=None, start=True, block=False, |
42 | 38 | hostname=None): |
43 | 39 | self.context = zmq.Context() |
49 | 45 | |
50 | 46 | if hostname is None: |
51 | 47 | hostname = socket.gethostname() |
52 | if isinstance(bind, unicode): | |
48 | if isinstance(bind, str): | |
53 | 49 | bind = bind.encode() |
54 | 50 | if bind is None: |
55 | 51 | port = self.socket.bind_to_random_port('tcp://*') |
172 | 168 | logger.debug('Server closes') |
173 | 169 | self.status = 'closed' |
174 | 170 | self.block() |
175 | with ignoring(zmq.error.ZMQError): | |
171 | with suppress(zmq.error.ZMQError): | |
176 | 172 | self.socket.close(1) |
177 | with ignoring(zmq.error.ZMQError): | |
173 | with suppress(zmq.error.ZMQError): | |
178 | 174 | self.context.destroy(3) |
179 | 175 | self.partd.lock.release() |
180 | 176 | |
304 | 300 | |
305 | 301 | def close(self): |
306 | 302 | if hasattr(self, 'server_process'): |
307 | with ignoring(zmq.error.ZMQError): | |
303 | with suppress(zmq.error.ZMQError): | |
308 | 304 | self.close_server() |
309 | 305 | self.server_process.join() |
310 | with ignoring(zmq.error.ZMQError): | |
306 | with suppress(zmq.error.ZMQError): | |
311 | 307 | self.socket.close(1) |
312 | with ignoring(zmq.error.ZMQError): | |
308 | with suppress(zmq.error.ZMQError): | |
313 | 309 | self.context.destroy(1) |
314 | 310 | |
315 | 311 | def __exit__(self, type, value, traceback): |
320 | 316 | self.close() |
321 | 317 | |
322 | 318 | |
323 | class NotALock(object): | |
319 | class NotALock: | |
324 | 320 | def acquire(self): pass |
325 | 321 | def release(self): pass |
326 | 322 |
0 | 0 | Metadata-Version: 2.1 |
1 | 1 | Name: partd |
2 | Version: 1.2.0 | |
2 | Version: 1.3.0 | |
3 | 3 | Summary: Appendable key-value storage |
4 | 4 | Home-page: http://github.com/dask/partd/ |
5 | 5 | Maintainer: Matthew Rocklin |
6 | 6 | Maintainer-email: mrocklin@gmail.com |
7 | 7 | License: BSD |
8 | Description: PartD | |
9 | ===== | |
10 | ||
11 | |Build Status| |Version Status| | |
12 | ||
13 | Key-value byte store with appendable values | |
14 | ||
15 | Partd stores key-value pairs. | |
16 | Values are raw bytes. | |
17 | We append on old values. | |
18 | ||
19 | Partd excels at shuffling operations. | |
20 | ||
21 | Operations | |
22 | ---------- | |
23 | ||
24 | PartD has two main operations, ``append`` and ``get``. | |
25 | ||
26 | ||
27 | Example | |
28 | ------- | |
29 | ||
30 | 1. Create a Partd backed by a directory:: | |
31 | ||
32 | >>> import partd | |
33 | >>> p = partd.File('/path/to/new/dataset/') | |
34 | ||
35 | 2. Append key-byte pairs to dataset:: | |
36 | ||
37 | >>> p.append({'x': b'Hello ', 'y': b'123'}) | |
38 | >>> p.append({'x': b'world!', 'y': b'456'}) | |
39 | ||
40 | 3. Get bytes associated to keys:: | |
41 | ||
42 | >>> p.get('x') # One key | |
43 | b'Hello world!' | |
44 | ||
45 | >>> p.get(['y', 'x']) # List of keys | |
46 | [b'123456', b'Hello world!'] | |
47 | ||
48 | 4. Destroy partd dataset:: | |
49 | ||
50 | >>> p.drop() | |
51 | ||
52 | That's it. | |
53 | ||
54 | ||
55 | Implementations | |
56 | --------------- | |
57 | ||
58 | We can back a partd by an in-memory dictionary:: | |
59 | ||
60 | >>> p = Dict() | |
61 | ||
62 | For larger amounts of data or to share data between processes we back a partd | |
63 | by a directory of files. This uses file-based locks for consistency.:: | |
64 | ||
65 | >>> p = File('/path/to/dataset/') | |
66 | ||
67 | However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low:: | |
68 | ||
69 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
70 | ||
71 | You might also want to have many distributed process write to a single partd | |
72 | consistently. This can be done with a server | |
73 | ||
74 | * Server Process:: | |
75 | ||
76 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
77 | >>> s = Server(p, address='ipc://server') | |
78 | ||
79 | * Worker processes:: | |
80 | ||
81 | >>> p = Client('ipc://server') # Client machine talks to remote server | |
82 | ||
83 | ||
84 | Encodings and Compression | |
85 | ------------------------- | |
86 | ||
87 | Once we can robustly and efficiently append bytes to a partd we consider | |
88 | compression and encodings. This is generally available with the ``Encode`` | |
89 | partd, which accepts three functions, one to apply on bytes as they are | |
90 | written, one to apply to bytes as they are read, and one to join bytestreams. | |
91 | Common configurations already exist for common data and compression formats. | |
92 | ||
93 | We may wish to compress and decompress data transparently as we interact with a | |
94 | partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take | |
95 | another partd as an argument.:: | |
96 | ||
97 | >>> p = File(...) | |
98 | >>> p = ZLib(p) | |
99 | ||
100 | These work exactly as before, the (de)compression happens automatically. | |
101 | ||
102 | Common data formats like Python lists, numpy arrays, and pandas | |
103 | dataframes are also supported out of the box.:: | |
104 | ||
105 | >>> p = File(...) | |
106 | >>> p = NumPy(p) | |
107 | >>> p.append({'x': np.array([...])}) | |
108 | ||
109 | This lets us forget about bytes and think instead in our normal data types. | |
110 | ||
111 | Composition | |
112 | ----------- | |
113 | ||
114 | In principle we want to compose all of these choices together | |
115 | ||
116 | 1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client`` | |
117 | 2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ... | |
118 | 3. Compression: ``Blosc``, ``Snappy``, ... | |
119 | ||
120 | Partd objects compose by nesting. Here we make a partd that writes pickle | |
121 | encoded BZ2 compressed bytes directly to disk:: | |
122 | ||
123 | >>> p = Pickle(BZ2(File('foo'))) | |
124 | ||
125 | We could construct more complex systems that include compression, | |
126 | serialization, buffering, and remote access.:: | |
127 | ||
128 | >>> server = Server(Buffer(Dict(), File(), available_memory=2e0)) | |
129 | ||
130 | >>> client = Pickle(Snappy(Client(server.address))) | |
131 | >>> client.append({'x': [1, 2, 3]}) | |
132 | ||
133 | .. |Build Status| image:: https://github.com/dask/partd/workflows/CI/badge.svg | |
134 | :target: https://github.com/dask/partd/actions?query=workflow%3ACI | |
135 | .. |Version Status| image:: https://img.shields.io/pypi/v/partd.svg | |
136 | :target: https://pypi.python.org/pypi/partd/ | |
137 | ||
138 | Platform: UNKNOWN | |
139 | 8 | Classifier: Programming Language :: Python :: 3 |
140 | Classifier: Programming Language :: Python :: 3.5 | |
141 | Classifier: Programming Language :: Python :: 3.6 | |
142 | 9 | Classifier: Programming Language :: Python :: 3.7 |
143 | 10 | Classifier: Programming Language :: Python :: 3.8 |
144 | Requires-Python: >=3.5 | |
11 | Classifier: Programming Language :: Python :: 3.9 | |
12 | Requires-Python: >=3.7 | |
145 | 13 | Provides-Extra: complete |
14 | License-File: LICENSE.txt | |
15 | ||
16 | PartD | |
17 | ===== | |
18 | ||
19 | |Build Status| |Version Status| | |
20 | ||
21 | Key-value byte store with appendable values | |
22 | ||
23 | Partd stores key-value pairs. | |
24 | Values are raw bytes. | |
25 | We append on old values. | |
26 | ||
27 | Partd excels at shuffling operations. | |
28 | ||
29 | Operations | |
30 | ---------- | |
31 | ||
32 | PartD has two main operations, ``append`` and ``get``. | |
33 | ||
34 | ||
35 | Example | |
36 | ------- | |
37 | ||
38 | 1. Create a Partd backed by a directory:: | |
39 | ||
40 | >>> import partd | |
41 | >>> p = partd.File('/path/to/new/dataset/') | |
42 | ||
43 | 2. Append key-byte pairs to dataset:: | |
44 | ||
45 | >>> p.append({'x': b'Hello ', 'y': b'123'}) | |
46 | >>> p.append({'x': b'world!', 'y': b'456'}) | |
47 | ||
48 | 3. Get bytes associated to keys:: | |
49 | ||
50 | >>> p.get('x') # One key | |
51 | b'Hello world!' | |
52 | ||
53 | >>> p.get(['y', 'x']) # List of keys | |
54 | [b'123456', b'Hello world!'] | |
55 | ||
56 | 4. Destroy partd dataset:: | |
57 | ||
58 | >>> p.drop() | |
59 | ||
60 | That's it. | |
61 | ||
62 | ||
63 | Implementations | |
64 | --------------- | |
65 | ||
66 | We can back a partd by an in-memory dictionary:: | |
67 | ||
68 | >>> p = Dict() | |
69 | ||
70 | For larger amounts of data or to share data between processes we back a partd | |
71 | by a directory of files. This uses file-based locks for consistency.:: | |
72 | ||
73 | >>> p = File('/path/to/dataset/') | |
74 | ||
75 | However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low:: | |
76 | ||
77 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
78 | ||
79 | You might also want to have many distributed process write to a single partd | |
80 | consistently. This can be done with a server | |
81 | ||
82 | * Server Process:: | |
83 | ||
84 | >>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer | |
85 | >>> s = Server(p, address='ipc://server') | |
86 | ||
87 | * Worker processes:: | |
88 | ||
89 | >>> p = Client('ipc://server') # Client machine talks to remote server | |
90 | ||
91 | ||
92 | Encodings and Compression | |
93 | ------------------------- | |
94 | ||
95 | Once we can robustly and efficiently append bytes to a partd we consider | |
96 | compression and encodings. This is generally available with the ``Encode`` | |
97 | partd, which accepts three functions, one to apply on bytes as they are | |
98 | written, one to apply to bytes as they are read, and one to join bytestreams. | |
99 | Common configurations already exist for common data and compression formats. | |
100 | ||
101 | We may wish to compress and decompress data transparently as we interact with a | |
102 | partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take | |
103 | another partd as an argument.:: | |
104 | ||
105 | >>> p = File(...) | |
106 | >>> p = ZLib(p) | |
107 | ||
108 | These work exactly as before, the (de)compression happens automatically. | |
109 | ||
110 | Common data formats like Python lists, numpy arrays, and pandas | |
111 | dataframes are also supported out of the box.:: | |
112 | ||
113 | >>> p = File(...) | |
114 | >>> p = NumPy(p) | |
115 | >>> p.append({'x': np.array([...])}) | |
116 | ||
117 | This lets us forget about bytes and think instead in our normal data types. | |
118 | ||
119 | Composition | |
120 | ----------- | |
121 | ||
122 | In principle we want to compose all of these choices together | |
123 | ||
124 | 1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client`` | |
125 | 2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ... | |
126 | 3. Compression: ``Blosc``, ``Snappy``, ... | |
127 | ||
128 | Partd objects compose by nesting. Here we make a partd that writes pickle | |
129 | encoded BZ2 compressed bytes directly to disk:: | |
130 | ||
131 | >>> p = Pickle(BZ2(File('foo'))) | |
132 | ||
133 | We could construct more complex systems that include compression, | |
134 | serialization, buffering, and remote access.:: | |
135 | ||
136 | >>> server = Server(Buffer(Dict(), File(), available_memory=2e0)) | |
137 | ||
138 | >>> client = Pickle(Snappy(Client(server.address))) | |
139 | >>> client.append({'x': [1, 2, 3]}) | |
140 | ||
141 | .. |Build Status| image:: https://github.com/dask/partd/workflows/CI/badge.svg | |
142 | :target: https://github.com/dask/partd/actions?query=workflow%3ACI | |
143 | .. |Version Status| image:: https://img.shields.io/pypi/v/partd.svg | |
144 | :target: https://pypi.python.org/pypi/partd/ |
7 | 7 | partd/__init__.py |
8 | 8 | partd/_version.py |
9 | 9 | partd/buffer.py |
10 | partd/compatibility.py | |
11 | 10 | partd/compressed.py |
12 | 11 | partd/core.py |
13 | 12 | partd/dict.py |
0 | 0 | [versioneer] |
1 | vcs = git | |
1 | VCS = git | |
2 | 2 | style = pep440 |
3 | 3 | versionfile_source = partd/_version.py |
4 | 4 | versionfile_build = partd/_version.py |
14 | 14 | keywords='', |
15 | 15 | packages=['partd'], |
16 | 16 | install_requires=list(open('requirements.txt').read().strip().split('\n')), |
17 | python_requires=">=3.5", | |
17 | python_requires=">=3.7", | |
18 | 18 | classifiers=[ |
19 | 19 | "Programming Language :: Python :: 3", |
20 | "Programming Language :: Python :: 3.5", | |
21 | "Programming Language :: Python :: 3.6", | |
22 | 20 | "Programming Language :: Python :: 3.7", |
23 | 21 | "Programming Language :: Python :: 3.8", |
22 | "Programming Language :: Python :: 3.9", | |
24 | 23 | ], |
25 | 24 | long_description=(open('README.rst').read() if exists('README.rst') |
26 | 25 | else ''), |