Commit HEAD - partd - Codebase list

+134

-135

PKG-INFO less more

0	0	Metadata-Version: 2.1
1	1	Name: partd
2		Version: 1.2.0
	2	Version: 1.3.0
3	3	Summary: Appendable key-value storage
4	4	Home-page: http://github.com/dask/partd/
5	5	Maintainer: Matthew Rocklin
6	6	Maintainer-email: mrocklin@gmail.com
7	7	License: BSD
8		Description: PartD
9		=====
10
11		\|Build Status\| \|Version Status\|
12
13		Key-value byte store with appendable values
14
15		Partd stores key-value pairs.
16		Values are raw bytes.
17		We append on old values.
18
19		Partd excels at shuffling operations.
20
21		Operations
22		----------
23
24		PartD has two main operations, ``append`` and ``get``.
25
26
27		Example
28		-------
29
30		1. Create a Partd backed by a directory::
31
32		>>> import partd
33		>>> p = partd.File('/path/to/new/dataset/')
34
35		2. Append key-byte pairs to dataset::
36
37		>>> p.append({'x': b'Hello ', 'y': b'123'})
38		>>> p.append({'x': b'world!', 'y': b'456'})
39
40		3. Get bytes associated to keys::
41
42		>>> p.get('x') # One key
43		b'Hello world!'
44
45		>>> p.get(['y', 'x']) # List of keys
46		[b'123456', b'Hello world!']
47
48		4. Destroy partd dataset::
49
50		>>> p.drop()
51
52		That's it.
53
54
55		Implementations
56		---------------
57
58		We can back a partd by an in-memory dictionary::
59
60		>>> p = Dict()
61
62		For larger amounts of data or to share data between processes we back a partd
63		by a directory of files. This uses file-based locks for consistency.::
64
65		>>> p = File('/path/to/dataset/')
66
67		However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low::
68
69		>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
70
71		You might also want to have many distributed process write to a single partd
72		consistently. This can be done with a server
73
74		* Server Process::
75
76		>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
77		>>> s = Server(p, address='ipc://server')
78
79		* Worker processes::
80
81		>>> p = Client('ipc://server') # Client machine talks to remote server
82
83
84		Encodings and Compression
85		-------------------------
86
87		Once we can robustly and efficiently append bytes to a partd we consider
88		compression and encodings. This is generally available with the ``Encode``
89		partd, which accepts three functions, one to apply on bytes as they are
90		written, one to apply to bytes as they are read, and one to join bytestreams.
91		Common configurations already exist for common data and compression formats.
92
93		We may wish to compress and decompress data transparently as we interact with a
94		partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take
95		another partd as an argument.::
96
97		>>> p = File(...)
98		>>> p = ZLib(p)
99
100		These work exactly as before, the (de)compression happens automatically.
101
102		Common data formats like Python lists, numpy arrays, and pandas
103		dataframes are also supported out of the box.::
104
105		>>> p = File(...)
106		>>> p = NumPy(p)
107		>>> p.append({'x': np.array([...])})
108
109		This lets us forget about bytes and think instead in our normal data types.
110
111		Composition
112		-----------
113
114		In principle we want to compose all of these choices together
115
116		1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client``
117		2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ...
118		3. Compression: ``Blosc``, ``Snappy``, ...
119
120		Partd objects compose by nesting. Here we make a partd that writes pickle
121		encoded BZ2 compressed bytes directly to disk::
122
123		>>> p = Pickle(BZ2(File('foo')))
124
125		We could construct more complex systems that include compression,
126		serialization, buffering, and remote access.::
127
128		>>> server = Server(Buffer(Dict(), File(), available_memory=2e0))
129
130		>>> client = Pickle(Snappy(Client(server.address)))
131		>>> client.append({'x': [1, 2, 3]})
132
133		.. \|Build Status\| image:: https://github.com/dask/partd/workflows/CI/badge.svg
134		:target: https://github.com/dask/partd/actions?query=workflow%3ACI
135		.. \|Version Status\| image:: https://img.shields.io/pypi/v/partd.svg
136		:target: https://pypi.python.org/pypi/partd/
137
138		Platform: UNKNOWN
139	8	Classifier: Programming Language :: Python :: 3
140		Classifier: Programming Language :: Python :: 3.5
141		Classifier: Programming Language :: Python :: 3.6
142	9	Classifier: Programming Language :: Python :: 3.7
143	10	Classifier: Programming Language :: Python :: 3.8
144		Requires-Python: >=3.5
	11	Classifier: Programming Language :: Python :: 3.9
	12	Requires-Python: >=3.7
145	13	Provides-Extra: complete
	14	License-File: LICENSE.txt
	15
	16	PartD
	17	=====
	18
	19	\|Build Status\| \|Version Status\|
	20
	21	Key-value byte store with appendable values
	22
	23	Partd stores key-value pairs.
	24	Values are raw bytes.
	25	We append on old values.
	26
	27	Partd excels at shuffling operations.
	28
	29	Operations
	30	----------
	31
	32	PartD has two main operations, ``append`` and ``get``.
	33
	34
	35	Example
	36	-------
	37
	38	1. Create a Partd backed by a directory::
	39
	40	>>> import partd
	41	>>> p = partd.File('/path/to/new/dataset/')
	42
	43	2. Append key-byte pairs to dataset::
	44
	45	>>> p.append({'x': b'Hello ', 'y': b'123'})
	46	>>> p.append({'x': b'world!', 'y': b'456'})
	47
	48	3. Get bytes associated to keys::
	49
	50	>>> p.get('x') # One key
	51	b'Hello world!'
	52
	53	>>> p.get(['y', 'x']) # List of keys
	54	[b'123456', b'Hello world!']
	55
	56	4. Destroy partd dataset::
	57
	58	>>> p.drop()
	59
	60	That's it.
	61
	62
	63	Implementations
	64	---------------
	65
	66	We can back a partd by an in-memory dictionary::
	67
	68	>>> p = Dict()
	69
	70	For larger amounts of data or to share data between processes we back a partd
	71	by a directory of files. This uses file-based locks for consistency.::
	72
	73	>>> p = File('/path/to/dataset/')
	74
	75	However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low::
	76
	77	>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
	78
	79	You might also want to have many distributed process write to a single partd
	80	consistently. This can be done with a server
	81
	82	* Server Process::
	83
	84	>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
	85	>>> s = Server(p, address='ipc://server')
	86
	87	* Worker processes::
	88
	89	>>> p = Client('ipc://server') # Client machine talks to remote server
	90
	91
	92	Encodings and Compression
	93	-------------------------
	94
	95	Once we can robustly and efficiently append bytes to a partd we consider
	96	compression and encodings. This is generally available with the ``Encode``
	97	partd, which accepts three functions, one to apply on bytes as they are
	98	written, one to apply to bytes as they are read, and one to join bytestreams.
	99	Common configurations already exist for common data and compression formats.
	100
	101	We may wish to compress and decompress data transparently as we interact with a
	102	partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take
	103	another partd as an argument.::
	104
	105	>>> p = File(...)
	106	>>> p = ZLib(p)
	107
	108	These work exactly as before, the (de)compression happens automatically.
	109
	110	Common data formats like Python lists, numpy arrays, and pandas
	111	dataframes are also supported out of the box.::
	112
	113	>>> p = File(...)
	114	>>> p = NumPy(p)
	115	>>> p.append({'x': np.array([...])})
	116
	117	This lets us forget about bytes and think instead in our normal data types.
	118
	119	Composition
	120	-----------
	121
	122	In principle we want to compose all of these choices together
	123
	124	1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client``
	125	2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ...
	126	3. Compression: ``Blosc``, ``Snappy``, ...
	127
	128	Partd objects compose by nesting. Here we make a partd that writes pickle
	129	encoded BZ2 compressed bytes directly to disk::
	130
	131	>>> p = Pickle(BZ2(File('foo')))
	132
	133	We could construct more complex systems that include compression,
	134	serialization, buffering, and remote access.::
	135
	136	>>> server = Server(Buffer(Dict(), File(), available_memory=2e0))
	137
	138	>>> client = Pickle(Snappy(Client(server.address)))
	139	>>> client.append({'x': [1, 2, 3]})
	140
	141	.. \|Build Status\| image:: https://github.com/dask/partd/workflows/CI/badge.svg
	142	:target: https://github.com/dask/partd/actions?query=workflow%3ACI
	143	.. \|Version Status\| image:: https://img.shields.io/pypi/v/partd.svg
	144	:target: https://pypi.python.org/pypi/partd/

+4

-5

partd/__init__.py less more

0		from __future__ import absolute_import
	0	from contextlib import suppress
1	1
2	2	from .file import File
3	3	from .dict import Dict

6	6	from .pickle import Pickle
7	7	from .python import Python
8	8	from .compressed import *
9		from .utils import ignoring
10		with ignoring(ImportError):
	9	with suppress(ImportError):
11	10	from .numpy import Numpy
12		with ignoring(ImportError):
	11	with suppress(ImportError):
13	12	from .pandas import PandasColumns, PandasBlocks
14		with ignoring(ImportError):
	13	with suppress(ImportError):
15	14	from .zmq import Client, Server
16	15
17	16	from ._version import get_versions

+3

-3

partd/_version.py less more

7	7
8	8	version_json = '''
9	9	{
10		"date": "2021-04-08T12:31:19-0500",
	10	"date": "2022-08-11T18:17:31-0500",
11	11	"dirty": false,
12	12	"error": null,
13		"full-revisionid": "9c9ba0a3a91b6b1eeb560615114a1df81fc427c1",
14		"version": "1.2.0"
	13	"full-revisionid": "d1faa885e2a7789ff3599b49be31b4c8cf48ba4d",
	14	"version": "1.3.0"
15	15	}
16	16	''' # END VERSION_JSON
17	17

+1

-1

partd/buffer.py less more

3	3	from operator import add
4	4	from bisect import bisect
5	5	from collections import defaultdict
6		from .compatibility import Queue, Empty
	6	from queue import Queue, Empty
7	7
8	8
9	9	def zero():

+0

-15

~~partd/compatibility.py~~ less more

0		from __future__ import absolute_import
1
2		import sys
3
4		if sys.version_info[0] == 3:
5		from io import StringIO
6		unicode = str
7		import pickle
8		from queue import Queue, Empty
9		if sys.version_info[0] == 2:
10		from StringIO import StringIO
11		unicode = unicode
12		import cPickle as pickle
13		from Queue import Queue, Empty
14

+7

-6

partd/compressed.py less more

0		from .utils import ignoring
	0	from contextlib import suppress
	1	from functools import partial
	2
1	3	from .encode import Encode
2		from functools import partial
3	4
4	5	__all__ = []
5	6

8	9	return b''.join(L)
9	10
10	11
11		with ignoring(ImportError, AttributeError):
	12	with suppress(ImportError, AttributeError):
12	13	# In case snappy is not installed, or another package called snappy that does not implement compress / decompress.
13	14	# For example, SnapPy (https://pypi.org/project/snappy/)
14	15	import snappy

19	20	__all__.append('Snappy')
20	21
21	22
22		with ignoring(ImportError):
	23	with suppress(ImportError):
23	24	import zlib
24	25	ZLib = partial(Encode,
25	26	zlib.compress,

28	29	__all__.append('ZLib')
29	30
30	31
31		with ignoring(ImportError):
	32	with suppress(ImportError):
32	33	import bz2
33	34	BZ2 = partial(Encode,
34	35	bz2.compress,

37	38	__all__.append('BZ2')
38	39
39	40
40		with ignoring(ImportError):
	41	with suppress(ImportError):
41	42	import blosc
42	43	Blosc = partial(Encode,
43	44	blosc.compress,

+1

-3

partd/core.py less more

0		from __future__ import absolute_import
1
2	0	import os
3	1	import shutil
4	2	import locket

43	41	return str(key)
44	42
45	43
46		class Interface(object):
	44	class Interface:
47	45	def __init__(self):
48	46	self._iset_seen = set()
49	47

+3

-5

partd/file.py less more

0		from __future__ import absolute_import
1
2	0	import atexit
	1	from contextlib import suppress
3	2	import os
4	3	import shutil
5	4	import string

7	6
8	7	from .core import Interface
9	8	import locket
10		from .utils import ignoring
11	9
12	10
13	11	class File(Interface):

20	18	self._explicitly_given_path = True
21	19	self.path = path
22	20	if not os.path.exists(path):
23		with ignoring(OSError):
	21	with suppress(OSError):
24	22	os.makedirs(path)
25	23	self.lock = locket.lock_file(self.filename('.lock'))
26	24	Interface.__init__(self)

56	54	try:
57	55	with open(self.filename(key), 'rb') as f:
58	56	result.append(f.read())
59		except IOError:
	57	except OSError:
60	58	result.append(b'')
61	59	finally:
62	60	if lock:

+7

-6

partd/numpy.py less more

3	3	Alongside each array x we ensure the value x.dtype which stores the string
4	4	description of the array's dtype.
5	5	"""
6		from __future__ import absolute_import
	6	from contextlib import suppress
	7	import pickle
	8
7	9	import numpy as np
8	10	from toolz import valmap, identity, partial
9		from .compatibility import pickle
10	11	from .core import Interface
11	12	from .file import File
12		from .utils import frame, framesplit, suffix, ignoring
	13	from .utils import frame, framesplit, suffix
13	14
14	15
15	16	def serialize_dtype(dt):

93	94	def serialize(x):
94	95	if x.dtype == 'O':
95	96	l = x.flatten().tolist()
96		with ignoring(Exception): # Try msgpack (faster on strings)
	97	with suppress(Exception): # Try msgpack (faster on strings)
97	98	return frame(msgpack.packb(l, use_bin_type=True))
98	99	return frame(pickle.dumps(l, protocol=pickle.HIGHEST_PROTOCOL))
99	100	else:

131	132	compress_bytes = lambda bytes, itemsize: bytes
132	133	decompress_bytes = identity
133	134
134		with ignoring(ImportError):
	135	with suppress(ImportError):
135	136	import blosc
136	137	blosc.set_nthreads(1)
137	138

141	142	compress_text = partial(blosc.compress, typesize=1)
142	143	decompress_text = blosc.decompress
143	144
144		with ignoring(ImportError):
	145	with suppress(ImportError):
145	146	from snappy import compress as compress_text
146	147	from snappy import decompress as decompress_text
147	148

+6

-8

partd/pandas.py less more

0		from __future__ import absolute_import
1
2	0	from functools import partial
	1	import pickle
3	2
4	3	import numpy as np
5	4	import pandas as pd

7	6
8	7	from . import numpy as pnp
9	8	from .core import Interface
10		from .compatibility import pickle
11	9	from .encode import Encode
12	10	from .utils import extend, framesplit, frame
13	11

46	44
47	45	# TODO: don't use values, it does some work. Look at _blocks instead
48	46	# pframe/cframe do this well
49		arrays = dict((extend(k, col), df[col].values)
	47	arrays = {extend(k, col): df[col].values
50	48	for k, df in data.items()
51		for col in df.columns)
52		arrays.update(dict((extend(k, '.index'), df.index.values)
53		for k, df in data.items()))
	49	for col in df.columns}
	50	arrays.update({extend(k, '.index'): df.index.values
	51	for k, df in data.items()})
54	52	# TODO: handle categoricals
55	53	self.partd.append(arrays, **kwargs)
56	54

109	107	cat = None
110	108	values = ind.values
111	109
112		header = (type(ind), ind._get_attributes_dict(), values.dtype, cat)
	110	header = (type(ind), {k: getattr(ind, k, None) for k in ind._attributes}, values.dtype, cat)
113	111	bytes = pnp.compress(pnp.serialize(values), values.dtype)
114	112	return header, bytes
115	113

+1

-3

partd/pickle.py less more

0	0	"""
1	1	get/put functions that consume/produce Python lists using Pickle to serialize
2	2	"""
3		from __future__ import absolute_import
4		from .compatibility import pickle
5
	3	import pickle
6	4
7	5	from .encode import Encode
8	6	from functools import partial

+1

-2

partd/python.py less more

3	3
4	4	First we try msgpack (it's faster). If that fails then we default to pickle.
5	5	"""
6		from __future__ import absolute_import
7		from .compatibility import pickle
	6	import pickle
8	7
9	8	try:
10	9	from pandas import msgpack

+0

-2

partd/tests/test_numpy.py less more

0		from __future__ import absolute_import
1
2	0	import pytest
3	1	np = pytest.importorskip('numpy') # noqa
4	2

+1

-3

partd/tests/test_pandas.py less more

0		from __future__ import absolute_import
1
2	0	import pytest
3	1	pytest.importorskip('pandas') # noqa
4	2
5	3	import numpy as np
6	4	import pandas as pd
7		import pandas.util.testing as tm
	5	import pandas.testing as tm
8	6	import os
9	7
10	8	from partd.pandas import PandasColumns, PandasBlocks, serialize, deserialize

+2

-2

partd/tests/test_zmq.py less more

53	53	held_append.start()
54	54
55	55	sleep(0.1)
56		assert held_append.isAlive() # held!
	56	assert held_append.is_alive() # held!
57	57
58	58	assert not s._frozen_sockets.empty()
59	59

63	63	free_frozen_sockets_thread.start()
64	64
65	65	sleep(0.2)
66		assert not held_append.isAlive()
	66	assert not held_append.is_alive()
67	67	assert s._frozen_sockets.empty()
68	68	finally:
69	69	s.close()

+1

-15

partd/utils.py less more

73	73	yield bytes[i: i+n]
74	74
75	75
76		@contextmanager
77		def ignoring(*exc):
78		try:
79		yield
80		except exc:
81		pass
82
83
84		@contextmanager
85		def do_nothing(args, *kwargs):
86		yield
87
88
89	76	def nested_get(ind, coll, lazy=False):
90	77	""" Get nested index from collection
91	78

128	115	"""
129	116	for item in seq:
130	117	if isinstance(item, list):
131		for item2 in flatten(item):
132		yield item2
	118	yield from flatten(item)
133	119	else:
134	120	yield item
135	121

+9

-13

partd/zmq.py less more

0		from __future__ import absolute_import, print_function
1
2	0	import zmq
3	1	import logging
4	2	from itertools import chain

9	7	from toolz import accumulate, topk, pluck, merge, keymap
10	8	import uuid
11	9	from collections import defaultdict
12		from contextlib import contextmanager
	10	from contextlib import contextmanager, suppress
13	11	from threading import Thread, Lock
14	12	from datetime import datetime
15	13	from multiprocessing import Process

19	17	from .file import File
20	18	from .buffer import Buffer
21	19	from . import core
22		from .compatibility import Queue, Empty, unicode
23		from .utils import ignoring
24	20
25	21
26	22	tuple_sep = b'-\|-'

37	33	raise
38	34
39	35
40		class Server(object):
	36	class Server:
41	37	def __init__(self, partd=None, bind=None, start=True, block=False,
42	38	hostname=None):
43	39	self.context = zmq.Context()

49	45
50	46	if hostname is None:
51	47	hostname = socket.gethostname()
52		if isinstance(bind, unicode):
	48	if isinstance(bind, str):
53	49	bind = bind.encode()
54	50	if bind is None:
55	51	port = self.socket.bind_to_random_port('tcp://*')

172	168	logger.debug('Server closes')
173	169	self.status = 'closed'
174	170	self.block()
175		with ignoring(zmq.error.ZMQError):
	171	with suppress(zmq.error.ZMQError):
176	172	self.socket.close(1)
177		with ignoring(zmq.error.ZMQError):
	173	with suppress(zmq.error.ZMQError):
178	174	self.context.destroy(3)
179	175	self.partd.lock.release()
180	176

304	300
305	301	def close(self):
306	302	if hasattr(self, 'server_process'):
307		with ignoring(zmq.error.ZMQError):
	303	with suppress(zmq.error.ZMQError):
308	304	self.close_server()
309	305	self.server_process.join()
310		with ignoring(zmq.error.ZMQError):
	306	with suppress(zmq.error.ZMQError):
311	307	self.socket.close(1)
312		with ignoring(zmq.error.ZMQError):
	308	with suppress(zmq.error.ZMQError):
313	309	self.context.destroy(1)
314	310
315	311	def __exit__(self, type, value, traceback):

320	316	self.close()
321	317
322	318
323		class NotALock(object):
	319	class NotALock:
324	320	def acquire(self): pass
325	321	def release(self): pass
326	322

+134

-135

partd.egg-info/PKG-INFO less more

0	0	Metadata-Version: 2.1
1	1	Name: partd
2		Version: 1.2.0
	2	Version: 1.3.0
3	3	Summary: Appendable key-value storage
4	4	Home-page: http://github.com/dask/partd/
5	5	Maintainer: Matthew Rocklin
6	6	Maintainer-email: mrocklin@gmail.com
7	7	License: BSD
8		Description: PartD
9		=====
10
11		\|Build Status\| \|Version Status\|
12
13		Key-value byte store with appendable values
14
15		Partd stores key-value pairs.
16		Values are raw bytes.
17		We append on old values.
18
19		Partd excels at shuffling operations.
20
21		Operations
22		----------
23
24		PartD has two main operations, ``append`` and ``get``.
25
26
27		Example
28		-------
29
30		1. Create a Partd backed by a directory::
31
32		>>> import partd
33		>>> p = partd.File('/path/to/new/dataset/')
34
35		2. Append key-byte pairs to dataset::
36
37		>>> p.append({'x': b'Hello ', 'y': b'123'})
38		>>> p.append({'x': b'world!', 'y': b'456'})
39
40		3. Get bytes associated to keys::
41
42		>>> p.get('x') # One key
43		b'Hello world!'
44
45		>>> p.get(['y', 'x']) # List of keys
46		[b'123456', b'Hello world!']
47
48		4. Destroy partd dataset::
49
50		>>> p.drop()
51
52		That's it.
53
54
55		Implementations
56		---------------
57
58		We can back a partd by an in-memory dictionary::
59
60		>>> p = Dict()
61
62		For larger amounts of data or to share data between processes we back a partd
63		by a directory of files. This uses file-based locks for consistency.::
64
65		>>> p = File('/path/to/dataset/')
66
67		However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low::
68
69		>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
70
71		You might also want to have many distributed process write to a single partd
72		consistently. This can be done with a server
73
74		* Server Process::
75
76		>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
77		>>> s = Server(p, address='ipc://server')
78
79		* Worker processes::
80
81		>>> p = Client('ipc://server') # Client machine talks to remote server
82
83
84		Encodings and Compression
85		-------------------------
86
87		Once we can robustly and efficiently append bytes to a partd we consider
88		compression and encodings. This is generally available with the ``Encode``
89		partd, which accepts three functions, one to apply on bytes as they are
90		written, one to apply to bytes as they are read, and one to join bytestreams.
91		Common configurations already exist for common data and compression formats.
92
93		We may wish to compress and decompress data transparently as we interact with a
94		partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take
95		another partd as an argument.::
96
97		>>> p = File(...)
98		>>> p = ZLib(p)
99
100		These work exactly as before, the (de)compression happens automatically.
101
102		Common data formats like Python lists, numpy arrays, and pandas
103		dataframes are also supported out of the box.::
104
105		>>> p = File(...)
106		>>> p = NumPy(p)
107		>>> p.append({'x': np.array([...])})
108
109		This lets us forget about bytes and think instead in our normal data types.
110
111		Composition
112		-----------
113
114		In principle we want to compose all of these choices together
115
116		1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client``
117		2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ...
118		3. Compression: ``Blosc``, ``Snappy``, ...
119
120		Partd objects compose by nesting. Here we make a partd that writes pickle
121		encoded BZ2 compressed bytes directly to disk::
122
123		>>> p = Pickle(BZ2(File('foo')))
124
125		We could construct more complex systems that include compression,
126		serialization, buffering, and remote access.::
127
128		>>> server = Server(Buffer(Dict(), File(), available_memory=2e0))
129
130		>>> client = Pickle(Snappy(Client(server.address)))
131		>>> client.append({'x': [1, 2, 3]})
132
133		.. \|Build Status\| image:: https://github.com/dask/partd/workflows/CI/badge.svg
134		:target: https://github.com/dask/partd/actions?query=workflow%3ACI
135		.. \|Version Status\| image:: https://img.shields.io/pypi/v/partd.svg
136		:target: https://pypi.python.org/pypi/partd/
137
138		Platform: UNKNOWN
139	8	Classifier: Programming Language :: Python :: 3
140		Classifier: Programming Language :: Python :: 3.5
141		Classifier: Programming Language :: Python :: 3.6
142	9	Classifier: Programming Language :: Python :: 3.7
143	10	Classifier: Programming Language :: Python :: 3.8
144		Requires-Python: >=3.5
	11	Classifier: Programming Language :: Python :: 3.9
	12	Requires-Python: >=3.7
145	13	Provides-Extra: complete
	14	License-File: LICENSE.txt
	15
	16	PartD
	17	=====
	18
	19	\|Build Status\| \|Version Status\|
	20
	21	Key-value byte store with appendable values
	22
	23	Partd stores key-value pairs.
	24	Values are raw bytes.
	25	We append on old values.
	26
	27	Partd excels at shuffling operations.
	28
	29	Operations
	30	----------
	31
	32	PartD has two main operations, ``append`` and ``get``.
	33
	34
	35	Example
	36	-------
	37
	38	1. Create a Partd backed by a directory::
	39
	40	>>> import partd
	41	>>> p = partd.File('/path/to/new/dataset/')
	42
	43	2. Append key-byte pairs to dataset::
	44
	45	>>> p.append({'x': b'Hello ', 'y': b'123'})
	46	>>> p.append({'x': b'world!', 'y': b'456'})
	47
	48	3. Get bytes associated to keys::
	49
	50	>>> p.get('x') # One key
	51	b'Hello world!'
	52
	53	>>> p.get(['y', 'x']) # List of keys
	54	[b'123456', b'Hello world!']
	55
	56	4. Destroy partd dataset::
	57
	58	>>> p.drop()
	59
	60	That's it.
	61
	62
	63	Implementations
	64	---------------
	65
	66	We can back a partd by an in-memory dictionary::
	67
	68	>>> p = Dict()
	69
	70	For larger amounts of data or to share data between processes we back a partd
	71	by a directory of files. This uses file-based locks for consistency.::
	72
	73	>>> p = File('/path/to/dataset/')
	74
	75	However this can fail for many small writes. In these cases you may wish to buffer one partd with another, keeping a fixed maximum of data in the buffering partd. This writes the larger elements of the first partd to the second partd when space runs low::
	76
	77	>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
	78
	79	You might also want to have many distributed process write to a single partd
	80	consistently. This can be done with a server
	81
	82	* Server Process::
	83
	84	>>> p = Buffer(Dict(), File(), available_memory=2e9) # 2GB memory buffer
	85	>>> s = Server(p, address='ipc://server')
	86
	87	* Worker processes::
	88
	89	>>> p = Client('ipc://server') # Client machine talks to remote server
	90
	91
	92	Encodings and Compression
	93	-------------------------
	94
	95	Once we can robustly and efficiently append bytes to a partd we consider
	96	compression and encodings. This is generally available with the ``Encode``
	97	partd, which accepts three functions, one to apply on bytes as they are
	98	written, one to apply to bytes as they are read, and one to join bytestreams.
	99	Common configurations already exist for common data and compression formats.
	100
	101	We may wish to compress and decompress data transparently as we interact with a
	102	partd. Objects like ``BZ2``, ``Blosc``, ``ZLib`` and ``Snappy`` exist and take
	103	another partd as an argument.::
	104
	105	>>> p = File(...)
	106	>>> p = ZLib(p)
	107
	108	These work exactly as before, the (de)compression happens automatically.
	109
	110	Common data formats like Python lists, numpy arrays, and pandas
	111	dataframes are also supported out of the box.::
	112
	113	>>> p = File(...)
	114	>>> p = NumPy(p)
	115	>>> p.append({'x': np.array([...])})
	116
	117	This lets us forget about bytes and think instead in our normal data types.
	118
	119	Composition
	120	-----------
	121
	122	In principle we want to compose all of these choices together
	123
	124	1. Write policy: ``Dict``, ``File``, ``Buffer``, ``Client``
	125	2. Encoding: ``Pickle``, ``Numpy``, ``Pandas``, ...
	126	3. Compression: ``Blosc``, ``Snappy``, ...
	127
	128	Partd objects compose by nesting. Here we make a partd that writes pickle
	129	encoded BZ2 compressed bytes directly to disk::
	130
	131	>>> p = Pickle(BZ2(File('foo')))
	132
	133	We could construct more complex systems that include compression,
	134	serialization, buffering, and remote access.::
	135
	136	>>> server = Server(Buffer(Dict(), File(), available_memory=2e0))
	137
	138	>>> client = Pickle(Snappy(Client(server.address)))
	139	>>> client.append({'x': [1, 2, 3]})
	140
	141	.. \|Build Status\| image:: https://github.com/dask/partd/workflows/CI/badge.svg
	142	:target: https://github.com/dask/partd/actions?query=workflow%3ACI
	143	.. \|Version Status\| image:: https://img.shields.io/pypi/v/partd.svg
	144	:target: https://pypi.python.org/pypi/partd/

+0

-1

partd.egg-info/SOURCES.txt less more

7	7	partd/__init__.py
8	8	partd/_version.py
9	9	partd/buffer.py
10		partd/compatibility.py
11	10	partd/compressed.py
12	11	partd/core.py
13	12	partd/dict.py

+1

-1

setup.cfg less more

0	0	[versioneer]
1		vcs = git
	1	VCS = git
2	2	style = pep440
3	3	versionfile_source = partd/_version.py
4	4	versionfile_build = partd/_version.py

+2

-3

setup.py less more

14	14	keywords='',
15	15	packages=['partd'],
16	16	install_requires=list(open('requirements.txt').read().strip().split('\n')),
17		python_requires=">=3.5",
	17	python_requires=">=3.7",
18	18	classifiers=[
19	19	"Programming Language :: Python :: 3",
20		"Programming Language :: Python :: 3.5",
21		"Programming Language :: Python :: 3.6",
22	20	"Programming Language :: Python :: 3.7",
23	21	"Programming Language :: Python :: 3.8",
	22	"Programming Language :: Python :: 3.9",
24	23	],
25	24	long_description=(open('README.rst').read() if exists('README.rst')
26	25	else ''),