Commit 0ee978908c6edf1ed28f0be9753f19d03ad3ee64 - datalad

+1

-1

.travis.yml less more

177	177	# Verify that setup.py build doesn't puke
178	178	- python setup.py build
179	179	# Run tests
180		- PATH=$PWD/tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -v -A "$NOSE_SELECTION_OP(integration or usecase or slow)" --with-doctest --doctest-tests --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
	180	- WRAPT_DISABLE_EXTENSIONS=1 PATH=$PWD/tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -v -A "$NOSE_SELECTION_OP(integration or usecase or slow)" --with-doctest --doctest-tests --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
181	181	# Generate documentation and run doctests
182	182	# but do only when we do not have obnoxious logging turned on -- something screws up sphinx on travis
183	183	- if [ ! "${DATALAD_LOG_LEVEL:-}" = 2 ]; then PYTHONPATH=$PWD $NOSE_WRAPPER make -C docs html doctest; fi

+31

-5

CHANGELOG.md less more

8	8	We would recommend to consult log of the
9	9	[DataLad git repository](http://github.com/datalad/datalad) for more details.
10	10
11		# 0.7.0 (Jun 25, 2017) -- when it works - it is quite awesome!
12
13		I bet we will fix some bugs and make a world even a better place.
	11
	12	## 0.8.0 (Jul 31, 2017) -- it is better than ever
	13
	14	A variety of fixes and enhancements
	15
	16	### Fixes
	17
	18	- [publish] would now push merged `git-annex` branch even if no other changes
	19	were done
	20	- [publish] should be able to publish using relative path within SSH URI
	21	(git hook would use relative paths)
	22	- [publish] should better tollerate publishing to pure git and `git-annex`
	23	special remotes
	24
	25	### Enhancements and new features
	26
	27	- [plugin] mechanism came to replace [export]. See [export_tarball] for the
	28	replacement of [export]. Now it should be easy to extend datalad's interface
	29	with custom functionality to be invoked along with other commands.
	30	- Minimalistic coloring of the results rendering
	31	- [publish]/`copy_to` got progress bar report now and support of `--jobs`
	32	- minor fixes and enhancements to crawler (e.g. support of recursive removes)
	33
	34
	35	## 0.7.0 (Jun 25, 2017) -- when it works - it is quite awesome!
	36
	37	New features, refactorings, and bug fixes.
14	38
15	39	### Major refactoring and deprecations
16	40

18	42	- [create-sibling], and [unlock] have been re-written to support the
19	43	same common API as most other commands
20	44
21		## Enhancements and new features
	45	### Enhancements and new features
22	46
23	47	- [siblings] can now be used to query and configure a local repository by
24	48	using the sibling name ``here``

30	54	- Significant parts of the documentation of been updated
31	55	- Instantiate GitPython's Repo instances lazily
32	56
33		## Fixes
	57	### Fixes
34	58
35	59	- API documentation is now rendered properly as HTML, and is easier to browse by
36	60	having more compact pages

358	382	[datalad]: http://docs.datalad.org/en/latest/generated/man/datalad.html
359	383	[drop]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-drop.html
360	384	[export]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-export.html
	385	[export_tarball]: http://docs.datalad.org/en/latest/generated/datalad.plugin.export_tarball.html
361	386	[get]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-get.html
362	387	[install]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-install.html
363	388	[ls]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-ls.html
364	389	[metadata]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-metadata.html
365	390	[publish]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-publish.html
	391	[plugin]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-plugin.html
366	392	[remove]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-remove.html
367	393	[save]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-save.html
368	394	[search]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-search.html

+22

-0

CONTRIBUTING.md less more

413	413	Any new DATALAD_CMD_PROTOCOL has to implement datalad.support.protocol.ProtocolInterface
414	414	- DATALAD_CMD_PROTOCOL_PREFIX:
415	415	Sets a prefix to add before the command call times are noted by DATALAD_CMD_PROTOCOL.
	416
	417
	418	# Changelog section
	419
	420	For the upcoming release use this template
	421
	422	## 0.8.1 (??? ??, 2017) -- will be better than ever
	423
	424	bet we will fix some bugs and make a world even a better place.
	425
	426	### Major refactoring and deprecations
	427
	428	- hopefully none
	429
	430	### Fixes
	431
	432	?
	433
	434	### Enhancements and new features
	435
	436	?
	437

+17

-0

datalad/__init__.py less more

26	26	from .log import lgr
27	27	import atexit
28	28	from datalad.utils import on_windows
	29
29	30	if not on_windows:
30	31	lgr.log(5, "Instantiating ssh manager")
31	32	from .support.sshconnector import SSHManager

33	34	atexit.register(ssh_manager.close, allow_fail=False)
34	35	else:
35	36	ssh_manager = None
	37
	38	try:
	39	# this will fix the rendering of ANSI escape sequences
	40	# for colored terminal output on windows
	41	# it will do nothing on any other platform, hence it
	42	# is safe to call unconditionally
	43	import colorama
	44	colorama.init()
	45	atexit.register(colorama.deinit)
	46	except ImportError as e:
	47	if on_windows:
	48	from datalad.dochelpers import exc_str
	49	lgr.warning(
	50	"'colorama' Python module missing, terminal output may look garbled [%s]",
	51	exc_str(e))
	52	pass
36	53
37	54	atexit.register(lgr.log, 5, "Exiting")
38	55

+2

-6

datalad/api.py less more

20	20	from collections import namedtuple
21	21	from functools import wraps
22	22
23		from datalad import cfg
24
25		from .interface.base import update_docstring_with_parameters
26	23	from .interface.base import get_interface_groups
27	24	from .interface.base import get_api_name
28		from .interface.base import alter_interface_docs_for_api
29		from .interface.base import merge_allargs2kwargs
	25	from .interface.base import get_allargs_as_kwargs
30	26
31	27	def _kwargs_to_namespace(call, args, kwargs):
32	28	"""
33	29	Given a __call__, args and kwargs passed, prepare a cmdlineargs-like
34	30	thing
35	31	"""
36		kwargs_ = merge_allargs2kwargs(call, args, kwargs)
	32	kwargs_ = get_allargs_as_kwargs(call, args, kwargs)
37	33	# Get all arguments removing those possible ones used internally and
38	34	# which shouldn't be exposed outside anyways
39	35	[kwargs_.pop(k) for k in kwargs_ if k.startswith('_')]

+27

-0

datalad/cmdline/main.py less more

141	141	of the command; 'continue' works like 'ignore', but an error causes a
142	142	non-zero exit code; 'stop' halts on first failure and yields non-zero exit
143	143	code. A failure is any result with status 'impossible' or 'error'.""")
	144	parser.add_argument(
	145	'--run-before', dest='common_run_before',
	146	nargs='+',
	147	action='append',
	148	metavar='PLUGINSPEC',
	149	help="""DataLad plugin to run after the command. PLUGINSPEC is a list
	150	comprised of a plugin name plus optional `key=value` pairs with arguments
	151	for the plugin call (see `plugin` command documentation for details).
	152	This option can be given more than once to run multiple plugins
	153	in the order in which they were given.
	154	For running plugins that require a --dataset argument it is important
	155	to provide the respective dataset as the --dataset argument of the main
	156	command, if it is not in the list of plugin arguments."""),
	157	parser.add_argument(
	158	'--run-after', dest='common_run_after',
	159	nargs='+',
	160	action='append',
	161	metavar='PLUGINSPEC',
	162	help="""Like --run-before, but plugins are executed after the main command
	163	has finished."""),
	164	parser.add_argument(
	165	'--cmd', dest='_', action='store_true',
	166	help="""syntactical helper that can be used to end the list of global
	167	command line options before the subcommand label. Options like
	168	--run-before can take an arbitray number of arguments and may require
	169	to be followed by a single --cmd in order to enable identification
	170	of the subcommand.""")
144	171
145	172	# yoh: atm we only dump to console. Might adopt the same separation later on
146	173	# and for consistency will call it --verbose-level as well for now

+13

-11

datalad/crawler/nodes/annex.py less more

36	36	from ...utils import lmtime
37	37	from ...utils import find_files
38	38	from ...utils import auto_repr
	39	from ...utils import _path_
39	40	from ...utils import getpwd
40	41	from ...utils import try_multiple
41	42	from ...tests.utils import put_file_under_git

176	177	"Was instructed to add to super dataset but no super dataset "
177	178	"was found for %s" % ds
178	179	)
179
180		# create/AnnexRepo specification of backend does it non-persistently in .git/config
181		if backend:
182		put_file_under_git(path, '.gitattributes', '* annex.backend=%s' % backend, annexed=False)
183	180
184	181	return ds
185	182

853	850	if self.repo.dirty and not exists(opj(path, '.gitattributes')) and isinstance(self.repo, AnnexRepo):
854	851	backends = self.repo.default_backends
855	852	if backends:
856		# then record default backend into the .gitattributes
857		put_file_under_git(path, '.gitattributes', '* annex.backend=%s' % backends[0],
858		annexed=False)
	853	self.repo.set_default_backend(backends[0], commit=False)
859	854
860	855	# at least use repo._git_custom_command
861	856	def _commit(self, msg=None, options=[]):

1302	1297	stats = data.get('datalad_stats', None)
1303	1298	if self.repo.dirty: # or self.tracker.dirty # for dry run
1304	1299	lgr.info("Repository found dirty -- adding and committing")
1305		_call(self.repo.add, '.', options=self.options) # so everything is committed
	1300	_call(self.repo.add, '.', git_options=self.options) # so everything is committed
1306	1301
1307	1302	stats_str = ('\n\n' + stats.as_str(mode='full')) if stats else ''
1308	1303	_call(self._commit, "%s%s" % (', '.join(self._states), stats_str), options=["-a"])

1394	1389
1395	1390	return _remove_obsolete()
1396	1391
1397		def remove(self, data):
	1392	def remove(self, data, recursive=False):
1398	1393	"""Removed passed along file name from git/annex"""
1399	1394	stats = data.get('datalad_stats', None)
1400	1395	self._states.add("Removed files")

1402	1397	# TODO: not sure if we should may be check if exists, and skip/just complain if not
1403	1398	if stats:
1404	1399	_call(stats.increment, 'removed')
1405		if lexists(opj(self.repo.path, filename)):
1406		_call(self.repo.remove, filename)
	1400	filepath = opj(self.repo.path, filename)
	1401	if lexists(filepath):
	1402	if os.path.isdir(filepath):
	1403	if recursive:
	1404	_call(self.repo.remove, filename, recursive=True)
	1405	else:
	1406	lgr.warning("Do not removing %s recursively, skipping", filepath)
	1407	else:
	1408	_call(self.repo.remove, filename)
1407	1409	else:
1408	1410	lgr.warning("Was asked to remove non-existing path %s", filename)
1409	1411	yield data

+3

-6

datalad/crawler/pipelines/tests/test_balsa.py less more

219	219	commits = {b: list(repo.get_branch_commits(b)) for b in branches}
220	220	eq_(len(commits['incoming']), 1)
221	221	eq_(len(commits['incoming-processed']), 2)
222		eq_(len(commits['master']), 5) # all commits out there -- init ds + init crawler + 1*(incoming, processed, merge)
	222	eq_(len(commits['master']), 6) # all commits out there -- init ds + init crawler + 1*(incoming, processed, merge)
223	223
224	224	with chpwd(outd):
225	225	eq_(set(glob('*')), {'dir1', 'file1.nii'})

249	249
250	250
251	251	@with_tree(tree={
252
253	252	'study': {
254	253	'show': {
255	254	'WG33': {

258	257	<a href="/file/show/JX5V">file1.nii</a>
259	258	<a href="/file/show/RIBX">dir1 / file2.nii</a>
260	259	<a href="/file/show/GSRD">file1b.nii</a>
261
262	260	%s
263	261	</body></html>""" % _PLUG_HERE,
264	262	},

272	270	}
273	271	}
274	272	},
275
276	273	'file': {
277	274	'show': {
278	275	'JX5V': {

292	289	}
293	290
294	291	},
295
296	292	'download': {
297	293	'file1.nii': "content of file1.nii is different",
298	294	'file1b.nii': "content of file1b.nii",

342	338	'./.datalad/crawl/crawl.cfg',
343	339	'./.datalad/crawl/statuses/incoming.json',
344	340	'./.datalad/meta/balsa.json',
345		'./file1.nii', './dir1/file2.nii',
	341	'./file1.nii',
	342	'./dir1/file2.nii',
346	343	}
347	344
348	345	eq_(set(all_files), target_files)

+5

-5

datalad/crawler/pipelines/tests/test_openfmri.py less more

264	264	eq_(len(commits_l['incoming']), 3)
265	265	eq_(len(commits['incoming-processed']), 6)
266	266	eq_(len(commits_l['incoming-processed']), 4) # because original merge has only 1 parent - incoming
267		eq_(len(commits['master']), 12) # all commits out there -- dataset init, crawler init + 3*(incoming, processed, meta data aggregation, merge)
268		eq_(len(commits_l['master']), 6)
	267	eq_(len(commits['master']), 13) # all commits out there -- dataset init, crawler init + 3*(incoming, processed, meta data aggregation, merge)
	268	eq_(len(commits_l['master']), 7)
269	269
270	270	# Check tags for the versions
271	271	eq_(out[0]['datalad_stats'].get_total().versions, ['1.0.0', '1.0.1'])
272	272	# +1 because original "release" was assumed to be 1.0.0
273	273	repo_tags = repo.get_tags()
274	274	eq_(repo.get_tags(output='name'), ['1.0.0', '1.0.0+1', '1.0.1'])
275		eq_(repo_tags[0]['hexsha'], commits_l['master'][-4].hexsha) # next to the last one
	275	eq_(repo_tags[0]['hexsha'], commits_l['master'][-5].hexsha) # next to the last one
276	276	eq_(repo_tags[-1]['hexsha'], commits_l['master'][0].hexsha) # the last one
277	277
278	278	def hexsha(l):

468	468	eq_(len(commits['incoming-processed']), 2)
469	469	eq_(len(commits_l['incoming-processed']), 2) # because original merge has only 1 parent - incoming
470	470	# to avoid 'dataset init' commit create() needs save=False
471		eq_(len(commits['master']), 6) # all commits out there, dataset init, crawler, init, incoming, incoming-processed, meta data aggregation, merge
472		eq_(len(commits_l['master']), 4) # dataset init, init, meta data aggregation, merge
	471	eq_(len(commits['master']), 7) # all commits out there, backend, dataset init, crawler, init, incoming, incoming-processed, meta data aggregation, merge
	472	eq_(len(commits_l['master']), 5) # backend, dataset init, init, meta data aggregation, merge
473	473
474	474	# rerun pipeline -- make sure we are on the same in all branches!
475	475	with chpwd(outd):

+1

-0

datalad/customremotes/tests/test_archives.py less more

42	42	['encryption=none', 'type=external', 'externaltype=%s' % ARCHIVES_SPECIAL_REMOTE,
43	43	'autoenable=true'
44	44	])
	45	assert annex.is_special_annex_remote(ARCHIVES_SPECIAL_REMOTE)
45	46	# We want two maximally obscure names, which are also different
46	47	assert(fn_extracted != fn_inarchive_obscure)
47	48	annex.add(fn_archive, commit=True, msg="Added tarball")

+11

-2

datalad/distribution/add.py less more

37	37	from datalad.interface.results import results_from_annex_noinfo
38	38	from datalad.interface.utils import discover_dataset_trace_to_targets
39	39	from datalad.interface.utils import eval_results
40		from datalad.interface.utils import build_doc
	40	from datalad.interface.base import build_doc
41	41	from datalad.interface.save import Save
42	42	from datalad.distribution.utils import _fixup_submodule_dotgit_setup
43	43	from datalad.support.constraints import EnsureStr

140	140	as it inflates dataset sizes and impacts flexibility of data
141	141	transport. If not specified - it will be up to git-annex to
142	142	decide, possibly on .gitattributes options."""),
	143	to_annex=Parameter(
	144	args=("--to-annex",),
	145	action='store_false',
	146	dest='to_git',
	147	doc="""flag whether to force adding data to Annex, instead of
	148	git. It might be that .gitattributes instructs for a file to be
	149	added to git, but for some particular files it is desired to be
	150	added to annex (e.g. sensitive files etc).
	151	If not specified - it will be up to git-annex to
	152	decide, possibly on .gitattributes options."""),
143	153	recursive=recursion_flag,
144	154	recursion_limit=recursion_limit,
145	155	# TODO not functional anymore

177	187	annex_opts=None,
178	188	annex_add_opts=None,
179	189	jobs=None):
180
181	190	# parameter constraints:
182	191	if not path:
183	192	raise InsufficientArgumentsError(

+12

-1

datalad/distribution/clone.py less more

9	9
10	10
11	11	import logging
	12	import re
12	13	from os import listdir
13	14	from os.path import relpath
14	15	from os.path import pardir

16	17
17	18	from datalad.interface.base import Interface
18	19	from datalad.interface.utils import eval_results
19		from datalad.interface.utils import build_doc
	20	from datalad.interface.base import build_doc
20	21	from datalad.interface.results import get_status_dict
21	22	from datalad.interface.common_opts import location_description
22	23	# from datalad.interface.common_opts import git_opts

100	101	reckless=reckless_opt,
101	102	alt_sources=Parameter(
102	103	args=('--alternative-sources',),
	104	dest='alt_sources',
103	105	metavar='SOURCE',
104	106	nargs='+',
105	107	doc="""Alternative sources to be tried if a dataset cannot

235	237	lgr.debug("Wiping out unsuccessful clone attempt at: %s",
236	238	dest_path)
237	239	rmtree(dest_path)
	240	if 'could not create work tree' in e.stderr.lower():
	241	# this cannot be fixed by trying another URL
	242	yield get_status_dict(
	243	status='error',
	244	message=re.match(r".fatal: (.)\n",
	245	e.stderr,
	246	flags=re.MULTILINE \| re.DOTALL).group(1),
	247	**status_kwargs)
	248	return
238	249
239	250	if not destination_dataset.is_installed():
240	251	yield get_status_dict(

+38

-9

datalad/distribution/create.py less more

19	19	from datalad.interface.base import Interface
20	20	from datalad.interface.annotate_paths import AnnotatePaths
21	21	from datalad.interface.utils import eval_results
22		from datalad.interface.utils import build_doc
	22	from datalad.interface.base import build_doc
23	23	from datalad.interface.common_opts import git_opts
24	24	from datalad.interface.common_opts import annex_opts
25	25	from datalad.interface.common_opts import annex_init_opts

111	111	doc="""enforce creation of a dataset in a non-empty directory""",
112	112	action='store_true'),
113	113	description=location_description,
	114	# TODO could move into cfg_annex plugin
114	115	no_annex=Parameter(
115	116	args=("--no-annex",),
116	117	doc="""if set, a plain Git repository will be created without any
117	118	annex""",
118	119	action='store_true'),
	120	text_no_annex=Parameter(
	121	args=("--text-no-annex",),
	122	doc="""if set, all text files in the future would be added to Git,
	123	not annex. Achieved by adding an entry to `.gitattributes` file. See
	124	http://git-annex.branchable.com/tips/largefiles/ and `no_annex`
	125	DataLad plugin to establish even more detailed control over which
	126	files are placed under annex control.""",
	127	action='store_true'),
119	128	save=nosave_opt,
	129	# TODO could move into cfg_annex plugin
120	130	annex_version=Parameter(
121	131	args=("--annex-version",),
122	132	doc="""select a particular annex repository version. The

124	134	version. This should be left untouched, unless you know what
125	135	you are doing""",
126	136	constraints=EnsureDType(int) \| EnsureNone()),
	137	# TODO could move into cfg_annex plugin
127	138	annex_backend=Parameter(
128	139	args=("--annex-backend",),
129	140	constraints=EnsureStr() \| EnsureNone(),

132	143	For a list of supported backends see the git-annex
133	144	documentation. The default is optimized for maximum compatibility
134	145	of datasets across platforms (especially those with limited
135		path lengths)""",
136		nargs=1),
	146	path lengths)"""),
	147	# TODO could move into cfg_metadata plugin
137	148	native_metadata_type=Parameter(
138	149	args=('--native-metadata-type',),
139	150	metavar='LABEL',

142	153	doc="""Metadata type label. Must match the name of the respective
143	154	parser implementation in Datalad (e.g. "bids").[CMD: This option
144	155	can be given multiple times CMD]"""),
	156	# TODO could move into cfg_access/permissions plugin
145	157	shared_access=shared_access_opt,
146	158	git_opts=git_opts,
147	159	annex_opts=annex_opts,

164	176	shared_access=None,
165	177	git_opts=None,
166	178	annex_opts=None,
167		annex_init_opts=None):
	179	annex_init_opts=None,
	180	text_no_annex=None
	181	):
168	182
169	183	# two major cases
170	184	# 1. we got a `dataset` -> we either want to create it (path is None),

206	220	unavailable_path_msg=None,
207	221	# if we have a dataset given that actually exists, we want to
208	222	# fail if the requested path is not in it
209		nondataset_path_status='error' if dataset and dataset.is_installed() else '',
	223	nondataset_path_status='error' \
	224	if isinstance(dataset, Dataset) and dataset.is_installed() else '',
210	225	on_failure='ignore')
211	226	path = None
212	227	for r in annotated_paths:

251	266
252	267	# important to use the given Dataset object to avoid spurious ID
253	268	# changes with not-yet-materialized Datasets
254		tbds = dataset if dataset is not None and dataset.path == path['path'] \
	269	tbds = dataset if isinstance(dataset, Dataset) and dataset.path == path['path'] \
255	270	else Dataset(path['path'])
256	271
257	272	# don't create in non-empty directory without `force`:

274	289	else:
275	290	# always come with annex when created from scratch
276	291	lgr.info("Creating a new annex repo at %s", tbds.path)
277		AnnexRepo(
	292	tbrepo = AnnexRepo(
278	293	tbds.path,
279	294	url=None,
280	295	create=True,

283	298	description=description,
284	299	git_opts=git_opts,
285	300	annex_opts=annex_opts,
286		annex_init_opts=annex_init_opts)
	301	annex_init_opts=annex_init_opts
	302	)
	303
	304	if text_no_annex:
	305	git_attributes_file = opj(tbds.path, '.gitattributes')
	306	with open(git_attributes_file, 'a') as f:
	307	f.write('* annex.largefiles=(not(mimetype=text/*))\n')
	308	tbrepo.add([git_attributes_file], git=True)
	309	tbrepo.commit(
	310	"Instructed annex to add text files to git",
	311	_datalad_msg=True,
	312	files=[git_attributes_file]
	313	)
287	314
288	315	if native_metadata_type is not None:
289	316	if not isinstance(native_metadata_type, list):

306	333	with open(opj(tbds.path, '.datalad', '.gitattributes'), 'a') as gitattr:
307	334	# TODO this will need adjusting, when annex'ed aggregate meta data
308	335	# comes around
	336	gitattr.write('# Text files (according to file --mime-type) are added directly to git.\n')
	337	gitattr.write('# See http://git-annex.branchable.com/tips/largefiles/ for more info.\n')
309	338	gitattr.write('** annex.largefiles=nothing\n')
310	339
311	340	# save everything, we need to do this now and cannot merge with the

317	346	# the next only makes sense if we saved the created dataset,
318	347	# otherwise we have no committed state to be registered
319	348	# in the parent
320		if save and dataset is not None and dataset.path != tbds.path:
	349	if save and isinstance(dataset, Dataset) and dataset.path != tbds.path:
321	350	# we created a dataset in another dataset
322	351	# -> make submodule
323	352	for r in dataset.add(

+24

-6

datalad/distribution/create_sibling.py less more

29	29	datasetmethod, require_dataset
30	30	from datalad.interface.annotate_paths import AnnotatePaths
31	31	from datalad.interface.base import Interface
32		from datalad.interface.utils import build_doc
	32	from datalad.interface.base import build_doc
	33	from datalad.interface.utils import eval_results
33	34	from datalad.interface.common_opts import recursion_limit, recursion_flag
34	35	from datalad.interface.common_opts import as_common_datasrc
35	36	from datalad.interface.common_opts import publish_by_default

38	39	from datalad.interface.common_opts import annex_wanted_opt
39	40	from datalad.interface.common_opts import annex_group_opt
40	41	from datalad.interface.common_opts import annex_groupwanted_opt
41		from datalad.interface.utils import eval_results
42		from datalad.interface.utils import build_doc
43	42	from datalad.support.annexrepo import AnnexRepo
44	43	from datalad.support.constraints import EnsureStr, EnsureNone, EnsureBool
45	44	from datalad.support.constraints import EnsureChoice

171	170	ssh("rm -rf {}".format(sh_quote(remoteds_path)))
172	171	# if we succeeded in removing it
173	172	path_exists = False
	173	# Since it is gone now, git-annex also should forget about it
	174	remotes = ds.repo.get_remotes()
	175	if name in remotes:
	176	# so we had this remote already, we should announce it dead
	177	# XXX what if there was some kind of mismatch and this name
	178	# isn't matching the actual remote UUID? should have we
	179	# checked more carefully?
	180	lgr.info(
	181	"Announcing existing remote %s dead to annex and removing",
	182	name
	183	)
	184	if isinstance(ds.repo, AnnexRepo):
	185	ds.repo.set_remote_dead(name)
	186	ds.repo.remove_remote(name)
174	187	elif existing == 'reconfigure':
175	188	lgr.info(_msg + " Will only reconfigure")
176	189	only_reconfigure = True

716	729	# DataLad
717	730	#
718	731	# (Re)generate meta-data for DataLad Web UI and possibly init new submodules
719		dsdir="{path}"
	732	dsdir="$(dirname $0)/../.."
720	733	logfile="$dsdir/{WEB_META_LOG}/{log_filename}"
721	734
	735	if [ ! -e "$dsdir/.git" ]; then
	736	echo Assumption of being under .git has failed >&2
	737	exit 1
	738	fi
	739
722	740	mkdir -p "$dsdir/{WEB_META_LOG}" # assure logs directory exists
723	741
724	742	( which datalad > /dev/null \
725		&& ( cd ..; GIT_DIR="$PWD/.git" datalad ls -a --json file "$dsdir"; ) \
	743	&& ( cd "$dsdir"; GIT_DIR="$PWD/.git" datalad ls -a --json file .; ) \
726	744	\|\| echo "E: no datalad found - skipping generation of indexes for web frontend"; \
727	745	) &> "$logfile"
728	746
729	747	# Some submodules might have been added and thus we better init them
730		( cd ..; git submodule update --init >> "$logfile" 2>&1 \|\| : ; )
	748	( cd "$dsdir"; git submodule update --init \|\| : ; ) >> "$logfile" 2>&1
731	749	'''.format(WEB_META_LOG=WEB_META_LOG, **locals())
732	750
733	751	with make_tempfile(content=hook_content) as tempf:

+1

-1

datalad/distribution/create_sibling_github.py less more

28	28	from datalad.support.constraints import EnsureChoice
29	29	from datalad.support.exceptions import MissingExternalDependency
30	30	from ..interface.base import Interface
31		from datalad.interface.utils import build_doc
	31	from datalad.interface.base import build_doc
32	32	from datalad.distribution.dataset import EnsureDataset, datasetmethod, \
33	33	require_dataset, Dataset
34	34	from datalad.distribution.siblings import Siblings

+1

-1

datalad/distribution/create_test_dataset.py less more

25	25	from datalad.support.gitrepo import GitRepo
26	26	from datalad.support.annexrepo import AnnexRepo
27	27	from datalad.interface.base import Interface
28		from datalad.interface.utils import build_doc
	28	from datalad.interface.base import build_doc
29	29
30	30	lgr = logging.getLogger('datalad.distribution.tests')
31	31

+10

-12

datalad/distribution/drop.py less more

36	36	from datalad.interface.results import results_from_annex_noinfo
37	37	from datalad.interface.utils import handle_dirty_dataset
38	38	from datalad.interface.utils import eval_results
39		from datalad.interface.utils import build_doc
	39	from datalad.interface.base import build_doc
40	40
41	41	lgr = logging.getLogger('datalad.distribution.drop')
42	42

128	128	before file content is dropped. As these checks could lead to slow
129	129	operation (network latencies, etc), they can be disabled.
130	130
131
132		Examples
133		--------
134
135		Drop all file content in a dataset::
136
137		~/some/dataset$ datalad drop
138
139		Drop all file content in a dataset and all its subdatasets::
140
141		~/some/dataset$ datalad drop --recursive
	131	Examples:
	132
	133	Drop all file content in a dataset::
	134
	135	~/some/dataset$ datalad drop
	136
	137	Drop all file content in a dataset and all its subdatasets::
	138
	139	~/some/dataset$ datalad drop --recursive
142	140
143	141	"""
144	142	_action = 'drop'

+1

-1

datalad/distribution/get.py less more

20	20	from datalad.interface.annotate_paths import AnnotatePaths
21	21	from datalad.interface.annotate_paths import annotated2content_by_ds
22	22	from datalad.interface.utils import eval_results
23		from datalad.interface.utils import build_doc
	23	from datalad.interface.base import build_doc
24	24	from datalad.interface.results import get_status_dict
25	25	from datalad.interface.results import results_from_paths
26	26	from datalad.interface.results import annexjson2result

+1

-1

datalad/distribution/install.py less more

29	29	from datalad.interface.results import YieldDatasets
30	30	from datalad.interface.results import is_result_matching_pathsource_argument
31	31	from datalad.interface.utils import eval_results
32		from datalad.interface.utils import build_doc
	32	from datalad.interface.base import build_doc
33	33	from datalad.support.constraints import EnsureNone
34	34	from datalad.support.constraints import EnsureStr
35	35	from datalad.support.exceptions import InsufficientArgumentsError

+105

-40

datalad/distribution/publish.py less more

16	16	from os.path import sep as dirsep
17	17
18	18	from datalad.interface.base import Interface
19		from datalad.interface.utils import build_doc
	19	from datalad.interface.base import build_doc
20	20	from datalad.interface.utils import filter_unmodified
21	21	from datalad.interface.common_opts import annex_copy_opts, recursion_flag, \
22		recursion_limit, git_opts, annex_opts
	22	recursion_limit, git_opts, annex_opts, jobs_opt
23	23	from datalad.interface.common_opts import missing_sibling_opt
24	24	from datalad.support.param import Parameter
25	25	from datalad.support.constraints import EnsureStr

29	29	from datalad.support.exceptions import CommandError
30	30
31	31	from datalad.utils import assure_list
	32	from datalad.dochelpers import exc_str
32	33
33	34	from .dataset import EnsureDataset
34	35	from .dataset import Dataset

59	60	return error
60	61
61	62
62		def _publish_dataset(ds, remote, refspec, paths, annex_copy_options, force=False):
	63	def _publish_dataset(ds, remote, refspec, paths, annex_copy_options, force=False, jobs=None):
63	64	# TODO: this setup is now quite ugly. The only way `refspec` can come
64	65	# in, is when there is a tracking branch, and we get its state via
65	66	# `refspec`
66	67
	68	is_annex_repo = isinstance(ds.repo, AnnexRepo)
	69
67	70	def _publish_data():
68		remote_wanted = ds.repo.get_preferred_content('wanted', remote)
69		if (paths or annex_copy_options or remote_wanted) and \
70		isinstance(ds.repo, AnnexRepo) and not \
71		ds.config.getbool(
72		'remote.{}'.format(remote),
73		'annex-ignore',
74		False):
	71	if ds.repo.is_remote_annex_ignored(remote):
	72	return [], [] # Cannot publish any data
	73	try:
	74	remote_wanted = ds.repo.get_preferred_content('wanted', remote)
	75	except CommandError as exc:
	76	if "cannot determine uuid" in str(exc):
	77	if not ds.repo.is_remote_annex_ignored(remote):
	78	lgr.warning(
	79	"Annex failed to determine UUID, skipping publishing data for now: %s",
	80	exc_str(exc)
	81	)
	82	return [], []
	83	raise
	84
	85	if (paths or annex_copy_options or remote_wanted) and is_annex_repo:
75	86	lgr.info("Publishing {0} data to {1}".format(ds, remote))
76	87	# overwrite URL with pushurl if any, reason:
77	88	# https://git-annex.branchable.com/bugs/annex_ignores_pushurl_and_uses_only_url_upon___34__copy_--to__34__/

98	109	pblshd = ds.repo.copy_to(
99	110	files=paths,
100	111	remote=remote,
101		options=annex_copy_options_
	112	options=annex_copy_options_,
	113	jobs=jobs
102	114	)
103	115	# if ds.submodules:
104	116	# # NOTE: we might need to init them on the remote, but needs to

148	160	# there was no tracking branch, check the push target
149	161	remote_branch_name = ds.repo.get_active_branch()
150	162
151		if remote_branch_name in ds.repo.repo.remotes[remote].refs:
152		lgr.debug("Testing for changes with respect to '%s' of remote '%s'",
153		remote_branch_name, remote)
154		current_commit = ds.repo.repo.commit()
155		remote_ref = ds.repo.repo.remotes[remote].refs[remote_branch_name]
156		if paths:
157		# if there were custom paths, we will look at the diff
158		lgr.debug("Since paths provided, looking at diff")
159		diff = current_commit.diff(
160		remote_ref,
161		paths=paths
162		)
163		else:
164		# if commits differ at all
165		lgr.debug("Since no paths provided, comparing commits")
166		diff = current_commit != remote_ref.commit
167		else:
168		lgr.debug("Remote '%s' has no branch matching %r. Will publish",
169		remote, remote_branch_name)
170		# we don't have any remote state, need to push for sure
171		diff = True
	163	diff = _get_remote_diff(ds, paths, None, remote, remote_branch_name)
	164
	165	# We might have got new information in git-annex branch although no other
	166	# changes
	167	if not diff and is_annex_repo:
	168	try:
	169	git_annex_commit = next(ds.repo.get_branch_commits('git-annex'))
	170	except StopIteration:
	171	git_annex_commit = None
	172	diff = _get_remote_diff(ds, [], git_annex_commit, remote, 'git-annex')
	173	if diff:
	174	lgr.info("Will publish updated git-annex")
172	175
173	176	# # remote might be set to be ignored by annex, or we might not even know yet its uuid
174	177	# annex_ignore = ds.config.getbool('remote.{}.annex-ignore'.format(remote), None)

177	180	# if annex_uuid is None:
178	181	# # most probably not yet 'known' and might require some annex
179	182	knew_remote_uuid = None
180		if isinstance(ds.repo, AnnexRepo):
	183	if is_annex_repo and not ds.repo.is_remote_annex_ignored(remote):
181	184	try:
182	185	ds.repo.get_preferred_content('wanted', remote) # could be just checking config.remote.uuid
183	186	knew_remote_uuid = True
184	187	except CommandError:
185	188	knew_remote_uuid = False
	189
186	190	if knew_remote_uuid:
187	191	# we can try publishing right away
188	192	published += _publish_data()

206	210	None,
207	211	paths,
208	212	annex_copy_options,
209		force=force)
	213	force=force,
	214	jobs=jobs
	215	)
210	216	published.extend(pblsh)
211	217	skipped.extend(skp)
	218
	219	if is_annex_repo and \
	220	ds.repo.is_special_annex_remote(remote):
	221	# There is nothing else to "publish"
	222	lgr.debug(
	223	"{0} is a special annex remote, no git push is needed".format(remote)
	224	)
	225	return published, skipped
212	226
213	227	lgr.info("Publishing {0} to {1}".format(ds, remote))
214	228

216	230	# we need to annex merge first. Otherwise a git push might be
217	231	# rejected if involving all matching branches for example.
218	232	# Once at it, also push the annex branch right here.
219		if isinstance(ds.repo, AnnexRepo):
	233	if is_annex_repo:
220	234	lgr.debug("Obtain remote annex info from '%s'", remote)
221	235	ds.repo.fetch(remote=remote)
222	236	ds.repo.merge_annex(remote)

234	248	current_branch = ds.repo.get_active_branch()
235	249	if current_branch: # possibly make this conditional on a switch
236	250	# TODO: this should become it own helper
237		if isinstance(ds.repo, AnnexRepo):
	251	if is_annex_repo:
238	252	# annex could manage this branch
239	253	if current_branch.startswith('annex/direct') \
240	254	and ds.config.getbool('annex', 'direct', default=False):

251	265	# and thus probably broken -- test me!
252	266	current_branch = match_adjusted.group(1)
253	267	things2push.append(current_branch)
254		if isinstance(ds.repo, AnnexRepo):
	268	if is_annex_repo:
255	269	things2push.append('git-annex')
256	270	# check that all our magic found valid branches
257	271	things2push = [t for t in things2push if t in ds.repo.get_branches()]

273	287
274	288	published.append(ds)
275	289
276		if knew_remote_uuid is False:
	290	late_published_data = None
	291	if knew_remote_uuid is False and is_annex_repo:
277	292	# publish only after we tried to sync/push and if it was annex repo
278		published += _publish_data()
	293	late_published_data = _publish_data()
	294	published += late_published_data
	295
	296	# if we published something (data, subdatasets) even though there were no
	297	# diff (thus no push), or there was an additional data published later
	298	if ((not diff and published) or late_published_data) \
	299	and is_annex_repo:
	300	# we need to do the same annex merge dance and push updated git-annex
	301	# and this way also trigger post-update hook which might update
	302	# web UI meta-data
	303	# https://github.com/datalad/datalad/issues/1658
	304	lgr.info(
	305	"Obtaining remote annex info from '%s' and pushing updated",
	306	remote
	307	)
	308	ds.repo.fetch(remote=remote)
	309	ds.repo.merge_annex(remote)
	310	# this will trigger post-update hook if present
	311	_log_push_info(ds.repo.push(remote=remote, refspec=['git-annex']))
	312
279	313	return published, skipped
	314
	315
	316	def _get_remote_diff(ds, paths, current_commit, remote, remote_branch_name):
	317	"""Helper to check if remote has different state of the branch"""
	318	if remote_branch_name in ds.repo.repo.remotes[remote].refs:
	319	lgr.debug("Testing for changes with respect to '%s' of remote '%s'",
	320	remote_branch_name, remote)
	321	if current_commit is None:
	322	current_commit = ds.repo.repo.commit()
	323	remote_ref = ds.repo.repo.remotes[remote].refs[remote_branch_name]
	324	if paths:
	325	# if there were custom paths, we will look at the diff
	326	lgr.debug("Since paths provided, looking at diff")
	327	diff = current_commit.diff(
	328	remote_ref,
	329	paths=paths
	330	)
	331	else:
	332	# if commits differ at all
	333	lgr.debug("Since no paths provided, comparing commits")
	334	diff = current_commit != remote_ref.commit
	335	else:
	336	lgr.debug("Remote '%s' has no branch matching %r. Will publish",
	337	remote, remote_branch_name)
	338	# we don't have any remote state, need to push for sure
	339	diff = True
	340
	341	return diff
280	342
281	343
282	344	@build_doc

365	427	git_opts=git_opts,
366	428	annex_opts=annex_opts,
367	429	annex_copy_opts=annex_copy_opts,
	430	jobs=jobs_opt,
368	431	)
369	432
370	433	@staticmethod

381	444	git_opts=None,
382	445	annex_opts=None,
383	446	annex_copy_opts=None,
	447	jobs=None
384	448	):
385	449
386	450	# if ever we get a mode, for "with-data" we would need this

522	586	refspec=remote_info.get('refspec', None),
523	587	paths=content_by_ds[ds_path],
524	588	annex_copy_options=annex_copy_opts,
525		force=force
	589	force=force,
	590	jobs=jobs
526	591	)
527	592	published.extend(pblsh)
528	593	skipped.extend(skp)

+7

-8

datalad/distribution/remove.py less more

31	31	from datalad.interface.common_opts import recursion_flag
32	32	from datalad.interface.utils import path_is_under
33	33	from datalad.interface.utils import eval_results
34		from datalad.interface.utils import build_doc
	34	from datalad.interface.base import build_doc
35	35	from datalad.interface.results import get_status_dict
36	36	from datalad.interface.save import Save
37	37	from datalad.distribution.drop import _drop_files

63	63	subdirectories within a dataset as always done automatically. An optional
64	64	recursion limit is applied relative to each given input path.
65	65
66		Examples
67		--------
68
69		Permanently remove a subdataset from a dataset and wipe out the subdataset
70		association too::
71
72		~/some/dataset$ datalad remove somesubdataset1
	66	Examples:
	67
	68	Permanently remove a subdataset from a dataset and wipe out the subdataset
	69	association too::
	70
	71	~/some/dataset$ datalad remove somesubdataset1
73	72	"""
74	73	_action = 'remove'
75	74

+28

-10

datalad/distribution/siblings.py less more

17	17
18	18	from datalad.interface.base import Interface
19	19	from datalad.interface.utils import eval_results
20		from datalad.interface.utils import build_doc
	20	from datalad.interface.base import build_doc
21	21	from datalad.interface.results import get_status_dict
22	22	from datalad.support.annexrepo import AnnexRepo
23	23	from datalad.support.constraints import EnsureStr

288	288	**dict(
289	289	res,
290	290	path=path,
291		with_annex='+' if 'annex-uuid' in res else '-',
	291	with_annex='+' if 'annex-uuid' in res \
	292	else ('-' if res.get('annex-ignore', None) else '?'),
292	293	spec=spec)))
293	294
294	295

614	615	if annex_description is not None:
615	616	info['annex-description'] = annex_description
616	617	if get_annex_info and isinstance(ds.repo, AnnexRepo):
617		for prop in ('wanted', 'required', 'group'):
618		var = ds.repo.get_preferred_content(
619		prop, '.' if remote == 'here' else remote)
620		if var:
621		info['annex-{}'.format(prop)] = var
622		groupwanted = ds.repo.get_groupwanted(remote)
623		if groupwanted:
624		info['annex-groupwanted'] = groupwanted
	618	if not ds.repo.is_remote_annex_ignored(remote):
	619	try:
	620	for prop in ('wanted', 'required', 'group'):
	621	var = ds.repo.get_preferred_content(
	622	prop, '.' if remote == 'here' else remote)
	623	if var:
	624	info['annex-{}'.format(prop)] = var
	625	groupwanted = ds.repo.get_groupwanted(remote)
	626	if groupwanted:
	627	info['annex-groupwanted'] = groupwanted
	628	except CommandError as exc:
	629	if 'cannot determine uuid' in str(exc):
	630	# not an annex (or no connection), would be marked as
	631	# annex-ignore
	632	msg = "Failed to determine if %s carries annex." % remote
	633	ds.repo.config.reload()
	634	if ds.repo.is_remote_annex_ignored(remote):
	635	msg += " Remote was marked by annex as annex-ignore. " \
	636	"Edit .git/config to reset if you think that was done by mistake due to absent connection etc"
	637	lgr.warning(msg)
	638	info['annex-ignore'] = True
	639	else:
	640	raise
	641	else:
	642	info['annex-ignore'] = True
625	643
626	644	info['status'] = 'ok'
627	645	yield info

+1

-1

datalad/distribution/subdatasets.py less more

22	22
23	23	from datalad.interface.base import Interface
24	24	from datalad.interface.utils import eval_results
25		from datalad.interface.utils import build_doc
	25	from datalad.interface.base import build_doc
26	26	from datalad.interface.results import get_status_dict
27	27	from datalad.support.constraints import EnsureBool
28	28	from datalad.support.constraints import EnsureStr

+9

-2

datalad/distribution/tests/test_add.py less more

89	89	if arg[0] == test_list_4:
90	90	result = ds.add('dir', to_git=arg[1], save=False)
91	91	else:
92		result = ds.add(arg[0], to_git=arg[1], save=False, result_xfm='relpaths',
	92	result = ds.add(arg[0], to_git=arg[1], save=False,
	93	result_xfm='relpaths',
93	94	return_type='item-or-list')
94	95	# order depends on how annex processes it, so let's sort
95	96	eq_(sorted(result), sorted(arg[0]))

102	103	# ignore the initial config file in index:
103	104	indexed.remove(opj('.datalad', 'config'))
104	105	indexed.remove(opj('.datalad', '.gitattributes'))
	106	indexed.remove('.gitattributes')
105	107	if isinstance(arg[0], list):
106	108	for x in arg[0]:
107	109	unstaged.remove(x)

306	308	@with_tree(tree={
307	309	'file.txt': 'some text',
308	310	'empty': '',
	311	'file2.txt': 'some text to go to annex',
309	312	'.gitattributes': '* annex.largefiles=(not(mimetype=text/*))'}
310	313	)
311	314	def test_add_mimetypes(path):

318	321	ds.repo.commit('added attributes to git explicitly')
319	322	# now test that those files will go into git/annex correspondingly
320	323	__not_tested__ = ds.add(['file.txt', 'empty'])
321		ok_clean_git(path)
	324	ok_clean_git(path, untracked=['file2.txt'])
322	325	# Empty one considered to be application/octet-stream i.e. non-text
323	326	ok_file_under_git(path, 'empty', annexed=True)
324	327	ok_file_under_git(path, 'file.txt', annexed=False)
	328
	329	# But we should be able to force adding file to annex when desired
	330	ds.add('file2.txt', to_git=False)
	331	ok_file_under_git(path, 'file2.txt', annexed=True)⏎

+23

-0

datalad/distribution/tests/test_clone.py less more

14	14	from os.path import exists
15	15	from os.path import basename
16	16	from os.path import dirname
	17	from os import mkdir
	18	from os import chmod
	19	from os import geteuid
17	20
18	21	from mock import patch
19	22
20	23	from datalad.api import create
21	24	from datalad.api import clone
22	25	from datalad.utils import chpwd
	26	from datalad.utils import _path_
	27	from datalad.utils import rmtree
23	28	from datalad.support.exceptions import IncompleteResultsError
24	29	from datalad.support.gitrepo import GitRepo
25	30	from datalad.support.annexrepo import AnnexRepo

44	49	from datalad.tests.utils import serve_path_via_http
45	50	from datalad.tests.utils import use_cassette
46	51	from datalad.tests.utils import skip_if_no_network
	52	from datalad.tests.utils import skip_if_on_windows
	53	from datalad.tests.utils import skip_if
47	54
48	55	from ..dataset import Dataset
49	56

308	315	assert clonedsub.path.startswith(path)
309	316	# no subdataset relation
310	317	eq_(cloned.subdatasets(), [])
	318
	319
	320	@skip_if_on_windows
	321	@skip_if(not geteuid(), "Will fail under super-user")
	322	@with_tempfile(mkdir=True)
	323	def test_clone_report_permission_issue(tdir):
	324	pdir = _path_(tdir, 'protected')
	325	mkdir(pdir)
	326	# make it read-only
	327	chmod(pdir, 0o555)
	328	with chpwd(pdir):
	329	res = clone('///', result_xfm=None, return_type='list', on_failure='ignore')
	330	assert_status('error', res)
	331	assert_result_count(
	332	res, 1, status='error',
	333	message="could not create work tree dir '%s/datasets.datalad.org': Permission denied" % pdir)

+51

-0

datalad/distribution/tests/test_create.py less more

10	10
11	11	import os
12	12	from os.path import join as opj
	13	from os.path import lexists
13	14
14	15	from ..dataset import Dataset
15	16	from datalad.api import create
16	17	from datalad.utils import chpwd
	18	from datalad.utils import _path_
17	19	from datalad.cmd import Runner
18	20
19	21	from datalad.tests.utils import with_tempfile
	22	from datalad.tests.utils import create_tree
20	23	from datalad.tests.utils import eq_
21	24	from datalad.tests.utils import ok_
22	25	from datalad.tests.utils import assert_not_in

27	30	from datalad.tests.utils import assert_in_results
28	31	from datalad.tests.utils import ok_clean_git
29	32	from datalad.tests.utils import with_tree
	33	from datalad.tests.utils import ok_file_has_content
	34	from datalad.tests.utils import ok_file_under_git
30	35
31	36
32	37	_dataset_hierarchy_template = {

253	258	# is committed -- ds2 is already known to git and it just pukes with a bit
254	259	# confusing 'ds2' already exists in the index
255	260	assert_in('ds2', ds1.subdatasets(result_xfm='relpaths'))
	261
	262
	263	@with_tempfile(mkdir=True)
	264	def test_create_withplugin(path):
	265	# first without
	266	ds = create(path)
	267	assert(not lexists(opj(ds.path, 'README.rst')))
	268	ds.remove()
	269	assert(not lexists(ds.path))
	270	# now for reals...
	271	ds = create(
	272	# needs to identify the dataset, otherwise post-proc
	273	# plugin doesn't no what to run on
	274	dataset=path,
	275	run_after=[['add_readme', 'filename=with hole.txt']])
	276	ok_clean_git(path)
	277	# README wil lend up in annex by default
	278	# TODO implement `nice_dataset` plugin to give sensible
	279	# default and avoid that
	280	assert(lexists(opj(ds.path, 'with hole.txt')))
	281
	282
	283	@with_tempfile(mkdir=True)
	284	def test_create_text_no_annex(path):
	285	ds = create(path, text_no_annex=True)
	286	ok_clean_git(path)
	287	import re
	288	ok_file_has_content(
	289	_path_(path, '.gitattributes'),
	290	content='\* annex\.largefiles=$not\(mimetype=text/\*$\)',
	291	re_=True,
	292	match=False,
	293	flags=re.MULTILINE
	294	)
	295	# and check that it is really committing text files to git and binaries
	296	# to annex
	297	create_tree(path,
	298	{
	299	't': 'some text',
	300	'b': '' # empty file is not considered to be a text file
	301	# should we adjust the rule to consider only non empty files?
	302	}
	303	)
	304	ds.add(['t', 'b'])
	305	ok_file_under_git(path, 't', annexed=False)
	306	ok_file_under_git(path, 'b', annexed=True)

+59

-5

datalad/distribution/tests/test_create_sibling.py less more

16	16
17	17	from ..dataset import Dataset
18	18	from datalad.api import publish, install, create_sibling
	19	from datalad.cmd import Runner
19	20	from datalad.utils import chpwd
20	21	from datalad.tests.utils import create_tree
21	22	from datalad.support.gitrepo import GitRepo

32	33	from datalad.tests.utils import assert_raises
33	34	from datalad.tests.utils import skip_ssh
34	35	from datalad.tests.utils import assert_dict_equal
	36	from datalad.tests.utils import assert_false
35	37	from datalad.tests.utils import assert_set_equal
36	38	from datalad.tests.utils import assert_result_count
37	39	from datalad.tests.utils import assert_not_equal

72	74	assert_false(exists(opj(target_path, path)))
73	75
74	76	hook_path = _path_(target_path, '.git/hooks/post-update')
75		ok_file_has_content(hook_path,
76		'.\ndsdir="%s"\n.' % target_path,
77		re_=True,
78		flags=re.DOTALL)
	77	# No longer the case -- we are no longer using absolute path in the
	78	# script
	79	# ok_file_has_content(hook_path,
	80	# '.\ndsdir="%s"\n.' % target_path,
	81	# re_=True,
	82	# flags=re.DOTALL)
	83	# No absolute path (so dataset could be moved) in the hook
	84	with open(hook_path) as f:
	85	assert_not_in(target_path, f.read())
79	86	# correct ls_json command in hook content (path wrapped in "quotes)
80	87	ok_file_has_content(hook_path,
81		'.datalad ls -a --json file "\$dsdir".',
	88	'.datalad ls -a --json file \..',
82	89	re_=True,
83	90	flags=re.DOTALL)
84	91

418	425
419	426	@skip_ssh
420	427	@with_tempfile(mkdir=True)
	428	@with_tempfile
	429	def test_replace_and_relative_sshpath(src_path, dst_path):
	430	# We need to come up with the path relative to our current home directory
	431	# https://github.com/datalad/datalad/issues/1653
	432	dst_relpath = os.path.relpath(dst_path, os.path.expanduser('~'))
	433	url = 'localhost:%s' % dst_relpath
	434	ds = Dataset(src_path).create()
	435	create_tree(ds.path, {'sub.dat': 'lots of data'})
	436	ds.add('sub.dat')
	437
	438	ds.create_sibling(url)
	439	published = ds.publish('.', to='localhost')
	440	assert_in('sub.dat', published[0])
	441	# verify that hook runs and there is nothing in stderr
	442	# since it exits with 0 exit even if there was a problem
	443	out, err = Runner(cwd=opj(dst_path, '.git'))(_path_('hooks/post-update'))
	444	assert_false(out)
	445	assert_false(err)
	446
	447	# Verify that we could replace and publish no problem
	448	# https://github.com/datalad/datalad/issues/1656
	449	# Strangely it spits outs IncompleteResultsError exception atm... so just
	450	# checking that it fails somehow
	451	assert_raises(Exception, ds.create_sibling, url)
	452	ds.create_sibling(url, existing='replace')
	453	published2 = ds.publish('.', to='localhost')
	454	assert_in('sub.dat', published2[0])
	455
	456	# and one more test since in above test it would not puke ATM but just
	457	# not even try to copy since it assumes that file is already there
	458	create_tree(ds.path, {'sub2.dat': 'more data'})
	459	ds.add('sub2.dat')
	460	published3 = ds.publish(to='localhost') # we publish just git
	461	assert_not_in('sub2.dat', published3[0])
	462	# now publish "with" data, which should also trigger the hook!
	463	# https://github.com/datalad/datalad/issues/1658
	464	from glob import glob
	465	from datalad.consts import WEB_META_LOG
	466	logs_prior = glob(_path_(dst_path, WEB_META_LOG, '*'))
	467	published4 = ds.publish('.', to='localhost')
	468	assert_in('sub2.dat', published4[0])
	469	logs_post = glob(_path_(dst_path, WEB_META_LOG, '*'))
	470	eq_(len(logs_post), len(logs_prior) + 1)
	471
	472
	473	@skip_ssh
	474	@with_tempfile(mkdir=True)
421	475	@with_tempfile(suffix="target")
422	476	def _test_target_ssh_inherit(standardgroup, src_path, target_path):
423	477	ds = Dataset(src_path).create()

+36

-9

datalad/distribution/tests/test_publish.py less more

27	27	from datalad.tests.utils import assert_raises
28	28	from datalad.tests.utils import assert_false
29	29	from datalad.tests.utils import assert_result_count
	30	from datalad.tests.utils import neq_
30	31	from datalad.tests.utils import ok_clean_git
31	32	from datalad.tests.utils import swallow_logs
32	33	from datalad.tests.utils import create_tree

61	62	name='target1')
62	63	# source.publish(to='target1')
63	64	with chpwd(p1):
64		# since we have only a single commit -- there is no HEAD^
65		assert_raises(ValueError, publish, to='target1', since='HEAD^')
	65	# since we have only two commits (set backend, init dataset)
	66	# -- there is no HEAD^^
	67	assert_raises(ValueError, publish, to='target1', since='HEAD^^')
66	68	# but now let's add one more commit, we should be able to pusblish
67	69	source.repo.commit("msg", options=['--allow-empty'])
68	70	publish(to='target1', since='HEAD^') # must not fail now

131	133
132	134
133	135	@with_testrepos('submodule_annex', flavors=['local'])
134		@with_tempfile(mkdir=True)
135		@with_tempfile(mkdir=True)
136		@with_tempfile(mkdir=True)
137		@with_tempfile(mkdir=True)
138		def test_publish_recursive(origin, src_path, dst_path, sub1_pub, sub2_pub):
139
	136	@with_tempfile
	137	@with_tempfile(mkdir=True)
	138	@with_tempfile(mkdir=True)
	139	@with_tempfile(mkdir=True)
	140	@with_tempfile(mkdir=True)
	141	def test_publish_recursive(pristine_origin, origin_path, src_path, dst_path, sub1_pub, sub2_pub):
	142
	143	# we will be publishing back to origin, so to not alter testrepo
	144	# we will first clone it
	145	origin = install(origin_path, source=pristine_origin, recursive=True)
140	146	# prepare src
141		source = install(src_path, source=origin, recursive=True)
	147	source = install(src_path, source=origin_path, recursive=True)
142	148
143	149	# create plain git at target:
144	150	target = GitRepo(dst_path, create=True)

193	199	eq_(list(sub2_target.get_branch_commits("git-annex")),
194	200	list(sub2.get_branch_commits("git-annex")))
195	201
	202	# we are tracking origin but origin has different git-annex, since we
	203	# cloned from it, so it is not aware of our git-annex
	204	neq_(list(origin.repo.get_branch_commits("git-annex")),
	205	list(source.repo.get_branch_commits("git-annex")))
	206	# So if we first publish to it recursively, we would update
	207	# all sub-datasets since git-annex branch would need to be pushed
	208	res_ = publish(dataset=source, recursive=True)
	209	eq_(set(r.path for r in res_[0]),
	210	set(opj(*([source.path] + x)) for x in ([], ['subm 1'], ['subm 2'])))
	211	# and now should carry the same state for git-annex
	212	eq_(list(origin.repo.get_branch_commits("git-annex")),
	213	list(source.repo.get_branch_commits("git-annex")))
	214
196	215	# test for publishing with --since. By default since no changes, nothing pushed
197	216	res_ = publish(dataset=source, recursive=True)
198	217	eq_(set(r.path for r in res_[0]), set())

335	354	# before
336	355	eq_({sub1.path, sub2.path},
337	356	set(result_paths))
	357
	358	# if we publish again -- nothing to be published
	359	eq_(source.publish(to="target"), ([], []))
	360	# if we drop a file and publish again -- dataset should be published
	361	# since git-annex branch was updated
	362	source.drop('test-annex.dat')
	363	eq_(source.publish(to="target"), ([source], []))
	364	eq_(source.publish(to="target"), ([], [])) # and empty again if we try again
338	365
339	366
340	367	@skip_ssh

+4

-5

datalad/distribution/uninstall.py less more

29	29	from datalad.interface.common_opts import recursion_flag
30	30	from datalad.interface.utils import path_is_under
31	31	from datalad.interface.utils import eval_results
32		from datalad.interface.utils import build_doc
	32	from datalad.interface.base import build_doc
33	33	from datalad.interface.utils import handle_dirty_dataset
34	34	from datalad.interface.results import get_status_dict
35	35	from datalad.utils import rmtree

93	93	subdirectories within a dataset as always done automatically. An optional
94	94	recursion limit is applied relative to each given input path.
95	95
96		Examples
97		--------
	96	Examples:
98	97
99		Uninstall a subdataset (undo installation)::
	98	Uninstall a subdataset (undo installation)::
100	99
101		~/some/dataset$ datalad uninstall somesubdataset1
	100	~/some/dataset$ datalad uninstall somesubdataset1
102	101
103	102	"""
104	103	_action = 'uninstall'

+1

-1

datalad/distribution/update.py less more

17	17
18	18	from datalad.interface.base import Interface
19	19	from datalad.interface.utils import eval_results
20		from datalad.interface.utils import build_doc
	20	from datalad.interface.base import build_doc
21	21	from datalad.interface.results import get_status_dict
22	22	from datalad.support.constraints import EnsureStr
23	23	from datalad.support.constraints import EnsureNone

+1

-1

datalad/downloaders/s3.py less more

187	187	try:
188	188	key = self._bucket.get_key(url_filepath, version_id=params.get('versionId', None))
189	189	except S3ResponseError as e:
190		raise DownloadError("S3 refused to provide the key for %s from url %s: %s"
	190	raise TargetFileAbsent("S3 refused to provide the key for %s from url %s: %s"
191	191	% (url_filepath, url, e))
192	192	if key is None:
193	193	raise TargetFileAbsent("No key returned for %s from url %s" % (url_filepath, url))

+0

-114

~~datalad/export/__init__.py~~ less more

0		# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
1		# ex: set sts=4 ts=4 sw=4 noet:
2		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3		#
4		# See COPYING file distributed along with the datalad package for the
5		# copyright and license terms.
6		#
7		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8		"""
9
10		"""
11
12		__docformat__ = 'restructuredtext'
13
14		import logging
15		from glob import glob
16		from os.path import join as opj, basename, dirname
17		from importlib import import_module
18
19		from datalad.support.param import Parameter
20		from datalad.support.constraints import EnsureNone
21		from datalad.distribution.dataset import EnsureDataset
22		from datalad.distribution.dataset import datasetmethod
23		from datalad.distribution.dataset import require_dataset
24		from datalad.dochelpers import exc_str
25
26		from datalad.interface.base import Interface
27		from datalad.interface.utils import build_doc
28
29		lgr = logging.getLogger('datalad.export')
30
31
32		def _get_exporter_names():
33		basepath = dirname(__file__)
34		return [basename(e)[:-3]
35		for e in glob(opj(basepath, '*.py'))
36		if not e.endswith('__init__.py')]
37
38
39		@build_doc
40		class Export(Interface):
41		"""Export a dataset to another representation
42		"""
43		# XXX prevent common args from being added to the docstring
44		_no_eval_results = True
45
46		_params_ = dict(
47		dataset=Parameter(
48		args=("-d", "--dataset"),
49		doc="""specify the dataset to export. If
50		no dataset is given, an attempt is made to identify the dataset
51		based on the current working directory.""",
52		constraints=EnsureDataset() \| EnsureNone()),
53		astype=Parameter(
54		args=("astype",),
55		choices=_get_exporter_names(),
56		doc="""label of the type or format the dataset shall be exported
57		to."""),
58		output=Parameter(
59		args=('-o', '--output'),
60		doc="""output destination specification to be passes to the exporter.
61		The particular semantics of the option value depend on the actual
62		exporter. Typically, this will be a file name or a path to a
63		directory."""),
64		getcmdhelp=Parameter(
65		args=('--help-type',),
66		dest='getcmdhelp',
67		action='store_true',
68		doc="""show help for a specific export type/format"""),
69		)
70
71		@staticmethod
72		@datasetmethod(name='export')
73		def __call__(astype, dataset, getcmdhelp=False, output=None, **kwargs):
74		# get a handle on the relevant plugin module
75		import datalad.export as export_mod
76		try:
77		exmod = import_module('.%s' % (astype,), package=export_mod.__package__)
78		except ImportError as e:
79		raise ValueError("cannot load exporter '{}': {}".format(
80		astype, exc_str(e)))
81		if getcmdhelp:
82		# no result, but return the module to make the renderer do the rest
83		return (exmod, None)
84
85		ds = require_dataset(dataset, check_installed=True, purpose='exporting')
86		# call the plugin, either with the argv array from the cmdline call
87		# or directly with the kwargs
88		if 'datalad_unparsed_args' in kwargs:
89		result = exmod._datalad_export_plugin_call(
90		ds, argv=kwargs['datalad_unparsed_args'], output=output)
91		else:
92		result = exmod._datalad_export_plugin_call(
93		ds, output=output, **kwargs)
94		return (exmod, result)
95
96		@staticmethod
97		def result_renderer_cmdline(res, args):
98		exmod, result = res
99		if args.getcmdhelp:
100		# the function that prints the help was returned as result
101		if not hasattr(exmod, '_datalad_get_cmdline_help'):
102		lgr.error("export plugin '{}' does not provide help".format(exmod))
103		return
104		replacement = []
105		help = exmod._datalad_get_cmdline_help()
106		if isinstance(help, tuple):
107		help, replacement = help
108		if replacement:
109		for in_s, out_s in replacement:
110		help = help.replace(in_s, out_s + ' ' * max(0, len(in_s) - len(out_s)))
111		print(help)
112		return
113		# TODO call exporter function (if any)

+0

-89

~~datalad/export/tarball.py~~ less more

0		# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
1		# ex: set sts=4 ts=4 sw=4 noet:
2		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3		#
4		# See COPYING file distributed along with the datalad package for the
5		# copyright and license terms.
6		#
7		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8		"""
9
10		"""
11
12		__docformat__ = 'restructuredtext'
13
14		import logging
15		import tarfile
16		import os
17
18		from mock import patch
19		from os.path import join as opj, dirname, normpath, isabs
20		from datalad.support.annexrepo import AnnexRepo
21		from datalad.utils import file_basename
22
23		lgr = logging.getLogger('datalad.export.tarball')
24
25
26		# PLUGIN API
27		def _datalad_export_plugin_call(dataset, output, argv=None):
28		if argv:
29		lgr.warn("tarball exporter ignores any additional options '{}'".format(
30		argv))
31
32		repo = dataset.repo
33		committed_date = repo.get_committed_date()
34
35		# could be used later on to filter files by some criterion
36		def _filter_tarinfo(ti):
37		# Reset the date to match the one of the last commit, not from the
38		# filesystem since git doesn't track those at all
39		# TODO: use the date of the last commit when any particular
40		# file was changed -- would be the most kosher yoh thinks to the
41		# degree of our abilities
42		ti.mtime = committed_date
43		return ti
44
45		if output is None:
46		output = "datalad_{}.tar.gz".format(dataset.id)
47		else:
48		if not output.endswith('.tar.gz'):
49		output += '.tar.gz'
50
51		root = dataset.path
52		# use dir inside matching the output filename
53		# TODO: could be an option to the export plugin allowing empty value
54		# for no leading dir
55		leading_dir = file_basename(output)
56
57		# workaround for inability to pass down the time stamp
58		with patch('time.time', return_value=committed_date), \
59		tarfile.open(output, "w:gz") as tar:
60		repo_files = sorted(repo.get_indexed_files())
61		if isinstance(repo, AnnexRepo):
62		annexed = repo.is_under_annex(
63		repo_files, allow_quick=True, batch=True)
64		else:
65		annexed = [False] * len(repo_files)
66		for i, rpath in enumerate(repo_files):
67		fpath = opj(root, rpath)
68		if annexed[i]:
69		# resolve to possible link target
70		link_target = os.readlink(fpath)
71		if not isabs(link_target):
72		link_target = normpath(opj(dirname(fpath), link_target))
73		fpath = link_target
74		# name in the tarball
75		aname = normpath(opj(leading_dir, rpath))
76		tar.add(
77		fpath,
78		arcname=aname,
79		recursive=False,
80		filter=_filter_tarinfo)
81
82		# I think it might better return "final" filename where stuff was saved
83		return output
84
85
86		# PLUGIN API
87		def _datalad_get_cmdline_help():
88		return 'Just call it, and it will produce a tarball.'

+0

-13

~~datalad/export/tests/__init__.py~~ less more

0		# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
1		# ex: set sts=4 ts=4 sw=4 noet:
2		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3		#
4		# See COPYING file distributed along with the datalad package for the
5		# copyright and license terms.
6		#
7		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8		"""Interfaces tests
9
10		"""
11
12		__docformat__ = 'restructuredtext'

+0

-87

~~datalad/export/tests/test_tarball.py~~ less more

0		# emacs: -- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
1		# -- coding: utf-8 --
2		# ex: set sts=4 ts=4 sw=4 noet:
3		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
4		#
5		# See COPYING file distributed along with the datalad package for the
6		# copyright and license terms.
7		#
8		# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
9		"""Test tarball exporter"""
10
11		import os
12		import time
13		from os.path import join as opj
14		from os.path import isabs
15		import tarfile
16
17		from datalad.api import Dataset
18		from datalad.api import export
19		from datalad.utils import chpwd
20		from datalad.utils import md5sum
21
22		from datalad.tests.utils import with_tree
23		from datalad.tests.utils import ok_startswith
24		from datalad.tests.utils import assert_true, assert_not_equal, assert_raises, \
25		assert_false, assert_equal
26
27
28		_dataset_template = {
29		'ds': {
30		'file_up': 'some_content',
31		'dir': {
32		'file1_down': 'one',
33		'file2_down': 'two'}}}
34
35
36		@with_tree(_dataset_template)
37		def test_failure(path):
38		ds = Dataset(opj(path, 'ds')).create(force=True)
39		# unknown exporter
40		assert_raises(ValueError, ds.export, 'nah')
41		# non-existing dataset
42		assert_raises(ValueError, export, 'tarball', Dataset('nowhere'))
43
44
45		@with_tree(_dataset_template)
46		def test_tarball(path):
47		ds = Dataset(opj(path, 'ds')).create(force=True)
48		ds.add('.')
49		committed_date = ds.repo.get_committed_date()
50		with chpwd(path):
51		_mod, tarball1 = ds.export('tarball')
52		assert(not isabs(tarball1))
53		tarball1 = opj(path, tarball1)
54		default_outname = opj(path, 'datalad_{}.tar.gz'.format(ds.id))
55		assert_equal(tarball1, default_outname)
56		assert_true(os.path.exists(default_outname))
57		custom_outname = opj(path, 'myexport.tar.gz')
58		# feed in without extension
59		ds.export('tarball', output=custom_outname[:-7])
60		assert_true(os.path.exists(custom_outname))
61		custom1_md5 = md5sum(custom_outname)
62		# encodes the original tarball filename -> different checksum, despit
63		# same content
64		assert_not_equal(md5sum(default_outname), custom1_md5)
65		# should really sleep so if they stop using time.time - we know
66		time.sleep(1.1)
67		ds.export('tarball', output=custom_outname)
68		# should not encode mtime, so should be identical
69		assert_equal(md5sum(custom_outname), custom1_md5)
70
71		def check_contents(outname, prefix):
72		with tarfile.open(outname) as tf:
73		nfiles = 0
74		for ti in tf:
75		# any annex links resolved
76		assert_false(ti.issym())
77		ok_startswith(ti.name, prefix + '/')
78		assert_equal(ti.mtime, committed_date)
79		if '.datalad' not in ti.name:
80		# ignore any files in .datalad for this test to not be
81		# susceptible to changes in how much we generate a meta info
82		nfiles += 1
83		# we have exactly three files, and expect no content for any directory
84		assert_equal(nfiles, 3)
85		check_contents(default_outname, 'datalad_%s' % ds.id)
86		check_contents(custom_outname, 'myexport')

+1

-1

datalad/interface/__init__.py less more

40	40	'create-sibling-github'),
41	41	('datalad.interface.unlock', 'Unlock', 'unlock'),
42	42	('datalad.interface.save', 'Save', 'save'),
43		('datalad.export', 'Export', 'export'),
	43	('datalad.plugin', 'Plugin', 'plugin'),
44	44	])
45	45
46	46	_group_metadata = (

+1

-1

datalad/interface/add_archive_content.py less more

27	27	from os.path import normpath
28	28
29	29	from .base import Interface
30		from datalad.interface.utils import build_doc
	30	from datalad.interface.base import build_doc
31	31	from .common_opts import allow_dirty
32	32	from ..consts import ARCHIVES_SPECIAL_REMOTE
33	33	from ..support.param import Parameter

+1

-1

datalad/interface/annotate_paths.py less more

24	24
25	25	from datalad.interface.base import Interface
26	26	from datalad.interface.utils import eval_results
27		from datalad.interface.utils import build_doc
	27	from datalad.interface.base import build_doc
28	28	from datalad.interface.results import get_status_dict
29	29	from datalad.support.constraints import EnsureStr
30	30	from datalad.support.constraints import EnsureBool

+77

-4

datalad/interface/base.py less more

23	23	from ..ui import ui
24	24	from ..dochelpers import exc_str
25	25
	26	from datalad.interface.common_opts import eval_params
	27	from datalad.interface.common_opts import eval_defaults
26	28	from datalad.support.exceptions import InsufficientArgumentsError
27	29	from datalad.utils import with_pathsep as _with_sep
28	30	from datalad.support.constraints import EnsureKeyChoice
29	31	from datalad.distribution.dataset import Dataset
30	32	from datalad.distribution.dataset import resolve_path
	33
	34
	35	default_logchannels = {
	36	'': 'debug',
	37	'ok': 'debug',
	38	'notneeded': 'debug',
	39	'impossible': 'warning',
	40	'error': 'error',
	41	}
31	42
32	43
33	44	def get_api_name(intfspec):

241	252	# assign the amended docs
242	253	func.__doc__ = doc
243	254	return func
	255
	256
	257	def build_doc(cls, **kwargs):
	258	"""Decorator to build docstrings for datalad commands
	259
	260	It's intended to decorate the class, the __call__-method of which is the
	261	actual command. It expects that __call__-method to be decorated by
	262	eval_results.
	263
	264	Parameters
	265	----------
	266	cls: Interface
	267	class defining a datalad command
	268	"""
	269
	270	# Note, that this is a class decorator, which is executed only once when the
	271	# class is imported. It builds the docstring for the class' __call__ method
	272	# and returns the original class.
	273	#
	274	# This is because a decorator for the actual function would not be able to
	275	# behave like this. To build the docstring we need to access the attribute
	276	# _params of the class. From within a function decorator we cannot do this
	277	# during import time, since the class is being built in this very moment and
	278	# is not yet available in the module. And if we do it from within the part
	279	# of a function decorator, that is executed when the function is called, we
	280	# would need to actually call the command once in order to build this
	281	# docstring.
	282
	283	lgr.debug("Building doc for {}".format(cls))
	284
	285	cls_doc = cls.__doc__
	286	if hasattr(cls, '_docs_'):
	287	# expand docs
	288	cls_doc = cls_doc.format(**cls._docs_)
	289
	290	call_doc = None
	291	# suffix for update_docstring_with_parameters:
	292	if cls.__call__.__doc__:
	293	call_doc = cls.__call__.__doc__
	294
	295	# build standard doc and insert eval_doc
	296	spec = getattr(cls, '_params_', dict())
	297	# get docs for eval_results parameters:
	298	spec.update(eval_params)
	299
	300	update_docstring_with_parameters(
	301	cls.__call__, spec,
	302	prefix=alter_interface_docs_for_api(cls_doc),
	303	suffix=alter_interface_docs_for_api(call_doc),
	304	add_args=eval_defaults if not hasattr(cls, '_no_eval_results') else None
	305	)
	306
	307	# return original
	308	return cls
244	309
245	310
246	311	class Interface(object):

323	388	'AddArchiveContent', 'AggregateMetaData',
324	389	'CrawlInit', 'Crawl', 'CreateSiblingGithub',
325	390	'CreateTestDataset', 'DownloadURL', 'Export', 'Ls', 'Move',
326		'Publish', 'SSHRun', 'Search'):
	391	'Publish', 'SSHRun', 'Search', 'Test'):
327	392	# set all common args explicitly to override class defaults
328	393	# that are tailored towards the the Python API
329	394	kwargs['return_type'] = 'generator'

482	547	return content_by_ds, unavailable_paths
483	548
484	549
485		def merge_allargs2kwargs(call, args, kwargs):
486		"""Generate a kwargs dict from a call signature and args, *kwargs"""
	550	def get_allargs_as_kwargs(call, args, kwargs):
	551	"""Generate a kwargs dict from a call signature and args, *kwargs
	552
	553	Basically resolving the argnames for all positional arguments, and
	554	resolvin the defaults for all kwargs that are not given in a kwargs
	555	dict
	556	"""
487	557	from inspect import getargspec
488	558	argspec = getargspec(call)
489	559	defaults = argspec.defaults

498	568	kwargs_[k] = v
499	569	# update with provided kwarg args
500	570	kwargs_.update(kwargs)
501		assert (nargs == len(kwargs_))
	571	# XXX we cannot assert the following, because our own highlevel
	572	# API commands support more kwargs than what is discoverable
	573	# from their signature...
	574	#assert (nargs == len(kwargs_))
502	575	return kwargs_

+1

-1

datalad/interface/clean.py less more

26	26	from datalad.interface.common_opts import recursion_limit
27	27	from datalad.interface.results import get_status_dict
28	28	from datalad.interface.utils import eval_results
29		from datalad.interface.utils import build_doc
	29	from datalad.interface.base import build_doc
30	30
31	31	from logging import getLogger
32	32	lgr = getLogger('datalad.api.clean')

+15

-0

datalad/interface/common_cfg.py less more

12	12	__docformat__ = 'restructuredtext'
13	13
14	14	from appdirs import AppDirs
	15	from os.path import join as opj
15	16	from datalad.support.constraints import EnsureBool
16	17	from datalad.support.constraints import EnsureInt
17	18

67	68	'destination': 'global',
68	69	'default': dirs.user_cache_dir,
69	70	},
	71	'datalad.locations.system-plugins': {
	72	'ui': ('question', {
	73	'title': 'System plugin directory',
	74	'text': 'Where should datalad search for system plugins?'}),
	75	'destination': 'global',
	76	'default': opj(dirs.site_config_dir, 'plugins'),
	77	},
	78	'datalad.locations.user-plugins': {
	79	'ui': ('question', {
	80	'title': 'User plugin directory',
	81	'text': 'Where should datalad search for user plugins?'}),
	82	'destination': 'global',
	83	'default': opj(dirs.user_config_dir, 'plugins'),
	84	},
70	85	'datalad.exc.str.tblimit': {
71	86	'ui': ('question', {
72	87	'title': 'This flag is used by the datalad extract_tb function which extracts and formats stack-traces. It caps the number of lines to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.'}),

+87

-0

datalad/interface/common_opts.py less more

11	11
12	12	__docformat__ = 'restructuredtext'
13	13
	14	from datalad.interface.results import known_result_xfms
14	15	from datalad.support.param import Parameter
15	16	from datalad.support.constraints import EnsureInt, EnsureNone, EnsureStr
16	17	from datalad.support.constraints import EnsureChoice
	18	from datalad.support.constraints import EnsureCallable
17	19
18	20
19	21	location_description = Parameter(

214	216	By default it would fail the run ('fail' setting). With 'inherit' a
215	217	'create-sibling' with '--inherit-settings' will be used to create sibling
216	218	on the remote. With 'skip' - it simply will be skipped.""")
	219
	220	with_plugin_opt = Parameter(
	221	args=('--with-plugin',),
	222	nargs='*',
	223	action='append',
	224	metavar='PLUGINSPEC',
	225	doc="""DataLad plugin to run in addition. PLUGINSPEC is a list
	226	comprised of a plugin name plus optional `key=value` pairs with arguments
	227	for the plugin call (see `plugin` command documentation for details).
	228	[PY: PLUGINSPECs must be wrapped in list where each item configures
	229	one plugin call. Plugins are called in the order defined by this list.
	230	PY][CMD: This option can be given more than once to run multiple plugins
	231	in the order in which they are given. CMD]""")
	232
	233	# define parameters to be used by eval_results to tune behavior
	234	# Note: This is done outside eval_results in order to be available when building
	235	# docstrings for the decorated functions
	236	# TODO: May be we want to move them to be part of the classes _params. Depends
	237	# on when and how eval_results actually has to determine the class.
	238	# Alternatively build a callable class with these to even have a fake signature
	239	# that matches the parameters, so they can be evaluated and defined the exact
	240	# same way.
	241
	242	eval_params = dict(
	243	return_type=Parameter(
	244	doc="""return value behavior switch. If 'item-or-list' a single
	245	value is returned instead of a one-item return value list, or a
	246	list in case of multiple return values. `None` is return in case
	247	of an empty list.""",
	248	constraints=EnsureChoice('generator', 'list', 'item-or-list')),
	249	result_filter=Parameter(
	250	doc="""if given, each to-be-returned
	251	status dictionary is passed to this callable, and is only
	252	returned if the callable's return value does not
	253	evaluate to False or a ValueError exception is raised. If the given
	254	callable supports `**kwargs` it will additionally be passed the
	255	keyword arguments of the original API call.""",
	256	constraints=EnsureCallable() \| EnsureNone()),
	257	result_xfm=Parameter(
	258	doc="""if given, each to-be-returned result
	259	status dictionary is passed to this callable, and its return value
	260	becomes the result instead. This is different from
	261	`result_filter`, as it can perform arbitrary transformation of the
	262	result value. This is mostly useful for top-level command invocations
	263	that need to provide the results in a particular format. Instead of
	264	a callable, a label for a pre-crafted result transformation can be
	265	given.""",
	266	constraints=EnsureChoice(*list(known_result_xfms.keys())) \| EnsureCallable() \| EnsureNone()),
	267	result_renderer=Parameter(
	268	doc="""format of return value rendering on stdout""",
	269	constraints=EnsureChoice('default', 'json', 'json_pp', 'tailored') \| EnsureNone()),
	270	on_failure=Parameter(
	271	doc="""behavior to perform on failure: 'ignore' any failure is reported,
	272	but does not cause an exception; 'continue' if any failure occurs an
	273	exception will be raised at the end, but processing other actions will
	274	continue for as long as possible; 'stop': processing will stop on first
	275	failure and an exception is raised. A failure is any result with status
	276	'impossible' or 'error'. Raised exception is an IncompleteResultsError
	277	that carries the result dictionaries of the failures in its `failed`
	278	attribute.""",
	279	constraints=EnsureChoice('ignore', 'continue', 'stop')),
	280	run_before=Parameter(
	281	doc="""DataLad plugin to run before the command. PLUGINSPEC is a list
	282	comprised of a plugin name plus optional 2-tuples of key-value pairs
	283	with arguments for the plugin call (see `plugin` command documentation
	284	for details).
	285	PLUGINSPECs must be wrapped in list where each item configures
	286	one plugin call. Plugins are called in the order defined by this list.
	287	For running plugins that require a `dataset` argument it is important
	288	to provide the respective dataset as the `dataset` argument of the main
	289	command, if it is not in the list of plugin arguments."""),
	290	run_after=Parameter(
	291	doc="""Like `run_before`, but plugins are executed after the main command
	292	has finished."""),
	293	)
	294
	295	eval_defaults = dict(
	296	return_type='list',
	297	result_filter=None,
	298	result_renderer=None,
	299	result_xfm=None,
	300	on_failure='continue',
	301	run_before=None,
	302	run_after=None,
	303	)

+1

-1

datalad/interface/crawl.py less more

12	12
13	13	from os.path import exists
14	14	from .base import Interface
15		from datalad.interface.utils import build_doc
	15	from datalad.interface.base import build_doc
16	16
17	17	from datalad.support.param import Parameter
18	18	from datalad.support.constraints import EnsureStr, EnsureNone

+1

-1

datalad/interface/crawl_init.py less more

11	11
12	12	from os.path import curdir
13	13	from .base import Interface
14		from datalad.interface.utils import build_doc
	14	from datalad.interface.base import build_doc
15	15	from collections import OrderedDict
16	16	from datalad.distribution.dataset import Dataset
17	17

+1

-1

datalad/interface/diff.py less more

21	21	from datalad.interface.annotate_paths import annotated2content_by_ds
22	22	from datalad.interface.base import Interface
23	23	from datalad.interface.utils import eval_results
24		from datalad.interface.utils import build_doc
	24	from datalad.interface.base import build_doc
25	25	from datalad.support.constraints import EnsureNone
26	26	from datalad.support.constraints import EnsureStr
27	27	from datalad.support.constraints import EnsureChoice

+1

-1

datalad/interface/download_url.py less more

18	18	from os.path import isdir, curdir
19	19
20	20	from .base import Interface
21		from datalad.interface.utils import build_doc
	21	from datalad.interface.base import build_doc
22	22	from ..ui import ui
23	23	from ..utils import assure_list_from_str
24	24	from ..dochelpers import exc_str

+2

-3

datalad/interface/ls.py less more

25	25	from ..cmdline.helpers import get_repo_instance
26	26	from ..utils import auto_repr
27	27	from .base import Interface
28		from datalad.interface.utils import build_doc
	28	from datalad.interface.base import build_doc
29	29	from ..ui import ui
30	30	from ..utils import swallow_logs
31	31	from ..consts import METADATA_DIR

53	53
54	54	ATM only s3:// URLs and datasets are supported
55	55
56		Examples
57		--------
	56	Examples:
58	57
59	58	$ datalad ls s3://openfmri/tarballs/ds202 # to list S3 bucket
60	59	$ datalad ls # to list current dataset

+1

-1

datalad/interface/save.py less more

33	33	from datalad.interface.common_opts import save_message_opt
34	34	from datalad.interface.results import get_status_dict
35	35	from datalad.interface.utils import eval_results
36		from datalad.interface.utils import build_doc
	36	from datalad.interface.base import build_doc
37	37	from datalad.interface.utils import get_tree_roots
38	38	from datalad.interface.utils import discover_dataset_trace_to_targets
39	39

+1

-1

datalad/interface/test.py less more

13	13
14	14	import datalad
15	15	from .base import Interface
16		from datalad.interface.utils import build_doc
	16	from datalad.interface.base import build_doc
17	17
18	18
19	19	@build_doc

+1

-1

datalad/interface/tests/test_utils.py less more

34	34
35	35	from ..base import Interface
36	36	from ..utils import eval_results
37		from ..utils import build_doc
	37	from datalad.interface.base import build_doc
38	38	from ..utils import handle_dirty_dataset
39	39	from ..utils import get_paths_by_dataset
40	40	from ..utils import filter_unmodified

+1

-1

datalad/interface/unlock.py less more

25	25	from datalad.interface.annotate_paths import annotated2content_by_ds
26	26	from datalad.interface.results import get_status_dict
27	27	from datalad.interface.utils import eval_results
28		from datalad.interface.utils import build_doc
	28	from datalad.interface.base import build_doc
29	29	from datalad.interface.common_opts import recursion_flag
30	30	from datalad.interface.common_opts import recursion_limit
31	31

+199

-235

datalad/interface/utils.py less more

15	15	import logging
16	16	import wrapt
17	17	import sys
	18	import re
	19	import shlex
18	20	from os import curdir
19	21	from os import pardir
20	22	from os import listdir
21		from os import linesep
22	23	from os.path import join as opj
23	24	from os.path import lexists
24	25	from os.path import isdir

45	46	from datalad import cfg as dlcfg
46	47	from datalad.dochelpers import exc_str
47	48
	49
48	50	from datalad.support.constraints import Constraint
49		from datalad.support.constraints import EnsureChoice
50		from datalad.support.constraints import EnsureNone
51		from datalad.support.constraints import EnsureCallable
52		from datalad.support.param import Parameter
53	51
54	52	from datalad.ui import ui
55
56		from .base import Interface
57		from .base import update_docstring_with_parameters
58		from .base import alter_interface_docs_for_api
59		from .base import merge_allargs2kwargs
	53	import datalad.support.ansi_colors as ac
	54
	55	from datalad.interface.base import Interface
	56	from datalad.interface.base import default_logchannels
	57	from datalad.interface.base import get_allargs_as_kwargs
	58	from datalad.interface.common_opts import eval_params
	59	from datalad.interface.common_opts import eval_defaults
60	60	from .results import known_result_xfms
61	61
62	62
63	63	lgr = logging.getLogger('datalad.interface.utils')
	64
	65
	66	def cls2cmdlinename(cls):
	67	"Return the cmdline command name from an Interface class"
	68	r = re.compile(r'([a-z0-9])([A-Z])')
	69	return r.sub('\\1-\\2', cls.__name__).lower()
64	70
65	71
66	72	def handle_dirty_dataset(ds, mode, msg=None):

507	513	return keep
508	514
509	515
510		# define parameters to be used by eval_results to tune behavior
511		# Note: This is done outside eval_results in order to be available when building
512		# docstrings for the decorated functions
513		# TODO: May be we want to move them to be part of the classes _params. Depends
514		# on when and how eval_results actually has to determine the class.
515		# Alternatively build a callable class with these to even have a fake signature
516		# that matches the parameters, so they can be evaluated and defined the exact
517		# same way.
518
519		eval_params = dict(
520		return_type=Parameter(
521		doc="""return value behavior switch. If 'item-or-list' a single
522		value is returned instead of a one-item return value list, or a
523		list in case of multiple return values. `None` is return in case
524		of an empty list.""",
525		constraints=EnsureChoice('generator', 'list', 'item-or-list')),
526		result_filter=Parameter(
527		doc="""if given, each to-be-returned
528		status dictionary is passed to this callable, and is only
529		returned if the callable's return value does not
530		evaluate to False or a ValueError exception is raised. If the given
531		callable supports `**kwargs` it will additionally be passed the
532		keyword arguments of the original API call.""",
533		constraints=EnsureCallable() \| EnsureNone()),
534		result_xfm=Parameter(
535		doc="""if given, each to-be-returned result
536		status dictionary is passed to this callable, and its return value
537		becomes the result instead. This is different from
538		`result_filter`, as it can perform arbitrary transformation of the
539		result value. This is mostly useful for top-level command invocations
540		that need to provide the results in a particular format. Instead of
541		a callable, a label for a pre-crafted result transformation can be
542		given.""",
543		constraints=EnsureChoice(*list(known_result_xfms.keys())) \| EnsureCallable() \| EnsureNone()),
544		result_renderer=Parameter(
545		doc="""format of return value rendering on stdout""",
546		constraints=EnsureChoice('default', 'json', 'json_pp', 'tailored') \| EnsureNone()),
547		on_failure=Parameter(
548		doc="""behavior to perform on failure: 'ignore' any failure is reported,
549		but does not cause an exception; 'continue' if any failure occurs an
550		exception will be raised at the end, but processing other actions will
551		continue for as long as possible; 'stop': processing will stop on first
552		failure and an exception is raised. A failure is any result with status
553		'impossible' or 'error'. Raised exception is an IncompleteResultsError
554		that carries the result dictionaries of the failures in its `failed`
555		attribute.""",
556		constraints=EnsureChoice('ignore', 'continue', 'stop')),
557		)
558		eval_defaults = dict(
559		return_type='list',
560		result_filter=None,
561		result_renderer=None,
562		result_xfm=None,
563		on_failure='continue',
564		)
565
566
567	516	def eval_results(func):
568	517	"""Decorator for return value evaluation of datalad commands.
569	518

605	554	i.e. a datalad command definition
606	555	"""
607	556
608		default_logchannels = {
609		'': 'debug',
610		'ok': 'debug',
611		'notneeded': 'debug',
612		'impossible': 'warning',
613		'error': 'error',
614		}
615
616	557	@wrapt.decorator
617	558	def eval_func(wrapped, instance, args, kwargs):
618
	559	# for result filters and pre/post plugins
	560	# we need to produce a dict with argname/argvalue pairs for all args
	561	# incl. defaults and args given as positionals
	562	allkwargs = get_allargs_as_kwargs(wrapped, args, kwargs)
619	563	# determine class, the __call__ method of which we are decorating:
620	564	# Ben: Note, that this is a bit dirty in PY2 and imposes restrictions on
621	565	# when and how to use eval_results as well as on how to name a command's

644	588	_func_class = mod.__dict__[command_class_name]
645	589	lgr.debug("Determined class of decorated function: %s", _func_class)
646	590
	591	# retrieve common options from kwargs, and fall back on the command
	592	# class attributes, or general defaults if needed
647	593	common_params = {
648	594	p_name: kwargs.pop(
649	595	p_name,
650	596	getattr(_func_class, p_name, eval_defaults[p_name]))
651	597	for p_name in eval_params}
	598	# short cuts and configured setup for common options
	599	on_failure = common_params['on_failure']
	600	return_type = common_params['return_type']
	601	# resolve string labels for transformers too
	602	result_xfm = common_params['result_xfm']
	603	if result_xfm in known_result_xfms:
	604	result_xfm = known_result_xfms[result_xfm]
652	605	result_renderer = common_params['result_renderer']
653
	606	# TODO remove this conditional branch entirely, done outside
	607	if not result_renderer:
	608	result_renderer = dlcfg.get('datalad.api.result-renderer', None)
	609	# wrap the filter into a helper to be able to pass additional arguments
	610	# if the filter supports it, but at the same time keep the required interface
	611	# as minimal as possible. Also do this here, in order to avoid this test
	612	# to be performed for each return value
	613	result_filter = common_params['result_filter']
	614	_result_filter = result_filter
	615	if result_filter:
	616	if isinstance(result_filter, Constraint):
	617	_result_filter = result_filter.__call__
	618	if (PY2 and inspect.getargspec(_result_filter).keywords) or \
	619	(not PY2 and inspect.getfullargspec(_result_filter).varkw):
	620
	621	def _result_filter(res):
	622	return result_filter(res, **allkwargs)
	623
	624	def _get_plugin_specs(param_key=None, cfg_key=None):
	625	spec = common_params.get(param_key, None)
	626	if spec is not None:
	627	# this is already a list of lists
	628	return spec
	629
	630	spec = dlcfg.get(cfg_key, None)
	631	if spec is None:
	632	return
	633	elif not isinstance(spec, tuple):
	634	spec = [spec]
	635	return [shlex.split(s) for s in spec]
	636
	637	# query cfg for defaults
	638	cmdline_name = cls2cmdlinename(_func_class)
	639	run_before = _get_plugin_specs(
	640	'run_before',
	641	'datalad.{}.run-before'.format(cmdline_name))
	642	run_after = _get_plugin_specs(
	643	'run_after',
	644	'datalad.{}.run-after'.format(cmdline_name))
	645
	646	# this internal helper function actually drives the command
	647	# generator-style, it may generate an exception if desired,
	648	# on incomplete results
654	649	def generator_func(_args, *_kwargs):
655		# obtain results
656		results = wrapped(_args, *_kwargs)
	650	from datalad.plugin import Plugin
	651
657	652	# flag whether to raise an exception
658		# TODO actually compose a meaningful exception
659	653	incomplete_results = []
660		# inspect and render
661		result_filter = common_params['result_filter']
662		# wrap the filter into a helper to be able to pass additional arguments
663		# if the filter supports it, but at the same time keep the required interface
664		# as minimal as possible. Also do this here, in order to avoid this test
665		# to be performed for each return value
666		_result_filter = result_filter
667		if result_filter:
668		if isinstance(result_filter, Constraint):
669		_result_filter = result_filter.__call__
670		if (PY2 and inspect.getargspec(_result_filter).keywords) or \
671		(not PY2 and inspect.getfullargspec(_result_filter).varkw):
672		# we need to produce a dict with argname/argvalue pairs for all args
673		# incl. defaults and args given as positionals
674		fullkwargs_ = merge_allargs2kwargs(wrapped, _args, _kwargs)
675
676		def _result_filter(res):
677		return result_filter(res, **fullkwargs_)
678		result_renderer = common_params['result_renderer']
679		result_xfm = common_params['result_xfm']
680		if result_xfm in known_result_xfms:
681		result_xfm = known_result_xfms[result_xfm]
682		on_failure = common_params['on_failure']
683		if not result_renderer:
684		result_renderer = dlcfg.get('datalad.api.result-renderer', None)
685	654	# track what actions were performed how many times
686	655	action_summary = {}
687		for res in results:
688		actsum = action_summary.get(res['action'], {})
689		if res['status']:
690		actsum[res['status']] = actsum.get(res['status'], 0) + 1
691		action_summary[res['action']] = actsum
692		## log message, if a logger was given
693		# remove logger instance from results, as it is no longer useful
694		# after logging was done, it isn't serializable, and generally
695		# pollutes the output
696		res_lgr = res.pop('logger', None)
697		if isinstance(res_lgr, logging.Logger):
698		# didn't get a particular log function, go with default
699		res_lgr = getattr(res_lgr, default_logchannels[res['status']])
700		if res_lgr and 'message' in res:
701		msg = res['message']
702		msgargs = None
703		if isinstance(msg, tuple):
704		msgargs = msg[1:]
705		msg = msg[0]
706		if 'path' in res:
707		msg = '{} [{}({})]'.format(
708		msg, res['action'], res['path'])
709		if msgargs:
710		# support string expansion of logging to avoid runtime cost
711		res_lgr(msg, *msgargs)
712		else:
713		res_lgr(msg)
714		## error handling
715		# looks for error status, and report at the end via
716		# an exception
717		if on_failure in ('continue', 'stop') \
718		and res['status'] in ('impossible', 'error'):
719		incomplete_results.append(res)
720		if on_failure == 'stop':
721		# first fail -> that's it
722		# raise will happen after the loop
723		break
724		if _result_filter:
725		try:
726		if not _result_filter(res):
727		raise ValueError('excluded by filter')
728		except ValueError as e:
729		lgr.debug('not reporting result (%s)', exc_str(e))
730		continue
731		## output rendering
732		if result_renderer == 'default':
733		# TODO have a helper that can expand a result message
734		ui.message('{action}({status}): {path}{type}{msg}'.format(
735		action=res['action'],
736		status=res['status'],
737		path=relpath(res['path'],
738		res['refds']) if res.get('refds', None) else res['path'],
739		type=' ({})'.format(res['type']) if 'type' in res else '',
740		msg=' [{}]'.format(
741		res['message'][0] % res['message'][1:]
742		if isinstance(res['message'], tuple) else res['message'])
743		if 'message' in res else ''))
744		elif result_renderer in ('json', 'json_pp'):
745		ui.message(json.dumps(
746		{k: v for k, v in res.items()
747		if k not in ('message', 'logger')},
748		sort_keys=True,
749		indent=2 if result_renderer.endswith('_pp') else None))
750		elif result_renderer == 'tailored':
751		if hasattr(_func_class, 'custom_result_renderer'):
752		_func_class.custom_result_renderer(res, **_kwargs)
753		elif hasattr(result_renderer, '__call__'):
754		result_renderer(res, **_kwargs)
755		if result_xfm:
756		res = result_xfm(res)
757		if res is None:
758		continue
759		yield res
760
	656
	657	for pluginspec in run_before or []:
	658	lgr.debug('Running pre-proc plugin %s', pluginspec)
	659	for r in _process_results(
	660	Plugin.__call__(
	661	pluginspec,
	662	dataset=allkwargs.get('dataset', None),
	663	return_type='generator'),
	664	_func_class, action_summary,
	665	on_failure, incomplete_results,
	666	result_renderer, result_xfm, result_filter,
	667	**_kwargs):
	668	yield r
	669
	670	# process main results
	671	for r in _process_results(
	672	wrapped(_args, *_kwargs),
	673	_func_class, action_summary,
	674	on_failure, incomplete_results,
	675	result_renderer, result_xfm, _result_filter, **_kwargs):
	676	yield r
	677
	678	for pluginspec in run_after or []:
	679	lgr.debug('Running post-proc plugin %s', pluginspec)
	680	for r in _process_results(
	681	Plugin.__call__(
	682	pluginspec,
	683	dataset=allkwargs.get('dataset', None),
	684	return_type='generator'),
	685	_func_class, action_summary,
	686	on_failure, incomplete_results,
	687	result_renderer, result_xfm, result_filter,
	688	**_kwargs):
	689	yield r
	690
	691	# result summary before a potential exception
761	692	if result_renderer == 'default' and action_summary and \
762	693	sum(sum(s.values()) for s in action_summary.values()) > 1:
763	694	# give a summary in default mode, when there was more than one

770	701	for act in sorted(action_summary))))
771	702
772	703	if incomplete_results:
773		# stupid catch all message <- tailor TODO
774	704	raise IncompleteResultsError(
775	705	failed=incomplete_results,
776	706	msg="Command did not complete successfully")
777	707
778		if common_params['return_type'] == 'generator':
	708	if return_type == 'generator':
	709	# hand over the generator
779	710	return generator_func(args, *kwargs)
780	711	else:
781	712	@wrapt.decorator
782	713	def return_func(wrapped_, instance_, args_, kwargs_):
783	714	results = wrapped_(args_, *kwargs_)
784	715	if inspect.isgenerator(results):
	716	# unwind generator if there is one, this actually runs
	717	# any processing
785	718	results = list(results)
786	719	# render summaries
787		if not common_params['result_xfm'] and result_renderer == 'tailored':
	720	if not result_xfm and result_renderer == 'tailored':
788	721	# cannot render transformed results
789	722	if hasattr(_func_class, 'custom_result_summary_renderer'):
790	723	_func_class.custom_result_summary_renderer(results)
791		if common_params['return_type'] == 'item-or-list' and \
	724	if return_type == 'item-or-list' and \
792	725	len(results) < 2:
793	726	return results[0] if results else None
794	727	else:

799	732	return eval_func(func)
800	733
801	734
802		def build_doc(cls, **kwargs):
803		"""Decorator to build docstrings for datalad commands
804
805		It's intended to decorate the class, the __call__-method of which is the
806		actual command. It expects that __call__-method to be decorated by
807		eval_results.
808
809		Parameters
810		----------
811		cls: Interface
812		class defining a datalad command
813		"""
814
815		# Note, that this is a class decorator, which is executed only once when the
816		# class is imported. It builds the docstring for the class' __call__ method
817		# and returns the original class.
818		#
819		# This is because a decorator for the actual function would not be able to
820		# behave like this. To build the docstring we need to access the attribute
821		# _params of the class. From within a function decorator we cannot do this
822		# during import time, since the class is being built in this very moment and
823		# is not yet available in the module. And if we do it from within the part
824		# of a function decorator, that is executed when the function is called, we
825		# would need to actually call the command once in order to build this
826		# docstring.
827
828		lgr.debug("Building doc for {}".format(cls))
829
830		cls_doc = cls.__doc__
831		if hasattr(cls, '_docs_'):
832		# expand docs
833		cls_doc = cls_doc.format(**cls._docs_)
834
835		call_doc = None
836		# suffix for update_docstring_with_parameters:
837		if cls.__call__.__doc__:
838		call_doc = cls.__call__.__doc__
839
840		# build standard doc and insert eval_doc
841		spec = getattr(cls, '_params_', dict())
842		# get docs for eval_results parameters:
843		spec.update(eval_params)
844
845		update_docstring_with_parameters(
846		cls.__call__, spec,
847		prefix=alter_interface_docs_for_api(cls_doc),
848		suffix=alter_interface_docs_for_api(call_doc),
849		add_args=eval_defaults if not hasattr(cls, '_no_eval_results') else None
850		)
851
852		# return original
853		return cls
	735	def _process_results(
	736	results, cmd_class,
	737	action_summary, on_failure, incomplete_results,
	738	result_renderer, result_xfm, result_filter, **kwargs):
	739	# private helper pf @eval_results
	740	# loop over results generated from some source and handle each
	741	# of them according to the requested behavior (logging, rendering, ...)
	742	for res in results:
	743	actsum = action_summary.get(res['action'], {})
	744	if res['status']:
	745	actsum[res['status']] = actsum.get(res['status'], 0) + 1
	746	action_summary[res['action']] = actsum
	747	## log message, if a logger was given
	748	# remove logger instance from results, as it is no longer useful
	749	# after logging was done, it isn't serializable, and generally
	750	# pollutes the output
	751	res_lgr = res.pop('logger', None)
	752	if isinstance(res_lgr, logging.Logger):
	753	# didn't get a particular log function, go with default
	754	res_lgr = getattr(res_lgr, default_logchannels[res['status']])
	755	if res_lgr and 'message' in res:
	756	msg = res['message']
	757	msgargs = None
	758	if isinstance(msg, tuple):
	759	msgargs = msg[1:]
	760	msg = msg[0]
	761	if 'path' in res:
	762	msg = '{} [{}({})]'.format(
	763	msg, res['action'], res['path'])
	764	if msgargs:
	765	# support string expansion of logging to avoid runtime cost
	766	res_lgr(msg, *msgargs)
	767	else:
	768	res_lgr(msg)
	769	## error handling
	770	# looks for error status, and report at the end via
	771	# an exception
	772	if on_failure in ('continue', 'stop') \
	773	and res['status'] in ('impossible', 'error'):
	774	incomplete_results.append(res)
	775	if on_failure == 'stop':
	776	# first fail -> that's it
	777	# raise will happen after the loop
	778	break
	779	if result_filter:
	780	try:
	781	if not result_filter(res):
	782	raise ValueError('excluded by filter')
	783	except ValueError as e:
	784	lgr.debug('not reporting result (%s)', exc_str(e))
	785	continue
	786	## output rendering
	787	# TODO RF this in a simple callable that gets passed into this function
	788	if result_renderer == 'default':
	789	# TODO have a helper that can expand a result message
	790	ui.message('{action}({status}): {path}{type}{msg}'.format(
	791	action=ac.color_word(res['action'], ac.BOLD),
	792	status=ac.color_status(res['status']),
	793	path=relpath(res['path'],
	794	res['refds']) if res.get('refds', None) else res['path'],
	795	type=' ({})'.format(
	796	ac.color_word(res['type'], ac.MAGENTA)
	797	) if 'type' in res else '',
	798	msg=' [{}]'.format(
	799	res['message'][0] % res['message'][1:]
	800	if isinstance(res['message'], tuple) else res['message'])
	801	if 'message' in res else ''))
	802	elif result_renderer in ('json', 'json_pp'):
	803	ui.message(json.dumps(
	804	{k: v for k, v in res.items()
	805	if k not in ('message', 'logger')},
	806	sort_keys=True,
	807	indent=2 if result_renderer.endswith('_pp') else None))
	808	elif result_renderer == 'tailored':
	809	if hasattr(cmd_class, 'custom_result_renderer'):
	810	cmd_class.custom_result_renderer(res, **kwargs)
	811	elif hasattr(result_renderer, '__call__'):
	812	result_renderer(res, **kwargs)
	813	if result_xfm:
	814	res = result_xfm(res)
	815	if res is None:
	816	continue
	817	yield res

+7

-6

datalad/metadata/aggregate.py less more

13	13	import os
14	14	from os.path import join as opj, exists, relpath, dirname
15	15	from datalad.interface.base import Interface
16		from datalad.interface.utils import build_doc
	16	from datalad.interface.base import build_doc
17	17	from datalad.interface.utils import handle_dirty_dataset
18	18	from datalad.interface.common_opts import recursion_limit, recursion_flag
19	19	from datalad.interface.common_opts import if_dirty_opt

47	47	types are configures. Moreover, it is possible to aggregate meta data from
48	48	any subdatasets into the superdataset, in order to facilitate data
49	49	discovery without having to obtain any subdataset.
50
51		Returns
52		-------
53		List
54		Any datasets where (updated) aggregated meta data was saved.
55	50	"""
56	51	# XXX prevent common args from being added to the docstring
57	52	_no_eval_results = True

84	79	recursion_limit=None,
85	80	save=True,
86	81	if_dirty='save-before'):
	82	"""
	83	Returns
	84	-------
	85	List
	86	Any datasets where (updated) aggregated meta data was saved.
	87	"""
87	88	ds = require_dataset(
88	89	dataset, check_installed=True, purpose='meta data aggregation')
89	90	modified_ds = []

+1

-1

datalad/metadata/metadata.py less more

25	25	from datalad.interface.save import Save
26	26	from datalad.interface.results import get_status_dict
27	27	from datalad.interface.utils import eval_results
28		from datalad.interface.utils import build_doc
	28	from datalad.interface.base import build_doc
29	29	from datalad.support.constraints import EnsureNone
30	30	from datalad.support.constraints import EnsureStr
31	31	from datalad.support.gitrepo import GitRepo

+9

-9

datalad/metadata/search.py less more

22	22	from six import reraise
23	23	from six import PY3
24	24	from datalad.interface.base import Interface
25		from datalad.interface.utils import build_doc
	25	from datalad.interface.base import build_doc
26	26	from datalad.distribution.dataset import Dataset
27	27	from datalad.distribution.dataset import datasetmethod, EnsureDataset, \
28	28	require_dataset

44	44	@build_doc
45	45	class Search(Interface):
46	46	"""Search within available in datasets' meta data
47
48		Yields
49		------
50		location : str
51		(relative) path to the dataset
52		report : dict
53		fields which were requested by `report` option
54
55	47	"""
56	48	# XXX prevent common args from being added to the docstring
57	49	_no_eval_results = True

122	114	report_matched=False,
123	115	format='custom',
124	116	regex=False):
	117	"""
	118	Yields
	119	------
	120	location : str
	121	(relative) path to the dataset
	122	report : dict
	123	fields which were requested by `report` option
	124	"""
125	125
126	126	lgr.debug("Initiating search for match=%r and dataset %r",
127	127	match, dataset)

+260

-0

datalad/plugin/__init__.py less more

	0	# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# ex: set sts=4 ts=4 sw=4 noet:
	2	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	3	#
	4	# See COPYING file distributed along with the datalad package for the
	5	# copyright and license terms.
	6	#
	7	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	8	"""
	9
	10	"""
	11
	12	__docformat__ = 'restructuredtext'
	13
	14	import logging
	15	from glob import glob
	16	import re
	17	from os.path import join as opj, basename, dirname
	18	from os import curdir
	19	import inspect
	20
	21	from datalad import cfg
	22	from datalad.support.param import Parameter
	23	from datalad.support.constraints import EnsureNone
	24	from datalad.distribution.dataset import EnsureDataset
	25	from datalad.distribution.dataset import datasetmethod
	26	from datalad.distribution.dataset import require_dataset
	27	from datalad.dochelpers import exc_str
	28
	29	from datalad.interface.base import Interface
	30	from datalad.interface.base import dedent_docstring
	31	from datalad.interface.base import build_doc
	32	from datalad.interface.utils import eval_results
	33	from datalad.ui import ui
	34
	35	lgr = logging.getLogger('datalad.plugin')
	36
	37	argspec = re.compile(r'^([a-zA-z][a-zA-Z0-9_])=(.)$')
	38
	39
	40	def _get_plugins():
	41	locations = (
	42	dirname(__file__),
	43	cfg.obtain('datalad.locations.system-plugins'),
	44	cfg.obtain('datalad.locations.user-plugins'))
	45	return {basename(e)[:-3]: {'file': e}
	46	for plugindir in locations
	47	for e in glob(opj(plugindir, '[!_]*.py'))}
	48
	49
	50	def _load_plugin(filepath):
	51	locals = {}
	52	globals = {}
	53	try:
	54	exec(compile(open(filepath, "rb").read(),
	55	filepath, 'exec'),
	56	globals,
	57	locals)
	58	except Exception as e:
	59	# any exception means full stop
	60	raise ValueError('plugin at {} is broken: {}'.format(
	61	filepath, exc_str(e)))
	62	if not len(locals) or 'dlplugin' not in locals:
	63	raise ValueError(
	64	"loading plugin '%s' did not yield a 'dlplugin' symbol, found: %s",
	65	filepath, locals.keys() if len(locals) else None)
	66	return locals['dlplugin']
	67
	68
	69	@build_doc
	70	class Plugin(Interface):
	71	"""Generic plugin interface
	72
	73	Using this command, arbitrary DataLad plugins can be executed. Plugins in
	74	three different locations are available
	75
	76	1. official plugins that are part of the local DataLad installation
	77
	78	2. system-wide plugins, location configuration::
	79
	80	datalad.locations.system-plugins
	81
	82	3. user-supplied plugins, location configuration::
	83
	84	datalad.locations.user-plugins
	85
	86	Identically named plugins in latter location replace those in locations
	87	searched before.
	88
	89	Using plugins
	90
	91	A list of all available plugins can be obtained by running this command
	92	without arguments::
	93
	94	datalad plugin
	95
	96	To run a specific plugin, provide the plugin name as an argument::
	97
	98	datalad plugin export_tarball
	99
	100	A plugin may come with its own documentation which can be displayed upon
	101	request::
	102
	103	datalad plugin export_tarball -H
	104
	105	If a plugin supports (optional) arguments, they can be passed to the plugin
	106	as key=value pairs with the name and the respective value of an argument,
	107	e.g.::
	108
	109	datalad plugin export_tarball output=myfile
	110
	111	Any number of arguments can be given. Only arguments with names supported
	112	by the respective plugin are passed to the plugin. If unsupported arguments
	113	are given, a warning is issued.
	114
	115	When an argument is given multiple times, all values are passed as a list
	116	to the respective argument (order of value matches the order in the
	117	plugin call)::
	118
	119	datalad plugin fancy_plugin input=this input=that
	120
	121	Like in most commands, a dedicated --dataset option is supported that
	122	can be used to identify a specific dataset to be passed to a plugin's
	123	``dataset`` argument. If a plugin requires such an argument, and no
	124	dataset was given, and none was found in the current working directory,
	125	the plugin call will fail. A dataset argument can also be passed alongside
	126	all other plugin arguments without using --dataset.
	127
	128	"""
	129	_params_ = dict(
	130	dataset=Parameter(
	131	args=("-d", "--dataset"),
	132	doc="""specify the dataset for the plugin to operate on
	133	If no dataset is given, but a plugin take a dataset as an argument,
	134	an attempt is made to identify the dataset based on the current
	135	working directory.""",
	136	constraints=EnsureDataset() \| EnsureNone()),
	137	plugin=Parameter(
	138	args=("plugin",),
	139	nargs='*',
	140	metavar='PLUGINSPEC',
	141	doc="""plugin name plus an optional list of `key=value` pairs with
	142	arguments for the plugin call"""),
	143	showpluginhelp=Parameter(
	144	args=('-H', '--show-plugin-help',),
	145	dest='showpluginhelp',
	146	action='store_true',
	147	doc="""show help for a particular"""),
	148	showplugininfo=Parameter(
	149	args=('--show-plugin-info',),
	150	dest='showplugininfo',
	151	action='store_true',
	152	doc="""show additional information in plugin overview (e.g. plugin file
	153	location"""),
	154	)
	155
	156	@staticmethod
	157	@datasetmethod(name='plugin')
	158	@eval_results
	159	def __call__(plugin=None, dataset=None, showpluginhelp=False, showplugininfo=False, **kwargs):
	160	plugins = _get_plugins()
	161	if not plugin:
	162	max_name_len = max(len(k) for k in plugins.keys())
	163	for plname, plinfo in sorted(plugins.items(), key=lambda x: x[0]):
	164	spacer = ' ' * (max_name_len - len(plname))
	165	synopsis = None
	166	try:
	167	with open(plinfo['file']) as plf:
	168	for line in plf:
	169	if line.startswith('"""'):
	170	synopsis = line.strip().strip('"').strip()
	171	break
	172	except Exception as e:
	173	ui.message('{}{} [BROKEN] {}'.format(
	174	plname, spacer, exc_str(e)))
	175	continue
	176	if synopsis:
	177	msg = '{}{} - {}'.format(
	178	plname, spacer, synopsis)
	179	else:
	180	msg = '{}{} [no synopsis]'.format(plname, spacer)
	181	if showplugininfo:
	182	msg = '{} ({})'.format(msg, plinfo['file'])
	183	ui.message(msg)
	184	return
	185	args = None
	186	if isinstance(plugin, (list, tuple)):
	187	args = plugin[1:]
	188	plugin = plugin[0]
	189	if plugin not in plugins:
	190	raise ValueError("unknown plugin '{}', available: {}".format(
	191	plugin, ','.join(plugins.keys())))
	192	user_supplied_args = set()
	193	if args:
	194	# we got some arguments in the plugin spec, parse them and add to
	195	# kwargs
	196	for arg in args:
	197	if isinstance(arg, tuple):
	198	# came from python item-style
	199	argname, argval = arg
	200	else:
	201	parsed = argspec.match(arg)
	202	if parsed is None:
	203	raise ValueError("invalid plugin argument: '{}'".format(arg))
	204	argname, argval = parsed.groups()
	205	if argname in kwargs:
	206	# argument was seen at least once before -> make list
	207	existing_val = kwargs[argname]
	208	if not isinstance(existing_val, list):
	209	existing_val = [existing_val]
	210	existing_val.append(argval)
	211	argval = existing_val
	212	kwargs[argname] = argval
	213	user_supplied_args.add(argname)
	214	plugin_call = _load_plugin(plugins[plugin]['file'])
	215
	216	if showpluginhelp:
	217	# we don't need special docs for the cmdline, standard python ones
	218	# should be comprehensible enough
	219	ui.message(
	220	dedent_docstring(plugin_call.__doc__)
	221	if plugin_call.__doc__
	222	else 'This plugin has no documentation')
	223	return
	224
	225	#
	226	# argument preprocessing
	227	#
	228	# check the plugin signature and filter out all unsupported args
	229	plugin_args, _, _, arg_defaults = inspect.getargspec(plugin_call)
	230	supported_args = {k: v for k, v in kwargs.items() if k in plugin_args}
	231	excluded_args = user_supplied_args.difference(supported_args.keys())
	232	if excluded_args:
	233	lgr.warning('ignoring plugin argument(s) %s, not supported by plugin',
	234	excluded_args)
	235	# always overwrite the dataset arg if one is needed
	236	if 'dataset' in plugin_args:
	237	supported_args['dataset'] = require_dataset(
	238	# use dedicated arg if given, also anything the came with the plugin args
	239	# or curdir as the last resort
	240	dataset if dataset else kwargs.get('dataset', curdir),
	241	# note 'dataset' arg is always first, if we have defaults for all args
	242	# we have a default for 'dataset' to -> it is optional
	243	check_installed=len(arg_defaults) != len(plugin_args),
	244	purpose='handover to plugin')
	245
	246	# call as a generator
	247	for res in plugin_call(**supported_args):
	248	if not res:
	249	continue
	250	if dataset:
	251	# enforce standard regardless of what plugin did
	252	res['refds'] = getattr(dataset, 'path', dataset)
	253	elif 'refds' in res:
	254	# no base dataset, results must not have them either
	255	del res['refds']
	256	if 'logger' not in res:
	257	# make sure we have a logger
	258	res['logger'] = lgr
	259	yield res

+76

-0

datalad/plugin/add_readme.py less more

	0	# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# ex: set sts=4 ts=4 sw=4 noet:
	2	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	3	#
	4	# See COPYING file distributed along with the datalad package for the
	5	# copyright and license terms.
	6	#
	7	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	8	"""add a README file to a dataset"""
	9
	10	__docformat__ = 'restructuredtext'
	11
	12
	13	# PLUGIN API
	14	def dlplugin(dataset, filename='README.rst', existing='skip'):
	15	"""Add basic information about DataLad datasets to a README file
	16
	17	The README file is added to the dataset and the addition is saved
	18	in the dataset.
	19
	20	Parameters
	21	----------
	22	dataset : Dataset
	23	dataset to add information to
	24	filename : str, optional
	25	path of the README file within the dataset. Default: 'README.rst'
	26	existing : {'skip', 'append', 'replace'}
	27	how to react if a file with the target name already exists:
	28	'skip': do nothing; 'append': append information to the existing
	29	file; 'replace': replace the existing file with new content.
	30	Default: 'skip'
	31
	32	"""
	33
	34	from os.path import lexists
	35	from os.path import join as opj
	36
	37	default_content="""\
	38	About this dataset
	39	==================
	40
	41	This is a DataLad dataset{id}.
	42
	43	For more information on DataLad and on how to work with its datasets,
	44	see the DataLad documentation at: http://docs.datalad.org
	45	""".format(
	46	id=' (id: {})'.format(dataset.id) if dataset.id else '')
	47	filename = opj(dataset.path, filename)
	48	res_kwargs = dict(action='add_readme', path=filename)
	49
	50	if lexists(filename) and existing == 'skip':
	51	yield dict(
	52	res_kwargs,
	53	status='notneeded',
	54	message='file already exists, and not appending content')
	55	return
	56
	57	# unlock, file could be annexed
	58	# TODO yield
	59	if lexists(filename):
	60	dataset.unlock(filename)
	61
	62	with open(filename, 'a' if existing == 'append' else 'w') as fp:
	63	fp.write(default_content)
	64	yield dict(
	65	status='ok',
	66	path=filename,
	67	type='file',
	68	action='add_readme')
	69
	70	for r in dataset.add(
	71	filename,
	72	message='[DATALAD] added README',
	73	result_filter=None,
	74	result_xfm=None):
	75	yield r

+84

-0

datalad/plugin/export_tarball.py less more

	0	# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# ex: set sts=4 ts=4 sw=4 noet:
	2	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	3	#
	4	# See COPYING file distributed along with the datalad package for the
	5	# copyright and license terms.
	6	#
	7	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	8	"""export a dataset to a tarball"""
	9
	10	__docformat__ = 'restructuredtext'
	11
	12
	13	# PLUGIN API
	14	def dlplugin(dataset, output=None):
	15	import os
	16	import tarfile
	17	from mock import patch
	18	from os.path import join as opj, dirname, normpath, isabs
	19	from datalad.utils import file_basename
	20	from datalad.support.annexrepo import AnnexRepo
	21
	22	import logging
	23	lgr = logging.getLogger('datalad.plugin.tarball')
	24
	25	repo = dataset.repo
	26	committed_date = repo.get_committed_date()
	27
	28	# could be used later on to filter files by some criterion
	29	def _filter_tarinfo(ti):
	30	# Reset the date to match the one of the last commit, not from the
	31	# filesystem since git doesn't track those at all
	32	# TODO: use the date of the last commit when any particular
	33	# file was changed -- would be the most kosher yoh thinks to the
	34	# degree of our abilities
	35	ti.mtime = committed_date
	36	return ti
	37
	38	if output is None:
	39	output = "datalad_{}.tar.gz".format(dataset.id)
	40	else:
	41	if not output.endswith('.tar.gz'):
	42	output += '.tar.gz'
	43
	44	root = dataset.path
	45	# use dir inside matching the output filename
	46	# TODO: could be an option to the export plugin allowing empty value
	47	# for no leading dir
	48	leading_dir = file_basename(output)
	49
	50	# workaround for inability to pass down the time stamp
	51	with patch('time.time', return_value=committed_date), \
	52	tarfile.open(output, "w:gz") as tar:
	53	repo_files = sorted(repo.get_indexed_files())
	54	if isinstance(repo, AnnexRepo):
	55	annexed = repo.is_under_annex(
	56	repo_files, allow_quick=True, batch=True)
	57	else:
	58	annexed = [False] * len(repo_files)
	59	for i, rpath in enumerate(repo_files):
	60	fpath = opj(root, rpath)
	61	if annexed[i]:
	62	# resolve to possible link target
	63	link_target = os.readlink(fpath)
	64	if not isabs(link_target):
	65	link_target = normpath(opj(dirname(fpath), link_target))
	66	fpath = link_target
	67	# name in the tarball
	68	aname = normpath(opj(leading_dir, rpath))
	69	tar.add(
	70	fpath,
	71	arcname=aname,
	72	recursive=False,
	73	filter=_filter_tarinfo)
	74
	75	if not isabs(output):
	76	output = opj(os.getcwd(), output)
	77
	78	yield dict(
	79	status='ok',
	80	path=output,
	81	type='file',
	82	action='export_tarball',
	83	logger=lgr)

+121

-0

datalad/plugin/no_annex.py less more

	0	# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# ex: set sts=4 ts=4 sw=4 noet:
	2	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	3	#
	4	# See COPYING file distributed along with the datalad package for the
	5	# copyright and license terms.
	6	#
	7	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	8	"""configure which dataset parts to never put in the annex"""
	9
	10
	11	__docformat__ = 'restructuredtext'
	12
	13
	14	# PLUGIN API
	15	def dlplugin(dataset, pattern, ref_dir='.', makedirs='no'):
	16	# could be extended to accept actual largefile expressions
	17	"""Configure a dataset to never put some content into the dataset's annex
	18
	19	This can be useful in mixed datasets that also contain textual data, such
	20	as source code, which can be efficiently and more conveniently managed
	21	directly in Git.
	22
	23	Patterns generally look like this::
	24
	25	code/*
	26
	27	which would match all file in the code directory. In order to match all
	28	files under ``code/``, including all its subdirectories use such a
	29	pattern::
	30
	31	code/**
	32
	33	Note that the plugin works incrementally, hence any existing configuration
	34	(e.g. from a previous plugin run) is amended, not replaced.
	35
	36	Parameters
	37	----------
	38	dataset : Dataset
	39	dataset to configure
	40	pattern : list
	41	list of path patterns. Any content whose path is matching any pattern
	42	will not be annexed when added to a dataset, but instead will be
	43	tracked directly in Git. Path pattern have to be relative to the
	44	directory given by the `ref_dir` option. By default, patterns should
	45	be relative to the root of the dataset.
	46	ref_dir : str, optional
	47	Relative path (within the dataset) to the directory that is to be
	48	configured. All patterns are interpreted relative to this path,
	49	and configuration is written to a ``.gitattributes`` file in this
	50	directory.
	51	makedirs : bool, optional
	52	If set, any missing directories will be created in order to be able
	53	to place a file into ``ref_dir``. Default: False.
	54	"""
	55	from os.path import join as opj
	56	from os.path import isabs
	57	from os.path import exists
	58	from os import makedirs as makedirsfx
	59	from datalad.distribution.dataset import require_dataset
	60	from datalad.support.annexrepo import AnnexRepo
	61	from datalad.support.constraints import EnsureBool
	62	from datalad.utils import assure_list
	63
	64	makedirs = EnsureBool()(makedirs)
	65	pattern = assure_list(pattern)
	66	ds = require_dataset(dataset, check_installed=True,
	67	purpose='no_annex configuration')
	68
	69	res_kwargs = dict(
	70	path=ds.path,
	71	type='dataset',
	72	action='no_annex',
	73	)
	74
	75	# all the ways we refused to cooperate
	76	if not isinstance(ds.repo, AnnexRepo):
	77	yield dict(
	78	res_kwargs,
	79	status='notneeded',
	80	message='dataset has no annex')
	81	return
	82	if any(isabs(p) for p in pattern):
	83	yield dict(
	84	res_kwargs,
	85	status='error',
	86	message=('path pattern for `no_annex` configuration must be relative paths: %s',
	87	pattern))
	88	return
	89	if isabs(ref_dir):
	90	yield dict(
	91	res_kwargs,
	92	status='error',
	93	message=('`ref_dir` for `no_annex` configuration must be a relative path: %s',
	94	ref_dir))
	95	return
	96
	97	gitattr_dir = opj(ds.path, ref_dir)
	98	if not exists(gitattr_dir):
	99	if makedirs:
	100	makedirsfx(gitattr_dir)
	101	else:
	102	yield dict(
	103	res_kwargs,
	104	status='error',
	105	message='target directory for `no_annex` does not exist (consider makedirs=True)')
	106	return
	107
	108	gitattr_file = opj(gitattr_dir, '.gitattributes')
	109	with open(gitattr_file, 'a') as fp:
	110	for p in pattern:
	111	fp.write('{} annex.largefiles=nothing'.format(p))
	112	yield dict(res_kwargs, status='ok')
	113
	114	for r in dataset.add(
	115	gitattr_file,
	116	to_git=True,
	117	message="[DATALAD] exclude paths from annex'ing",
	118	result_filter=None,
	119	result_xfm=None):
	120	yield r

+13

-0

datalad/plugin/tests/__init__.py less more

	0	# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# ex: set sts=4 ts=4 sw=4 noet:
	2	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	3	#
	4	# See COPYING file distributed along with the datalad package for the
	5	# copyright and license terms.
	6	#
	7	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	8	"""Plugin tests
	9
	10	"""
	11
	12	__docformat__ = 'restructuredtext'

+228

-0

datalad/plugin/tests/test_plugins.py less more

	0	# emacs: -- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# -- coding: utf-8 --
	2	# ex: set sts=4 ts=4 sw=4 noet:
	3	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	4	#
	5	# See COPYING file distributed along with the datalad package for the
	6	# copyright and license terms.
	7	#
	8	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	9	"""Test plugin interface mechanics"""
	10
	11
	12	import logging
	13	from os.path import join as opj
	14	from os.path import exists
	15	from mock import patch
	16
	17	from datalad.config import ConfigManager
	18	from datalad.api import plugin
	19	from datalad.api import create
	20
	21	from datalad.tests.utils import swallow_logs
	22	from datalad.tests.utils import swallow_outputs
	23	from datalad.tests.utils import with_tempfile
	24	from datalad.tests.utils import chpwd
	25	from datalad.tests.utils import create_tree
	26	from datalad.tests.utils import assert_raises
	27	from datalad.tests.utils import assert_status
	28	from datalad.tests.utils import assert_in
	29	from datalad.tests.utils import assert_not_in
	30	from datalad.tests.utils import eq_
	31	from datalad.tests.utils import ok_clean_git
	32
	33	broken_plugin = """garbage"""
	34
	35	nodocs_plugin = """\
	36	def dlplugin():
	37	pass
	38	"""
	39
	40	# functioning plugin dummy
	41	dummy_plugin = '''\
	42	"""real dummy"""
	43
	44	def dlplugin(dataset, noval, withval='test'):
	45	"mydocstring"
	46	yield dict(
	47	status='ok',
	48	action='dummy',
	49	args=dict(
	50	dataset=dataset,
	51	noval=noval,
	52	withval=withval))
	53	'''
	54
	55
	56	@with_tempfile()
	57	@with_tempfile(mkdir=True)
	58	def test_plugin_call(path, dspath):
	59	# make plugins
	60	create_tree(
	61	path,
	62	{
	63	'dlplugin_dummy.py': dummy_plugin,
	64	'dlplugin_nodocs.py': nodocs_plugin,
	65	'dlplugin_broken.py': broken_plugin,
	66	})
	67	fake_dummy_spec = {
	68	'dummy': {'file': opj(path, 'dlplugin_dummy.py')},
	69	'nodocs': {'file': opj(path, 'dlplugin_nodocs.py')},
	70	'broken': {'file': opj(path, 'dlplugin_broken.py')},
	71	}
	72
	73	with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec):
	74	with swallow_outputs() as cmo:
	75	plugin(showplugininfo=True)
	76	# hyphen spacing depends on the longest plugin name!
	77	# sorted
	78	# summary list generation doesn't actually load plugins for speed,
	79	# hence broken is not known to be broken here
	80	eq_(cmo.out,
	81	"broken [no synopsis] ({})\ndummy - real dummy ({})\nnodocs [no synopsis] ({})\n".format(
	82	fake_dummy_spec['broken']['file'],
	83	fake_dummy_spec['dummy']['file'],
	84	fake_dummy_spec['nodocs']['file']))
	85	with swallow_outputs() as cmo:
	86	plugin(['dummy'], showpluginhelp=True)
	87	eq_(cmo.out.rstrip(), "mydocstring")
	88	with swallow_outputs() as cmo:
	89	plugin(['nodocs'], showpluginhelp=True)
	90	eq_(cmo.out.rstrip(), "This plugin has no documentation")
	91	# loading fails, no docs
	92	assert_raises(ValueError, plugin, ['broken'], showpluginhelp=True)
	93
	94	# assume this most obscure plugin name is not used
	95	assert_raises(ValueError, plugin, '32sdfhvz984--^^')
	96
	97	# broken plugin argument, must match Python keyword arg
	98	# specs
	99	assert_raises(ValueError, plugin, ['dummy', '1245'])
	100
	101	with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec):
	102	# does not trip over unsupported argument, they get filtered out, because
	103	# we carry all kinds of stuff
	104	with swallow_logs(new_level=logging.WARNING) as cml:
	105	res = list(plugin(['dummy', 'noval=one', 'obscure=some']))
	106	assert_status('ok', res)
	107	cml.assert_logged(
	108	msg=".ignoring plugin argument\$s\$.obscure., not supported by plugin.",
	109	regex=True, level='WARNING')
	110	# fails on missing positional arg
	111	assert_raises(TypeError, plugin, ['dummy'])
	112	# positional and kwargs actually make it into the plugin
	113	res = list(plugin(['dummy', 'noval=one', 'withval=two']))[0]
	114	eq_('one', res['args']['noval'])
	115	eq_('two', res['args']['withval'])
	116	# kwarg defaults are preserved
	117	res = list(plugin(['dummy', 'noval=one']))[0]
	118	eq_('test', res['args']['withval'])
	119	# repeated specification yields list input
	120	res = list(plugin(['dummy', 'noval=one', 'noval=two']))[0]
	121	eq_(['one', 'two'], res['args']['noval'])
	122	# can do the same thing while bypassing argument parsing for calls
	123	# from within python, and even preserve native python dtypes
	124	res = list(plugin(['dummy', ('noval', 1), ('noval', 'two')]))[0]
	125	eq_([1, 'two'], res['args']['noval'])
	126	# and we can further simplify in this case by passing lists right
	127	# away
	128	res = list(plugin(['dummy', ('noval', [1, 'two'])]))[0]
	129	eq_([1, 'two'], res['args']['noval'])
	130
	131	# dataset arg handling
	132	# run plugin that needs a dataset where there is none
	133	with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec):
	134	ds = None
	135	with chpwd(dspath):
	136	assert_raises(ValueError, plugin, ['dummy', 'noval=one'])
	137	# create a dataset here, fixes the error
	138	ds = create()
	139	res = list(plugin(['dummy', 'noval=one']))[0]
	140	# gives dataset instance
	141	eq_(ds, res['args']['dataset'])
	142	# no do again, giving the dataset path
	143	# but careful, `dataset` is a proper argument
	144	res = list(plugin(['dummy', 'noval=one'], dataset=dspath))[0]
	145	eq_(ds, res['args']['dataset'])
	146	# however, if passed alongside the plugins args it also works
	147	res = list(plugin(['dummy', 'dataset={}'.format(dspath), 'noval=one']))[0]
	148	eq_(ds, res['args']['dataset'])
	149	# but if both are given, the proper args takes precedence
	150	assert_raises(ValueError, plugin, ['dummy', 'dataset={}'.format(dspath), 'noval=one'],
	151	dataset='rubbish')
	152
	153
	154	# MIH: I failed to replace our config manager instance for this test run
	155	# in order to be able to configure a set of plugins to run prior and after
	156	# create. A test should not alter a users config, hence I am disabling this
	157	# for now, and hope somebody can fix it up
	158	#@with_tempfile(mkdir=True)
	159	#def test_plugin_config(path):
	160	# with patch.dict('os.environ',
	161	# {'HOME': path, 'DATALAD_SNEAKY_ADDITION': 'ignore'}):
	162	# with patch('datalad.cfg', ConfigManager()) as cfg:
	163	# global_gitconfig = opj(path, '.gitconfig')
	164	# assert(not exists(global_gitconfig))
	165	# # swap out the actual config for this test
	166	# assert_in('datalad.sneaky.addition', cfg)
	167	# # now we configure a plugin to run before and twice after `create`
	168	# cfg.add('datalad.create.run-before',
	169	# 'add_readme filename=before.txt',
	170	# where='global')
	171	# cfg.add('datalad.create.run-after',
	172	# 'add_readme filename=after1.txt',
	173	# where='global')
	174	# cfg.add('datalad.create.run-after',
	175	# 'add_readme filename=after2.txt',
	176	# where='global')
	177	# # force reload to pick up newly populated .gitconfig
	178	# cfg.reload(force=True)
	179	# assert_in('datalad.create.run-before', cfg)
	180	# # and now we create a dataset and expect the two readme files
	181	# # to be part of it
	182	# ds = create(dataset=opj(path, 'ds'))
	183	# ok_clean_git(ds.path)
	184	# assert(exists(opj(ds.path, 'before.txt')))
	185	# assert(exists(opj(ds.path, 'after1.txt')))
	186	# assert(exists(opj(ds.path, 'after2.txt')))
	187
	188
	189	@with_tempfile(mkdir=True)
	190	def test_wtf(path):
	191	# smoke test for now
	192	with swallow_outputs() as cmo:
	193	plugin(['wtf'], dataset=path)
	194	assert_not_in('Dataset information', cmo.out)
	195	assert_in('Configuration', cmo.out)
	196	with chpwd(path):
	197	with swallow_outputs() as cmo:
	198	plugin(['wtf'])
	199	assert_not_in('Dataset information', cmo.out)
	200	assert_in('Configuration', cmo.out)
	201	# now with a dataset
	202	ds = create(path)
	203	with swallow_outputs() as cmo:
	204	plugin(['wtf'], dataset=ds.path)
	205	assert_in('Configuration', cmo.out)
	206	assert_in('Dataset information', cmo.out)
	207	assert_in('path: {}'.format(ds.path), cmo.out)
	208
	209
	210	@with_tempfile(mkdir=True)
	211	def test_no_annex(path):
	212	ds = create(path)
	213	ok_clean_git(ds.path)
	214	create_tree(
	215	ds.path,
	216	{'code': {
	217	'inannex': 'content',
	218	'notinannex': 'othercontent'}})
	219	# add two files, pre and post configuration
	220	ds.add(opj('code', 'inannex'))
	221	plugin(['no_annex', 'pattern=code/**'], dataset=ds)
	222	ds.add(opj('code', 'notinannex'))
	223	ok_clean_git(ds.path)
	224	# one is annex'ed, the other is not, despite no change in add call
	225	# importantly, also .gitattribute is not annexed
	226	eq_([opj('code', 'inannex')],
	227	ds.repo.get_annexed_files())

+90

-0

datalad/plugin/tests/test_tarball.py less more

	0	# emacs: -- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# -- coding: utf-8 --
	2	# ex: set sts=4 ts=4 sw=4 noet:
	3	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	4	#
	5	# See COPYING file distributed along with the datalad package for the
	6	# copyright and license terms.
	7	#
	8	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	9	"""Test tarball exporter"""
	10
	11	import os
	12	import time
	13	from os.path import join as opj
	14	from os.path import isabs
	15	import tarfile
	16
	17	from datalad.api import Dataset
	18	from datalad.api import plugin
	19	from datalad.utils import chpwd
	20	from datalad.utils import md5sum
	21
	22	from datalad.tests.utils import with_tree
	23	from datalad.tests.utils import ok_startswith
	24	from datalad.tests.utils import assert_true, assert_not_equal, assert_raises, \
	25	assert_false, assert_equal
	26	from datalad.tests.utils import assert_status
	27	from datalad.tests.utils import assert_result_count
	28
	29
	30	_dataset_template = {
	31	'ds': {
	32	'file_up': 'some_content',
	33	'dir': {
	34	'file1_down': 'one',
	35	'file2_down': 'two'}}}
	36
	37
	38	@with_tree(_dataset_template)
	39	def test_failure(path):
	40	ds = Dataset(opj(path, 'ds')).create(force=True)
	41	# unknown pluginer
	42	assert_raises(ValueError, ds.plugin, 'nah')
	43	# non-existing dataset
	44	assert_raises(ValueError, plugin, 'export_tarball', Dataset('nowhere'))
	45
	46
	47	@with_tree(_dataset_template)
	48	def test_tarball(path):
	49	ds = Dataset(opj(path, 'ds')).create(force=True)
	50	ds.add('.')
	51	committed_date = ds.repo.get_committed_date()
	52	default_outname = opj(path, 'datalad_{}.tar.gz'.format(ds.id))
	53	with chpwd(path):
	54	res = list(ds.plugin('export_tarball'))
	55	assert_status('ok', res)
	56	assert_result_count(res, 1)
	57	assert(isabs(res[0]['path']))
	58	assert_true(os.path.exists(default_outname))
	59	custom_outname = opj(path, 'myexport.tar.gz')
	60	# feed in without extension
	61	ds.plugin('export_tarball', output=custom_outname[:-7])
	62	assert_true(os.path.exists(custom_outname))
	63	custom1_md5 = md5sum(custom_outname)
	64	# encodes the original tarball filename -> different checksum, despit
	65	# same content
	66	assert_not_equal(md5sum(default_outname), custom1_md5)
	67	# should really sleep so if they stop using time.time - we know
	68	time.sleep(1.1)
	69	ds.plugin('export_tarball', output=custom_outname)
	70	# should not encode mtime, so should be identical
	71	assert_equal(md5sum(custom_outname), custom1_md5)
	72
	73	def check_contents(outname, prefix):
	74	with tarfile.open(outname) as tf:
	75	nfiles = 0
	76	for ti in tf:
	77	# any annex links resolved
	78	assert_false(ti.issym())
	79	ok_startswith(ti.name, prefix + '/')
	80	assert_equal(ti.mtime, committed_date)
	81	if '.datalad' not in ti.name:
	82	# ignore any files in .datalad for this test to not be
	83	# susceptible to changes in how much we generate a meta info
	84	nfiles += 1
	85	# we have exactly four files (includes .gitattributes for default
	86	# MD5E backend), and expect no content for any directory
	87	assert_equal(nfiles, 4)
	88	check_contents(default_outname, 'datalad_%s' % ds.id)
	89	check_contents(custom_outname, 'myexport')

+80

-0

datalad/plugin/wtf.py less more

	0	# emacs: -- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil --
	1	# ex: set sts=4 ts=4 sw=4 noet:
	2	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	3	#
	4	# See COPYING file distributed along with the datalad package for the
	5	# copyright and license terms.
	6	#
	7	# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
	8	"""provide information about this DataLad installation"""
	9
	10	__docformat__ = 'restructuredtext'
	11
	12
	13	# PLUGIN API
	14	def dlplugin(dataset=None):
	15	"""Generate a report about the DataLad installation and configuration
	16
	17	IMPORTANT: Sharing this report with untrusted parties (e.g. on the web)
	18	should be done with care, as it may include identifying information, and/or
	19	credentials or access tokens.
	20
	21	Parameters
	22	----------
	23	dataset : Dataset, optional
	24	If a dataset is given or found, information on this dataset is provided
	25	(if it exists), and its active configuration is reported.
	26	"""
	27	ds = dataset
	28	if ds and not ds.is_installed():
	29	# we don't deal with absent datasets
	30	ds = None
	31	if ds is None:
	32	from datalad import cfg
	33	else:
	34	cfg = ds.config
	35	from datalad.ui import ui
	36	from datalad.api import metadata
	37
	38	report_template = """\
	39	{dataset}
	40	Configuration
	41	=============
	42	{cfg}
	43
	44	"""
	45
	46	dataset_template = """\
	47	Dataset information
	48	===================
	49	{basic}
	50
	51	Metadata
	52	--------
	53	{meta}
	54
	55	"""
	56	ds_meta = None
	57	if ds and ds.is_installed():
	58	ds_meta = metadata(
	59	dataset=ds, dataset_global=True, return_type='item-or-list',
	60	result_filter=lambda x: x['action'] == 'metadata')
	61	if ds_meta:
	62	ds_meta = ds_meta['metadata']
	63
	64	ui.message(report_template.format(
	65	dataset='' if not ds else dataset_template.format(
	66	basic='\n'.join(
	67	'{}: {}'.format(k, v) for k, v in (
	68	('path', ds.path),
	69	('repo', ds.repo.__class__.__name__ if ds.repo else '[NONE]'),
	70	)),
	71	meta='\n'.join(
	72	'{}: {}'.format(k, v) for k, v in ds_meta)
	73	if ds_meta else '[no metadata]'
	74	),
	75	cfg='\n'.join(
	76	'{}: {}'.format(k, '<HIDDEN>' if k.startswith('user.') or 'token' in k else v)
	77	for k, v in sorted(cfg.items(), key=lambda x: x[0])),
	78	))
	79	yield

+163

-59

datalad/support/annexrepo.py less more

50	50	from datalad.utils import on_windows
51	51	from datalad.utils import swallow_logs
52	52	from datalad.utils import assure_list
	53	from datalad.utils import _path_
53	54	from datalad.cmd import GitRunner
54	55
55	56	# imports from same module:

96	97	WEB_UUID = "00000000-0000-0000-0000-000000000001"
97	98
98	99	# To be assigned and checked to be good enough upon first call to AnnexRepo
	100	# 6.20160923 -- --json-progress for get
99	101	# 6.20161210 -- annex add to add also changes (not only new files) to git
100	102	# 6.20170220 -- annex status provides --ignore-submodules
101	103	GIT_ANNEX_MIN_VERSION = '6.20170220'

240	242	# to use 'git annex unlock' instead.
241	243	lgr.warning("direct mode not available for %s. Ignored." % self)
242	244
	245	self._batched = BatchedAnnexes(batch_size=batch_size)
	246
243	247	# set default backend for future annex commands:
244	248	# TODO: Should the backend option of __init__() also migrate
245	249	# the annex, in case there are annexed files already?
246	250	if backend:
247		lgr.debug("Setting annex backend to %s", backend)
248		# Must be done with explicit release, otherwise on Python3 would end up
249		# with .git/config wiped out
250		# see https://github.com/gitpython-developers/GitPython/issues/333#issuecomment-126633757
251
252		# TODO: 'annex.backends' actually is a space separated list.
253		# Figure out, whether we want to allow for a list here or what to
254		# do, if there is sth in that setting already
	251	self.set_default_backend(backend, persistent=True)
	252
	253
	254	def set_default_backend(self, backend, persistent=True, commit=True):
	255	"""Set default backend
	256
	257	Parameters
	258	----------
	259	backend : str
	260	persistent : bool, optional
	261	If persistent, would add/commit to .gitattributes. If not -- would
	262	set within .git/config
	263	"""
	264	# TODO: 'annex.backends' actually is a space separated list.
	265	# Figure out, whether we want to allow for a list here or what to
	266	# do, if there is sth in that setting already
	267	if persistent:
	268	git_attributes_file = _path_(self.path, '.gitattributes')
	269	git_attributes = ''
	270	if exists(git_attributes_file):
	271	with open(git_attributes_file) as f:
	272	git_attributes = f.read()
	273	if ' annex.backend=' in git_attributes:
	274	lgr.debug(
	275	"Not (re)setting backend since seems already set in %s"
	276	% git_attributes_file
	277	)
	278	else:
	279	lgr.debug("Setting annex backend to %s (persistently)", backend)
	280	self.config.set('annex.backends', backend, where='local')
	281	with open(git_attributes_file, 'a') as f:
	282	if git_attributes and not git_attributes.endswith(os.linesep):
	283	f.write(os.linesep)
	284	f.write('* annex.backend=%s%s' % (backend, os.linesep))
	285	self.add(git_attributes_file, git=True)
	286	if commit:
	287	self.commit(
	288	"Set default backend for all files to be %s" % backend,
	289	_datalad_msg=True,
	290	files=[git_attributes_file]
	291	)
	292	else:
	293	lgr.debug("Setting annex backend to %s (in .git/config)", backend)
255	294	self.config.set('annex.backends', backend, where='local')
256
257		self._batched = BatchedAnnexes(batch_size=batch_size)
258	295
259	296	def __del__(self):
260	297	try:

826	863	super(AnnexRepo, self).set_remote_url(name, url, push=push)
827	864	self._set_shared_connection(name, url)
828	865
	866	def set_remote_dead(self, name):
	867	"""Announce to annex that remote is "dead"
	868	"""
	869	return self._annex_custom_command([], ["git", "annex", "dead", name])
	870
	871	def is_remote_annex_ignored(self, remote):
	872	"""Return True if remote is explicitly ignored"""
	873	return self.config.getbool(
	874	'remote.{}'.format(remote), 'annex-ignore',
	875	default=False
	876	)
	877
	878	def is_special_annex_remote(self, remote, check_if_known=True):
	879	"""Return either remote is a special annex remote
	880
	881	Decides based on the presence of diagnostic annex- options
	882	for the remote
	883	"""
	884	if check_if_known:
	885	if remote not in self.get_remotes():
	886	raise RemoteNotAvailableError(remote)
	887	sec = 'remote.{}'.format(remote)
	888	for opt in ('annex-externaltype', 'annex-webdav'):
	889	if self.config.has_option(sec, opt):
	890	return True
	891	return False
	892
829	893	@borrowkwargs(GitRepo)
830		def get_remotes(self, with_refs_only=False, with_urls_only=False,
	894	def get_remotes(self,
	895	with_urls_only=False,
831	896	exclude_special_remotes=False):
832	897	"""Get known (special-) remotes of the repository
833	898

841	906	remotes : list of str
842	907	List of names of the remotes
843	908	"""
844		remotes = super(AnnexRepo, self).get_remotes(
845		with_refs_only=with_refs_only, with_urls_only=with_urls_only)
	909	remotes = super(AnnexRepo, self).get_remotes(with_urls_only=with_urls_only)
846	910
847	911	if exclude_special_remotes:
848		return [remote for remote in remotes
849		if not self.config.has_option('remote.{}'.format(remote),
850		'annex-externaltype')]
	912	return [
	913	remote for remote in remotes
	914	if not self.is_special_annex_remote(remote, check_if_known=False)
	915	]
851	916	else:
852	917	return remotes
853	918

1119	1184	self.config.reload()
1120	1185
1121	1186	@normalize_paths
1122		def get(self, files, options=None, jobs=None):
	1187	def get(self, files, remote=None, options=None, jobs=None):
1123	1188	"""Get the actual content of files
1124	1189
1125	1190	Parameters
1126	1191	----------
1127	1192	files : list of str
1128	1193	paths to get
	1194	remote : str, optional
	1195	from which remote to fetch content
1129	1196	options : list of str, optional
1130	1197	commandline options for the git annex get command
1131	1198	jobs : int, optional

1137	1204	"""
1138	1205	options = options[:] if options else []
1139	1206
	1207	if remote:
	1208	if remote not in self.get_remotes():
	1209	raise RemoteNotAvailableError(
	1210	remote=remote,
	1211	cmd="get",
	1212	msg="Remote is not known. Known are: %s"
	1213	% (self.get_remotes(),)
	1214	)
	1215	options += ['--from', remote]
	1216
1140	1217	# analyze provided files to decide which actually are needed to be
1141	1218	# fetched
1142	1219
1143	1220	if '--key' not in options:
1144		expected_downloads, fetch_files = self._get_expected_downloads(
1145		files)
	1221	expected_downloads, fetch_files = self._get_expected_files(
	1222	files, ['--not', '--in', 'here'])
1146	1223	else:
1147	1224	fetch_files = files
1148	1225	assert(len(files) == 1)

1155	1232	if len(fetch_files) != len(files):
1156	1233	lgr.info("Actually getting %d files", len(fetch_files))
1157	1234
1158		# TODO: check annex version and issue a one time warning if not
1159		# old enough for --json-progress
1160
1161		# Without up to date annex, we would still report total! ;)
1162		if self.git_annex_version >= '6.20160923':
1163		# options might be the '--key' which should go last
1164		options = ['--json-progress'] + options
	1235	# options might be the '--key' which should go last
	1236	options = ['--json-progress'] + options
1165	1237
1166	1238	# Note: Currently swallowing logs, due to the workaround to report files
1167	1239	# not found, but don't fail and report about other files and use JSON,

1179	1251	# from annex failed ones
1180	1252	with cm:
1181	1253	results = self._run_annex_command_json(
1182		'get', args=options + fetch_files,
	1254	'get',
	1255	args=options + fetch_files,
1183	1256	jobs=jobs,
1184	1257	expected_entries=expected_downloads)
1185	1258	results_list = list(results)
1186	1259	# TODO: should we here compare fetch_files against result_list
1187		# and womit an exception of incomplete download????
	1260	# and vomit an exception of incomplete download????
1188	1261	return results_list
1189	1262
1190		def _get_expected_downloads(self, files):
	1263	def _get_expected_files(self, files, expr):
1191	1264	"""Given a list of files, figure out what to be downloaded
1192	1265
1193	1266	Parameters
1194	1267	----------
1195	1268	files
	1269	expr: list
	1270	Expression to be passed into annex's find
1196	1271
1197	1272	Returns
1198	1273	-------
1199		expected_downloads : dict
	1274	expected_files : dict
1200	1275	key -> size
1201	1276	fetch_files : list
1202	1277	files to be fetched
1203	1278	"""
1204		lgr.debug("Determine what files need to be obtained")
	1279	lgr.debug("Determine what files match the query to work with")
1205	1280	# Let's figure out first which files/keys and of what size to download
1206		expected_downloads = {}
	1281	expected_files = {}
1207	1282	fetch_files = []
1208	1283	keys_seen = set()
1209	1284	unknown_sizes = [] # unused atm
1210	1285	# for now just record total size, and
1211	1286	for j in self._run_annex_command_json(
1212		'find', args=['--json', '--not', '--in', 'here'] + files
	1287	'find', args=['--json'] + expr + files
1213	1288	):
	1289	# TODO: some files might not even be here. So in current fancy
	1290	# output reporting scheme we should then theoretically handle
	1291	# those cases here and say 'impossible' or something like that
	1292	if not j.get('success', True):
	1293	# TODO: I guess do something with yielding and filtering for
	1294	# what need to be done and what not
	1295	continue
1214	1296	key = j['key']
1215	1297	size = j.get('bytesize')
1216	1298	if key in keys_seen:

1221	1303	assert j['file']
1222	1304	fetch_files.append(j['file'])
1223	1305	if size and size.isdigit():
1224		expected_downloads[key] = int(size)
	1306	expected_files[key] = int(size)
1225	1307	else:
1226		expected_downloads[key] = None
	1308	expected_files[key] = None
1227	1309	unknown_sizes.append(j['file'])
1228		return expected_downloads, fetch_files
	1310	return expected_files, fetch_files
1229	1311
1230	1312	@normalize_paths
1231	1313	def add(self, files, git=None, backend=None, options=None, commit=False,

2159	2241	json_objects = (json.loads(line)
2160	2242	for line in out.splitlines() if line.startswith('{'))
2161	2243	# protect against progress leakage
2162		json_objects = [j for j in json_objects if not 'byte-progress' in j]
	2244	json_objects = [j for j in json_objects if 'byte-progress' not in j]
2163	2245	return json_objects
2164	2246
2165	2247	# TODO: reconsider having any magic at all and maybe just return a list/dict always

2693	2775	# TODO: we probably need to override get_file_content, since it returns the
2694	2776	# symlink's target instead of the actual content.
2695	2777
	2778	# We need --auto and --fast having exposed TODO
2696	2779	@normalize_paths(match_return_type=False) # get a list even in case of a single item
2697		def copy_to(self, files, remote, options=None, log_online=True):
	2780	def copy_to(self, files, remote, options=None, jobs=None):
2698	2781	"""Copy the actual content of `files` to `remote`
2699	2782
2700	2783	Parameters

2703	2786	path(s) to copy
2704	2787	remote: str
2705	2788	name of remote to copy `files` to
2706		log_online: bool
2707		see get()
2708	2789
2709	2790	Returns
2710	2791	-------

2712	2793	files successfully copied
2713	2794	"""
2714	2795
	2796	# find --in here --not --in remote
2715	2797	# TODO: full support of annex copy options would lead to `files` being
2716	2798	# optional. This means to check for whether files or certain options are
2717	2799	# given and fail or just pass everything as is and try to figure out,

2720	2802	if remote not in self.get_remotes():
2721	2803	raise ValueError("Unknown remote '{0}'.".format(remote))
2722	2804
	2805	options = options[:] if options else []
	2806
	2807	# Note:
2723	2808	# In case of single path, 'annex copy' will fail, if it cannot copy it.
2724	2809	# With multiple files, annex will just skip the ones, it cannot deal
2725	2810	# with. We'll do the same and report back what was successful

2729	2814	if not isdir(files[0]):
2730	2815	self.get_file_key(files[0])
2731	2816
2732		# Note:
2733		# - annex copy fails, if `files` was a single item, that doesn't exist
2734		# - files not in annex or not even in git don't yield a non-zero exit,
2735		# but are ignored
2736		# - in case of multiple items, annex would silently skip those files
2737
2738		annex_options = files + ['--to=%s' % remote]
	2817	# TODO: RF -- logic is duplicated with get() -- the only difference
	2818	# is the verb (copy, copy) or (get, put) and remote ('here', remote)?
	2819	if '--key' not in options:
	2820	expected_copys, copy_files = self._get_expected_files(
	2821	files, ['--in', 'here', '--not', '--in', remote])
	2822	else:
	2823	copy_files = files
	2824	assert(len(files) == 1)
	2825	expected_copys = {files[0]: AnnexRepo.get_size_from_key(files[0])}
	2826
	2827	if not copy_files:
	2828	lgr.debug("No files found needing copying.")
	2829	return []
	2830
	2831	if len(copy_files) != len(files):
	2832	lgr.info("Actually copying %d files", len(copy_files))
	2833
	2834	annex_options = ['--to=%s' % remote, '--json-progress']
2739	2835	if options:
2740	2836	annex_options.extend(shlex.split(options))
2741		# Note:
2742		# As of now, there is no --json option for annex copy. Use it once this
2743		# changed.
2744		results = self._run_annex_command_json(
2745		'copy',
2746		args=annex_options,
2747		#log_stdout=True, log_stderr=not log_online,
2748		#log_online=log_online, expect_stderr=True
2749		)
2750		results = list(results)
	2837
	2838	cm = swallow_logs() \
	2839	if lgr.getEffectiveLevel() > logging.DEBUG \
	2840	else nothing_cm()
	2841	# TODO: provide more meaningful message (possibly aggregating 'note'
	2842	# from annex failed ones
	2843	with cm:
	2844	results = self._run_annex_command_json(
	2845	'copy',
	2846	args=annex_options + copy_files,
	2847	jobs=jobs,
	2848	expected_entries=expected_copys
	2849	#log_stdout=True, log_stderr=not log_online,
	2850	#log_online=log_online, expect_stderr=True
	2851	)
	2852	results_list = list(results)
	2853	# XXX this is the only logic different ATM from get
2751	2854	# check if any transfer failed since then we should just raise an Exception
2752	2855	# for now to guarantee consistent behavior with non--json output
2753	2856	# see https://github.com/datalad/datalad/pull/1349#discussion_r103639456
2754	2857	from operator import itemgetter
2755		failed_copies = [e['file'] for e in results if not e['success']]
	2858	failed_copies = [e['file'] for e in results_list if not e['success']]
2756	2859	good_copies = [
2757		e['file'] for e in results
	2860	e['file'] for e in results_list
2758	2861	if e['success'] and
2759	2862	e.get('note', '').startswith('to ') # transfer did happen
2760	2863	]
2761	2864	if failed_copies:
	2865	# TODO: RF for new fancy scheme of outputs reporting
2762	2866	raise IncompleteResultsError(
2763	2867	results=good_copies, failed=failed_copies,
2764	2868	msg="Failed to copy %d file(s)" % len(failed_copies))

+12

-0

datalad/support/ansi_colors.py less more

25	25	'ERROR': RED
26	26	}
27	27
	28	RESULT_STATUS_COLORS = {
	29	'ok': GREEN,
	30	'notneeded': GREEN,
	31	'impossible': YELLOW,
	32	'error': RED
	33	}
	34
28	35	# Aliases for uniform presentation
29	36
30	37	DATASET = UNDERLINE

43	50	return "%s%s%s" % (COLOR_SEQ % color, s, RESET_SEQ) \
44	51	if ui.is_interactive \
45	52	else s
	53
	54
	55	def color_status(status):
	56	col = RESULT_STATUS_COLORS.get(status, None)
	57	return color_word(status, col) if col else status

+3

-1

datalad/support/constraints.py less more

171	171	return False
172	172	elif value in ('1', 'yes', 'on', 'enable', 'true'):
173	173	return True
174		raise ValueError("value must be converted to boolean")
	174	raise ValueError(
	175	"value '{}' must be convertible to boolean".format(
	176	value))
175	177
176	178	def long_description(self):
177	179	return 'value must be convertible to type bool'

+8

-31

datalad/support/gitrepo.py less more

892	892	for f in re.findall("'(.*)'[\n$]", stdout)]
893	893
894	894	@normalize_paths(match_return_type=False)
895		def remove(self, files, **kwargs):
	895	def remove(self, files, recursive=False, **kwargs):
896	896	"""Remove files.
897	897
898	898	Calls git-rm.

901	901	----------
902	902	files: str
903	903	list of paths to remove
	904	recursive: False
	905	either to allow recursive removal from subdirectories
904	906	kwargs:
905	907	see `__init__`
906	908

912	914
913	915	files = _remove_empty_items(files)
914	916
	917	if recursive:
	918	kwargs['r'] = True
915	919	stdout, stderr = self._git_custom_command(
916	920	files, ['git', 'rm'] + to_options(**kwargs))
917	921

1158	1162	# return [branch.strip() for branch in
1159	1163	# self.repo.git.branch(r=True).splitlines()]
1160	1164
1161		def get_remotes(self, with_refs_only=False, with_urls_only=False):
	1165	def get_remotes(self, with_urls_only=False):
1162	1166	"""Get known remotes of the repository
1163	1167
1164	1168	Parameters
1165	1169	----------
1166		with_refs_only : bool, optional
1167		return only remotes with any refs. E.g. annex special remotes
1168		would not have any refs
	1170	with_urls_only : bool, optional
	1171	return only remotes which have urls
1169	1172
1170	1173	Returns
1171	1174	-------
1172	1175	remotes : list of str
1173	1176	List of names of the remotes
1174	1177	"""
1175
1176		# Note: This still uses GitPython and therefore might cause a gitpy.Repo
1177		# instance to be created.
1178		if with_refs_only:
1179		# older versions of GitPython might not tolerate remotes without
1180		# any references at all, so we need to catch
1181		remotes = []
1182		for remote in self.repo.remotes:
1183		try:
1184		if len(remote.refs):
1185		remotes.append(remote.name)
1186		except AssertionError as exc:
1187		if "not have any references" not in str(exc):
1188		# was some other reason
1189		raise
1190	1178
1191	1179	# Note: read directly from config and spare instantiation of gitpy.Repo
1192	1180	# since we need this in AnnexRepo constructor. Furthermore gitpy does it

1417	1405	return self._git_custom_command(
1418	1406	'', ['git', 'remote', 'remove', name]
1419	1407	)
1420
1421		def show_remotes(self, name='', verbose=False):
1422		"""
1423		"""
1424
1425		options = ["-v"] if verbose else []
1426		name = [name] if name else []
1427		out, err = self._git_custom_command(
1428		'', ['git', 'remote'] + options + ['show'] + name
1429		)
1430		return out.rstrip(linesep).splitlines()
1431	1408
1432	1409	def update_remote(self, name=None, verbose=False):
1433	1410	"""

+0

-9

datalad/support/param.py less more

117	117	doc.strip()
118	118	if len(doc) and not doc.endswith('.'):
119	119	doc += '.'
120		if self.constraints is not None:
121		cdoc = self.constraints.long_description()
122		if cdoc[0] == '(' and cdoc[-1] == ')':
123		cdoc = cdoc[1:-1]
124		addinfo = ''
125		if self.cmd_kwargs.get('nargs', None) == '?' \
126		or self.cmd_kwargs.get('action', None) == 'append':
127		addinfo = 'list expected, each '
128		doc += ' Constraints: %s%s.' % (addinfo, cdoc)
129	120	if has_default:
130	121	doc += " [Default: %r]" % (default,)
131	122	# Explicitly deal with multiple spaces, for some reason

+1

-1

datalad/support/sshrun.py less more

20	20
21	21	from datalad.support.param import Parameter
22	22	from datalad.interface.base import Interface
23		from datalad.interface.utils import build_doc
	23	from datalad.interface.base import build_doc
24	24
25	25	from datalad import ssh_manager
26	26

+28

-1

datalad/support/tests/test_annexrepo.py less more

284	284	ar.get('test-annex.dat', options=["--from=NotExistingRemote"])
285	285	eq_(cme.exception.remote, "NotExistingRemote")
286	286
	287	# and similar one whenever invoking with remote parameter
	288	with assert_raises(RemoteNotAvailableError) as cme:
	289	ar.get('test-annex.dat', remote="NotExistingRemote")
	290	eq_(cme.exception.remote, "NotExistingRemote")
	291
287	292
288	293	# 1 is enough to test file_has_content
289	294	@with_batch_direct

482	487	@with_tempfile
483	488	def test_AnnexRepo_migrating_backends(src, dst):
484	489	ar = AnnexRepo.clone(src, dst, backend='MD5')
	490	eq_(ar.default_backends, ['MD5'])
485	491	# GitPython has a bug which causes .git/config being wiped out
486	492	# under Python3, triggered by collecting its config instance I guess
487	493	gc.collect()

1161	1167	# Test that if we pass a list of items and annex processes them nicely,
1162	1168	# we would obtain a list back. To not stress our tests even more -- let's mock
1163	1169	def ok_copy(command, **kwargs):
	1170	# Check that we do pass to annex call only the list of files which we
	1171	# asked to be copied
	1172	assert_in('copied1', kwargs['annex_options'])
	1173	assert_in('copied2', kwargs['annex_options'])
	1174	assert_in('existed', kwargs['annex_options'])
1164	1175	return """
1165	1176	{"command":"copy","note":"to target ...", "success":true, "key":"akey1", "file":"copied1"}
1166	1177	{"command":"copy","note":"to target ...", "success":true, "key":"akey2", "file":"copied2"}

1173	1184	# now let's test that we are correctly raising the exception in case if
1174	1185	# git-annex execution fails
1175	1186	orig_run = repo._run_annex_command
	1187
	1188	# Kinda a bit off the reality since no nonex* would not be returned/handled
	1189	# by _get_expected_files, so in real life -- wouldn't get report about Incomplete!?
1176	1190	def fail_to_copy(command, **kwargs):
1177	1191	if command == 'copy':
1178	1192	# That is not how annex behaves

1190	1204	else:
1191	1205	return orig_run(command, **kwargs)
1192	1206
1193		with patch.object(repo, '_run_annex_command', fail_to_copy):
	1207	def fail_to_copy_get_expected(files, expr):
	1208	assert files == ["copied", "existed", "nonex1", "nonex2"]
	1209	return {'akey1': 10}, ["copied"]
	1210
	1211	with patch.object(repo, '_run_annex_command', fail_to_copy), \
	1212	patch.object(repo, '_get_expected_files', fail_to_copy_get_expected):
1194	1213	with assert_raises(IncompleteResultsError) as cme:
1195	1214	repo.copy_to(["copied", "existed", "nonex1", "nonex2"], "target")
1196	1215	eq_(cme.exception.results, ["copied"])

2119	2138	def test_AnnexRepo_flyweight_monitoring_inode(path, store):
2120	2139	# testing for issue #1512
2121	2140	check_repo_deals_with_inode_change(AnnexRepo, path, store)
	2141
	2142
	2143	@with_tempfile(mkdir=True)
	2144	def test_fake_is_not_special(path):
	2145	ar = AnnexRepo(path, create=True)
	2146	# doesn't exist -- we fail by default
	2147	assert_raises(RemoteNotAvailableError, ar.is_special_annex_remote, "fake")
	2148	assert_false(ar.is_special_annex_remote("fake", check_if_known=False))

+4

-23

datalad/support/tests/test_gitrepo.py less more

347	347	def test_GitRepo_remote_add(orig_path, path):
348	348
349	349	gr = GitRepo.clone(orig_path, path)
350		out = gr.show_remotes()
	350	out = gr.get_remotes()
351	351	assert_in('origin', out)
352	352	eq_(len(out), 1)
353	353	gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1')
354		out = gr.show_remotes()
	354	out = gr.get_remotes()
355	355	assert_in('origin', out)
356	356	assert_in('github', out)
357	357	eq_(len(out), 2)
358		out = gr.show_remotes('github')
359		assert_in(' Fetch URL: git://github.com/datalad/testrepo--basic--r1', out)
	358	eq_('git://github.com/datalad/testrepo--basic--r1', gr.config['remote.github.url'])
360	359
361	360
362	361	@with_testrepos(flavors=local_testrepo_flavors)

366	365	gr = GitRepo.clone(orig_path, path)
367	366	gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1')
368	367	gr.remove_remote('github')
369		out = gr.show_remotes()
	368	out = gr.get_remotes()
370	369	eq_(len(out), 1)
371	370	assert_in('origin', out)
372
373
374		@with_testrepos(flavors=local_testrepo_flavors)
375		@with_tempfile
376		def test_GitRepo_remote_show(orig_path, path):
377
378		gr = GitRepo.clone(orig_path, path)
379		gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1')
380		out = gr.show_remotes(verbose=True)
381		eq_(len(out), 4)
382		assert_in('origin\t%s (fetch)' % orig_path, out)
383		assert_in('origin\t%s (push)' % orig_path, out)
384		# Some fellas might have some fancy rewrite rules for pushes, so we can't
385		# just check for specific protocol
386		assert_re_in('github\tgit(://\|@)github.com[:/]datalad/testrepo--basic--r1 $fetch$',
387		out)
388		assert_re_in('github\tgit(://\|@)github.com[:/]datalad/testrepo--basic--r1 $push$',
389		out)
390	371
391	372
392	373	@with_testrepos(flavors=local_testrepo_flavors)

+0

-2

datalad/tests/test_interface.py less more

63	63	# constraints
64	64	p = Parameter(doc=doc, constraints=cnstr.EnsureInt() \| cnstr.EnsureStr())
65	65	autodoc = p.get_autodoc('testname')
66		assert_true("convertible to type 'int'" in autodoc)
67		assert_true('must be a string' in autodoc)
68	66	assert_true('int or str' in autodoc)
69	67
70	68	with assert_raises(ValueError) as cmr:

+1

-1

datalad/version.py less more

12	12	from os.path import lexists, dirname, join as opj, curdir
13	13
14	14	# Hard coded version, to be done by release process
15		__version__ = '0.6.0.dev1'
	15	__version__ = '0.8.0'
16	16
17	17	# NOTE: might cause problems with "python setup.py develop" deployments
18	18	# so I have even changed buildbot to use pip install -e .

+1

-1

docs/source/cmdline.rst less more

25	25	generated/man/datalad-create-sibling
26	26	generated/man/datalad-create-sibling-github
27	27	generated/man/datalad-drop
28		generated/man/datalad-export
	28	generated/man/datalad-plugin
29	29	generated/man/datalad-get
30	30	generated/man/datalad-install
31	31	generated/man/datalad-publish

+120

-0

docs/source/customization.rst less more

	0	.. -- mode: rst --
	1	.. vi: set ft=rst sts=4 ts=4 sw=4 et tw=79:
	2
	3	.. _chap_customization:
	4
	5	********************************************
	6	Customization and extension of functionality
	7	********************************************
	8
	9	DataLad provides numerous commands that cover many use cases. However, there will
	10	always be a demand for further customization at a particular site, or for an
	11	individual user. DataLad addresses this need by providing a generic plugin
	12	interface.
	13
	14	First of all, DataLad plugins can be executed via the :ref:`man_datalad-plugin`
	15	command. This allows for executing arbitrary plugins (on particular dataset)
	16	at any point in time.
	17
	18	In addition, DataLad can be configured to run any number of plugins prior or
	19	after particular commands. For example, it is possible to execute a plugin
	20	each time DataLad has created a dataset to configure it so that all files
	21	that are added to its ``code/`` subdirectory will always be managed directly
	22	with Git and not be put into the dataset's annex. In order to achieve this,
	23	adjust your Git configuration in the following way::
	24
	25	git config --global --add datalad.create.run-after 'no_annex pattern=code/**'
	26
	27	This will cause DataLad to run the ``no_annex`` plugin to add the given pattern
	28	to the dataset's ``.gitattribute`` file, which in turn instructs git annex to
	29	send any matching files directly to Git. The same functionality is available
	30	for ad-hoc adjustments via the ``--run-after`` option supported by most
	31	commands.
	32
	33	Analog to ``--run-after`` DataLad also supports ``--run-before`` to execute
	34	plugins prior a command.
	35
	36	DataLad will discover plugins at three locations:
	37
	38	1. official plugins that are part of the local DataLad installation
	39
	40	2. system-wide plugins, provided by the local admin
	41
	42	The location where plugins need to be placed depends on the platform.
	43	On GNU/Linux systems this will be ``/etc/xdg/datalad/plugins``, whereas
	44	on Windows it will be ``C:\ProgramData\datalad.org\datalad\plugins``.
	45
	46	This default location can be overridden by setting the
	47	``datalad.locations.system-plugins`` configuration variable in the local or
	48	global Git configuration.
	49
	50	3. user-supplied plugins, customizable by each user
	51
	52	Again, the location will depend on the platform. On GNU/Linux systems this
	53	will be ``$HOME/.config/datalad/plugins``, whereas on Windows it will be
	54	``C:\Users\<username>\AppData\Local\datalad.org\datalad\plugins``.
	55
	56	This default location can be overridden by setting the
	57	``datalad.locations.user-plugins`` configuration variable in the local or
	58	global Git configuration.
	59
	60	Identically named plugins in latter location replace those in locations
	61	searched before. This can be used to alter the behavior of plugins provided
	62	with DataLad, and enables users to adjust a site-wide configuration.
	63
	64
	65	Writing own plugins
	66	===================
	67
	68	Plugins are written in Python. In order for DataLad to be able to find
	69	them, plugins need to be placed in one of the supported locations described
	70	above. Plugin file names have to have a '.py' extensions and must not start
	71	with an underscore ('_').
	72
	73	Plugin source files must define a function named::
	74
	75	dlplugin
	76
	77	This function is executed as the plugin. It can have any number of
	78	arguments (positional, or keyword arguments with defaults), or none at
	79	all. All arguments, except ``dataset`` must expect any value to
	80	be a string.
	81
	82	The plugin function must be self-contained, i.e. all needed imports
	83	of definitions must be done within the body of the function.
	84
	85	The doc string of the plugin function is displayed when the plugin
	86	documentation is requested. The first line in a plugin file that starts
	87	with triple double-quotes will be used as the plugin short description
	88	(this will typically be the docstring of the module file). This short
	89	description is displayed as the plugin synopsis in the plugin overview
	90	list.
	91
	92	Plugin functions must yield their results as a Python generator. Results are
	93	DataLad status dictionaries. There are no constraints on the number of results,
	94	or the number and nature of result properties. However, conventions exists and
	95	must be followed for compatibility with the result evaluation and rendering
	96	performed by DataLad.
	97
	98	The following property keys must exist:
	99
	100	"status"
	101	{'ok', 'notneeded', 'impossible', 'error'}
	102
	103	"action"
	104	label for the action performed by the plugin. In many cases this
	105	could be the plugin's name.
	106
	107	The following keys should exists if possible:
	108
	109	"path"
	110	absolute path to a result on the file system
	111
	112	"type"
	113	label indicating the nature of a result (e.g. 'file', 'dataset',
	114	'directory', etc.)
	115
	116	"message"
	117	string message annotating the result, particularly important for
	118	non-ok results. This can be a tuple with 'logging'-style string
	119	expansion.

+17

-15

docs/source/gettingstarted.rst less more

11	11
12	12	Datalad is a Python package and can be installed via pip_, which is the
13	13	preferred method unless system packages are available for the target platform
14		(see below)::
	14	(see below). To automatically install datalad and all its software dependencies
	15	type::
15	16
16	17	pip install datalad
17	18
18	19	.. _pip: https://pip.pypa.io
19	20
20		This will automatically install all software dependencies necessary to provide
21		core functionality. Several additional installation schemes are supported
22		(e.g., ``publish``, ``metadata``, ``tests``, ``crawl``)::
	21	Several additional installation schemes are supported (``[SCHEME]`` can be e.g.
	22	``publish``, ``metadata``, ``tests`` or ``crawl``)::
23	23
24		pip install datalad[SCHEME]
25
26		where ``SCHEME`` can be any supported scheme, such as the ones listed above.
	24	pip install datalad [SCHEME]
	25
	26	.. cool, but why should I (or a first-time reader) even bother about the schemes?
27	27
28	28	In addition, it is necessary to have a working installation of git-annex_,
29	29	which is not set up automatically at this point.

38	38	package::
39	39
40	40	sudo apt-get install datalad
	41
	42	A current version of git-annex (as also provided by the NeuroDebian_
	43	repository) can be installed by typing::
	44
	45	sudo apt-get install git-annex
41	46
42	47	.. _neurodebian: http://neuro.debian.net
43	48

59	64	First steps
60	65	===========
61	66
62		After datalad is installed it can be queried for information about known
63		datasets. For example, we might want to look for dataset thats were funded by,
64		or acknowledge the US National Science Foundation (NSF)::
	67	Datalad can be queried for information about known datasets. Doing a first search
	68	query, datalad automatically offers assistence to obtain a :term:`superdataset` first.
	69	The superdataset is a lightweight container that contains meta information about known datasets but does not contain actual data itself.
	70
	71	For example, we might want to look for dataset thats were funded by, or acknowledge the US National Science Foundation (NSF)::
65	72
66	73	~ % datalad search NSF
67	74	No DataLad dataset found at current location

75	82	~/datalad/openfmri/ds000003
76	83	...
77	84
78		On first attempt, datalad offers assistence to obtain a :term:`superdataset`
79		with information on all datasets it knows about. This is a lightweight
80		container that does not actually contain data, but meta information only. Once
81		downloaded queries can be made offline.
82
83	85	Any known dataset can now be installed inside the local superdataset with a
84	86	command like this::
85	87

+1

-0

docs/source/index.rst less more

18	18	basics
19	19	usecases/index
20	20	metadata
	21	customization
21	22	faq
22	23	glossary
23	24

+16

-1

docs/source/modref.rst less more

27	27	api.create_sibling
28	28	api.create_sibling_github
29	29	api.drop
30		api.export
	30	api.plugin
31	31	api.get
32	32	api.install
33	33	api.publish

76	76	api.crawl
77	77	api.crawl_init
78	78	api.test
	79
	80	Plugins
	81	-------
	82
	83	DataLad can be customized by plugins. The following plugins are shipped
	84	with DataLad.
	85
	86	.. currentmodule:: datalad.plugin
	87	.. autosummary::
	88	:toctree: generated
	89
	90	add_readme
	91	export_tarball
	92	no_annex
	93	wtf
79	94
80	95
81	96	Support functionality

+4

-0

requirements-devel.txt less more

0	0	# Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided
1	1	# Since we use requirements.txt ATM only for development IMHO it is ok but
2	2	# we need to figure out/complaint to pip folks
	3	# For now, until https://github.com/GrahamDumpleton/wrapt/issues/98 resolved
	4	# we should use our version which allows to disable extension(s)
	5	git+https://github.com/yarikoptic/wrapt@develop
3	6	-e .[devel]
	7