Codebase list datalad / 0ee9789
Merge tag '0.8.0' into debian A variety of fixes and enhancements - [publish] would now push merged `git-annex` branch even if no other changes were done - [publish] should be able to publish using relative path within SSH URI (git hook would use relative paths) - [publish] should better tollerate publishing to pure git and `git-annex` special remotes - [plugin] mechanism came to replace [export]. See [export_tarball] for the replacement of [export]. Now it should be easy to extend datalad's interface with custom functionality to be invoked along with other commands. - Minimalistic coloring of the results rendering - [publish]/`copy_to` got progress bar report now and support of `--jobs` - minor fixes and enhancements to crawler (e.g. support of recursive removes) * tag '0.8.0': (76 commits) Changelog for 0.8.0 BF: fixed test_publish for assuming that there is no need to push git-annex, which was fixed in prior commit BF/RF: mv is_remote_annex_ignored to AnnexRepo, make siblings command not puke if not yet annex-ignored BF: publish if only updates to git-annex, do not puke if remote is ignored by annex ENH: add --to-annex (reuse to_git Python interface though) to force adding to annex RF: --text-to-git -> --text-no-annex, and handled by create, not AnnexRepo ENH: allow for "recursive" flag for remove (needed while crawling s3 where prefix is a directory) BF: fixing up a test and a hook more for now using relative path(s) BF: call set_remote_dead only for annexrepo RF: removed the comment BF: push updated git-annex branch upon publishing data (only) BF: use relative dspath in the hook (Closes #1653), dead/remove remote upon replace (Closes #1656) BF: for copy_to report only # of files present locally and use correct verb in msg BF: Test is old fashion -- doesn't accept rendering options etc ENH: create --text-to-git to establish .gitattributes so that text file go to git BF: fixing url for pip -- must have git+ prefix DOC: little cleaning of gettingstarted.rst BF(workaround): use patched wrapt disabling its extensions BF: providing guarding against non-existing paths in checking on what to copy ENH: --jobs and progress for copy_to/publish ... Yaroslav Halchenko 6 years ago
79 changed file(s) with 2286 addition(s) and 868 deletion(s). Raw diff Collapse all Expand all
177177 # Verify that setup.py build doesn't puke
178178 - python setup.py build
179179 # Run tests
180 - PATH=$PWD/tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -v -A "$NOSE_SELECTION_OP(integration or usecase or slow)" --with-doctest --doctest-tests --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
180 - WRAPT_DISABLE_EXTENSIONS=1 PATH=$PWD/tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -v -A "$NOSE_SELECTION_OP(integration or usecase or slow)" --with-doctest --doctest-tests --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM
181181 # Generate documentation and run doctests
182182 # but do only when we do not have obnoxious logging turned on -- something screws up sphinx on travis
183183 - if [ ! "${DATALAD_LOG_LEVEL:-}" = 2 ]; then PYTHONPATH=$PWD $NOSE_WRAPPER make -C docs html doctest; fi
88 We would recommend to consult log of the
99 [DataLad git repository](http://github.com/datalad/datalad) for more details.
1010
11 # 0.7.0 (Jun 25, 2017) -- when it works - it is quite awesome!
12
13 I bet we will fix some bugs and make a world even a better place.
11
12 ## 0.8.0 (Jul 31, 2017) -- it is better than ever
13
14 A variety of fixes and enhancements
15
16 ### Fixes
17
18 - [publish] would now push merged `git-annex` branch even if no other changes
19 were done
20 - [publish] should be able to publish using relative path within SSH URI
21 (git hook would use relative paths)
22 - [publish] should better tollerate publishing to pure git and `git-annex`
23 special remotes
24
25 ### Enhancements and new features
26
27 - [plugin] mechanism came to replace [export]. See [export_tarball] for the
28 replacement of [export]. Now it should be easy to extend datalad's interface
29 with custom functionality to be invoked along with other commands.
30 - Minimalistic coloring of the results rendering
31 - [publish]/`copy_to` got progress bar report now and support of `--jobs`
32 - minor fixes and enhancements to crawler (e.g. support of recursive removes)
33
34
35 ## 0.7.0 (Jun 25, 2017) -- when it works - it is quite awesome!
36
37 New features, refactorings, and bug fixes.
1438
1539 ### Major refactoring and deprecations
1640
1842 - [create-sibling], and [unlock] have been re-written to support the
1943 same common API as most other commands
2044
21 ## Enhancements and new features
45 ### Enhancements and new features
2246
2347 - [siblings] can now be used to query and configure a local repository by
2448 using the sibling name ``here``
3054 - Significant parts of the documentation of been updated
3155 - Instantiate GitPython's Repo instances lazily
3256
33 ## Fixes
57 ### Fixes
3458
3559 - API documentation is now rendered properly as HTML, and is easier to browse by
3660 having more compact pages
358382 [datalad]: http://docs.datalad.org/en/latest/generated/man/datalad.html
359383 [drop]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-drop.html
360384 [export]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-export.html
385 [export_tarball]: http://docs.datalad.org/en/latest/generated/datalad.plugin.export_tarball.html
361386 [get]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-get.html
362387 [install]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-install.html
363388 [ls]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-ls.html
364389 [metadata]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-metadata.html
365390 [publish]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-publish.html
391 [plugin]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-plugin.html
366392 [remove]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-remove.html
367393 [save]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-save.html
368394 [search]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-search.html
413413 Any new DATALAD_CMD_PROTOCOL has to implement datalad.support.protocol.ProtocolInterface
414414 - *DATALAD_CMD_PROTOCOL_PREFIX*:
415415 Sets a prefix to add before the command call times are noted by DATALAD_CMD_PROTOCOL.
416
417
418 # Changelog section
419
420 For the upcoming release use this template
421
422 ## 0.8.1 (??? ??, 2017) -- will be better than ever
423
424 bet we will fix some bugs and make a world even a better place.
425
426 ### Major refactoring and deprecations
427
428 - hopefully none
429
430 ### Fixes
431
432 ?
433
434 ### Enhancements and new features
435
436 ?
437
2626 from .log import lgr
2727 import atexit
2828 from datalad.utils import on_windows
29
2930 if not on_windows:
3031 lgr.log(5, "Instantiating ssh manager")
3132 from .support.sshconnector import SSHManager
3334 atexit.register(ssh_manager.close, allow_fail=False)
3435 else:
3536 ssh_manager = None
37
38 try:
39 # this will fix the rendering of ANSI escape sequences
40 # for colored terminal output on windows
41 # it will do nothing on any other platform, hence it
42 # is safe to call unconditionally
43 import colorama
44 colorama.init()
45 atexit.register(colorama.deinit)
46 except ImportError as e:
47 if on_windows:
48 from datalad.dochelpers import exc_str
49 lgr.warning(
50 "'colorama' Python module missing, terminal output may look garbled [%s]",
51 exc_str(e))
52 pass
3653
3754 atexit.register(lgr.log, 5, "Exiting")
3855
2020 from collections import namedtuple
2121 from functools import wraps
2222
23 from datalad import cfg
24
25 from .interface.base import update_docstring_with_parameters
2623 from .interface.base import get_interface_groups
2724 from .interface.base import get_api_name
28 from .interface.base import alter_interface_docs_for_api
29 from .interface.base import merge_allargs2kwargs
25 from .interface.base import get_allargs_as_kwargs
3026
3127 def _kwargs_to_namespace(call, args, kwargs):
3228 """
3329 Given a __call__, args and kwargs passed, prepare a cmdlineargs-like
3430 thing
3531 """
36 kwargs_ = merge_allargs2kwargs(call, args, kwargs)
32 kwargs_ = get_allargs_as_kwargs(call, args, kwargs)
3733 # Get all arguments removing those possible ones used internally and
3834 # which shouldn't be exposed outside anyways
3935 [kwargs_.pop(k) for k in kwargs_ if k.startswith('_')]
141141 of the command; 'continue' works like 'ignore', but an error causes a
142142 non-zero exit code; 'stop' halts on first failure and yields non-zero exit
143143 code. A failure is any result with status 'impossible' or 'error'.""")
144 parser.add_argument(
145 '--run-before', dest='common_run_before',
146 nargs='+',
147 action='append',
148 metavar='PLUGINSPEC',
149 help="""DataLad plugin to run after the command. PLUGINSPEC is a list
150 comprised of a plugin name plus optional `key=value` pairs with arguments
151 for the plugin call (see `plugin` command documentation for details).
152 This option can be given more than once to run multiple plugins
153 in the order in which they were given.
154 For running plugins that require a --dataset argument it is important
155 to provide the respective dataset as the --dataset argument of the main
156 command, if it is not in the list of plugin arguments."""),
157 parser.add_argument(
158 '--run-after', dest='common_run_after',
159 nargs='+',
160 action='append',
161 metavar='PLUGINSPEC',
162 help="""Like --run-before, but plugins are executed after the main command
163 has finished."""),
164 parser.add_argument(
165 '--cmd', dest='_', action='store_true',
166 help="""syntactical helper that can be used to end the list of global
167 command line options before the subcommand label. Options like
168 --run-before can take an arbitray number of arguments and may require
169 to be followed by a single --cmd in order to enable identification
170 of the subcommand.""")
144171
145172 # yoh: atm we only dump to console. Might adopt the same separation later on
146173 # and for consistency will call it --verbose-level as well for now
3636 from ...utils import lmtime
3737 from ...utils import find_files
3838 from ...utils import auto_repr
39 from ...utils import _path_
3940 from ...utils import getpwd
4041 from ...utils import try_multiple
4142 from ...tests.utils import put_file_under_git
176177 "Was instructed to add to super dataset but no super dataset "
177178 "was found for %s" % ds
178179 )
179
180 # create/AnnexRepo specification of backend does it non-persistently in .git/config
181 if backend:
182 put_file_under_git(path, '.gitattributes', '* annex.backend=%s' % backend, annexed=False)
183180
184181 return ds
185182
853850 if self.repo.dirty and not exists(opj(path, '.gitattributes')) and isinstance(self.repo, AnnexRepo):
854851 backends = self.repo.default_backends
855852 if backends:
856 # then record default backend into the .gitattributes
857 put_file_under_git(path, '.gitattributes', '* annex.backend=%s' % backends[0],
858 annexed=False)
853 self.repo.set_default_backend(backends[0], commit=False)
859854
860855 # at least use repo._git_custom_command
861856 def _commit(self, msg=None, options=[]):
13021297 stats = data.get('datalad_stats', None)
13031298 if self.repo.dirty: # or self.tracker.dirty # for dry run
13041299 lgr.info("Repository found dirty -- adding and committing")
1305 _call(self.repo.add, '.', options=self.options) # so everything is committed
1300 _call(self.repo.add, '.', git_options=self.options) # so everything is committed
13061301
13071302 stats_str = ('\n\n' + stats.as_str(mode='full')) if stats else ''
13081303 _call(self._commit, "%s%s" % (', '.join(self._states), stats_str), options=["-a"])
13941389
13951390 return _remove_obsolete()
13961391
1397 def remove(self, data):
1392 def remove(self, data, recursive=False):
13981393 """Removed passed along file name from git/annex"""
13991394 stats = data.get('datalad_stats', None)
14001395 self._states.add("Removed files")
14021397 # TODO: not sure if we should may be check if exists, and skip/just complain if not
14031398 if stats:
14041399 _call(stats.increment, 'removed')
1405 if lexists(opj(self.repo.path, filename)):
1406 _call(self.repo.remove, filename)
1400 filepath = opj(self.repo.path, filename)
1401 if lexists(filepath):
1402 if os.path.isdir(filepath):
1403 if recursive:
1404 _call(self.repo.remove, filename, recursive=True)
1405 else:
1406 lgr.warning("Do not removing %s recursively, skipping", filepath)
1407 else:
1408 _call(self.repo.remove, filename)
14071409 else:
14081410 lgr.warning("Was asked to remove non-existing path %s", filename)
14091411 yield data
219219 commits = {b: list(repo.get_branch_commits(b)) for b in branches}
220220 eq_(len(commits['incoming']), 1)
221221 eq_(len(commits['incoming-processed']), 2)
222 eq_(len(commits['master']), 5) # all commits out there -- init ds + init crawler + 1*(incoming, processed, merge)
222 eq_(len(commits['master']), 6) # all commits out there -- init ds + init crawler + 1*(incoming, processed, merge)
223223
224224 with chpwd(outd):
225225 eq_(set(glob('*')), {'dir1', 'file1.nii'})
249249
250250
251251 @with_tree(tree={
252
253252 'study': {
254253 'show': {
255254 'WG33': {
258257 <a href="/file/show/JX5V">file1.nii</a>
259258 <a href="/file/show/RIBX">dir1 / file2.nii</a>
260259 <a href="/file/show/GSRD">file1b.nii</a>
261
262260 %s
263261 </body></html>""" % _PLUG_HERE,
264262 },
272270 }
273271 }
274272 },
275
276273 'file': {
277274 'show': {
278275 'JX5V': {
292289 }
293290
294291 },
295
296292 'download': {
297293 'file1.nii': "content of file1.nii is different",
298294 'file1b.nii': "content of file1b.nii",
342338 './.datalad/crawl/crawl.cfg',
343339 './.datalad/crawl/statuses/incoming.json',
344340 './.datalad/meta/balsa.json',
345 './file1.nii', './dir1/file2.nii',
341 './file1.nii',
342 './dir1/file2.nii',
346343 }
347344
348345 eq_(set(all_files), target_files)
264264 eq_(len(commits_l['incoming']), 3)
265265 eq_(len(commits['incoming-processed']), 6)
266266 eq_(len(commits_l['incoming-processed']), 4) # because original merge has only 1 parent - incoming
267 eq_(len(commits['master']), 12) # all commits out there -- dataset init, crawler init + 3*(incoming, processed, meta data aggregation, merge)
268 eq_(len(commits_l['master']), 6)
267 eq_(len(commits['master']), 13) # all commits out there -- dataset init, crawler init + 3*(incoming, processed, meta data aggregation, merge)
268 eq_(len(commits_l['master']), 7)
269269
270270 # Check tags for the versions
271271 eq_(out[0]['datalad_stats'].get_total().versions, ['1.0.0', '1.0.1'])
272272 # +1 because original "release" was assumed to be 1.0.0
273273 repo_tags = repo.get_tags()
274274 eq_(repo.get_tags(output='name'), ['1.0.0', '1.0.0+1', '1.0.1'])
275 eq_(repo_tags[0]['hexsha'], commits_l['master'][-4].hexsha) # next to the last one
275 eq_(repo_tags[0]['hexsha'], commits_l['master'][-5].hexsha) # next to the last one
276276 eq_(repo_tags[-1]['hexsha'], commits_l['master'][0].hexsha) # the last one
277277
278278 def hexsha(l):
468468 eq_(len(commits['incoming-processed']), 2)
469469 eq_(len(commits_l['incoming-processed']), 2) # because original merge has only 1 parent - incoming
470470 # to avoid 'dataset init' commit create() needs save=False
471 eq_(len(commits['master']), 6) # all commits out there, dataset init, crawler, init, incoming, incoming-processed, meta data aggregation, merge
472 eq_(len(commits_l['master']), 4) # dataset init, init, meta data aggregation, merge
471 eq_(len(commits['master']), 7) # all commits out there, backend, dataset init, crawler, init, incoming, incoming-processed, meta data aggregation, merge
472 eq_(len(commits_l['master']), 5) # backend, dataset init, init, meta data aggregation, merge
473473
474474 # rerun pipeline -- make sure we are on the same in all branches!
475475 with chpwd(outd):
4242 ['encryption=none', 'type=external', 'externaltype=%s' % ARCHIVES_SPECIAL_REMOTE,
4343 'autoenable=true'
4444 ])
45 assert annex.is_special_annex_remote(ARCHIVES_SPECIAL_REMOTE)
4546 # We want two maximally obscure names, which are also different
4647 assert(fn_extracted != fn_inarchive_obscure)
4748 annex.add(fn_archive, commit=True, msg="Added tarball")
3737 from datalad.interface.results import results_from_annex_noinfo
3838 from datalad.interface.utils import discover_dataset_trace_to_targets
3939 from datalad.interface.utils import eval_results
40 from datalad.interface.utils import build_doc
40 from datalad.interface.base import build_doc
4141 from datalad.interface.save import Save
4242 from datalad.distribution.utils import _fixup_submodule_dotgit_setup
4343 from datalad.support.constraints import EnsureStr
140140 as it inflates dataset sizes and impacts flexibility of data
141141 transport. If not specified - it will be up to git-annex to
142142 decide, possibly on .gitattributes options."""),
143 to_annex=Parameter(
144 args=("--to-annex",),
145 action='store_false',
146 dest='to_git',
147 doc="""flag whether to force adding data to Annex, instead of
148 git. It might be that .gitattributes instructs for a file to be
149 added to git, but for some particular files it is desired to be
150 added to annex (e.g. sensitive files etc).
151 If not specified - it will be up to git-annex to
152 decide, possibly on .gitattributes options."""),
143153 recursive=recursion_flag,
144154 recursion_limit=recursion_limit,
145155 # TODO not functional anymore
177187 annex_opts=None,
178188 annex_add_opts=None,
179189 jobs=None):
180
181190 # parameter constraints:
182191 if not path:
183192 raise InsufficientArgumentsError(
99
1010
1111 import logging
12 import re
1213 from os import listdir
1314 from os.path import relpath
1415 from os.path import pardir
1617
1718 from datalad.interface.base import Interface
1819 from datalad.interface.utils import eval_results
19 from datalad.interface.utils import build_doc
20 from datalad.interface.base import build_doc
2021 from datalad.interface.results import get_status_dict
2122 from datalad.interface.common_opts import location_description
2223 # from datalad.interface.common_opts import git_opts
100101 reckless=reckless_opt,
101102 alt_sources=Parameter(
102103 args=('--alternative-sources',),
104 dest='alt_sources',
103105 metavar='SOURCE',
104106 nargs='+',
105107 doc="""Alternative sources to be tried if a dataset cannot
235237 lgr.debug("Wiping out unsuccessful clone attempt at: %s",
236238 dest_path)
237239 rmtree(dest_path)
240 if 'could not create work tree' in e.stderr.lower():
241 # this cannot be fixed by trying another URL
242 yield get_status_dict(
243 status='error',
244 message=re.match(r".*fatal: (.*)\n",
245 e.stderr,
246 flags=re.MULTILINE | re.DOTALL).group(1),
247 **status_kwargs)
248 return
238249
239250 if not destination_dataset.is_installed():
240251 yield get_status_dict(
1919 from datalad.interface.base import Interface
2020 from datalad.interface.annotate_paths import AnnotatePaths
2121 from datalad.interface.utils import eval_results
22 from datalad.interface.utils import build_doc
22 from datalad.interface.base import build_doc
2323 from datalad.interface.common_opts import git_opts
2424 from datalad.interface.common_opts import annex_opts
2525 from datalad.interface.common_opts import annex_init_opts
111111 doc="""enforce creation of a dataset in a non-empty directory""",
112112 action='store_true'),
113113 description=location_description,
114 # TODO could move into cfg_annex plugin
114115 no_annex=Parameter(
115116 args=("--no-annex",),
116117 doc="""if set, a plain Git repository will be created without any
117118 annex""",
118119 action='store_true'),
120 text_no_annex=Parameter(
121 args=("--text-no-annex",),
122 doc="""if set, all text files in the future would be added to Git,
123 not annex. Achieved by adding an entry to `.gitattributes` file. See
124 http://git-annex.branchable.com/tips/largefiles/ and `no_annex`
125 DataLad plugin to establish even more detailed control over which
126 files are placed under annex control.""",
127 action='store_true'),
119128 save=nosave_opt,
129 # TODO could move into cfg_annex plugin
120130 annex_version=Parameter(
121131 args=("--annex-version",),
122132 doc="""select a particular annex repository version. The
124134 version. This should be left untouched, unless you know what
125135 you are doing""",
126136 constraints=EnsureDType(int) | EnsureNone()),
137 # TODO could move into cfg_annex plugin
127138 annex_backend=Parameter(
128139 args=("--annex-backend",),
129140 constraints=EnsureStr() | EnsureNone(),
132143 For a list of supported backends see the git-annex
133144 documentation. The default is optimized for maximum compatibility
134145 of datasets across platforms (especially those with limited
135 path lengths)""",
136 nargs=1),
146 path lengths)"""),
147 # TODO could move into cfg_metadata plugin
137148 native_metadata_type=Parameter(
138149 args=('--native-metadata-type',),
139150 metavar='LABEL',
142153 doc="""Metadata type label. Must match the name of the respective
143154 parser implementation in Datalad (e.g. "bids").[CMD: This option
144155 can be given multiple times CMD]"""),
156 # TODO could move into cfg_access/permissions plugin
145157 shared_access=shared_access_opt,
146158 git_opts=git_opts,
147159 annex_opts=annex_opts,
164176 shared_access=None,
165177 git_opts=None,
166178 annex_opts=None,
167 annex_init_opts=None):
179 annex_init_opts=None,
180 text_no_annex=None
181 ):
168182
169183 # two major cases
170184 # 1. we got a `dataset` -> we either want to create it (path is None),
206220 unavailable_path_msg=None,
207221 # if we have a dataset given that actually exists, we want to
208222 # fail if the requested path is not in it
209 nondataset_path_status='error' if dataset and dataset.is_installed() else '',
223 nondataset_path_status='error' \
224 if isinstance(dataset, Dataset) and dataset.is_installed() else '',
210225 on_failure='ignore')
211226 path = None
212227 for r in annotated_paths:
251266
252267 # important to use the given Dataset object to avoid spurious ID
253268 # changes with not-yet-materialized Datasets
254 tbds = dataset if dataset is not None and dataset.path == path['path'] \
269 tbds = dataset if isinstance(dataset, Dataset) and dataset.path == path['path'] \
255270 else Dataset(path['path'])
256271
257272 # don't create in non-empty directory without `force`:
274289 else:
275290 # always come with annex when created from scratch
276291 lgr.info("Creating a new annex repo at %s", tbds.path)
277 AnnexRepo(
292 tbrepo = AnnexRepo(
278293 tbds.path,
279294 url=None,
280295 create=True,
283298 description=description,
284299 git_opts=git_opts,
285300 annex_opts=annex_opts,
286 annex_init_opts=annex_init_opts)
301 annex_init_opts=annex_init_opts
302 )
303
304 if text_no_annex:
305 git_attributes_file = opj(tbds.path, '.gitattributes')
306 with open(git_attributes_file, 'a') as f:
307 f.write('* annex.largefiles=(not(mimetype=text/*))\n')
308 tbrepo.add([git_attributes_file], git=True)
309 tbrepo.commit(
310 "Instructed annex to add text files to git",
311 _datalad_msg=True,
312 files=[git_attributes_file]
313 )
287314
288315 if native_metadata_type is not None:
289316 if not isinstance(native_metadata_type, list):
306333 with open(opj(tbds.path, '.datalad', '.gitattributes'), 'a') as gitattr:
307334 # TODO this will need adjusting, when annex'ed aggregate meta data
308335 # comes around
336 gitattr.write('# Text files (according to file --mime-type) are added directly to git.\n')
337 gitattr.write('# See http://git-annex.branchable.com/tips/largefiles/ for more info.\n')
309338 gitattr.write('** annex.largefiles=nothing\n')
310339
311340 # save everything, we need to do this now and cannot merge with the
317346 # the next only makes sense if we saved the created dataset,
318347 # otherwise we have no committed state to be registered
319348 # in the parent
320 if save and dataset is not None and dataset.path != tbds.path:
349 if save and isinstance(dataset, Dataset) and dataset.path != tbds.path:
321350 # we created a dataset in another dataset
322351 # -> make submodule
323352 for r in dataset.add(
2929 datasetmethod, require_dataset
3030 from datalad.interface.annotate_paths import AnnotatePaths
3131 from datalad.interface.base import Interface
32 from datalad.interface.utils import build_doc
32 from datalad.interface.base import build_doc
33 from datalad.interface.utils import eval_results
3334 from datalad.interface.common_opts import recursion_limit, recursion_flag
3435 from datalad.interface.common_opts import as_common_datasrc
3536 from datalad.interface.common_opts import publish_by_default
3839 from datalad.interface.common_opts import annex_wanted_opt
3940 from datalad.interface.common_opts import annex_group_opt
4041 from datalad.interface.common_opts import annex_groupwanted_opt
41 from datalad.interface.utils import eval_results
42 from datalad.interface.utils import build_doc
4342 from datalad.support.annexrepo import AnnexRepo
4443 from datalad.support.constraints import EnsureStr, EnsureNone, EnsureBool
4544 from datalad.support.constraints import EnsureChoice
171170 ssh("rm -rf {}".format(sh_quote(remoteds_path)))
172171 # if we succeeded in removing it
173172 path_exists = False
173 # Since it is gone now, git-annex also should forget about it
174 remotes = ds.repo.get_remotes()
175 if name in remotes:
176 # so we had this remote already, we should announce it dead
177 # XXX what if there was some kind of mismatch and this name
178 # isn't matching the actual remote UUID? should have we
179 # checked more carefully?
180 lgr.info(
181 "Announcing existing remote %s dead to annex and removing",
182 name
183 )
184 if isinstance(ds.repo, AnnexRepo):
185 ds.repo.set_remote_dead(name)
186 ds.repo.remove_remote(name)
174187 elif existing == 'reconfigure':
175188 lgr.info(_msg + " Will only reconfigure")
176189 only_reconfigure = True
716729 # DataLad
717730 #
718731 # (Re)generate meta-data for DataLad Web UI and possibly init new submodules
719 dsdir="{path}"
732 dsdir="$(dirname $0)/../.."
720733 logfile="$dsdir/{WEB_META_LOG}/{log_filename}"
721734
735 if [ ! -e "$dsdir/.git" ]; then
736 echo Assumption of being under .git has failed >&2
737 exit 1
738 fi
739
722740 mkdir -p "$dsdir/{WEB_META_LOG}" # assure logs directory exists
723741
724742 ( which datalad > /dev/null \
725 && ( cd ..; GIT_DIR="$PWD/.git" datalad ls -a --json file "$dsdir"; ) \
743 && ( cd "$dsdir"; GIT_DIR="$PWD/.git" datalad ls -a --json file .; ) \
726744 || echo "E: no datalad found - skipping generation of indexes for web frontend"; \
727745 ) &> "$logfile"
728746
729747 # Some submodules might have been added and thus we better init them
730 ( cd ..; git submodule update --init >> "$logfile" 2>&1 || : ; )
748 ( cd "$dsdir"; git submodule update --init || : ; ) >> "$logfile" 2>&1
731749 '''.format(WEB_META_LOG=WEB_META_LOG, **locals())
732750
733751 with make_tempfile(content=hook_content) as tempf:
2828 from datalad.support.constraints import EnsureChoice
2929 from datalad.support.exceptions import MissingExternalDependency
3030 from ..interface.base import Interface
31 from datalad.interface.utils import build_doc
31 from datalad.interface.base import build_doc
3232 from datalad.distribution.dataset import EnsureDataset, datasetmethod, \
3333 require_dataset, Dataset
3434 from datalad.distribution.siblings import Siblings
2525 from datalad.support.gitrepo import GitRepo
2626 from datalad.support.annexrepo import AnnexRepo
2727 from datalad.interface.base import Interface
28 from datalad.interface.utils import build_doc
28 from datalad.interface.base import build_doc
2929
3030 lgr = logging.getLogger('datalad.distribution.tests')
3131
3636 from datalad.interface.results import results_from_annex_noinfo
3737 from datalad.interface.utils import handle_dirty_dataset
3838 from datalad.interface.utils import eval_results
39 from datalad.interface.utils import build_doc
39 from datalad.interface.base import build_doc
4040
4141 lgr = logging.getLogger('datalad.distribution.drop')
4242
128128 before file content is dropped. As these checks could lead to slow
129129 operation (network latencies, etc), they can be disabled.
130130
131
132 Examples
133 --------
134
135 Drop all file content in a dataset::
136
137 ~/some/dataset$ datalad drop
138
139 Drop all file content in a dataset and all its subdatasets::
140
141 ~/some/dataset$ datalad drop --recursive
131 Examples:
132
133 Drop all file content in a dataset::
134
135 ~/some/dataset$ datalad drop
136
137 Drop all file content in a dataset and all its subdatasets::
138
139 ~/some/dataset$ datalad drop --recursive
142140
143141 """
144142 _action = 'drop'
2020 from datalad.interface.annotate_paths import AnnotatePaths
2121 from datalad.interface.annotate_paths import annotated2content_by_ds
2222 from datalad.interface.utils import eval_results
23 from datalad.interface.utils import build_doc
23 from datalad.interface.base import build_doc
2424 from datalad.interface.results import get_status_dict
2525 from datalad.interface.results import results_from_paths
2626 from datalad.interface.results import annexjson2result
2929 from datalad.interface.results import YieldDatasets
3030 from datalad.interface.results import is_result_matching_pathsource_argument
3131 from datalad.interface.utils import eval_results
32 from datalad.interface.utils import build_doc
32 from datalad.interface.base import build_doc
3333 from datalad.support.constraints import EnsureNone
3434 from datalad.support.constraints import EnsureStr
3535 from datalad.support.exceptions import InsufficientArgumentsError
1616 from os.path import sep as dirsep
1717
1818 from datalad.interface.base import Interface
19 from datalad.interface.utils import build_doc
19 from datalad.interface.base import build_doc
2020 from datalad.interface.utils import filter_unmodified
2121 from datalad.interface.common_opts import annex_copy_opts, recursion_flag, \
22 recursion_limit, git_opts, annex_opts
22 recursion_limit, git_opts, annex_opts, jobs_opt
2323 from datalad.interface.common_opts import missing_sibling_opt
2424 from datalad.support.param import Parameter
2525 from datalad.support.constraints import EnsureStr
2929 from datalad.support.exceptions import CommandError
3030
3131 from datalad.utils import assure_list
32 from datalad.dochelpers import exc_str
3233
3334 from .dataset import EnsureDataset
3435 from .dataset import Dataset
5960 return error
6061
6162
62 def _publish_dataset(ds, remote, refspec, paths, annex_copy_options, force=False):
63 def _publish_dataset(ds, remote, refspec, paths, annex_copy_options, force=False, jobs=None):
6364 # TODO: this setup is now quite ugly. The only way `refspec` can come
6465 # in, is when there is a tracking branch, and we get its state via
6566 # `refspec`
6667
68 is_annex_repo = isinstance(ds.repo, AnnexRepo)
69
6770 def _publish_data():
68 remote_wanted = ds.repo.get_preferred_content('wanted', remote)
69 if (paths or annex_copy_options or remote_wanted) and \
70 isinstance(ds.repo, AnnexRepo) and not \
71 ds.config.getbool(
72 'remote.{}'.format(remote),
73 'annex-ignore',
74 False):
71 if ds.repo.is_remote_annex_ignored(remote):
72 return [], [] # Cannot publish any data
73 try:
74 remote_wanted = ds.repo.get_preferred_content('wanted', remote)
75 except CommandError as exc:
76 if "cannot determine uuid" in str(exc):
77 if not ds.repo.is_remote_annex_ignored(remote):
78 lgr.warning(
79 "Annex failed to determine UUID, skipping publishing data for now: %s",
80 exc_str(exc)
81 )
82 return [], []
83 raise
84
85 if (paths or annex_copy_options or remote_wanted) and is_annex_repo:
7586 lgr.info("Publishing {0} data to {1}".format(ds, remote))
7687 # overwrite URL with pushurl if any, reason:
7788 # https://git-annex.branchable.com/bugs/annex_ignores_pushurl_and_uses_only_url_upon___34__copy_--to__34__/
98109 pblshd = ds.repo.copy_to(
99110 files=paths,
100111 remote=remote,
101 options=annex_copy_options_
112 options=annex_copy_options_,
113 jobs=jobs
102114 )
103115 # if ds.submodules:
104116 # # NOTE: we might need to init them on the remote, but needs to
148160 # there was no tracking branch, check the push target
149161 remote_branch_name = ds.repo.get_active_branch()
150162
151 if remote_branch_name in ds.repo.repo.remotes[remote].refs:
152 lgr.debug("Testing for changes with respect to '%s' of remote '%s'",
153 remote_branch_name, remote)
154 current_commit = ds.repo.repo.commit()
155 remote_ref = ds.repo.repo.remotes[remote].refs[remote_branch_name]
156 if paths:
157 # if there were custom paths, we will look at the diff
158 lgr.debug("Since paths provided, looking at diff")
159 diff = current_commit.diff(
160 remote_ref,
161 paths=paths
162 )
163 else:
164 # if commits differ at all
165 lgr.debug("Since no paths provided, comparing commits")
166 diff = current_commit != remote_ref.commit
167 else:
168 lgr.debug("Remote '%s' has no branch matching %r. Will publish",
169 remote, remote_branch_name)
170 # we don't have any remote state, need to push for sure
171 diff = True
163 diff = _get_remote_diff(ds, paths, None, remote, remote_branch_name)
164
165 # We might have got new information in git-annex branch although no other
166 # changes
167 if not diff and is_annex_repo:
168 try:
169 git_annex_commit = next(ds.repo.get_branch_commits('git-annex'))
170 except StopIteration:
171 git_annex_commit = None
172 diff = _get_remote_diff(ds, [], git_annex_commit, remote, 'git-annex')
173 if diff:
174 lgr.info("Will publish updated git-annex")
172175
173176 # # remote might be set to be ignored by annex, or we might not even know yet its uuid
174177 # annex_ignore = ds.config.getbool('remote.{}.annex-ignore'.format(remote), None)
177180 # if annex_uuid is None:
178181 # # most probably not yet 'known' and might require some annex
179182 knew_remote_uuid = None
180 if isinstance(ds.repo, AnnexRepo):
183 if is_annex_repo and not ds.repo.is_remote_annex_ignored(remote):
181184 try:
182185 ds.repo.get_preferred_content('wanted', remote) # could be just checking config.remote.uuid
183186 knew_remote_uuid = True
184187 except CommandError:
185188 knew_remote_uuid = False
189
186190 if knew_remote_uuid:
187191 # we can try publishing right away
188192 published += _publish_data()
206210 None,
207211 paths,
208212 annex_copy_options,
209 force=force)
213 force=force,
214 jobs=jobs
215 )
210216 published.extend(pblsh)
211217 skipped.extend(skp)
218
219 if is_annex_repo and \
220 ds.repo.is_special_annex_remote(remote):
221 # There is nothing else to "publish"
222 lgr.debug(
223 "{0} is a special annex remote, no git push is needed".format(remote)
224 )
225 return published, skipped
212226
213227 lgr.info("Publishing {0} to {1}".format(ds, remote))
214228
216230 # we need to annex merge first. Otherwise a git push might be
217231 # rejected if involving all matching branches for example.
218232 # Once at it, also push the annex branch right here.
219 if isinstance(ds.repo, AnnexRepo):
233 if is_annex_repo:
220234 lgr.debug("Obtain remote annex info from '%s'", remote)
221235 ds.repo.fetch(remote=remote)
222236 ds.repo.merge_annex(remote)
234248 current_branch = ds.repo.get_active_branch()
235249 if current_branch: # possibly make this conditional on a switch
236250 # TODO: this should become it own helper
237 if isinstance(ds.repo, AnnexRepo):
251 if is_annex_repo:
238252 # annex could manage this branch
239253 if current_branch.startswith('annex/direct') \
240254 and ds.config.getbool('annex', 'direct', default=False):
251265 # and thus probably broken -- test me!
252266 current_branch = match_adjusted.group(1)
253267 things2push.append(current_branch)
254 if isinstance(ds.repo, AnnexRepo):
268 if is_annex_repo:
255269 things2push.append('git-annex')
256270 # check that all our magic found valid branches
257271 things2push = [t for t in things2push if t in ds.repo.get_branches()]
273287
274288 published.append(ds)
275289
276 if knew_remote_uuid is False:
290 late_published_data = None
291 if knew_remote_uuid is False and is_annex_repo:
277292 # publish only after we tried to sync/push and if it was annex repo
278 published += _publish_data()
293 late_published_data = _publish_data()
294 published += late_published_data
295
296 # if we published something (data, subdatasets) even though there were no
297 # diff (thus no push), or there was an additional data published later
298 if ((not diff and published) or late_published_data) \
299 and is_annex_repo:
300 # we need to do the same annex merge dance and push updated git-annex
301 # and this way also trigger post-update hook which might update
302 # web UI meta-data
303 # https://github.com/datalad/datalad/issues/1658
304 lgr.info(
305 "Obtaining remote annex info from '%s' and pushing updated",
306 remote
307 )
308 ds.repo.fetch(remote=remote)
309 ds.repo.merge_annex(remote)
310 # this will trigger post-update hook if present
311 _log_push_info(ds.repo.push(remote=remote, refspec=['git-annex']))
312
279313 return published, skipped
314
315
316 def _get_remote_diff(ds, paths, current_commit, remote, remote_branch_name):
317 """Helper to check if remote has different state of the branch"""
318 if remote_branch_name in ds.repo.repo.remotes[remote].refs:
319 lgr.debug("Testing for changes with respect to '%s' of remote '%s'",
320 remote_branch_name, remote)
321 if current_commit is None:
322 current_commit = ds.repo.repo.commit()
323 remote_ref = ds.repo.repo.remotes[remote].refs[remote_branch_name]
324 if paths:
325 # if there were custom paths, we will look at the diff
326 lgr.debug("Since paths provided, looking at diff")
327 diff = current_commit.diff(
328 remote_ref,
329 paths=paths
330 )
331 else:
332 # if commits differ at all
333 lgr.debug("Since no paths provided, comparing commits")
334 diff = current_commit != remote_ref.commit
335 else:
336 lgr.debug("Remote '%s' has no branch matching %r. Will publish",
337 remote, remote_branch_name)
338 # we don't have any remote state, need to push for sure
339 diff = True
340
341 return diff
280342
281343
282344 @build_doc
365427 git_opts=git_opts,
366428 annex_opts=annex_opts,
367429 annex_copy_opts=annex_copy_opts,
430 jobs=jobs_opt,
368431 )
369432
370433 @staticmethod
381444 git_opts=None,
382445 annex_opts=None,
383446 annex_copy_opts=None,
447 jobs=None
384448 ):
385449
386450 # if ever we get a mode, for "with-data" we would need this
522586 refspec=remote_info.get('refspec', None),
523587 paths=content_by_ds[ds_path],
524588 annex_copy_options=annex_copy_opts,
525 force=force
589 force=force,
590 jobs=jobs
526591 )
527592 published.extend(pblsh)
528593 skipped.extend(skp)
3131 from datalad.interface.common_opts import recursion_flag
3232 from datalad.interface.utils import path_is_under
3333 from datalad.interface.utils import eval_results
34 from datalad.interface.utils import build_doc
34 from datalad.interface.base import build_doc
3535 from datalad.interface.results import get_status_dict
3636 from datalad.interface.save import Save
3737 from datalad.distribution.drop import _drop_files
6363 subdirectories within a dataset as always done automatically. An optional
6464 recursion limit is applied relative to each given input path.
6565
66 Examples
67 --------
68
69 Permanently remove a subdataset from a dataset and wipe out the subdataset
70 association too::
71
72 ~/some/dataset$ datalad remove somesubdataset1
66 Examples:
67
68 Permanently remove a subdataset from a dataset and wipe out the subdataset
69 association too::
70
71 ~/some/dataset$ datalad remove somesubdataset1
7372 """
7473 _action = 'remove'
7574
1717
1818 from datalad.interface.base import Interface
1919 from datalad.interface.utils import eval_results
20 from datalad.interface.utils import build_doc
20 from datalad.interface.base import build_doc
2121 from datalad.interface.results import get_status_dict
2222 from datalad.support.annexrepo import AnnexRepo
2323 from datalad.support.constraints import EnsureStr
288288 **dict(
289289 res,
290290 path=path,
291 with_annex='+' if 'annex-uuid' in res else '-',
291 with_annex='+' if 'annex-uuid' in res \
292 else ('-' if res.get('annex-ignore', None) else '?'),
292293 spec=spec)))
293294
294295
614615 if annex_description is not None:
615616 info['annex-description'] = annex_description
616617 if get_annex_info and isinstance(ds.repo, AnnexRepo):
617 for prop in ('wanted', 'required', 'group'):
618 var = ds.repo.get_preferred_content(
619 prop, '.' if remote == 'here' else remote)
620 if var:
621 info['annex-{}'.format(prop)] = var
622 groupwanted = ds.repo.get_groupwanted(remote)
623 if groupwanted:
624 info['annex-groupwanted'] = groupwanted
618 if not ds.repo.is_remote_annex_ignored(remote):
619 try:
620 for prop in ('wanted', 'required', 'group'):
621 var = ds.repo.get_preferred_content(
622 prop, '.' if remote == 'here' else remote)
623 if var:
624 info['annex-{}'.format(prop)] = var
625 groupwanted = ds.repo.get_groupwanted(remote)
626 if groupwanted:
627 info['annex-groupwanted'] = groupwanted
628 except CommandError as exc:
629 if 'cannot determine uuid' in str(exc):
630 # not an annex (or no connection), would be marked as
631 # annex-ignore
632 msg = "Failed to determine if %s carries annex." % remote
633 ds.repo.config.reload()
634 if ds.repo.is_remote_annex_ignored(remote):
635 msg += " Remote was marked by annex as annex-ignore. " \
636 "Edit .git/config to reset if you think that was done by mistake due to absent connection etc"
637 lgr.warning(msg)
638 info['annex-ignore'] = True
639 else:
640 raise
641 else:
642 info['annex-ignore'] = True
625643
626644 info['status'] = 'ok'
627645 yield info
2222
2323 from datalad.interface.base import Interface
2424 from datalad.interface.utils import eval_results
25 from datalad.interface.utils import build_doc
25 from datalad.interface.base import build_doc
2626 from datalad.interface.results import get_status_dict
2727 from datalad.support.constraints import EnsureBool
2828 from datalad.support.constraints import EnsureStr
8989 if arg[0] == test_list_4:
9090 result = ds.add('dir', to_git=arg[1], save=False)
9191 else:
92 result = ds.add(arg[0], to_git=arg[1], save=False, result_xfm='relpaths',
92 result = ds.add(arg[0], to_git=arg[1], save=False,
93 result_xfm='relpaths',
9394 return_type='item-or-list')
9495 # order depends on how annex processes it, so let's sort
9596 eq_(sorted(result), sorted(arg[0]))
102103 # ignore the initial config file in index:
103104 indexed.remove(opj('.datalad', 'config'))
104105 indexed.remove(opj('.datalad', '.gitattributes'))
106 indexed.remove('.gitattributes')
105107 if isinstance(arg[0], list):
106108 for x in arg[0]:
107109 unstaged.remove(x)
306308 @with_tree(tree={
307309 'file.txt': 'some text',
308310 'empty': '',
311 'file2.txt': 'some text to go to annex',
309312 '.gitattributes': '* annex.largefiles=(not(mimetype=text/*))'}
310313 )
311314 def test_add_mimetypes(path):
318321 ds.repo.commit('added attributes to git explicitly')
319322 # now test that those files will go into git/annex correspondingly
320323 __not_tested__ = ds.add(['file.txt', 'empty'])
321 ok_clean_git(path)
324 ok_clean_git(path, untracked=['file2.txt'])
322325 # Empty one considered to be application/octet-stream i.e. non-text
323326 ok_file_under_git(path, 'empty', annexed=True)
324327 ok_file_under_git(path, 'file.txt', annexed=False)
328
329 # But we should be able to force adding file to annex when desired
330 ds.add('file2.txt', to_git=False)
331 ok_file_under_git(path, 'file2.txt', annexed=True)
1414 from os.path import exists
1515 from os.path import basename
1616 from os.path import dirname
17 from os import mkdir
18 from os import chmod
19 from os import geteuid
1720
1821 from mock import patch
1922
2023 from datalad.api import create
2124 from datalad.api import clone
2225 from datalad.utils import chpwd
26 from datalad.utils import _path_
27 from datalad.utils import rmtree
2328 from datalad.support.exceptions import IncompleteResultsError
2429 from datalad.support.gitrepo import GitRepo
2530 from datalad.support.annexrepo import AnnexRepo
4449 from datalad.tests.utils import serve_path_via_http
4550 from datalad.tests.utils import use_cassette
4651 from datalad.tests.utils import skip_if_no_network
52 from datalad.tests.utils import skip_if_on_windows
53 from datalad.tests.utils import skip_if
4754
4855 from ..dataset import Dataset
4956
308315 assert clonedsub.path.startswith(path)
309316 # no subdataset relation
310317 eq_(cloned.subdatasets(), [])
318
319
320 @skip_if_on_windows
321 @skip_if(not geteuid(), "Will fail under super-user")
322 @with_tempfile(mkdir=True)
323 def test_clone_report_permission_issue(tdir):
324 pdir = _path_(tdir, 'protected')
325 mkdir(pdir)
326 # make it read-only
327 chmod(pdir, 0o555)
328 with chpwd(pdir):
329 res = clone('///', result_xfm=None, return_type='list', on_failure='ignore')
330 assert_status('error', res)
331 assert_result_count(
332 res, 1, status='error',
333 message="could not create work tree dir '%s/datasets.datalad.org': Permission denied" % pdir)
1010
1111 import os
1212 from os.path import join as opj
13 from os.path import lexists
1314
1415 from ..dataset import Dataset
1516 from datalad.api import create
1617 from datalad.utils import chpwd
18 from datalad.utils import _path_
1719 from datalad.cmd import Runner
1820
1921 from datalad.tests.utils import with_tempfile
22 from datalad.tests.utils import create_tree
2023 from datalad.tests.utils import eq_
2124 from datalad.tests.utils import ok_
2225 from datalad.tests.utils import assert_not_in
2730 from datalad.tests.utils import assert_in_results
2831 from datalad.tests.utils import ok_clean_git
2932 from datalad.tests.utils import with_tree
33 from datalad.tests.utils import ok_file_has_content
34 from datalad.tests.utils import ok_file_under_git
3035
3136
3237 _dataset_hierarchy_template = {
253258 # is committed -- ds2 is already known to git and it just pukes with a bit
254259 # confusing 'ds2' already exists in the index
255260 assert_in('ds2', ds1.subdatasets(result_xfm='relpaths'))
261
262
263 @with_tempfile(mkdir=True)
264 def test_create_withplugin(path):
265 # first without
266 ds = create(path)
267 assert(not lexists(opj(ds.path, 'README.rst')))
268 ds.remove()
269 assert(not lexists(ds.path))
270 # now for reals...
271 ds = create(
272 # needs to identify the dataset, otherwise post-proc
273 # plugin doesn't no what to run on
274 dataset=path,
275 run_after=[['add_readme', 'filename=with hole.txt']])
276 ok_clean_git(path)
277 # README wil lend up in annex by default
278 # TODO implement `nice_dataset` plugin to give sensible
279 # default and avoid that
280 assert(lexists(opj(ds.path, 'with hole.txt')))
281
282
283 @with_tempfile(mkdir=True)
284 def test_create_text_no_annex(path):
285 ds = create(path, text_no_annex=True)
286 ok_clean_git(path)
287 import re
288 ok_file_has_content(
289 _path_(path, '.gitattributes'),
290 content='\* annex\.largefiles=\(not\(mimetype=text/\*\)\)',
291 re_=True,
292 match=False,
293 flags=re.MULTILINE
294 )
295 # and check that it is really committing text files to git and binaries
296 # to annex
297 create_tree(path,
298 {
299 't': 'some text',
300 'b': '' # empty file is not considered to be a text file
301 # should we adjust the rule to consider only non empty files?
302 }
303 )
304 ds.add(['t', 'b'])
305 ok_file_under_git(path, 't', annexed=False)
306 ok_file_under_git(path, 'b', annexed=True)
1616
1717 from ..dataset import Dataset
1818 from datalad.api import publish, install, create_sibling
19 from datalad.cmd import Runner
1920 from datalad.utils import chpwd
2021 from datalad.tests.utils import create_tree
2122 from datalad.support.gitrepo import GitRepo
3233 from datalad.tests.utils import assert_raises
3334 from datalad.tests.utils import skip_ssh
3435 from datalad.tests.utils import assert_dict_equal
36 from datalad.tests.utils import assert_false
3537 from datalad.tests.utils import assert_set_equal
3638 from datalad.tests.utils import assert_result_count
3739 from datalad.tests.utils import assert_not_equal
7274 assert_false(exists(opj(target_path, path)))
7375
7476 hook_path = _path_(target_path, '.git/hooks/post-update')
75 ok_file_has_content(hook_path,
76 '.*\ndsdir="%s"\n.*' % target_path,
77 re_=True,
78 flags=re.DOTALL)
77 # No longer the case -- we are no longer using absolute path in the
78 # script
79 # ok_file_has_content(hook_path,
80 # '.*\ndsdir="%s"\n.*' % target_path,
81 # re_=True,
82 # flags=re.DOTALL)
83 # No absolute path (so dataset could be moved) in the hook
84 with open(hook_path) as f:
85 assert_not_in(target_path, f.read())
7986 # correct ls_json command in hook content (path wrapped in "quotes)
8087 ok_file_has_content(hook_path,
81 '.*datalad ls -a --json file "\$dsdir".*',
88 '.*datalad ls -a --json file \..*',
8289 re_=True,
8390 flags=re.DOTALL)
8491
418425
419426 @skip_ssh
420427 @with_tempfile(mkdir=True)
428 @with_tempfile
429 def test_replace_and_relative_sshpath(src_path, dst_path):
430 # We need to come up with the path relative to our current home directory
431 # https://github.com/datalad/datalad/issues/1653
432 dst_relpath = os.path.relpath(dst_path, os.path.expanduser('~'))
433 url = 'localhost:%s' % dst_relpath
434 ds = Dataset(src_path).create()
435 create_tree(ds.path, {'sub.dat': 'lots of data'})
436 ds.add('sub.dat')
437
438 ds.create_sibling(url)
439 published = ds.publish('.', to='localhost')
440 assert_in('sub.dat', published[0])
441 # verify that hook runs and there is nothing in stderr
442 # since it exits with 0 exit even if there was a problem
443 out, err = Runner(cwd=opj(dst_path, '.git'))(_path_('hooks/post-update'))
444 assert_false(out)
445 assert_false(err)
446
447 # Verify that we could replace and publish no problem
448 # https://github.com/datalad/datalad/issues/1656
449 # Strangely it spits outs IncompleteResultsError exception atm... so just
450 # checking that it fails somehow
451 assert_raises(Exception, ds.create_sibling, url)
452 ds.create_sibling(url, existing='replace')
453 published2 = ds.publish('.', to='localhost')
454 assert_in('sub.dat', published2[0])
455
456 # and one more test since in above test it would not puke ATM but just
457 # not even try to copy since it assumes that file is already there
458 create_tree(ds.path, {'sub2.dat': 'more data'})
459 ds.add('sub2.dat')
460 published3 = ds.publish(to='localhost') # we publish just git
461 assert_not_in('sub2.dat', published3[0])
462 # now publish "with" data, which should also trigger the hook!
463 # https://github.com/datalad/datalad/issues/1658
464 from glob import glob
465 from datalad.consts import WEB_META_LOG
466 logs_prior = glob(_path_(dst_path, WEB_META_LOG, '*'))
467 published4 = ds.publish('.', to='localhost')
468 assert_in('sub2.dat', published4[0])
469 logs_post = glob(_path_(dst_path, WEB_META_LOG, '*'))
470 eq_(len(logs_post), len(logs_prior) + 1)
471
472
473 @skip_ssh
474 @with_tempfile(mkdir=True)
421475 @with_tempfile(suffix="target")
422476 def _test_target_ssh_inherit(standardgroup, src_path, target_path):
423477 ds = Dataset(src_path).create()
2727 from datalad.tests.utils import assert_raises
2828 from datalad.tests.utils import assert_false
2929 from datalad.tests.utils import assert_result_count
30 from datalad.tests.utils import neq_
3031 from datalad.tests.utils import ok_clean_git
3132 from datalad.tests.utils import swallow_logs
3233 from datalad.tests.utils import create_tree
6162 name='target1')
6263 # source.publish(to='target1')
6364 with chpwd(p1):
64 # since we have only a single commit -- there is no HEAD^
65 assert_raises(ValueError, publish, to='target1', since='HEAD^')
65 # since we have only two commits (set backend, init dataset)
66 # -- there is no HEAD^^
67 assert_raises(ValueError, publish, to='target1', since='HEAD^^')
6668 # but now let's add one more commit, we should be able to pusblish
6769 source.repo.commit("msg", options=['--allow-empty'])
6870 publish(to='target1', since='HEAD^') # must not fail now
131133
132134
133135 @with_testrepos('submodule_annex', flavors=['local'])
134 @with_tempfile(mkdir=True)
135 @with_tempfile(mkdir=True)
136 @with_tempfile(mkdir=True)
137 @with_tempfile(mkdir=True)
138 def test_publish_recursive(origin, src_path, dst_path, sub1_pub, sub2_pub):
139
136 @with_tempfile
137 @with_tempfile(mkdir=True)
138 @with_tempfile(mkdir=True)
139 @with_tempfile(mkdir=True)
140 @with_tempfile(mkdir=True)
141 def test_publish_recursive(pristine_origin, origin_path, src_path, dst_path, sub1_pub, sub2_pub):
142
143 # we will be publishing back to origin, so to not alter testrepo
144 # we will first clone it
145 origin = install(origin_path, source=pristine_origin, recursive=True)
140146 # prepare src
141 source = install(src_path, source=origin, recursive=True)
147 source = install(src_path, source=origin_path, recursive=True)
142148
143149 # create plain git at target:
144150 target = GitRepo(dst_path, create=True)
193199 eq_(list(sub2_target.get_branch_commits("git-annex")),
194200 list(sub2.get_branch_commits("git-annex")))
195201
202 # we are tracking origin but origin has different git-annex, since we
203 # cloned from it, so it is not aware of our git-annex
204 neq_(list(origin.repo.get_branch_commits("git-annex")),
205 list(source.repo.get_branch_commits("git-annex")))
206 # So if we first publish to it recursively, we would update
207 # all sub-datasets since git-annex branch would need to be pushed
208 res_ = publish(dataset=source, recursive=True)
209 eq_(set(r.path for r in res_[0]),
210 set(opj(*([source.path] + x)) for x in ([], ['subm 1'], ['subm 2'])))
211 # and now should carry the same state for git-annex
212 eq_(list(origin.repo.get_branch_commits("git-annex")),
213 list(source.repo.get_branch_commits("git-annex")))
214
196215 # test for publishing with --since. By default since no changes, nothing pushed
197216 res_ = publish(dataset=source, recursive=True)
198217 eq_(set(r.path for r in res_[0]), set())
335354 # before
336355 eq_({sub1.path, sub2.path},
337356 set(result_paths))
357
358 # if we publish again -- nothing to be published
359 eq_(source.publish(to="target"), ([], []))
360 # if we drop a file and publish again -- dataset should be published
361 # since git-annex branch was updated
362 source.drop('test-annex.dat')
363 eq_(source.publish(to="target"), ([source], []))
364 eq_(source.publish(to="target"), ([], [])) # and empty again if we try again
338365
339366
340367 @skip_ssh
2929 from datalad.interface.common_opts import recursion_flag
3030 from datalad.interface.utils import path_is_under
3131 from datalad.interface.utils import eval_results
32 from datalad.interface.utils import build_doc
32 from datalad.interface.base import build_doc
3333 from datalad.interface.utils import handle_dirty_dataset
3434 from datalad.interface.results import get_status_dict
3535 from datalad.utils import rmtree
9393 subdirectories within a dataset as always done automatically. An optional
9494 recursion limit is applied relative to each given input path.
9595
96 Examples
97 --------
96 Examples:
9897
99 Uninstall a subdataset (undo installation)::
98 Uninstall a subdataset (undo installation)::
10099
101 ~/some/dataset$ datalad uninstall somesubdataset1
100 ~/some/dataset$ datalad uninstall somesubdataset1
102101
103102 """
104103 _action = 'uninstall'
1717
1818 from datalad.interface.base import Interface
1919 from datalad.interface.utils import eval_results
20 from datalad.interface.utils import build_doc
20 from datalad.interface.base import build_doc
2121 from datalad.interface.results import get_status_dict
2222 from datalad.support.constraints import EnsureStr
2323 from datalad.support.constraints import EnsureNone
187187 try:
188188 key = self._bucket.get_key(url_filepath, version_id=params.get('versionId', None))
189189 except S3ResponseError as e:
190 raise DownloadError("S3 refused to provide the key for %s from url %s: %s"
190 raise TargetFileAbsent("S3 refused to provide the key for %s from url %s: %s"
191191 % (url_filepath, url, e))
192192 if key is None:
193193 raise TargetFileAbsent("No key returned for %s from url %s" % (url_filepath, url))
+0
-114
datalad/export/__init__.py less more
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """
9
10 """
11
12 __docformat__ = 'restructuredtext'
13
14 import logging
15 from glob import glob
16 from os.path import join as opj, basename, dirname
17 from importlib import import_module
18
19 from datalad.support.param import Parameter
20 from datalad.support.constraints import EnsureNone
21 from datalad.distribution.dataset import EnsureDataset
22 from datalad.distribution.dataset import datasetmethod
23 from datalad.distribution.dataset import require_dataset
24 from datalad.dochelpers import exc_str
25
26 from datalad.interface.base import Interface
27 from datalad.interface.utils import build_doc
28
29 lgr = logging.getLogger('datalad.export')
30
31
32 def _get_exporter_names():
33 basepath = dirname(__file__)
34 return [basename(e)[:-3]
35 for e in glob(opj(basepath, '*.py'))
36 if not e.endswith('__init__.py')]
37
38
39 @build_doc
40 class Export(Interface):
41 """Export a dataset to another representation
42 """
43 # XXX prevent common args from being added to the docstring
44 _no_eval_results = True
45
46 _params_ = dict(
47 dataset=Parameter(
48 args=("-d", "--dataset"),
49 doc="""specify the dataset to export. If
50 no dataset is given, an attempt is made to identify the dataset
51 based on the current working directory.""",
52 constraints=EnsureDataset() | EnsureNone()),
53 astype=Parameter(
54 args=("astype",),
55 choices=_get_exporter_names(),
56 doc="""label of the type or format the dataset shall be exported
57 to."""),
58 output=Parameter(
59 args=('-o', '--output'),
60 doc="""output destination specification to be passes to the exporter.
61 The particular semantics of the option value depend on the actual
62 exporter. Typically, this will be a file name or a path to a
63 directory."""),
64 getcmdhelp=Parameter(
65 args=('--help-type',),
66 dest='getcmdhelp',
67 action='store_true',
68 doc="""show help for a specific export type/format"""),
69 )
70
71 @staticmethod
72 @datasetmethod(name='export')
73 def __call__(astype, dataset, getcmdhelp=False, output=None, **kwargs):
74 # get a handle on the relevant plugin module
75 import datalad.export as export_mod
76 try:
77 exmod = import_module('.%s' % (astype,), package=export_mod.__package__)
78 except ImportError as e:
79 raise ValueError("cannot load exporter '{}': {}".format(
80 astype, exc_str(e)))
81 if getcmdhelp:
82 # no result, but return the module to make the renderer do the rest
83 return (exmod, None)
84
85 ds = require_dataset(dataset, check_installed=True, purpose='exporting')
86 # call the plugin, either with the argv array from the cmdline call
87 # or directly with the kwargs
88 if 'datalad_unparsed_args' in kwargs:
89 result = exmod._datalad_export_plugin_call(
90 ds, argv=kwargs['datalad_unparsed_args'], output=output)
91 else:
92 result = exmod._datalad_export_plugin_call(
93 ds, output=output, **kwargs)
94 return (exmod, result)
95
96 @staticmethod
97 def result_renderer_cmdline(res, args):
98 exmod, result = res
99 if args.getcmdhelp:
100 # the function that prints the help was returned as result
101 if not hasattr(exmod, '_datalad_get_cmdline_help'):
102 lgr.error("export plugin '{}' does not provide help".format(exmod))
103 return
104 replacement = []
105 help = exmod._datalad_get_cmdline_help()
106 if isinstance(help, tuple):
107 help, replacement = help
108 if replacement:
109 for in_s, out_s in replacement:
110 help = help.replace(in_s, out_s + ' ' * max(0, len(in_s) - len(out_s)))
111 print(help)
112 return
113 # TODO call exporter function (if any)
+0
-89
datalad/export/tarball.py less more
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """
9
10 """
11
12 __docformat__ = 'restructuredtext'
13
14 import logging
15 import tarfile
16 import os
17
18 from mock import patch
19 from os.path import join as opj, dirname, normpath, isabs
20 from datalad.support.annexrepo import AnnexRepo
21 from datalad.utils import file_basename
22
23 lgr = logging.getLogger('datalad.export.tarball')
24
25
26 # PLUGIN API
27 def _datalad_export_plugin_call(dataset, output, argv=None):
28 if argv:
29 lgr.warn("tarball exporter ignores any additional options '{}'".format(
30 argv))
31
32 repo = dataset.repo
33 committed_date = repo.get_committed_date()
34
35 # could be used later on to filter files by some criterion
36 def _filter_tarinfo(ti):
37 # Reset the date to match the one of the last commit, not from the
38 # filesystem since git doesn't track those at all
39 # TODO: use the date of the last commit when any particular
40 # file was changed -- would be the most kosher yoh thinks to the
41 # degree of our abilities
42 ti.mtime = committed_date
43 return ti
44
45 if output is None:
46 output = "datalad_{}.tar.gz".format(dataset.id)
47 else:
48 if not output.endswith('.tar.gz'):
49 output += '.tar.gz'
50
51 root = dataset.path
52 # use dir inside matching the output filename
53 # TODO: could be an option to the export plugin allowing empty value
54 # for no leading dir
55 leading_dir = file_basename(output)
56
57 # workaround for inability to pass down the time stamp
58 with patch('time.time', return_value=committed_date), \
59 tarfile.open(output, "w:gz") as tar:
60 repo_files = sorted(repo.get_indexed_files())
61 if isinstance(repo, AnnexRepo):
62 annexed = repo.is_under_annex(
63 repo_files, allow_quick=True, batch=True)
64 else:
65 annexed = [False] * len(repo_files)
66 for i, rpath in enumerate(repo_files):
67 fpath = opj(root, rpath)
68 if annexed[i]:
69 # resolve to possible link target
70 link_target = os.readlink(fpath)
71 if not isabs(link_target):
72 link_target = normpath(opj(dirname(fpath), link_target))
73 fpath = link_target
74 # name in the tarball
75 aname = normpath(opj(leading_dir, rpath))
76 tar.add(
77 fpath,
78 arcname=aname,
79 recursive=False,
80 filter=_filter_tarinfo)
81
82 # I think it might better return "final" filename where stuff was saved
83 return output
84
85
86 # PLUGIN API
87 def _datalad_get_cmdline_help():
88 return 'Just call it, and it will produce a tarball.'
+0
-13
datalad/export/tests/__init__.py less more
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """Interfaces tests
9
10 """
11
12 __docformat__ = 'restructuredtext'
+0
-87
datalad/export/tests/test_tarball.py less more
0 # emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # -*- coding: utf-8 -*-
2 # ex: set sts=4 ts=4 sw=4 noet:
3 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
4 #
5 # See COPYING file distributed along with the datalad package for the
6 # copyright and license terms.
7 #
8 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
9 """Test tarball exporter"""
10
11 import os
12 import time
13 from os.path import join as opj
14 from os.path import isabs
15 import tarfile
16
17 from datalad.api import Dataset
18 from datalad.api import export
19 from datalad.utils import chpwd
20 from datalad.utils import md5sum
21
22 from datalad.tests.utils import with_tree
23 from datalad.tests.utils import ok_startswith
24 from datalad.tests.utils import assert_true, assert_not_equal, assert_raises, \
25 assert_false, assert_equal
26
27
28 _dataset_template = {
29 'ds': {
30 'file_up': 'some_content',
31 'dir': {
32 'file1_down': 'one',
33 'file2_down': 'two'}}}
34
35
36 @with_tree(_dataset_template)
37 def test_failure(path):
38 ds = Dataset(opj(path, 'ds')).create(force=True)
39 # unknown exporter
40 assert_raises(ValueError, ds.export, 'nah')
41 # non-existing dataset
42 assert_raises(ValueError, export, 'tarball', Dataset('nowhere'))
43
44
45 @with_tree(_dataset_template)
46 def test_tarball(path):
47 ds = Dataset(opj(path, 'ds')).create(force=True)
48 ds.add('.')
49 committed_date = ds.repo.get_committed_date()
50 with chpwd(path):
51 _mod, tarball1 = ds.export('tarball')
52 assert(not isabs(tarball1))
53 tarball1 = opj(path, tarball1)
54 default_outname = opj(path, 'datalad_{}.tar.gz'.format(ds.id))
55 assert_equal(tarball1, default_outname)
56 assert_true(os.path.exists(default_outname))
57 custom_outname = opj(path, 'myexport.tar.gz')
58 # feed in without extension
59 ds.export('tarball', output=custom_outname[:-7])
60 assert_true(os.path.exists(custom_outname))
61 custom1_md5 = md5sum(custom_outname)
62 # encodes the original tarball filename -> different checksum, despit
63 # same content
64 assert_not_equal(md5sum(default_outname), custom1_md5)
65 # should really sleep so if they stop using time.time - we know
66 time.sleep(1.1)
67 ds.export('tarball', output=custom_outname)
68 # should not encode mtime, so should be identical
69 assert_equal(md5sum(custom_outname), custom1_md5)
70
71 def check_contents(outname, prefix):
72 with tarfile.open(outname) as tf:
73 nfiles = 0
74 for ti in tf:
75 # any annex links resolved
76 assert_false(ti.issym())
77 ok_startswith(ti.name, prefix + '/')
78 assert_equal(ti.mtime, committed_date)
79 if '.datalad' not in ti.name:
80 # ignore any files in .datalad for this test to not be
81 # susceptible to changes in how much we generate a meta info
82 nfiles += 1
83 # we have exactly three files, and expect no content for any directory
84 assert_equal(nfiles, 3)
85 check_contents(default_outname, 'datalad_%s' % ds.id)
86 check_contents(custom_outname, 'myexport')
4040 'create-sibling-github'),
4141 ('datalad.interface.unlock', 'Unlock', 'unlock'),
4242 ('datalad.interface.save', 'Save', 'save'),
43 ('datalad.export', 'Export', 'export'),
43 ('datalad.plugin', 'Plugin', 'plugin'),
4444 ])
4545
4646 _group_metadata = (
2727 from os.path import normpath
2828
2929 from .base import Interface
30 from datalad.interface.utils import build_doc
30 from datalad.interface.base import build_doc
3131 from .common_opts import allow_dirty
3232 from ..consts import ARCHIVES_SPECIAL_REMOTE
3333 from ..support.param import Parameter
2424
2525 from datalad.interface.base import Interface
2626 from datalad.interface.utils import eval_results
27 from datalad.interface.utils import build_doc
27 from datalad.interface.base import build_doc
2828 from datalad.interface.results import get_status_dict
2929 from datalad.support.constraints import EnsureStr
3030 from datalad.support.constraints import EnsureBool
2323 from ..ui import ui
2424 from ..dochelpers import exc_str
2525
26 from datalad.interface.common_opts import eval_params
27 from datalad.interface.common_opts import eval_defaults
2628 from datalad.support.exceptions import InsufficientArgumentsError
2729 from datalad.utils import with_pathsep as _with_sep
2830 from datalad.support.constraints import EnsureKeyChoice
2931 from datalad.distribution.dataset import Dataset
3032 from datalad.distribution.dataset import resolve_path
33
34
35 default_logchannels = {
36 '': 'debug',
37 'ok': 'debug',
38 'notneeded': 'debug',
39 'impossible': 'warning',
40 'error': 'error',
41 }
3142
3243
3344 def get_api_name(intfspec):
241252 # assign the amended docs
242253 func.__doc__ = doc
243254 return func
255
256
257 def build_doc(cls, **kwargs):
258 """Decorator to build docstrings for datalad commands
259
260 It's intended to decorate the class, the __call__-method of which is the
261 actual command. It expects that __call__-method to be decorated by
262 eval_results.
263
264 Parameters
265 ----------
266 cls: Interface
267 class defining a datalad command
268 """
269
270 # Note, that this is a class decorator, which is executed only once when the
271 # class is imported. It builds the docstring for the class' __call__ method
272 # and returns the original class.
273 #
274 # This is because a decorator for the actual function would not be able to
275 # behave like this. To build the docstring we need to access the attribute
276 # _params of the class. From within a function decorator we cannot do this
277 # during import time, since the class is being built in this very moment and
278 # is not yet available in the module. And if we do it from within the part
279 # of a function decorator, that is executed when the function is called, we
280 # would need to actually call the command once in order to build this
281 # docstring.
282
283 lgr.debug("Building doc for {}".format(cls))
284
285 cls_doc = cls.__doc__
286 if hasattr(cls, '_docs_'):
287 # expand docs
288 cls_doc = cls_doc.format(**cls._docs_)
289
290 call_doc = None
291 # suffix for update_docstring_with_parameters:
292 if cls.__call__.__doc__:
293 call_doc = cls.__call__.__doc__
294
295 # build standard doc and insert eval_doc
296 spec = getattr(cls, '_params_', dict())
297 # get docs for eval_results parameters:
298 spec.update(eval_params)
299
300 update_docstring_with_parameters(
301 cls.__call__, spec,
302 prefix=alter_interface_docs_for_api(cls_doc),
303 suffix=alter_interface_docs_for_api(call_doc),
304 add_args=eval_defaults if not hasattr(cls, '_no_eval_results') else None
305 )
306
307 # return original
308 return cls
244309
245310
246311 class Interface(object):
323388 'AddArchiveContent', 'AggregateMetaData',
324389 'CrawlInit', 'Crawl', 'CreateSiblingGithub',
325390 'CreateTestDataset', 'DownloadURL', 'Export', 'Ls', 'Move',
326 'Publish', 'SSHRun', 'Search'):
391 'Publish', 'SSHRun', 'Search', 'Test'):
327392 # set all common args explicitly to override class defaults
328393 # that are tailored towards the the Python API
329394 kwargs['return_type'] = 'generator'
482547 return content_by_ds, unavailable_paths
483548
484549
485 def merge_allargs2kwargs(call, args, kwargs):
486 """Generate a kwargs dict from a call signature and *args, **kwargs"""
550 def get_allargs_as_kwargs(call, args, kwargs):
551 """Generate a kwargs dict from a call signature and *args, **kwargs
552
553 Basically resolving the argnames for all positional arguments, and
554 resolvin the defaults for all kwargs that are not given in a kwargs
555 dict
556 """
487557 from inspect import getargspec
488558 argspec = getargspec(call)
489559 defaults = argspec.defaults
498568 kwargs_[k] = v
499569 # update with provided kwarg args
500570 kwargs_.update(kwargs)
501 assert (nargs == len(kwargs_))
571 # XXX we cannot assert the following, because our own highlevel
572 # API commands support more kwargs than what is discoverable
573 # from their signature...
574 #assert (nargs == len(kwargs_))
502575 return kwargs_
2626 from datalad.interface.common_opts import recursion_limit
2727 from datalad.interface.results import get_status_dict
2828 from datalad.interface.utils import eval_results
29 from datalad.interface.utils import build_doc
29 from datalad.interface.base import build_doc
3030
3131 from logging import getLogger
3232 lgr = getLogger('datalad.api.clean')
1212 __docformat__ = 'restructuredtext'
1313
1414 from appdirs import AppDirs
15 from os.path import join as opj
1516 from datalad.support.constraints import EnsureBool
1617 from datalad.support.constraints import EnsureInt
1718
6768 'destination': 'global',
6869 'default': dirs.user_cache_dir,
6970 },
71 'datalad.locations.system-plugins': {
72 'ui': ('question', {
73 'title': 'System plugin directory',
74 'text': 'Where should datalad search for system plugins?'}),
75 'destination': 'global',
76 'default': opj(dirs.site_config_dir, 'plugins'),
77 },
78 'datalad.locations.user-plugins': {
79 'ui': ('question', {
80 'title': 'User plugin directory',
81 'text': 'Where should datalad search for user plugins?'}),
82 'destination': 'global',
83 'default': opj(dirs.user_config_dir, 'plugins'),
84 },
7085 'datalad.exc.str.tblimit': {
7186 'ui': ('question', {
7287 'title': 'This flag is used by the datalad extract_tb function which extracts and formats stack-traces. It caps the number of lines to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.'}),
1111
1212 __docformat__ = 'restructuredtext'
1313
14 from datalad.interface.results import known_result_xfms
1415 from datalad.support.param import Parameter
1516 from datalad.support.constraints import EnsureInt, EnsureNone, EnsureStr
1617 from datalad.support.constraints import EnsureChoice
18 from datalad.support.constraints import EnsureCallable
1719
1820
1921 location_description = Parameter(
214216 By default it would fail the run ('fail' setting). With 'inherit' a
215217 'create-sibling' with '--inherit-settings' will be used to create sibling
216218 on the remote. With 'skip' - it simply will be skipped.""")
219
220 with_plugin_opt = Parameter(
221 args=('--with-plugin',),
222 nargs='*',
223 action='append',
224 metavar='PLUGINSPEC',
225 doc="""DataLad plugin to run in addition. PLUGINSPEC is a list
226 comprised of a plugin name plus optional `key=value` pairs with arguments
227 for the plugin call (see `plugin` command documentation for details).
228 [PY: PLUGINSPECs must be wrapped in list where each item configures
229 one plugin call. Plugins are called in the order defined by this list.
230 PY][CMD: This option can be given more than once to run multiple plugins
231 in the order in which they are given. CMD]""")
232
233 # define parameters to be used by eval_results to tune behavior
234 # Note: This is done outside eval_results in order to be available when building
235 # docstrings for the decorated functions
236 # TODO: May be we want to move them to be part of the classes _params. Depends
237 # on when and how eval_results actually has to determine the class.
238 # Alternatively build a callable class with these to even have a fake signature
239 # that matches the parameters, so they can be evaluated and defined the exact
240 # same way.
241
242 eval_params = dict(
243 return_type=Parameter(
244 doc="""return value behavior switch. If 'item-or-list' a single
245 value is returned instead of a one-item return value list, or a
246 list in case of multiple return values. `None` is return in case
247 of an empty list.""",
248 constraints=EnsureChoice('generator', 'list', 'item-or-list')),
249 result_filter=Parameter(
250 doc="""if given, each to-be-returned
251 status dictionary is passed to this callable, and is only
252 returned if the callable's return value does not
253 evaluate to False or a ValueError exception is raised. If the given
254 callable supports `**kwargs` it will additionally be passed the
255 keyword arguments of the original API call.""",
256 constraints=EnsureCallable() | EnsureNone()),
257 result_xfm=Parameter(
258 doc="""if given, each to-be-returned result
259 status dictionary is passed to this callable, and its return value
260 becomes the result instead. This is different from
261 `result_filter`, as it can perform arbitrary transformation of the
262 result value. This is mostly useful for top-level command invocations
263 that need to provide the results in a particular format. Instead of
264 a callable, a label for a pre-crafted result transformation can be
265 given.""",
266 constraints=EnsureChoice(*list(known_result_xfms.keys())) | EnsureCallable() | EnsureNone()),
267 result_renderer=Parameter(
268 doc="""format of return value rendering on stdout""",
269 constraints=EnsureChoice('default', 'json', 'json_pp', 'tailored') | EnsureNone()),
270 on_failure=Parameter(
271 doc="""behavior to perform on failure: 'ignore' any failure is reported,
272 but does not cause an exception; 'continue' if any failure occurs an
273 exception will be raised at the end, but processing other actions will
274 continue for as long as possible; 'stop': processing will stop on first
275 failure and an exception is raised. A failure is any result with status
276 'impossible' or 'error'. Raised exception is an IncompleteResultsError
277 that carries the result dictionaries of the failures in its `failed`
278 attribute.""",
279 constraints=EnsureChoice('ignore', 'continue', 'stop')),
280 run_before=Parameter(
281 doc="""DataLad plugin to run before the command. PLUGINSPEC is a list
282 comprised of a plugin name plus optional 2-tuples of key-value pairs
283 with arguments for the plugin call (see `plugin` command documentation
284 for details).
285 PLUGINSPECs must be wrapped in list where each item configures
286 one plugin call. Plugins are called in the order defined by this list.
287 For running plugins that require a `dataset` argument it is important
288 to provide the respective dataset as the `dataset` argument of the main
289 command, if it is not in the list of plugin arguments."""),
290 run_after=Parameter(
291 doc="""Like `run_before`, but plugins are executed after the main command
292 has finished."""),
293 )
294
295 eval_defaults = dict(
296 return_type='list',
297 result_filter=None,
298 result_renderer=None,
299 result_xfm=None,
300 on_failure='continue',
301 run_before=None,
302 run_after=None,
303 )
1212
1313 from os.path import exists
1414 from .base import Interface
15 from datalad.interface.utils import build_doc
15 from datalad.interface.base import build_doc
1616
1717 from datalad.support.param import Parameter
1818 from datalad.support.constraints import EnsureStr, EnsureNone
1111
1212 from os.path import curdir
1313 from .base import Interface
14 from datalad.interface.utils import build_doc
14 from datalad.interface.base import build_doc
1515 from collections import OrderedDict
1616 from datalad.distribution.dataset import Dataset
1717
2121 from datalad.interface.annotate_paths import annotated2content_by_ds
2222 from datalad.interface.base import Interface
2323 from datalad.interface.utils import eval_results
24 from datalad.interface.utils import build_doc
24 from datalad.interface.base import build_doc
2525 from datalad.support.constraints import EnsureNone
2626 from datalad.support.constraints import EnsureStr
2727 from datalad.support.constraints import EnsureChoice
1818 from os.path import isdir, curdir
1919
2020 from .base import Interface
21 from datalad.interface.utils import build_doc
21 from datalad.interface.base import build_doc
2222 from ..ui import ui
2323 from ..utils import assure_list_from_str
2424 from ..dochelpers import exc_str
2525 from ..cmdline.helpers import get_repo_instance
2626 from ..utils import auto_repr
2727 from .base import Interface
28 from datalad.interface.utils import build_doc
28 from datalad.interface.base import build_doc
2929 from ..ui import ui
3030 from ..utils import swallow_logs
3131 from ..consts import METADATA_DIR
5353
5454 ATM only s3:// URLs and datasets are supported
5555
56 Examples
57 --------
56 Examples:
5857
5958 $ datalad ls s3://openfmri/tarballs/ds202 # to list S3 bucket
6059 $ datalad ls # to list current dataset
3333 from datalad.interface.common_opts import save_message_opt
3434 from datalad.interface.results import get_status_dict
3535 from datalad.interface.utils import eval_results
36 from datalad.interface.utils import build_doc
36 from datalad.interface.base import build_doc
3737 from datalad.interface.utils import get_tree_roots
3838 from datalad.interface.utils import discover_dataset_trace_to_targets
3939
1313
1414 import datalad
1515 from .base import Interface
16 from datalad.interface.utils import build_doc
16 from datalad.interface.base import build_doc
1717
1818
1919 @build_doc
3434
3535 from ..base import Interface
3636 from ..utils import eval_results
37 from ..utils import build_doc
37 from datalad.interface.base import build_doc
3838 from ..utils import handle_dirty_dataset
3939 from ..utils import get_paths_by_dataset
4040 from ..utils import filter_unmodified
2525 from datalad.interface.annotate_paths import annotated2content_by_ds
2626 from datalad.interface.results import get_status_dict
2727 from datalad.interface.utils import eval_results
28 from datalad.interface.utils import build_doc
28 from datalad.interface.base import build_doc
2929 from datalad.interface.common_opts import recursion_flag
3030 from datalad.interface.common_opts import recursion_limit
3131
1515 import logging
1616 import wrapt
1717 import sys
18 import re
19 import shlex
1820 from os import curdir
1921 from os import pardir
2022 from os import listdir
21 from os import linesep
2223 from os.path import join as opj
2324 from os.path import lexists
2425 from os.path import isdir
4546 from datalad import cfg as dlcfg
4647 from datalad.dochelpers import exc_str
4748
49
4850 from datalad.support.constraints import Constraint
49 from datalad.support.constraints import EnsureChoice
50 from datalad.support.constraints import EnsureNone
51 from datalad.support.constraints import EnsureCallable
52 from datalad.support.param import Parameter
5351
5452 from datalad.ui import ui
55
56 from .base import Interface
57 from .base import update_docstring_with_parameters
58 from .base import alter_interface_docs_for_api
59 from .base import merge_allargs2kwargs
53 import datalad.support.ansi_colors as ac
54
55 from datalad.interface.base import Interface
56 from datalad.interface.base import default_logchannels
57 from datalad.interface.base import get_allargs_as_kwargs
58 from datalad.interface.common_opts import eval_params
59 from datalad.interface.common_opts import eval_defaults
6060 from .results import known_result_xfms
6161
6262
6363 lgr = logging.getLogger('datalad.interface.utils')
64
65
66 def cls2cmdlinename(cls):
67 "Return the cmdline command name from an Interface class"
68 r = re.compile(r'([a-z0-9])([A-Z])')
69 return r.sub('\\1-\\2', cls.__name__).lower()
6470
6571
6672 def handle_dirty_dataset(ds, mode, msg=None):
507513 return keep
508514
509515
510 # define parameters to be used by eval_results to tune behavior
511 # Note: This is done outside eval_results in order to be available when building
512 # docstrings for the decorated functions
513 # TODO: May be we want to move them to be part of the classes _params. Depends
514 # on when and how eval_results actually has to determine the class.
515 # Alternatively build a callable class with these to even have a fake signature
516 # that matches the parameters, so they can be evaluated and defined the exact
517 # same way.
518
519 eval_params = dict(
520 return_type=Parameter(
521 doc="""return value behavior switch. If 'item-or-list' a single
522 value is returned instead of a one-item return value list, or a
523 list in case of multiple return values. `None` is return in case
524 of an empty list.""",
525 constraints=EnsureChoice('generator', 'list', 'item-or-list')),
526 result_filter=Parameter(
527 doc="""if given, each to-be-returned
528 status dictionary is passed to this callable, and is only
529 returned if the callable's return value does not
530 evaluate to False or a ValueError exception is raised. If the given
531 callable supports `**kwargs` it will additionally be passed the
532 keyword arguments of the original API call.""",
533 constraints=EnsureCallable() | EnsureNone()),
534 result_xfm=Parameter(
535 doc="""if given, each to-be-returned result
536 status dictionary is passed to this callable, and its return value
537 becomes the result instead. This is different from
538 `result_filter`, as it can perform arbitrary transformation of the
539 result value. This is mostly useful for top-level command invocations
540 that need to provide the results in a particular format. Instead of
541 a callable, a label for a pre-crafted result transformation can be
542 given.""",
543 constraints=EnsureChoice(*list(known_result_xfms.keys())) | EnsureCallable() | EnsureNone()),
544 result_renderer=Parameter(
545 doc="""format of return value rendering on stdout""",
546 constraints=EnsureChoice('default', 'json', 'json_pp', 'tailored') | EnsureNone()),
547 on_failure=Parameter(
548 doc="""behavior to perform on failure: 'ignore' any failure is reported,
549 but does not cause an exception; 'continue' if any failure occurs an
550 exception will be raised at the end, but processing other actions will
551 continue for as long as possible; 'stop': processing will stop on first
552 failure and an exception is raised. A failure is any result with status
553 'impossible' or 'error'. Raised exception is an IncompleteResultsError
554 that carries the result dictionaries of the failures in its `failed`
555 attribute.""",
556 constraints=EnsureChoice('ignore', 'continue', 'stop')),
557 )
558 eval_defaults = dict(
559 return_type='list',
560 result_filter=None,
561 result_renderer=None,
562 result_xfm=None,
563 on_failure='continue',
564 )
565
566
567516 def eval_results(func):
568517 """Decorator for return value evaluation of datalad commands.
569518
605554 i.e. a datalad command definition
606555 """
607556
608 default_logchannels = {
609 '': 'debug',
610 'ok': 'debug',
611 'notneeded': 'debug',
612 'impossible': 'warning',
613 'error': 'error',
614 }
615
616557 @wrapt.decorator
617558 def eval_func(wrapped, instance, args, kwargs):
618
559 # for result filters and pre/post plugins
560 # we need to produce a dict with argname/argvalue pairs for all args
561 # incl. defaults and args given as positionals
562 allkwargs = get_allargs_as_kwargs(wrapped, args, kwargs)
619563 # determine class, the __call__ method of which we are decorating:
620564 # Ben: Note, that this is a bit dirty in PY2 and imposes restrictions on
621565 # when and how to use eval_results as well as on how to name a command's
644588 _func_class = mod.__dict__[command_class_name]
645589 lgr.debug("Determined class of decorated function: %s", _func_class)
646590
591 # retrieve common options from kwargs, and fall back on the command
592 # class attributes, or general defaults if needed
647593 common_params = {
648594 p_name: kwargs.pop(
649595 p_name,
650596 getattr(_func_class, p_name, eval_defaults[p_name]))
651597 for p_name in eval_params}
598 # short cuts and configured setup for common options
599 on_failure = common_params['on_failure']
600 return_type = common_params['return_type']
601 # resolve string labels for transformers too
602 result_xfm = common_params['result_xfm']
603 if result_xfm in known_result_xfms:
604 result_xfm = known_result_xfms[result_xfm]
652605 result_renderer = common_params['result_renderer']
653
606 # TODO remove this conditional branch entirely, done outside
607 if not result_renderer:
608 result_renderer = dlcfg.get('datalad.api.result-renderer', None)
609 # wrap the filter into a helper to be able to pass additional arguments
610 # if the filter supports it, but at the same time keep the required interface
611 # as minimal as possible. Also do this here, in order to avoid this test
612 # to be performed for each return value
613 result_filter = common_params['result_filter']
614 _result_filter = result_filter
615 if result_filter:
616 if isinstance(result_filter, Constraint):
617 _result_filter = result_filter.__call__
618 if (PY2 and inspect.getargspec(_result_filter).keywords) or \
619 (not PY2 and inspect.getfullargspec(_result_filter).varkw):
620
621 def _result_filter(res):
622 return result_filter(res, **allkwargs)
623
624 def _get_plugin_specs(param_key=None, cfg_key=None):
625 spec = common_params.get(param_key, None)
626 if spec is not None:
627 # this is already a list of lists
628 return spec
629
630 spec = dlcfg.get(cfg_key, None)
631 if spec is None:
632 return
633 elif not isinstance(spec, tuple):
634 spec = [spec]
635 return [shlex.split(s) for s in spec]
636
637 # query cfg for defaults
638 cmdline_name = cls2cmdlinename(_func_class)
639 run_before = _get_plugin_specs(
640 'run_before',
641 'datalad.{}.run-before'.format(cmdline_name))
642 run_after = _get_plugin_specs(
643 'run_after',
644 'datalad.{}.run-after'.format(cmdline_name))
645
646 # this internal helper function actually drives the command
647 # generator-style, it may generate an exception if desired,
648 # on incomplete results
654649 def generator_func(*_args, **_kwargs):
655 # obtain results
656 results = wrapped(*_args, **_kwargs)
650 from datalad.plugin import Plugin
651
657652 # flag whether to raise an exception
658 # TODO actually compose a meaningful exception
659653 incomplete_results = []
660 # inspect and render
661 result_filter = common_params['result_filter']
662 # wrap the filter into a helper to be able to pass additional arguments
663 # if the filter supports it, but at the same time keep the required interface
664 # as minimal as possible. Also do this here, in order to avoid this test
665 # to be performed for each return value
666 _result_filter = result_filter
667 if result_filter:
668 if isinstance(result_filter, Constraint):
669 _result_filter = result_filter.__call__
670 if (PY2 and inspect.getargspec(_result_filter).keywords) or \
671 (not PY2 and inspect.getfullargspec(_result_filter).varkw):
672 # we need to produce a dict with argname/argvalue pairs for all args
673 # incl. defaults and args given as positionals
674 fullkwargs_ = merge_allargs2kwargs(wrapped, _args, _kwargs)
675
676 def _result_filter(res):
677 return result_filter(res, **fullkwargs_)
678 result_renderer = common_params['result_renderer']
679 result_xfm = common_params['result_xfm']
680 if result_xfm in known_result_xfms:
681 result_xfm = known_result_xfms[result_xfm]
682 on_failure = common_params['on_failure']
683 if not result_renderer:
684 result_renderer = dlcfg.get('datalad.api.result-renderer', None)
685654 # track what actions were performed how many times
686655 action_summary = {}
687 for res in results:
688 actsum = action_summary.get(res['action'], {})
689 if res['status']:
690 actsum[res['status']] = actsum.get(res['status'], 0) + 1
691 action_summary[res['action']] = actsum
692 ## log message, if a logger was given
693 # remove logger instance from results, as it is no longer useful
694 # after logging was done, it isn't serializable, and generally
695 # pollutes the output
696 res_lgr = res.pop('logger', None)
697 if isinstance(res_lgr, logging.Logger):
698 # didn't get a particular log function, go with default
699 res_lgr = getattr(res_lgr, default_logchannels[res['status']])
700 if res_lgr and 'message' in res:
701 msg = res['message']
702 msgargs = None
703 if isinstance(msg, tuple):
704 msgargs = msg[1:]
705 msg = msg[0]
706 if 'path' in res:
707 msg = '{} [{}({})]'.format(
708 msg, res['action'], res['path'])
709 if msgargs:
710 # support string expansion of logging to avoid runtime cost
711 res_lgr(msg, *msgargs)
712 else:
713 res_lgr(msg)
714 ## error handling
715 # looks for error status, and report at the end via
716 # an exception
717 if on_failure in ('continue', 'stop') \
718 and res['status'] in ('impossible', 'error'):
719 incomplete_results.append(res)
720 if on_failure == 'stop':
721 # first fail -> that's it
722 # raise will happen after the loop
723 break
724 if _result_filter:
725 try:
726 if not _result_filter(res):
727 raise ValueError('excluded by filter')
728 except ValueError as e:
729 lgr.debug('not reporting result (%s)', exc_str(e))
730 continue
731 ## output rendering
732 if result_renderer == 'default':
733 # TODO have a helper that can expand a result message
734 ui.message('{action}({status}): {path}{type}{msg}'.format(
735 action=res['action'],
736 status=res['status'],
737 path=relpath(res['path'],
738 res['refds']) if res.get('refds', None) else res['path'],
739 type=' ({})'.format(res['type']) if 'type' in res else '',
740 msg=' [{}]'.format(
741 res['message'][0] % res['message'][1:]
742 if isinstance(res['message'], tuple) else res['message'])
743 if 'message' in res else ''))
744 elif result_renderer in ('json', 'json_pp'):
745 ui.message(json.dumps(
746 {k: v for k, v in res.items()
747 if k not in ('message', 'logger')},
748 sort_keys=True,
749 indent=2 if result_renderer.endswith('_pp') else None))
750 elif result_renderer == 'tailored':
751 if hasattr(_func_class, 'custom_result_renderer'):
752 _func_class.custom_result_renderer(res, **_kwargs)
753 elif hasattr(result_renderer, '__call__'):
754 result_renderer(res, **_kwargs)
755 if result_xfm:
756 res = result_xfm(res)
757 if res is None:
758 continue
759 yield res
760
656
657 for pluginspec in run_before or []:
658 lgr.debug('Running pre-proc plugin %s', pluginspec)
659 for r in _process_results(
660 Plugin.__call__(
661 pluginspec,
662 dataset=allkwargs.get('dataset', None),
663 return_type='generator'),
664 _func_class, action_summary,
665 on_failure, incomplete_results,
666 result_renderer, result_xfm, result_filter,
667 **_kwargs):
668 yield r
669
670 # process main results
671 for r in _process_results(
672 wrapped(*_args, **_kwargs),
673 _func_class, action_summary,
674 on_failure, incomplete_results,
675 result_renderer, result_xfm, _result_filter, **_kwargs):
676 yield r
677
678 for pluginspec in run_after or []:
679 lgr.debug('Running post-proc plugin %s', pluginspec)
680 for r in _process_results(
681 Plugin.__call__(
682 pluginspec,
683 dataset=allkwargs.get('dataset', None),
684 return_type='generator'),
685 _func_class, action_summary,
686 on_failure, incomplete_results,
687 result_renderer, result_xfm, result_filter,
688 **_kwargs):
689 yield r
690
691 # result summary before a potential exception
761692 if result_renderer == 'default' and action_summary and \
762693 sum(sum(s.values()) for s in action_summary.values()) > 1:
763694 # give a summary in default mode, when there was more than one
770701 for act in sorted(action_summary))))
771702
772703 if incomplete_results:
773 # stupid catch all message <- tailor TODO
774704 raise IncompleteResultsError(
775705 failed=incomplete_results,
776706 msg="Command did not complete successfully")
777707
778 if common_params['return_type'] == 'generator':
708 if return_type == 'generator':
709 # hand over the generator
779710 return generator_func(*args, **kwargs)
780711 else:
781712 @wrapt.decorator
782713 def return_func(wrapped_, instance_, args_, kwargs_):
783714 results = wrapped_(*args_, **kwargs_)
784715 if inspect.isgenerator(results):
716 # unwind generator if there is one, this actually runs
717 # any processing
785718 results = list(results)
786719 # render summaries
787 if not common_params['result_xfm'] and result_renderer == 'tailored':
720 if not result_xfm and result_renderer == 'tailored':
788721 # cannot render transformed results
789722 if hasattr(_func_class, 'custom_result_summary_renderer'):
790723 _func_class.custom_result_summary_renderer(results)
791 if common_params['return_type'] == 'item-or-list' and \
724 if return_type == 'item-or-list' and \
792725 len(results) < 2:
793726 return results[0] if results else None
794727 else:
799732 return eval_func(func)
800733
801734
802 def build_doc(cls, **kwargs):
803 """Decorator to build docstrings for datalad commands
804
805 It's intended to decorate the class, the __call__-method of which is the
806 actual command. It expects that __call__-method to be decorated by
807 eval_results.
808
809 Parameters
810 ----------
811 cls: Interface
812 class defining a datalad command
813 """
814
815 # Note, that this is a class decorator, which is executed only once when the
816 # class is imported. It builds the docstring for the class' __call__ method
817 # and returns the original class.
818 #
819 # This is because a decorator for the actual function would not be able to
820 # behave like this. To build the docstring we need to access the attribute
821 # _params of the class. From within a function decorator we cannot do this
822 # during import time, since the class is being built in this very moment and
823 # is not yet available in the module. And if we do it from within the part
824 # of a function decorator, that is executed when the function is called, we
825 # would need to actually call the command once in order to build this
826 # docstring.
827
828 lgr.debug("Building doc for {}".format(cls))
829
830 cls_doc = cls.__doc__
831 if hasattr(cls, '_docs_'):
832 # expand docs
833 cls_doc = cls_doc.format(**cls._docs_)
834
835 call_doc = None
836 # suffix for update_docstring_with_parameters:
837 if cls.__call__.__doc__:
838 call_doc = cls.__call__.__doc__
839
840 # build standard doc and insert eval_doc
841 spec = getattr(cls, '_params_', dict())
842 # get docs for eval_results parameters:
843 spec.update(eval_params)
844
845 update_docstring_with_parameters(
846 cls.__call__, spec,
847 prefix=alter_interface_docs_for_api(cls_doc),
848 suffix=alter_interface_docs_for_api(call_doc),
849 add_args=eval_defaults if not hasattr(cls, '_no_eval_results') else None
850 )
851
852 # return original
853 return cls
735 def _process_results(
736 results, cmd_class,
737 action_summary, on_failure, incomplete_results,
738 result_renderer, result_xfm, result_filter, **kwargs):
739 # private helper pf @eval_results
740 # loop over results generated from some source and handle each
741 # of them according to the requested behavior (logging, rendering, ...)
742 for res in results:
743 actsum = action_summary.get(res['action'], {})
744 if res['status']:
745 actsum[res['status']] = actsum.get(res['status'], 0) + 1
746 action_summary[res['action']] = actsum
747 ## log message, if a logger was given
748 # remove logger instance from results, as it is no longer useful
749 # after logging was done, it isn't serializable, and generally
750 # pollutes the output
751 res_lgr = res.pop('logger', None)
752 if isinstance(res_lgr, logging.Logger):
753 # didn't get a particular log function, go with default
754 res_lgr = getattr(res_lgr, default_logchannels[res['status']])
755 if res_lgr and 'message' in res:
756 msg = res['message']
757 msgargs = None
758 if isinstance(msg, tuple):
759 msgargs = msg[1:]
760 msg = msg[0]
761 if 'path' in res:
762 msg = '{} [{}({})]'.format(
763 msg, res['action'], res['path'])
764 if msgargs:
765 # support string expansion of logging to avoid runtime cost
766 res_lgr(msg, *msgargs)
767 else:
768 res_lgr(msg)
769 ## error handling
770 # looks for error status, and report at the end via
771 # an exception
772 if on_failure in ('continue', 'stop') \
773 and res['status'] in ('impossible', 'error'):
774 incomplete_results.append(res)
775 if on_failure == 'stop':
776 # first fail -> that's it
777 # raise will happen after the loop
778 break
779 if result_filter:
780 try:
781 if not result_filter(res):
782 raise ValueError('excluded by filter')
783 except ValueError as e:
784 lgr.debug('not reporting result (%s)', exc_str(e))
785 continue
786 ## output rendering
787 # TODO RF this in a simple callable that gets passed into this function
788 if result_renderer == 'default':
789 # TODO have a helper that can expand a result message
790 ui.message('{action}({status}): {path}{type}{msg}'.format(
791 action=ac.color_word(res['action'], ac.BOLD),
792 status=ac.color_status(res['status']),
793 path=relpath(res['path'],
794 res['refds']) if res.get('refds', None) else res['path'],
795 type=' ({})'.format(
796 ac.color_word(res['type'], ac.MAGENTA)
797 ) if 'type' in res else '',
798 msg=' [{}]'.format(
799 res['message'][0] % res['message'][1:]
800 if isinstance(res['message'], tuple) else res['message'])
801 if 'message' in res else ''))
802 elif result_renderer in ('json', 'json_pp'):
803 ui.message(json.dumps(
804 {k: v for k, v in res.items()
805 if k not in ('message', 'logger')},
806 sort_keys=True,
807 indent=2 if result_renderer.endswith('_pp') else None))
808 elif result_renderer == 'tailored':
809 if hasattr(cmd_class, 'custom_result_renderer'):
810 cmd_class.custom_result_renderer(res, **kwargs)
811 elif hasattr(result_renderer, '__call__'):
812 result_renderer(res, **kwargs)
813 if result_xfm:
814 res = result_xfm(res)
815 if res is None:
816 continue
817 yield res
1313 import os
1414 from os.path import join as opj, exists, relpath, dirname
1515 from datalad.interface.base import Interface
16 from datalad.interface.utils import build_doc
16 from datalad.interface.base import build_doc
1717 from datalad.interface.utils import handle_dirty_dataset
1818 from datalad.interface.common_opts import recursion_limit, recursion_flag
1919 from datalad.interface.common_opts import if_dirty_opt
4747 types are configures. Moreover, it is possible to aggregate meta data from
4848 any subdatasets into the superdataset, in order to facilitate data
4949 discovery without having to obtain any subdataset.
50
51 Returns
52 -------
53 List
54 Any datasets where (updated) aggregated meta data was saved.
5550 """
5651 # XXX prevent common args from being added to the docstring
5752 _no_eval_results = True
8479 recursion_limit=None,
8580 save=True,
8681 if_dirty='save-before'):
82 """
83 Returns
84 -------
85 List
86 Any datasets where (updated) aggregated meta data was saved.
87 """
8788 ds = require_dataset(
8889 dataset, check_installed=True, purpose='meta data aggregation')
8990 modified_ds = []
2525 from datalad.interface.save import Save
2626 from datalad.interface.results import get_status_dict
2727 from datalad.interface.utils import eval_results
28 from datalad.interface.utils import build_doc
28 from datalad.interface.base import build_doc
2929 from datalad.support.constraints import EnsureNone
3030 from datalad.support.constraints import EnsureStr
3131 from datalad.support.gitrepo import GitRepo
2222 from six import reraise
2323 from six import PY3
2424 from datalad.interface.base import Interface
25 from datalad.interface.utils import build_doc
25 from datalad.interface.base import build_doc
2626 from datalad.distribution.dataset import Dataset
2727 from datalad.distribution.dataset import datasetmethod, EnsureDataset, \
2828 require_dataset
4444 @build_doc
4545 class Search(Interface):
4646 """Search within available in datasets' meta data
47
48 Yields
49 ------
50 location : str
51 (relative) path to the dataset
52 report : dict
53 fields which were requested by `report` option
54
5547 """
5648 # XXX prevent common args from being added to the docstring
5749 _no_eval_results = True
122114 report_matched=False,
123115 format='custom',
124116 regex=False):
117 """
118 Yields
119 ------
120 location : str
121 (relative) path to the dataset
122 report : dict
123 fields which were requested by `report` option
124 """
125125
126126 lgr.debug("Initiating search for match=%r and dataset %r",
127127 match, dataset)
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """
9
10 """
11
12 __docformat__ = 'restructuredtext'
13
14 import logging
15 from glob import glob
16 import re
17 from os.path import join as opj, basename, dirname
18 from os import curdir
19 import inspect
20
21 from datalad import cfg
22 from datalad.support.param import Parameter
23 from datalad.support.constraints import EnsureNone
24 from datalad.distribution.dataset import EnsureDataset
25 from datalad.distribution.dataset import datasetmethod
26 from datalad.distribution.dataset import require_dataset
27 from datalad.dochelpers import exc_str
28
29 from datalad.interface.base import Interface
30 from datalad.interface.base import dedent_docstring
31 from datalad.interface.base import build_doc
32 from datalad.interface.utils import eval_results
33 from datalad.ui import ui
34
35 lgr = logging.getLogger('datalad.plugin')
36
37 argspec = re.compile(r'^([a-zA-z][a-zA-Z0-9_]*)=(.*)$')
38
39
40 def _get_plugins():
41 locations = (
42 dirname(__file__),
43 cfg.obtain('datalad.locations.system-plugins'),
44 cfg.obtain('datalad.locations.user-plugins'))
45 return {basename(e)[:-3]: {'file': e}
46 for plugindir in locations
47 for e in glob(opj(plugindir, '[!_]*.py'))}
48
49
50 def _load_plugin(filepath):
51 locals = {}
52 globals = {}
53 try:
54 exec(compile(open(filepath, "rb").read(),
55 filepath, 'exec'),
56 globals,
57 locals)
58 except Exception as e:
59 # any exception means full stop
60 raise ValueError('plugin at {} is broken: {}'.format(
61 filepath, exc_str(e)))
62 if not len(locals) or 'dlplugin' not in locals:
63 raise ValueError(
64 "loading plugin '%s' did not yield a 'dlplugin' symbol, found: %s",
65 filepath, locals.keys() if len(locals) else None)
66 return locals['dlplugin']
67
68
69 @build_doc
70 class Plugin(Interface):
71 """Generic plugin interface
72
73 Using this command, arbitrary DataLad plugins can be executed. Plugins in
74 three different locations are available
75
76 1. official plugins that are part of the local DataLad installation
77
78 2. system-wide plugins, location configuration::
79
80 datalad.locations.system-plugins
81
82 3. user-supplied plugins, location configuration::
83
84 datalad.locations.user-plugins
85
86 Identically named plugins in latter location replace those in locations
87 searched before.
88
89 *Using plugins*
90
91 A list of all available plugins can be obtained by running this command
92 without arguments::
93
94 datalad plugin
95
96 To run a specific plugin, provide the plugin name as an argument::
97
98 datalad plugin export_tarball
99
100 A plugin may come with its own documentation which can be displayed upon
101 request::
102
103 datalad plugin export_tarball -H
104
105 If a plugin supports (optional) arguments, they can be passed to the plugin
106 as key=value pairs with the name and the respective value of an argument,
107 e.g.::
108
109 datalad plugin export_tarball output=myfile
110
111 Any number of arguments can be given. Only arguments with names supported
112 by the respective plugin are passed to the plugin. If unsupported arguments
113 are given, a warning is issued.
114
115 When an argument is given multiple times, all values are passed as a list
116 to the respective argument (order of value matches the order in the
117 plugin call)::
118
119 datalad plugin fancy_plugin input=this input=that
120
121 Like in most commands, a dedicated --dataset option is supported that
122 can be used to identify a specific dataset to be passed to a plugin's
123 ``dataset`` argument. If a plugin requires such an argument, and no
124 dataset was given, and none was found in the current working directory,
125 the plugin call will fail. A dataset argument can also be passed alongside
126 all other plugin arguments without using --dataset.
127
128 """
129 _params_ = dict(
130 dataset=Parameter(
131 args=("-d", "--dataset"),
132 doc="""specify the dataset for the plugin to operate on
133 If no dataset is given, but a plugin take a dataset as an argument,
134 an attempt is made to identify the dataset based on the current
135 working directory.""",
136 constraints=EnsureDataset() | EnsureNone()),
137 plugin=Parameter(
138 args=("plugin",),
139 nargs='*',
140 metavar='PLUGINSPEC',
141 doc="""plugin name plus an optional list of `key=value` pairs with
142 arguments for the plugin call"""),
143 showpluginhelp=Parameter(
144 args=('-H', '--show-plugin-help',),
145 dest='showpluginhelp',
146 action='store_true',
147 doc="""show help for a particular"""),
148 showplugininfo=Parameter(
149 args=('--show-plugin-info',),
150 dest='showplugininfo',
151 action='store_true',
152 doc="""show additional information in plugin overview (e.g. plugin file
153 location"""),
154 )
155
156 @staticmethod
157 @datasetmethod(name='plugin')
158 @eval_results
159 def __call__(plugin=None, dataset=None, showpluginhelp=False, showplugininfo=False, **kwargs):
160 plugins = _get_plugins()
161 if not plugin:
162 max_name_len = max(len(k) for k in plugins.keys())
163 for plname, plinfo in sorted(plugins.items(), key=lambda x: x[0]):
164 spacer = ' ' * (max_name_len - len(plname))
165 synopsis = None
166 try:
167 with open(plinfo['file']) as plf:
168 for line in plf:
169 if line.startswith('"""'):
170 synopsis = line.strip().strip('"').strip()
171 break
172 except Exception as e:
173 ui.message('{}{} [BROKEN] {}'.format(
174 plname, spacer, exc_str(e)))
175 continue
176 if synopsis:
177 msg = '{}{} - {}'.format(
178 plname, spacer, synopsis)
179 else:
180 msg = '{}{} [no synopsis]'.format(plname, spacer)
181 if showplugininfo:
182 msg = '{} ({})'.format(msg, plinfo['file'])
183 ui.message(msg)
184 return
185 args = None
186 if isinstance(plugin, (list, tuple)):
187 args = plugin[1:]
188 plugin = plugin[0]
189 if plugin not in plugins:
190 raise ValueError("unknown plugin '{}', available: {}".format(
191 plugin, ','.join(plugins.keys())))
192 user_supplied_args = set()
193 if args:
194 # we got some arguments in the plugin spec, parse them and add to
195 # kwargs
196 for arg in args:
197 if isinstance(arg, tuple):
198 # came from python item-style
199 argname, argval = arg
200 else:
201 parsed = argspec.match(arg)
202 if parsed is None:
203 raise ValueError("invalid plugin argument: '{}'".format(arg))
204 argname, argval = parsed.groups()
205 if argname in kwargs:
206 # argument was seen at least once before -> make list
207 existing_val = kwargs[argname]
208 if not isinstance(existing_val, list):
209 existing_val = [existing_val]
210 existing_val.append(argval)
211 argval = existing_val
212 kwargs[argname] = argval
213 user_supplied_args.add(argname)
214 plugin_call = _load_plugin(plugins[plugin]['file'])
215
216 if showpluginhelp:
217 # we don't need special docs for the cmdline, standard python ones
218 # should be comprehensible enough
219 ui.message(
220 dedent_docstring(plugin_call.__doc__)
221 if plugin_call.__doc__
222 else 'This plugin has no documentation')
223 return
224
225 #
226 # argument preprocessing
227 #
228 # check the plugin signature and filter out all unsupported args
229 plugin_args, _, _, arg_defaults = inspect.getargspec(plugin_call)
230 supported_args = {k: v for k, v in kwargs.items() if k in plugin_args}
231 excluded_args = user_supplied_args.difference(supported_args.keys())
232 if excluded_args:
233 lgr.warning('ignoring plugin argument(s) %s, not supported by plugin',
234 excluded_args)
235 # always overwrite the dataset arg if one is needed
236 if 'dataset' in plugin_args:
237 supported_args['dataset'] = require_dataset(
238 # use dedicated arg if given, also anything the came with the plugin args
239 # or curdir as the last resort
240 dataset if dataset else kwargs.get('dataset', curdir),
241 # note 'dataset' arg is always first, if we have defaults for all args
242 # we have a default for 'dataset' to -> it is optional
243 check_installed=len(arg_defaults) != len(plugin_args),
244 purpose='handover to plugin')
245
246 # call as a generator
247 for res in plugin_call(**supported_args):
248 if not res:
249 continue
250 if dataset:
251 # enforce standard regardless of what plugin did
252 res['refds'] = getattr(dataset, 'path', dataset)
253 elif 'refds' in res:
254 # no base dataset, results must not have them either
255 del res['refds']
256 if 'logger' not in res:
257 # make sure we have a logger
258 res['logger'] = lgr
259 yield res
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """add a README file to a dataset"""
9
10 __docformat__ = 'restructuredtext'
11
12
13 # PLUGIN API
14 def dlplugin(dataset, filename='README.rst', existing='skip'):
15 """Add basic information about DataLad datasets to a README file
16
17 The README file is added to the dataset and the addition is saved
18 in the dataset.
19
20 Parameters
21 ----------
22 dataset : Dataset
23 dataset to add information to
24 filename : str, optional
25 path of the README file within the dataset. Default: 'README.rst'
26 existing : {'skip', 'append', 'replace'}
27 how to react if a file with the target name already exists:
28 'skip': do nothing; 'append': append information to the existing
29 file; 'replace': replace the existing file with new content.
30 Default: 'skip'
31
32 """
33
34 from os.path import lexists
35 from os.path import join as opj
36
37 default_content="""\
38 About this dataset
39 ==================
40
41 This is a DataLad dataset{id}.
42
43 For more information on DataLad and on how to work with its datasets,
44 see the DataLad documentation at: http://docs.datalad.org
45 """.format(
46 id=' (id: {})'.format(dataset.id) if dataset.id else '')
47 filename = opj(dataset.path, filename)
48 res_kwargs = dict(action='add_readme', path=filename)
49
50 if lexists(filename) and existing == 'skip':
51 yield dict(
52 res_kwargs,
53 status='notneeded',
54 message='file already exists, and not appending content')
55 return
56
57 # unlock, file could be annexed
58 # TODO yield
59 if lexists(filename):
60 dataset.unlock(filename)
61
62 with open(filename, 'a' if existing == 'append' else 'w') as fp:
63 fp.write(default_content)
64 yield dict(
65 status='ok',
66 path=filename,
67 type='file',
68 action='add_readme')
69
70 for r in dataset.add(
71 filename,
72 message='[DATALAD] added README',
73 result_filter=None,
74 result_xfm=None):
75 yield r
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """export a dataset to a tarball"""
9
10 __docformat__ = 'restructuredtext'
11
12
13 # PLUGIN API
14 def dlplugin(dataset, output=None):
15 import os
16 import tarfile
17 from mock import patch
18 from os.path import join as opj, dirname, normpath, isabs
19 from datalad.utils import file_basename
20 from datalad.support.annexrepo import AnnexRepo
21
22 import logging
23 lgr = logging.getLogger('datalad.plugin.tarball')
24
25 repo = dataset.repo
26 committed_date = repo.get_committed_date()
27
28 # could be used later on to filter files by some criterion
29 def _filter_tarinfo(ti):
30 # Reset the date to match the one of the last commit, not from the
31 # filesystem since git doesn't track those at all
32 # TODO: use the date of the last commit when any particular
33 # file was changed -- would be the most kosher yoh thinks to the
34 # degree of our abilities
35 ti.mtime = committed_date
36 return ti
37
38 if output is None:
39 output = "datalad_{}.tar.gz".format(dataset.id)
40 else:
41 if not output.endswith('.tar.gz'):
42 output += '.tar.gz'
43
44 root = dataset.path
45 # use dir inside matching the output filename
46 # TODO: could be an option to the export plugin allowing empty value
47 # for no leading dir
48 leading_dir = file_basename(output)
49
50 # workaround for inability to pass down the time stamp
51 with patch('time.time', return_value=committed_date), \
52 tarfile.open(output, "w:gz") as tar:
53 repo_files = sorted(repo.get_indexed_files())
54 if isinstance(repo, AnnexRepo):
55 annexed = repo.is_under_annex(
56 repo_files, allow_quick=True, batch=True)
57 else:
58 annexed = [False] * len(repo_files)
59 for i, rpath in enumerate(repo_files):
60 fpath = opj(root, rpath)
61 if annexed[i]:
62 # resolve to possible link target
63 link_target = os.readlink(fpath)
64 if not isabs(link_target):
65 link_target = normpath(opj(dirname(fpath), link_target))
66 fpath = link_target
67 # name in the tarball
68 aname = normpath(opj(leading_dir, rpath))
69 tar.add(
70 fpath,
71 arcname=aname,
72 recursive=False,
73 filter=_filter_tarinfo)
74
75 if not isabs(output):
76 output = opj(os.getcwd(), output)
77
78 yield dict(
79 status='ok',
80 path=output,
81 type='file',
82 action='export_tarball',
83 logger=lgr)
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """configure which dataset parts to never put in the annex"""
9
10
11 __docformat__ = 'restructuredtext'
12
13
14 # PLUGIN API
15 def dlplugin(dataset, pattern, ref_dir='.', makedirs='no'):
16 # could be extended to accept actual largefile expressions
17 """Configure a dataset to never put some content into the dataset's annex
18
19 This can be useful in mixed datasets that also contain textual data, such
20 as source code, which can be efficiently and more conveniently managed
21 directly in Git.
22
23 Patterns generally look like this::
24
25 code/*
26
27 which would match all file in the code directory. In order to match all
28 files under ``code/``, including all its subdirectories use such a
29 pattern::
30
31 code/**
32
33 Note that the plugin works incrementally, hence any existing configuration
34 (e.g. from a previous plugin run) is amended, not replaced.
35
36 Parameters
37 ----------
38 dataset : Dataset
39 dataset to configure
40 pattern : list
41 list of path patterns. Any content whose path is matching any pattern
42 will not be annexed when added to a dataset, but instead will be
43 tracked directly in Git. Path pattern have to be relative to the
44 directory given by the `ref_dir` option. By default, patterns should
45 be relative to the root of the dataset.
46 ref_dir : str, optional
47 Relative path (within the dataset) to the directory that is to be
48 configured. All patterns are interpreted relative to this path,
49 and configuration is written to a ``.gitattributes`` file in this
50 directory.
51 makedirs : bool, optional
52 If set, any missing directories will be created in order to be able
53 to place a file into ``ref_dir``. Default: False.
54 """
55 from os.path import join as opj
56 from os.path import isabs
57 from os.path import exists
58 from os import makedirs as makedirsfx
59 from datalad.distribution.dataset import require_dataset
60 from datalad.support.annexrepo import AnnexRepo
61 from datalad.support.constraints import EnsureBool
62 from datalad.utils import assure_list
63
64 makedirs = EnsureBool()(makedirs)
65 pattern = assure_list(pattern)
66 ds = require_dataset(dataset, check_installed=True,
67 purpose='no_annex configuration')
68
69 res_kwargs = dict(
70 path=ds.path,
71 type='dataset',
72 action='no_annex',
73 )
74
75 # all the ways we refused to cooperate
76 if not isinstance(ds.repo, AnnexRepo):
77 yield dict(
78 res_kwargs,
79 status='notneeded',
80 message='dataset has no annex')
81 return
82 if any(isabs(p) for p in pattern):
83 yield dict(
84 res_kwargs,
85 status='error',
86 message=('path pattern for `no_annex` configuration must be relative paths: %s',
87 pattern))
88 return
89 if isabs(ref_dir):
90 yield dict(
91 res_kwargs,
92 status='error',
93 message=('`ref_dir` for `no_annex` configuration must be a relative path: %s',
94 ref_dir))
95 return
96
97 gitattr_dir = opj(ds.path, ref_dir)
98 if not exists(gitattr_dir):
99 if makedirs:
100 makedirsfx(gitattr_dir)
101 else:
102 yield dict(
103 res_kwargs,
104 status='error',
105 message='target directory for `no_annex` does not exist (consider makedirs=True)')
106 return
107
108 gitattr_file = opj(gitattr_dir, '.gitattributes')
109 with open(gitattr_file, 'a') as fp:
110 for p in pattern:
111 fp.write('{} annex.largefiles=nothing'.format(p))
112 yield dict(res_kwargs, status='ok')
113
114 for r in dataset.add(
115 gitattr_file,
116 to_git=True,
117 message="[DATALAD] exclude paths from annex'ing",
118 result_filter=None,
119 result_xfm=None):
120 yield r
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """Plugin tests
9
10 """
11
12 __docformat__ = 'restructuredtext'
0 # emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # -*- coding: utf-8 -*-
2 # ex: set sts=4 ts=4 sw=4 noet:
3 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
4 #
5 # See COPYING file distributed along with the datalad package for the
6 # copyright and license terms.
7 #
8 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
9 """Test plugin interface mechanics"""
10
11
12 import logging
13 from os.path import join as opj
14 from os.path import exists
15 from mock import patch
16
17 from datalad.config import ConfigManager
18 from datalad.api import plugin
19 from datalad.api import create
20
21 from datalad.tests.utils import swallow_logs
22 from datalad.tests.utils import swallow_outputs
23 from datalad.tests.utils import with_tempfile
24 from datalad.tests.utils import chpwd
25 from datalad.tests.utils import create_tree
26 from datalad.tests.utils import assert_raises
27 from datalad.tests.utils import assert_status
28 from datalad.tests.utils import assert_in
29 from datalad.tests.utils import assert_not_in
30 from datalad.tests.utils import eq_
31 from datalad.tests.utils import ok_clean_git
32
33 broken_plugin = """garbage"""
34
35 nodocs_plugin = """\
36 def dlplugin():
37 pass
38 """
39
40 # functioning plugin dummy
41 dummy_plugin = '''\
42 """real dummy"""
43
44 def dlplugin(dataset, noval, withval='test'):
45 "mydocstring"
46 yield dict(
47 status='ok',
48 action='dummy',
49 args=dict(
50 dataset=dataset,
51 noval=noval,
52 withval=withval))
53 '''
54
55
56 @with_tempfile()
57 @with_tempfile(mkdir=True)
58 def test_plugin_call(path, dspath):
59 # make plugins
60 create_tree(
61 path,
62 {
63 'dlplugin_dummy.py': dummy_plugin,
64 'dlplugin_nodocs.py': nodocs_plugin,
65 'dlplugin_broken.py': broken_plugin,
66 })
67 fake_dummy_spec = {
68 'dummy': {'file': opj(path, 'dlplugin_dummy.py')},
69 'nodocs': {'file': opj(path, 'dlplugin_nodocs.py')},
70 'broken': {'file': opj(path, 'dlplugin_broken.py')},
71 }
72
73 with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec):
74 with swallow_outputs() as cmo:
75 plugin(showplugininfo=True)
76 # hyphen spacing depends on the longest plugin name!
77 # sorted
78 # summary list generation doesn't actually load plugins for speed,
79 # hence broken is not known to be broken here
80 eq_(cmo.out,
81 "broken [no synopsis] ({})\ndummy - real dummy ({})\nnodocs [no synopsis] ({})\n".format(
82 fake_dummy_spec['broken']['file'],
83 fake_dummy_spec['dummy']['file'],
84 fake_dummy_spec['nodocs']['file']))
85 with swallow_outputs() as cmo:
86 plugin(['dummy'], showpluginhelp=True)
87 eq_(cmo.out.rstrip(), "mydocstring")
88 with swallow_outputs() as cmo:
89 plugin(['nodocs'], showpluginhelp=True)
90 eq_(cmo.out.rstrip(), "This plugin has no documentation")
91 # loading fails, no docs
92 assert_raises(ValueError, plugin, ['broken'], showpluginhelp=True)
93
94 # assume this most obscure plugin name is not used
95 assert_raises(ValueError, plugin, '32sdfhvz984--^^')
96
97 # broken plugin argument, must match Python keyword arg
98 # specs
99 assert_raises(ValueError, plugin, ['dummy', '1245'])
100
101 with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec):
102 # does not trip over unsupported argument, they get filtered out, because
103 # we carry all kinds of stuff
104 with swallow_logs(new_level=logging.WARNING) as cml:
105 res = list(plugin(['dummy', 'noval=one', 'obscure=some']))
106 assert_status('ok', res)
107 cml.assert_logged(
108 msg=".*ignoring plugin argument\\(s\\).*obscure.*, not supported by plugin.*",
109 regex=True, level='WARNING')
110 # fails on missing positional arg
111 assert_raises(TypeError, plugin, ['dummy'])
112 # positional and kwargs actually make it into the plugin
113 res = list(plugin(['dummy', 'noval=one', 'withval=two']))[0]
114 eq_('one', res['args']['noval'])
115 eq_('two', res['args']['withval'])
116 # kwarg defaults are preserved
117 res = list(plugin(['dummy', 'noval=one']))[0]
118 eq_('test', res['args']['withval'])
119 # repeated specification yields list input
120 res = list(plugin(['dummy', 'noval=one', 'noval=two']))[0]
121 eq_(['one', 'two'], res['args']['noval'])
122 # can do the same thing while bypassing argument parsing for calls
123 # from within python, and even preserve native python dtypes
124 res = list(plugin(['dummy', ('noval', 1), ('noval', 'two')]))[0]
125 eq_([1, 'two'], res['args']['noval'])
126 # and we can further simplify in this case by passing lists right
127 # away
128 res = list(plugin(['dummy', ('noval', [1, 'two'])]))[0]
129 eq_([1, 'two'], res['args']['noval'])
130
131 # dataset arg handling
132 # run plugin that needs a dataset where there is none
133 with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec):
134 ds = None
135 with chpwd(dspath):
136 assert_raises(ValueError, plugin, ['dummy', 'noval=one'])
137 # create a dataset here, fixes the error
138 ds = create()
139 res = list(plugin(['dummy', 'noval=one']))[0]
140 # gives dataset instance
141 eq_(ds, res['args']['dataset'])
142 # no do again, giving the dataset path
143 # but careful, `dataset` is a proper argument
144 res = list(plugin(['dummy', 'noval=one'], dataset=dspath))[0]
145 eq_(ds, res['args']['dataset'])
146 # however, if passed alongside the plugins args it also works
147 res = list(plugin(['dummy', 'dataset={}'.format(dspath), 'noval=one']))[0]
148 eq_(ds, res['args']['dataset'])
149 # but if both are given, the proper args takes precedence
150 assert_raises(ValueError, plugin, ['dummy', 'dataset={}'.format(dspath), 'noval=one'],
151 dataset='rubbish')
152
153
154 # MIH: I failed to replace our config manager instance for this test run
155 # in order to be able to configure a set of plugins to run prior and after
156 # create. A test should not alter a users config, hence I am disabling this
157 # for now, and hope somebody can fix it up
158 #@with_tempfile(mkdir=True)
159 #def test_plugin_config(path):
160 # with patch.dict('os.environ',
161 # {'HOME': path, 'DATALAD_SNEAKY_ADDITION': 'ignore'}):
162 # with patch('datalad.cfg', ConfigManager()) as cfg:
163 # global_gitconfig = opj(path, '.gitconfig')
164 # assert(not exists(global_gitconfig))
165 # # swap out the actual config for this test
166 # assert_in('datalad.sneaky.addition', cfg)
167 # # now we configure a plugin to run before and twice after `create`
168 # cfg.add('datalad.create.run-before',
169 # 'add_readme filename=before.txt',
170 # where='global')
171 # cfg.add('datalad.create.run-after',
172 # 'add_readme filename=after1.txt',
173 # where='global')
174 # cfg.add('datalad.create.run-after',
175 # 'add_readme filename=after2.txt',
176 # where='global')
177 # # force reload to pick up newly populated .gitconfig
178 # cfg.reload(force=True)
179 # assert_in('datalad.create.run-before', cfg)
180 # # and now we create a dataset and expect the two readme files
181 # # to be part of it
182 # ds = create(dataset=opj(path, 'ds'))
183 # ok_clean_git(ds.path)
184 # assert(exists(opj(ds.path, 'before.txt')))
185 # assert(exists(opj(ds.path, 'after1.txt')))
186 # assert(exists(opj(ds.path, 'after2.txt')))
187
188
189 @with_tempfile(mkdir=True)
190 def test_wtf(path):
191 # smoke test for now
192 with swallow_outputs() as cmo:
193 plugin(['wtf'], dataset=path)
194 assert_not_in('Dataset information', cmo.out)
195 assert_in('Configuration', cmo.out)
196 with chpwd(path):
197 with swallow_outputs() as cmo:
198 plugin(['wtf'])
199 assert_not_in('Dataset information', cmo.out)
200 assert_in('Configuration', cmo.out)
201 # now with a dataset
202 ds = create(path)
203 with swallow_outputs() as cmo:
204 plugin(['wtf'], dataset=ds.path)
205 assert_in('Configuration', cmo.out)
206 assert_in('Dataset information', cmo.out)
207 assert_in('path: {}'.format(ds.path), cmo.out)
208
209
210 @with_tempfile(mkdir=True)
211 def test_no_annex(path):
212 ds = create(path)
213 ok_clean_git(ds.path)
214 create_tree(
215 ds.path,
216 {'code': {
217 'inannex': 'content',
218 'notinannex': 'othercontent'}})
219 # add two files, pre and post configuration
220 ds.add(opj('code', 'inannex'))
221 plugin(['no_annex', 'pattern=code/**'], dataset=ds)
222 ds.add(opj('code', 'notinannex'))
223 ok_clean_git(ds.path)
224 # one is annex'ed, the other is not, despite no change in add call
225 # importantly, also .gitattribute is not annexed
226 eq_([opj('code', 'inannex')],
227 ds.repo.get_annexed_files())
0 # emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # -*- coding: utf-8 -*-
2 # ex: set sts=4 ts=4 sw=4 noet:
3 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
4 #
5 # See COPYING file distributed along with the datalad package for the
6 # copyright and license terms.
7 #
8 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
9 """Test tarball exporter"""
10
11 import os
12 import time
13 from os.path import join as opj
14 from os.path import isabs
15 import tarfile
16
17 from datalad.api import Dataset
18 from datalad.api import plugin
19 from datalad.utils import chpwd
20 from datalad.utils import md5sum
21
22 from datalad.tests.utils import with_tree
23 from datalad.tests.utils import ok_startswith
24 from datalad.tests.utils import assert_true, assert_not_equal, assert_raises, \
25 assert_false, assert_equal
26 from datalad.tests.utils import assert_status
27 from datalad.tests.utils import assert_result_count
28
29
30 _dataset_template = {
31 'ds': {
32 'file_up': 'some_content',
33 'dir': {
34 'file1_down': 'one',
35 'file2_down': 'two'}}}
36
37
38 @with_tree(_dataset_template)
39 def test_failure(path):
40 ds = Dataset(opj(path, 'ds')).create(force=True)
41 # unknown pluginer
42 assert_raises(ValueError, ds.plugin, 'nah')
43 # non-existing dataset
44 assert_raises(ValueError, plugin, 'export_tarball', Dataset('nowhere'))
45
46
47 @with_tree(_dataset_template)
48 def test_tarball(path):
49 ds = Dataset(opj(path, 'ds')).create(force=True)
50 ds.add('.')
51 committed_date = ds.repo.get_committed_date()
52 default_outname = opj(path, 'datalad_{}.tar.gz'.format(ds.id))
53 with chpwd(path):
54 res = list(ds.plugin('export_tarball'))
55 assert_status('ok', res)
56 assert_result_count(res, 1)
57 assert(isabs(res[0]['path']))
58 assert_true(os.path.exists(default_outname))
59 custom_outname = opj(path, 'myexport.tar.gz')
60 # feed in without extension
61 ds.plugin('export_tarball', output=custom_outname[:-7])
62 assert_true(os.path.exists(custom_outname))
63 custom1_md5 = md5sum(custom_outname)
64 # encodes the original tarball filename -> different checksum, despit
65 # same content
66 assert_not_equal(md5sum(default_outname), custom1_md5)
67 # should really sleep so if they stop using time.time - we know
68 time.sleep(1.1)
69 ds.plugin('export_tarball', output=custom_outname)
70 # should not encode mtime, so should be identical
71 assert_equal(md5sum(custom_outname), custom1_md5)
72
73 def check_contents(outname, prefix):
74 with tarfile.open(outname) as tf:
75 nfiles = 0
76 for ti in tf:
77 # any annex links resolved
78 assert_false(ti.issym())
79 ok_startswith(ti.name, prefix + '/')
80 assert_equal(ti.mtime, committed_date)
81 if '.datalad' not in ti.name:
82 # ignore any files in .datalad for this test to not be
83 # susceptible to changes in how much we generate a meta info
84 nfiles += 1
85 # we have exactly four files (includes .gitattributes for default
86 # MD5E backend), and expect no content for any directory
87 assert_equal(nfiles, 4)
88 check_contents(default_outname, 'datalad_%s' % ds.id)
89 check_contents(custom_outname, 'myexport')
0 # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
1 # ex: set sts=4 ts=4 sw=4 noet:
2 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
3 #
4 # See COPYING file distributed along with the datalad package for the
5 # copyright and license terms.
6 #
7 # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
8 """provide information about this DataLad installation"""
9
10 __docformat__ = 'restructuredtext'
11
12
13 # PLUGIN API
14 def dlplugin(dataset=None):
15 """Generate a report about the DataLad installation and configuration
16
17 IMPORTANT: Sharing this report with untrusted parties (e.g. on the web)
18 should be done with care, as it may include identifying information, and/or
19 credentials or access tokens.
20
21 Parameters
22 ----------
23 dataset : Dataset, optional
24 If a dataset is given or found, information on this dataset is provided
25 (if it exists), and its active configuration is reported.
26 """
27 ds = dataset
28 if ds and not ds.is_installed():
29 # we don't deal with absent datasets
30 ds = None
31 if ds is None:
32 from datalad import cfg
33 else:
34 cfg = ds.config
35 from datalad.ui import ui
36 from datalad.api import metadata
37
38 report_template = """\
39 {dataset}
40 Configuration
41 =============
42 {cfg}
43
44 """
45
46 dataset_template = """\
47 Dataset information
48 ===================
49 {basic}
50
51 Metadata
52 --------
53 {meta}
54
55 """
56 ds_meta = None
57 if ds and ds.is_installed():
58 ds_meta = metadata(
59 dataset=ds, dataset_global=True, return_type='item-or-list',
60 result_filter=lambda x: x['action'] == 'metadata')
61 if ds_meta:
62 ds_meta = ds_meta['metadata']
63
64 ui.message(report_template.format(
65 dataset='' if not ds else dataset_template.format(
66 basic='\n'.join(
67 '{}: {}'.format(k, v) for k, v in (
68 ('path', ds.path),
69 ('repo', ds.repo.__class__.__name__ if ds.repo else '[NONE]'),
70 )),
71 meta='\n'.join(
72 '{}: {}'.format(k, v) for k, v in ds_meta)
73 if ds_meta else '[no metadata]'
74 ),
75 cfg='\n'.join(
76 '{}: {}'.format(k, '<HIDDEN>' if k.startswith('user.') or 'token' in k else v)
77 for k, v in sorted(cfg.items(), key=lambda x: x[0])),
78 ))
79 yield
5050 from datalad.utils import on_windows
5151 from datalad.utils import swallow_logs
5252 from datalad.utils import assure_list
53 from datalad.utils import _path_
5354 from datalad.cmd import GitRunner
5455
5556 # imports from same module:
9697 WEB_UUID = "00000000-0000-0000-0000-000000000001"
9798
9899 # To be assigned and checked to be good enough upon first call to AnnexRepo
100 # 6.20160923 -- --json-progress for get
99101 # 6.20161210 -- annex add to add also changes (not only new files) to git
100102 # 6.20170220 -- annex status provides --ignore-submodules
101103 GIT_ANNEX_MIN_VERSION = '6.20170220'
240242 # to use 'git annex unlock' instead.
241243 lgr.warning("direct mode not available for %s. Ignored." % self)
242244
245 self._batched = BatchedAnnexes(batch_size=batch_size)
246
243247 # set default backend for future annex commands:
244248 # TODO: Should the backend option of __init__() also migrate
245249 # the annex, in case there are annexed files already?
246250 if backend:
247 lgr.debug("Setting annex backend to %s", backend)
248 # Must be done with explicit release, otherwise on Python3 would end up
249 # with .git/config wiped out
250 # see https://github.com/gitpython-developers/GitPython/issues/333#issuecomment-126633757
251
252 # TODO: 'annex.backends' actually is a space separated list.
253 # Figure out, whether we want to allow for a list here or what to
254 # do, if there is sth in that setting already
251 self.set_default_backend(backend, persistent=True)
252
253
254 def set_default_backend(self, backend, persistent=True, commit=True):
255 """Set default backend
256
257 Parameters
258 ----------
259 backend : str
260 persistent : bool, optional
261 If persistent, would add/commit to .gitattributes. If not -- would
262 set within .git/config
263 """
264 # TODO: 'annex.backends' actually is a space separated list.
265 # Figure out, whether we want to allow for a list here or what to
266 # do, if there is sth in that setting already
267 if persistent:
268 git_attributes_file = _path_(self.path, '.gitattributes')
269 git_attributes = ''
270 if exists(git_attributes_file):
271 with open(git_attributes_file) as f:
272 git_attributes = f.read()
273 if ' annex.backend=' in git_attributes:
274 lgr.debug(
275 "Not (re)setting backend since seems already set in %s"
276 % git_attributes_file
277 )
278 else:
279 lgr.debug("Setting annex backend to %s (persistently)", backend)
280 self.config.set('annex.backends', backend, where='local')
281 with open(git_attributes_file, 'a') as f:
282 if git_attributes and not git_attributes.endswith(os.linesep):
283 f.write(os.linesep)
284 f.write('* annex.backend=%s%s' % (backend, os.linesep))
285 self.add(git_attributes_file, git=True)
286 if commit:
287 self.commit(
288 "Set default backend for all files to be %s" % backend,
289 _datalad_msg=True,
290 files=[git_attributes_file]
291 )
292 else:
293 lgr.debug("Setting annex backend to %s (in .git/config)", backend)
255294 self.config.set('annex.backends', backend, where='local')
256
257 self._batched = BatchedAnnexes(batch_size=batch_size)
258295
259296 def __del__(self):
260297 try:
826863 super(AnnexRepo, self).set_remote_url(name, url, push=push)
827864 self._set_shared_connection(name, url)
828865
866 def set_remote_dead(self, name):
867 """Announce to annex that remote is "dead"
868 """
869 return self._annex_custom_command([], ["git", "annex", "dead", name])
870
871 def is_remote_annex_ignored(self, remote):
872 """Return True if remote is explicitly ignored"""
873 return self.config.getbool(
874 'remote.{}'.format(remote), 'annex-ignore',
875 default=False
876 )
877
878 def is_special_annex_remote(self, remote, check_if_known=True):
879 """Return either remote is a special annex remote
880
881 Decides based on the presence of diagnostic annex- options
882 for the remote
883 """
884 if check_if_known:
885 if remote not in self.get_remotes():
886 raise RemoteNotAvailableError(remote)
887 sec = 'remote.{}'.format(remote)
888 for opt in ('annex-externaltype', 'annex-webdav'):
889 if self.config.has_option(sec, opt):
890 return True
891 return False
892
829893 @borrowkwargs(GitRepo)
830 def get_remotes(self, with_refs_only=False, with_urls_only=False,
894 def get_remotes(self,
895 with_urls_only=False,
831896 exclude_special_remotes=False):
832897 """Get known (special-) remotes of the repository
833898
841906 remotes : list of str
842907 List of names of the remotes
843908 """
844 remotes = super(AnnexRepo, self).get_remotes(
845 with_refs_only=with_refs_only, with_urls_only=with_urls_only)
909 remotes = super(AnnexRepo, self).get_remotes(with_urls_only=with_urls_only)
846910
847911 if exclude_special_remotes:
848 return [remote for remote in remotes
849 if not self.config.has_option('remote.{}'.format(remote),
850 'annex-externaltype')]
912 return [
913 remote for remote in remotes
914 if not self.is_special_annex_remote(remote, check_if_known=False)
915 ]
851916 else:
852917 return remotes
853918
11191184 self.config.reload()
11201185
11211186 @normalize_paths
1122 def get(self, files, options=None, jobs=None):
1187 def get(self, files, remote=None, options=None, jobs=None):
11231188 """Get the actual content of files
11241189
11251190 Parameters
11261191 ----------
11271192 files : list of str
11281193 paths to get
1194 remote : str, optional
1195 from which remote to fetch content
11291196 options : list of str, optional
11301197 commandline options for the git annex get command
11311198 jobs : int, optional
11371204 """
11381205 options = options[:] if options else []
11391206
1207 if remote:
1208 if remote not in self.get_remotes():
1209 raise RemoteNotAvailableError(
1210 remote=remote,
1211 cmd="get",
1212 msg="Remote is not known. Known are: %s"
1213 % (self.get_remotes(),)
1214 )
1215 options += ['--from', remote]
1216
11401217 # analyze provided files to decide which actually are needed to be
11411218 # fetched
11421219
11431220 if '--key' not in options:
1144 expected_downloads, fetch_files = self._get_expected_downloads(
1145 files)
1221 expected_downloads, fetch_files = self._get_expected_files(
1222 files, ['--not', '--in', 'here'])
11461223 else:
11471224 fetch_files = files
11481225 assert(len(files) == 1)
11551232 if len(fetch_files) != len(files):
11561233 lgr.info("Actually getting %d files", len(fetch_files))
11571234
1158 # TODO: check annex version and issue a one time warning if not
1159 # old enough for --json-progress
1160
1161 # Without up to date annex, we would still report total! ;)
1162 if self.git_annex_version >= '6.20160923':
1163 # options might be the '--key' which should go last
1164 options = ['--json-progress'] + options
1235 # options might be the '--key' which should go last
1236 options = ['--json-progress'] + options
11651237
11661238 # Note: Currently swallowing logs, due to the workaround to report files
11671239 # not found, but don't fail and report about other files and use JSON,
11791251 # from annex failed ones
11801252 with cm:
11811253 results = self._run_annex_command_json(
1182 'get', args=options + fetch_files,
1254 'get',
1255 args=options + fetch_files,
11831256 jobs=jobs,
11841257 expected_entries=expected_downloads)
11851258 results_list = list(results)
11861259 # TODO: should we here compare fetch_files against result_list
1187 # and womit an exception of incomplete download????
1260 # and vomit an exception of incomplete download????
11881261 return results_list
11891262
1190 def _get_expected_downloads(self, files):
1263 def _get_expected_files(self, files, expr):
11911264 """Given a list of files, figure out what to be downloaded
11921265
11931266 Parameters
11941267 ----------
11951268 files
1269 expr: list
1270 Expression to be passed into annex's find
11961271
11971272 Returns
11981273 -------
1199 expected_downloads : dict
1274 expected_files : dict
12001275 key -> size
12011276 fetch_files : list
12021277 files to be fetched
12031278 """
1204 lgr.debug("Determine what files need to be obtained")
1279 lgr.debug("Determine what files match the query to work with")
12051280 # Let's figure out first which files/keys and of what size to download
1206 expected_downloads = {}
1281 expected_files = {}
12071282 fetch_files = []
12081283 keys_seen = set()
12091284 unknown_sizes = [] # unused atm
12101285 # for now just record total size, and
12111286 for j in self._run_annex_command_json(
1212 'find', args=['--json', '--not', '--in', 'here'] + files
1287 'find', args=['--json'] + expr + files
12131288 ):
1289 # TODO: some files might not even be here. So in current fancy
1290 # output reporting scheme we should then theoretically handle
1291 # those cases here and say 'impossible' or something like that
1292 if not j.get('success', True):
1293 # TODO: I guess do something with yielding and filtering for
1294 # what need to be done and what not
1295 continue
12141296 key = j['key']
12151297 size = j.get('bytesize')
12161298 if key in keys_seen:
12211303 assert j['file']
12221304 fetch_files.append(j['file'])
12231305 if size and size.isdigit():
1224 expected_downloads[key] = int(size)
1306 expected_files[key] = int(size)
12251307 else:
1226 expected_downloads[key] = None
1308 expected_files[key] = None
12271309 unknown_sizes.append(j['file'])
1228 return expected_downloads, fetch_files
1310 return expected_files, fetch_files
12291311
12301312 @normalize_paths
12311313 def add(self, files, git=None, backend=None, options=None, commit=False,
21592241 json_objects = (json.loads(line)
21602242 for line in out.splitlines() if line.startswith('{'))
21612243 # protect against progress leakage
2162 json_objects = [j for j in json_objects if not 'byte-progress' in j]
2244 json_objects = [j for j in json_objects if 'byte-progress' not in j]
21632245 return json_objects
21642246
21652247 # TODO: reconsider having any magic at all and maybe just return a list/dict always
26932775 # TODO: we probably need to override get_file_content, since it returns the
26942776 # symlink's target instead of the actual content.
26952777
2778 # We need --auto and --fast having exposed TODO
26962779 @normalize_paths(match_return_type=False) # get a list even in case of a single item
2697 def copy_to(self, files, remote, options=None, log_online=True):
2780 def copy_to(self, files, remote, options=None, jobs=None):
26982781 """Copy the actual content of `files` to `remote`
26992782
27002783 Parameters
27032786 path(s) to copy
27042787 remote: str
27052788 name of remote to copy `files` to
2706 log_online: bool
2707 see get()
27082789
27092790 Returns
27102791 -------
27122793 files successfully copied
27132794 """
27142795
2796 # find --in here --not --in remote
27152797 # TODO: full support of annex copy options would lead to `files` being
27162798 # optional. This means to check for whether files or certain options are
27172799 # given and fail or just pass everything as is and try to figure out,
27202802 if remote not in self.get_remotes():
27212803 raise ValueError("Unknown remote '{0}'.".format(remote))
27222804
2805 options = options[:] if options else []
2806
2807 # Note:
27232808 # In case of single path, 'annex copy' will fail, if it cannot copy it.
27242809 # With multiple files, annex will just skip the ones, it cannot deal
27252810 # with. We'll do the same and report back what was successful
27292814 if not isdir(files[0]):
27302815 self.get_file_key(files[0])
27312816
2732 # Note:
2733 # - annex copy fails, if `files` was a single item, that doesn't exist
2734 # - files not in annex or not even in git don't yield a non-zero exit,
2735 # but are ignored
2736 # - in case of multiple items, annex would silently skip those files
2737
2738 annex_options = files + ['--to=%s' % remote]
2817 # TODO: RF -- logic is duplicated with get() -- the only difference
2818 # is the verb (copy, copy) or (get, put) and remote ('here', remote)?
2819 if '--key' not in options:
2820 expected_copys, copy_files = self._get_expected_files(
2821 files, ['--in', 'here', '--not', '--in', remote])
2822 else:
2823 copy_files = files
2824 assert(len(files) == 1)
2825 expected_copys = {files[0]: AnnexRepo.get_size_from_key(files[0])}
2826
2827 if not copy_files:
2828 lgr.debug("No files found needing copying.")
2829 return []
2830
2831 if len(copy_files) != len(files):
2832 lgr.info("Actually copying %d files", len(copy_files))
2833
2834 annex_options = ['--to=%s' % remote, '--json-progress']
27392835 if options:
27402836 annex_options.extend(shlex.split(options))
2741 # Note:
2742 # As of now, there is no --json option for annex copy. Use it once this
2743 # changed.
2744 results = self._run_annex_command_json(
2745 'copy',
2746 args=annex_options,
2747 #log_stdout=True, log_stderr=not log_online,
2748 #log_online=log_online, expect_stderr=True
2749 )
2750 results = list(results)
2837
2838 cm = swallow_logs() \
2839 if lgr.getEffectiveLevel() > logging.DEBUG \
2840 else nothing_cm()
2841 # TODO: provide more meaningful message (possibly aggregating 'note'
2842 # from annex failed ones
2843 with cm:
2844 results = self._run_annex_command_json(
2845 'copy',
2846 args=annex_options + copy_files,
2847 jobs=jobs,
2848 expected_entries=expected_copys
2849 #log_stdout=True, log_stderr=not log_online,
2850 #log_online=log_online, expect_stderr=True
2851 )
2852 results_list = list(results)
2853 # XXX this is the only logic different ATM from get
27512854 # check if any transfer failed since then we should just raise an Exception
27522855 # for now to guarantee consistent behavior with non--json output
27532856 # see https://github.com/datalad/datalad/pull/1349#discussion_r103639456
27542857 from operator import itemgetter
2755 failed_copies = [e['file'] for e in results if not e['success']]
2858 failed_copies = [e['file'] for e in results_list if not e['success']]
27562859 good_copies = [
2757 e['file'] for e in results
2860 e['file'] for e in results_list
27582861 if e['success'] and
27592862 e.get('note', '').startswith('to ') # transfer did happen
27602863 ]
27612864 if failed_copies:
2865 # TODO: RF for new fancy scheme of outputs reporting
27622866 raise IncompleteResultsError(
27632867 results=good_copies, failed=failed_copies,
27642868 msg="Failed to copy %d file(s)" % len(failed_copies))
2525 'ERROR': RED
2626 }
2727
28 RESULT_STATUS_COLORS = {
29 'ok': GREEN,
30 'notneeded': GREEN,
31 'impossible': YELLOW,
32 'error': RED
33 }
34
2835 # Aliases for uniform presentation
2936
3037 DATASET = UNDERLINE
4350 return "%s%s%s" % (COLOR_SEQ % color, s, RESET_SEQ) \
4451 if ui.is_interactive \
4552 else s
53
54
55 def color_status(status):
56 col = RESULT_STATUS_COLORS.get(status, None)
57 return color_word(status, col) if col else status
171171 return False
172172 elif value in ('1', 'yes', 'on', 'enable', 'true'):
173173 return True
174 raise ValueError("value must be converted to boolean")
174 raise ValueError(
175 "value '{}' must be convertible to boolean".format(
176 value))
175177
176178 def long_description(self):
177179 return 'value must be convertible to type bool'
892892 for f in re.findall("'(.*)'[\n$]", stdout)]
893893
894894 @normalize_paths(match_return_type=False)
895 def remove(self, files, **kwargs):
895 def remove(self, files, recursive=False, **kwargs):
896896 """Remove files.
897897
898898 Calls git-rm.
901901 ----------
902902 files: str
903903 list of paths to remove
904 recursive: False
905 either to allow recursive removal from subdirectories
904906 kwargs:
905907 see `__init__`
906908
912914
913915 files = _remove_empty_items(files)
914916
917 if recursive:
918 kwargs['r'] = True
915919 stdout, stderr = self._git_custom_command(
916920 files, ['git', 'rm'] + to_options(**kwargs))
917921
11581162 # return [branch.strip() for branch in
11591163 # self.repo.git.branch(r=True).splitlines()]
11601164
1161 def get_remotes(self, with_refs_only=False, with_urls_only=False):
1165 def get_remotes(self, with_urls_only=False):
11621166 """Get known remotes of the repository
11631167
11641168 Parameters
11651169 ----------
1166 with_refs_only : bool, optional
1167 return only remotes with any refs. E.g. annex special remotes
1168 would not have any refs
1170 with_urls_only : bool, optional
1171 return only remotes which have urls
11691172
11701173 Returns
11711174 -------
11721175 remotes : list of str
11731176 List of names of the remotes
11741177 """
1175
1176 # Note: This still uses GitPython and therefore might cause a gitpy.Repo
1177 # instance to be created.
1178 if with_refs_only:
1179 # older versions of GitPython might not tolerate remotes without
1180 # any references at all, so we need to catch
1181 remotes = []
1182 for remote in self.repo.remotes:
1183 try:
1184 if len(remote.refs):
1185 remotes.append(remote.name)
1186 except AssertionError as exc:
1187 if "not have any references" not in str(exc):
1188 # was some other reason
1189 raise
11901178
11911179 # Note: read directly from config and spare instantiation of gitpy.Repo
11921180 # since we need this in AnnexRepo constructor. Furthermore gitpy does it
14171405 return self._git_custom_command(
14181406 '', ['git', 'remote', 'remove', name]
14191407 )
1420
1421 def show_remotes(self, name='', verbose=False):
1422 """
1423 """
1424
1425 options = ["-v"] if verbose else []
1426 name = [name] if name else []
1427 out, err = self._git_custom_command(
1428 '', ['git', 'remote'] + options + ['show'] + name
1429 )
1430 return out.rstrip(linesep).splitlines()
14311408
14321409 def update_remote(self, name=None, verbose=False):
14331410 """
117117 doc.strip()
118118 if len(doc) and not doc.endswith('.'):
119119 doc += '.'
120 if self.constraints is not None:
121 cdoc = self.constraints.long_description()
122 if cdoc[0] == '(' and cdoc[-1] == ')':
123 cdoc = cdoc[1:-1]
124 addinfo = ''
125 if self.cmd_kwargs.get('nargs', None) == '?' \
126 or self.cmd_kwargs.get('action', None) == 'append':
127 addinfo = 'list expected, each '
128 doc += ' Constraints: %s%s.' % (addinfo, cdoc)
129120 if has_default:
130121 doc += " [Default: %r]" % (default,)
131122 # Explicitly deal with multiple spaces, for some reason
2020
2121 from datalad.support.param import Parameter
2222 from datalad.interface.base import Interface
23 from datalad.interface.utils import build_doc
23 from datalad.interface.base import build_doc
2424
2525 from datalad import ssh_manager
2626
284284 ar.get('test-annex.dat', options=["--from=NotExistingRemote"])
285285 eq_(cme.exception.remote, "NotExistingRemote")
286286
287 # and similar one whenever invoking with remote parameter
288 with assert_raises(RemoteNotAvailableError) as cme:
289 ar.get('test-annex.dat', remote="NotExistingRemote")
290 eq_(cme.exception.remote, "NotExistingRemote")
291
287292
288293 # 1 is enough to test file_has_content
289294 @with_batch_direct
482487 @with_tempfile
483488 def test_AnnexRepo_migrating_backends(src, dst):
484489 ar = AnnexRepo.clone(src, dst, backend='MD5')
490 eq_(ar.default_backends, ['MD5'])
485491 # GitPython has a bug which causes .git/config being wiped out
486492 # under Python3, triggered by collecting its config instance I guess
487493 gc.collect()
11611167 # Test that if we pass a list of items and annex processes them nicely,
11621168 # we would obtain a list back. To not stress our tests even more -- let's mock
11631169 def ok_copy(command, **kwargs):
1170 # Check that we do pass to annex call only the list of files which we
1171 # asked to be copied
1172 assert_in('copied1', kwargs['annex_options'])
1173 assert_in('copied2', kwargs['annex_options'])
1174 assert_in('existed', kwargs['annex_options'])
11641175 return """
11651176 {"command":"copy","note":"to target ...", "success":true, "key":"akey1", "file":"copied1"}
11661177 {"command":"copy","note":"to target ...", "success":true, "key":"akey2", "file":"copied2"}
11731184 # now let's test that we are correctly raising the exception in case if
11741185 # git-annex execution fails
11751186 orig_run = repo._run_annex_command
1187
1188 # Kinda a bit off the reality since no nonex* would not be returned/handled
1189 # by _get_expected_files, so in real life -- wouldn't get report about Incomplete!?
11761190 def fail_to_copy(command, **kwargs):
11771191 if command == 'copy':
11781192 # That is not how annex behaves
11901204 else:
11911205 return orig_run(command, **kwargs)
11921206
1193 with patch.object(repo, '_run_annex_command', fail_to_copy):
1207 def fail_to_copy_get_expected(files, expr):
1208 assert files == ["copied", "existed", "nonex1", "nonex2"]
1209 return {'akey1': 10}, ["copied"]
1210
1211 with patch.object(repo, '_run_annex_command', fail_to_copy), \
1212 patch.object(repo, '_get_expected_files', fail_to_copy_get_expected):
11941213 with assert_raises(IncompleteResultsError) as cme:
11951214 repo.copy_to(["copied", "existed", "nonex1", "nonex2"], "target")
11961215 eq_(cme.exception.results, ["copied"])
21192138 def test_AnnexRepo_flyweight_monitoring_inode(path, store):
21202139 # testing for issue #1512
21212140 check_repo_deals_with_inode_change(AnnexRepo, path, store)
2141
2142
2143 @with_tempfile(mkdir=True)
2144 def test_fake_is_not_special(path):
2145 ar = AnnexRepo(path, create=True)
2146 # doesn't exist -- we fail by default
2147 assert_raises(RemoteNotAvailableError, ar.is_special_annex_remote, "fake")
2148 assert_false(ar.is_special_annex_remote("fake", check_if_known=False))
347347 def test_GitRepo_remote_add(orig_path, path):
348348
349349 gr = GitRepo.clone(orig_path, path)
350 out = gr.show_remotes()
350 out = gr.get_remotes()
351351 assert_in('origin', out)
352352 eq_(len(out), 1)
353353 gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1')
354 out = gr.show_remotes()
354 out = gr.get_remotes()
355355 assert_in('origin', out)
356356 assert_in('github', out)
357357 eq_(len(out), 2)
358 out = gr.show_remotes('github')
359 assert_in(' Fetch URL: git://github.com/datalad/testrepo--basic--r1', out)
358 eq_('git://github.com/datalad/testrepo--basic--r1', gr.config['remote.github.url'])
360359
361360
362361 @with_testrepos(flavors=local_testrepo_flavors)
366365 gr = GitRepo.clone(orig_path, path)
367366 gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1')
368367 gr.remove_remote('github')
369 out = gr.show_remotes()
368 out = gr.get_remotes()
370369 eq_(len(out), 1)
371370 assert_in('origin', out)
372
373
374 @with_testrepos(flavors=local_testrepo_flavors)
375 @with_tempfile
376 def test_GitRepo_remote_show(orig_path, path):
377
378 gr = GitRepo.clone(orig_path, path)
379 gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1')
380 out = gr.show_remotes(verbose=True)
381 eq_(len(out), 4)
382 assert_in('origin\t%s (fetch)' % orig_path, out)
383 assert_in('origin\t%s (push)' % orig_path, out)
384 # Some fellas might have some fancy rewrite rules for pushes, so we can't
385 # just check for specific protocol
386 assert_re_in('github\tgit(://|@)github.com[:/]datalad/testrepo--basic--r1 \(fetch\)',
387 out)
388 assert_re_in('github\tgit(://|@)github.com[:/]datalad/testrepo--basic--r1 \(push\)',
389 out)
390371
391372
392373 @with_testrepos(flavors=local_testrepo_flavors)
6363 # constraints
6464 p = Parameter(doc=doc, constraints=cnstr.EnsureInt() | cnstr.EnsureStr())
6565 autodoc = p.get_autodoc('testname')
66 assert_true("convertible to type 'int'" in autodoc)
67 assert_true('must be a string' in autodoc)
6866 assert_true('int or str' in autodoc)
6967
7068 with assert_raises(ValueError) as cmr:
1212 from os.path import lexists, dirname, join as opj, curdir
1313
1414 # Hard coded version, to be done by release process
15 __version__ = '0.6.0.dev1'
15 __version__ = '0.8.0'
1616
1717 # NOTE: might cause problems with "python setup.py develop" deployments
1818 # so I have even changed buildbot to use pip install -e .
2525 generated/man/datalad-create-sibling
2626 generated/man/datalad-create-sibling-github
2727 generated/man/datalad-drop
28 generated/man/datalad-export
28 generated/man/datalad-plugin
2929 generated/man/datalad-get
3030 generated/man/datalad-install
3131 generated/man/datalad-publish
0 .. -*- mode: rst -*-
1 .. vi: set ft=rst sts=4 ts=4 sw=4 et tw=79:
2
3 .. _chap_customization:
4
5 ********************************************
6 Customization and extension of functionality
7 ********************************************
8
9 DataLad provides numerous commands that cover many use cases. However, there will
10 always be a demand for further customization at a particular site, or for an
11 individual user. DataLad addresses this need by providing a generic plugin
12 interface.
13
14 First of all, DataLad plugins can be executed via the :ref:`man_datalad-plugin`
15 command. This allows for executing arbitrary plugins (on particular dataset)
16 at any point in time.
17
18 In addition, DataLad can be configured to run any number of plugins prior or
19 after particular commands. For example, it is possible to execute a plugin
20 each time DataLad has created a dataset to configure it so that all files
21 that are added to its ``code/`` subdirectory will always be managed directly
22 with Git and not be put into the dataset's annex. In order to achieve this,
23 adjust your Git configuration in the following way::
24
25 git config --global --add datalad.create.run-after 'no_annex pattern=code/**'
26
27 This will cause DataLad to run the ``no_annex`` plugin to add the given pattern
28 to the dataset's ``.gitattribute`` file, which in turn instructs git annex to
29 send any matching files directly to Git. The same functionality is available
30 for ad-hoc adjustments via the ``--run-after`` option supported by most
31 commands.
32
33 Analog to ``--run-after`` DataLad also supports ``--run-before`` to execute
34 plugins prior a command.
35
36 DataLad will discover plugins at three locations:
37
38 1. official plugins that are part of the local DataLad installation
39
40 2. system-wide plugins, provided by the local admin
41
42 The location where plugins need to be placed depends on the platform.
43 On GNU/Linux systems this will be ``/etc/xdg/datalad/plugins``, whereas
44 on Windows it will be ``C:\ProgramData\datalad.org\datalad\plugins``.
45
46 This default location can be overridden by setting the
47 ``datalad.locations.system-plugins`` configuration variable in the local or
48 global Git configuration.
49
50 3. user-supplied plugins, customizable by each user
51
52 Again, the location will depend on the platform. On GNU/Linux systems this
53 will be ``$HOME/.config/datalad/plugins``, whereas on Windows it will be
54 ``C:\Users\<username>\AppData\Local\datalad.org\datalad\plugins``.
55
56 This default location can be overridden by setting the
57 ``datalad.locations.user-plugins`` configuration variable in the local or
58 global Git configuration.
59
60 Identically named plugins in latter location replace those in locations
61 searched before. This can be used to alter the behavior of plugins provided
62 with DataLad, and enables users to adjust a site-wide configuration.
63
64
65 Writing own plugins
66 ===================
67
68 Plugins are written in Python. In order for DataLad to be able to find
69 them, plugins need to be placed in one of the supported locations described
70 above. Plugin file names have to have a '.py' extensions and must not start
71 with an underscore ('_').
72
73 Plugin source files must define a function named::
74
75 dlplugin
76
77 This function is executed as the plugin. It can have any number of
78 arguments (positional, or keyword arguments with defaults), or none at
79 all. All arguments, except ``dataset`` must expect any value to
80 be a string.
81
82 The plugin function must be self-contained, i.e. all needed imports
83 of definitions must be done within the body of the function.
84
85 The doc string of the plugin function is displayed when the plugin
86 documentation is requested. The first line in a plugin file that starts
87 with triple double-quotes will be used as the plugin short description
88 (this will typically be the docstring of the module file). This short
89 description is displayed as the plugin synopsis in the plugin overview
90 list.
91
92 Plugin functions must yield their results as a Python generator. Results are
93 DataLad status dictionaries. There are no constraints on the number of results,
94 or the number and nature of result properties. However, conventions exists and
95 must be followed for compatibility with the result evaluation and rendering
96 performed by DataLad.
97
98 The following property keys must exist:
99
100 "status"
101 {'ok', 'notneeded', 'impossible', 'error'}
102
103 "action"
104 label for the action performed by the plugin. In many cases this
105 could be the plugin's name.
106
107 The following keys should exists if possible:
108
109 "path"
110 absolute path to a result on the file system
111
112 "type"
113 label indicating the nature of a result (e.g. 'file', 'dataset',
114 'directory', etc.)
115
116 "message"
117 string message annotating the result, particularly important for
118 non-ok results. This can be a tuple with 'logging'-style string
119 expansion.
1111
1212 Datalad is a Python package and can be installed via pip_, which is the
1313 preferred method unless system packages are available for the target platform
14 (see below)::
14 (see below). To automatically install datalad and all its software dependencies
15 type::
1516
1617 pip install datalad
1718
1819 .. _pip: https://pip.pypa.io
1920
20 This will automatically install all software dependencies necessary to provide
21 core functionality. Several additional installation schemes are supported
22 (e.g., ``publish``, ``metadata``, ``tests``, ``crawl``)::
21 Several additional installation schemes are supported (``[SCHEME]`` can be e.g.
22 ``publish``, ``metadata``, ``tests`` or ``crawl``)::
2323
24 pip install datalad[SCHEME]
25
26 where ``SCHEME`` can be any supported scheme, such as the ones listed above.
24 pip install datalad [SCHEME]
25
26 .. cool, but why should I (or a first-time reader) even bother about the schemes?
2727
2828 In addition, it is necessary to have a working installation of git-annex_,
2929 which is not set up automatically at this point.
3838 package::
3939
4040 sudo apt-get install datalad
41
42 A current version of git-annex (as also provided by the NeuroDebian_
43 repository) can be installed by typing::
44
45 sudo apt-get install git-annex
4146
4247 .. _neurodebian: http://neuro.debian.net
4348
5964 First steps
6065 ===========
6166
62 After datalad is installed it can be queried for information about known
63 datasets. For example, we might want to look for dataset thats were funded by,
64 or acknowledge the US National Science Foundation (NSF)::
67 Datalad can be queried for information about known datasets. Doing a first search
68 query, datalad automatically offers assistence to obtain a :term:`superdataset` first.
69 The superdataset is a lightweight container that contains meta information about known datasets but does not contain actual data itself.
70
71 For example, we might want to look for dataset thats were funded by, or acknowledge the US National Science Foundation (NSF)::
6572
6673 ~ % datalad search NSF
6774 No DataLad dataset found at current location
7582 ~/datalad/openfmri/ds000003
7683 ...
7784
78 On first attempt, datalad offers assistence to obtain a :term:`superdataset`
79 with information on all datasets it knows about. This is a lightweight
80 container that does not actually contain data, but meta information only. Once
81 downloaded queries can be made offline.
82
8385 Any known dataset can now be installed inside the local superdataset with a
8486 command like this::
8587
1818 basics
1919 usecases/index
2020 metadata
21 customization
2122 faq
2223 glossary
2324
2727 api.create_sibling
2828 api.create_sibling_github
2929 api.drop
30 api.export
30 api.plugin
3131 api.get
3232 api.install
3333 api.publish
7676 api.crawl
7777 api.crawl_init
7878 api.test
79
80 Plugins
81 -------
82
83 DataLad can be customized by plugins. The following plugins are shipped
84 with DataLad.
85
86 .. currentmodule:: datalad.plugin
87 .. autosummary::
88 :toctree: generated
89
90 add_readme
91 export_tarball
92 no_annex
93 wtf
7994
8095
8196 Support functionality
00 # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided
11 # Since we use requirements.txt ATM only for development IMHO it is ok but
22 # we need to figure out/complaint to pip folks
3 # For now, until https://github.com/GrahamDumpleton/wrapt/issues/98 resolved
4 # we should use our version which allows to disable extension(s)
5 git+https://github.com/yarikoptic/wrapt@develop
36 -e .[devel]
7