Merge tag '0.8.0' into debian
A variety of fixes and enhancements
- [publish] would now push merged `git-annex` branch even if no other changes
were done
- [publish] should be able to publish using relative path within SSH URI
(git hook would use relative paths)
- [publish] should better tollerate publishing to pure git and `git-annex`
special remotes
- [plugin] mechanism came to replace [export]. See [export_tarball] for the
replacement of [export]. Now it should be easy to extend datalad's interface
with custom functionality to be invoked along with other commands.
- Minimalistic coloring of the results rendering
- [publish]/`copy_to` got progress bar report now and support of `--jobs`
- minor fixes and enhancements to crawler (e.g. support of recursive removes)
* tag '0.8.0': (76 commits)
Changelog for 0.8.0
BF: fixed test_publish for assuming that there is no need to push git-annex, which was fixed in prior commit
BF/RF: mv is_remote_annex_ignored to AnnexRepo, make siblings command not puke if not yet annex-ignored
BF: publish if only updates to git-annex, do not puke if remote is ignored by annex
ENH: add --to-annex (reuse to_git Python interface though) to force adding to annex
RF: --text-to-git -> --text-no-annex, and handled by create, not AnnexRepo
ENH: allow for "recursive" flag for remove (needed while crawling s3 where prefix is a directory)
BF: fixing up a test and a hook more for now using relative path(s)
BF: call set_remote_dead only for annexrepo
RF: removed the comment
BF: push updated git-annex branch upon publishing data (only)
BF: use relative dspath in the hook (Closes #1653), dead/remove remote upon replace (Closes #1656)
BF: for copy_to report only # of files present locally and use correct verb in msg
BF: Test is old fashion -- doesn't accept rendering options etc
ENH: create --text-to-git to establish .gitattributes so that text file go to git
BF: fixing url for pip -- must have git+ prefix
DOC: little cleaning of gettingstarted.rst
BF(workaround): use patched wrapt disabling its extensions
BF: providing guarding against non-existing paths in checking on what to copy
ENH: --jobs and progress for copy_to/publish
...
Yaroslav Halchenko
6 years ago
177 | 177 | # Verify that setup.py build doesn't puke |
178 | 178 | - python setup.py build |
179 | 179 | # Run tests |
180 | - PATH=$PWD/tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -v -A "$NOSE_SELECTION_OP(integration or usecase or slow)" --with-doctest --doctest-tests --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM | |
180 | - WRAPT_DISABLE_EXTENSIONS=1 PATH=$PWD/tools/coverage-bin:$PATH $NOSE_WRAPPER `which nosetests` $NOSE_OPTS -v -A "$NOSE_SELECTION_OP(integration or usecase or slow)" --with-doctest --doctest-tests --with-cov --cover-package datalad --logging-level=INFO $TESTS_TO_PERFORM | |
181 | 181 | # Generate documentation and run doctests |
182 | 182 | # but do only when we do not have obnoxious logging turned on -- something screws up sphinx on travis |
183 | 183 | - if [ ! "${DATALAD_LOG_LEVEL:-}" = 2 ]; then PYTHONPATH=$PWD $NOSE_WRAPPER make -C docs html doctest; fi |
8 | 8 | We would recommend to consult log of the |
9 | 9 | [DataLad git repository](http://github.com/datalad/datalad) for more details. |
10 | 10 | |
11 | # 0.7.0 (Jun 25, 2017) -- when it works - it is quite awesome! | |
12 | ||
13 | I bet we will fix some bugs and make a world even a better place. | |
11 | ||
12 | ## 0.8.0 (Jul 31, 2017) -- it is better than ever | |
13 | ||
14 | A variety of fixes and enhancements | |
15 | ||
16 | ### Fixes | |
17 | ||
18 | - [publish] would now push merged `git-annex` branch even if no other changes | |
19 | were done | |
20 | - [publish] should be able to publish using relative path within SSH URI | |
21 | (git hook would use relative paths) | |
22 | - [publish] should better tollerate publishing to pure git and `git-annex` | |
23 | special remotes | |
24 | ||
25 | ### Enhancements and new features | |
26 | ||
27 | - [plugin] mechanism came to replace [export]. See [export_tarball] for the | |
28 | replacement of [export]. Now it should be easy to extend datalad's interface | |
29 | with custom functionality to be invoked along with other commands. | |
30 | - Minimalistic coloring of the results rendering | |
31 | - [publish]/`copy_to` got progress bar report now and support of `--jobs` | |
32 | - minor fixes and enhancements to crawler (e.g. support of recursive removes) | |
33 | ||
34 | ||
35 | ## 0.7.0 (Jun 25, 2017) -- when it works - it is quite awesome! | |
36 | ||
37 | New features, refactorings, and bug fixes. | |
14 | 38 | |
15 | 39 | ### Major refactoring and deprecations |
16 | 40 | |
18 | 42 | - [create-sibling], and [unlock] have been re-written to support the |
19 | 43 | same common API as most other commands |
20 | 44 | |
21 | ## Enhancements and new features | |
45 | ### Enhancements and new features | |
22 | 46 | |
23 | 47 | - [siblings] can now be used to query and configure a local repository by |
24 | 48 | using the sibling name ``here`` |
30 | 54 | - Significant parts of the documentation of been updated |
31 | 55 | - Instantiate GitPython's Repo instances lazily |
32 | 56 | |
33 | ## Fixes | |
57 | ### Fixes | |
34 | 58 | |
35 | 59 | - API documentation is now rendered properly as HTML, and is easier to browse by |
36 | 60 | having more compact pages |
358 | 382 | [datalad]: http://docs.datalad.org/en/latest/generated/man/datalad.html |
359 | 383 | [drop]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-drop.html |
360 | 384 | [export]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-export.html |
385 | [export_tarball]: http://docs.datalad.org/en/latest/generated/datalad.plugin.export_tarball.html | |
361 | 386 | [get]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-get.html |
362 | 387 | [install]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-install.html |
363 | 388 | [ls]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-ls.html |
364 | 389 | [metadata]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-metadata.html |
365 | 390 | [publish]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-publish.html |
391 | [plugin]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-plugin.html | |
366 | 392 | [remove]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-remove.html |
367 | 393 | [save]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-save.html |
368 | 394 | [search]: http://datalad.readthedocs.io/en/latest/generated/man/datalad-search.html |
413 | 413 | Any new DATALAD_CMD_PROTOCOL has to implement datalad.support.protocol.ProtocolInterface |
414 | 414 | - *DATALAD_CMD_PROTOCOL_PREFIX*: |
415 | 415 | Sets a prefix to add before the command call times are noted by DATALAD_CMD_PROTOCOL. |
416 | ||
417 | ||
418 | # Changelog section | |
419 | ||
420 | For the upcoming release use this template | |
421 | ||
422 | ## 0.8.1 (??? ??, 2017) -- will be better than ever | |
423 | ||
424 | bet we will fix some bugs and make a world even a better place. | |
425 | ||
426 | ### Major refactoring and deprecations | |
427 | ||
428 | - hopefully none | |
429 | ||
430 | ### Fixes | |
431 | ||
432 | ? | |
433 | ||
434 | ### Enhancements and new features | |
435 | ||
436 | ? | |
437 |
26 | 26 | from .log import lgr |
27 | 27 | import atexit |
28 | 28 | from datalad.utils import on_windows |
29 | ||
29 | 30 | if not on_windows: |
30 | 31 | lgr.log(5, "Instantiating ssh manager") |
31 | 32 | from .support.sshconnector import SSHManager |
33 | 34 | atexit.register(ssh_manager.close, allow_fail=False) |
34 | 35 | else: |
35 | 36 | ssh_manager = None |
37 | ||
38 | try: | |
39 | # this will fix the rendering of ANSI escape sequences | |
40 | # for colored terminal output on windows | |
41 | # it will do nothing on any other platform, hence it | |
42 | # is safe to call unconditionally | |
43 | import colorama | |
44 | colorama.init() | |
45 | atexit.register(colorama.deinit) | |
46 | except ImportError as e: | |
47 | if on_windows: | |
48 | from datalad.dochelpers import exc_str | |
49 | lgr.warning( | |
50 | "'colorama' Python module missing, terminal output may look garbled [%s]", | |
51 | exc_str(e)) | |
52 | pass | |
36 | 53 | |
37 | 54 | atexit.register(lgr.log, 5, "Exiting") |
38 | 55 |
20 | 20 | from collections import namedtuple |
21 | 21 | from functools import wraps |
22 | 22 | |
23 | from datalad import cfg | |
24 | ||
25 | from .interface.base import update_docstring_with_parameters | |
26 | 23 | from .interface.base import get_interface_groups |
27 | 24 | from .interface.base import get_api_name |
28 | from .interface.base import alter_interface_docs_for_api | |
29 | from .interface.base import merge_allargs2kwargs | |
25 | from .interface.base import get_allargs_as_kwargs | |
30 | 26 | |
31 | 27 | def _kwargs_to_namespace(call, args, kwargs): |
32 | 28 | """ |
33 | 29 | Given a __call__, args and kwargs passed, prepare a cmdlineargs-like |
34 | 30 | thing |
35 | 31 | """ |
36 | kwargs_ = merge_allargs2kwargs(call, args, kwargs) | |
32 | kwargs_ = get_allargs_as_kwargs(call, args, kwargs) | |
37 | 33 | # Get all arguments removing those possible ones used internally and |
38 | 34 | # which shouldn't be exposed outside anyways |
39 | 35 | [kwargs_.pop(k) for k in kwargs_ if k.startswith('_')] |
141 | 141 | of the command; 'continue' works like 'ignore', but an error causes a |
142 | 142 | non-zero exit code; 'stop' halts on first failure and yields non-zero exit |
143 | 143 | code. A failure is any result with status 'impossible' or 'error'.""") |
144 | parser.add_argument( | |
145 | '--run-before', dest='common_run_before', | |
146 | nargs='+', | |
147 | action='append', | |
148 | metavar='PLUGINSPEC', | |
149 | help="""DataLad plugin to run after the command. PLUGINSPEC is a list | |
150 | comprised of a plugin name plus optional `key=value` pairs with arguments | |
151 | for the plugin call (see `plugin` command documentation for details). | |
152 | This option can be given more than once to run multiple plugins | |
153 | in the order in which they were given. | |
154 | For running plugins that require a --dataset argument it is important | |
155 | to provide the respective dataset as the --dataset argument of the main | |
156 | command, if it is not in the list of plugin arguments."""), | |
157 | parser.add_argument( | |
158 | '--run-after', dest='common_run_after', | |
159 | nargs='+', | |
160 | action='append', | |
161 | metavar='PLUGINSPEC', | |
162 | help="""Like --run-before, but plugins are executed after the main command | |
163 | has finished."""), | |
164 | parser.add_argument( | |
165 | '--cmd', dest='_', action='store_true', | |
166 | help="""syntactical helper that can be used to end the list of global | |
167 | command line options before the subcommand label. Options like | |
168 | --run-before can take an arbitray number of arguments and may require | |
169 | to be followed by a single --cmd in order to enable identification | |
170 | of the subcommand.""") | |
144 | 171 | |
145 | 172 | # yoh: atm we only dump to console. Might adopt the same separation later on |
146 | 173 | # and for consistency will call it --verbose-level as well for now |
36 | 36 | from ...utils import lmtime |
37 | 37 | from ...utils import find_files |
38 | 38 | from ...utils import auto_repr |
39 | from ...utils import _path_ | |
39 | 40 | from ...utils import getpwd |
40 | 41 | from ...utils import try_multiple |
41 | 42 | from ...tests.utils import put_file_under_git |
176 | 177 | "Was instructed to add to super dataset but no super dataset " |
177 | 178 | "was found for %s" % ds |
178 | 179 | ) |
179 | ||
180 | # create/AnnexRepo specification of backend does it non-persistently in .git/config | |
181 | if backend: | |
182 | put_file_under_git(path, '.gitattributes', '* annex.backend=%s' % backend, annexed=False) | |
183 | 180 | |
184 | 181 | return ds |
185 | 182 | |
853 | 850 | if self.repo.dirty and not exists(opj(path, '.gitattributes')) and isinstance(self.repo, AnnexRepo): |
854 | 851 | backends = self.repo.default_backends |
855 | 852 | if backends: |
856 | # then record default backend into the .gitattributes | |
857 | put_file_under_git(path, '.gitattributes', '* annex.backend=%s' % backends[0], | |
858 | annexed=False) | |
853 | self.repo.set_default_backend(backends[0], commit=False) | |
859 | 854 | |
860 | 855 | # at least use repo._git_custom_command |
861 | 856 | def _commit(self, msg=None, options=[]): |
1302 | 1297 | stats = data.get('datalad_stats', None) |
1303 | 1298 | if self.repo.dirty: # or self.tracker.dirty # for dry run |
1304 | 1299 | lgr.info("Repository found dirty -- adding and committing") |
1305 | _call(self.repo.add, '.', options=self.options) # so everything is committed | |
1300 | _call(self.repo.add, '.', git_options=self.options) # so everything is committed | |
1306 | 1301 | |
1307 | 1302 | stats_str = ('\n\n' + stats.as_str(mode='full')) if stats else '' |
1308 | 1303 | _call(self._commit, "%s%s" % (', '.join(self._states), stats_str), options=["-a"]) |
1394 | 1389 | |
1395 | 1390 | return _remove_obsolete() |
1396 | 1391 | |
1397 | def remove(self, data): | |
1392 | def remove(self, data, recursive=False): | |
1398 | 1393 | """Removed passed along file name from git/annex""" |
1399 | 1394 | stats = data.get('datalad_stats', None) |
1400 | 1395 | self._states.add("Removed files") |
1402 | 1397 | # TODO: not sure if we should may be check if exists, and skip/just complain if not |
1403 | 1398 | if stats: |
1404 | 1399 | _call(stats.increment, 'removed') |
1405 | if lexists(opj(self.repo.path, filename)): | |
1406 | _call(self.repo.remove, filename) | |
1400 | filepath = opj(self.repo.path, filename) | |
1401 | if lexists(filepath): | |
1402 | if os.path.isdir(filepath): | |
1403 | if recursive: | |
1404 | _call(self.repo.remove, filename, recursive=True) | |
1405 | else: | |
1406 | lgr.warning("Do not removing %s recursively, skipping", filepath) | |
1407 | else: | |
1408 | _call(self.repo.remove, filename) | |
1407 | 1409 | else: |
1408 | 1410 | lgr.warning("Was asked to remove non-existing path %s", filename) |
1409 | 1411 | yield data |
219 | 219 | commits = {b: list(repo.get_branch_commits(b)) for b in branches} |
220 | 220 | eq_(len(commits['incoming']), 1) |
221 | 221 | eq_(len(commits['incoming-processed']), 2) |
222 | eq_(len(commits['master']), 5) # all commits out there -- init ds + init crawler + 1*(incoming, processed, merge) | |
222 | eq_(len(commits['master']), 6) # all commits out there -- init ds + init crawler + 1*(incoming, processed, merge) | |
223 | 223 | |
224 | 224 | with chpwd(outd): |
225 | 225 | eq_(set(glob('*')), {'dir1', 'file1.nii'}) |
249 | 249 | |
250 | 250 | |
251 | 251 | @with_tree(tree={ |
252 | ||
253 | 252 | 'study': { |
254 | 253 | 'show': { |
255 | 254 | 'WG33': { |
258 | 257 | <a href="/file/show/JX5V">file1.nii</a> |
259 | 258 | <a href="/file/show/RIBX">dir1 / file2.nii</a> |
260 | 259 | <a href="/file/show/GSRD">file1b.nii</a> |
261 | ||
262 | 260 | %s |
263 | 261 | </body></html>""" % _PLUG_HERE, |
264 | 262 | }, |
272 | 270 | } |
273 | 271 | } |
274 | 272 | }, |
275 | ||
276 | 273 | 'file': { |
277 | 274 | 'show': { |
278 | 275 | 'JX5V': { |
292 | 289 | } |
293 | 290 | |
294 | 291 | }, |
295 | ||
296 | 292 | 'download': { |
297 | 293 | 'file1.nii': "content of file1.nii is different", |
298 | 294 | 'file1b.nii': "content of file1b.nii", |
342 | 338 | './.datalad/crawl/crawl.cfg', |
343 | 339 | './.datalad/crawl/statuses/incoming.json', |
344 | 340 | './.datalad/meta/balsa.json', |
345 | './file1.nii', './dir1/file2.nii', | |
341 | './file1.nii', | |
342 | './dir1/file2.nii', | |
346 | 343 | } |
347 | 344 | |
348 | 345 | eq_(set(all_files), target_files) |
264 | 264 | eq_(len(commits_l['incoming']), 3) |
265 | 265 | eq_(len(commits['incoming-processed']), 6) |
266 | 266 | eq_(len(commits_l['incoming-processed']), 4) # because original merge has only 1 parent - incoming |
267 | eq_(len(commits['master']), 12) # all commits out there -- dataset init, crawler init + 3*(incoming, processed, meta data aggregation, merge) | |
268 | eq_(len(commits_l['master']), 6) | |
267 | eq_(len(commits['master']), 13) # all commits out there -- dataset init, crawler init + 3*(incoming, processed, meta data aggregation, merge) | |
268 | eq_(len(commits_l['master']), 7) | |
269 | 269 | |
270 | 270 | # Check tags for the versions |
271 | 271 | eq_(out[0]['datalad_stats'].get_total().versions, ['1.0.0', '1.0.1']) |
272 | 272 | # +1 because original "release" was assumed to be 1.0.0 |
273 | 273 | repo_tags = repo.get_tags() |
274 | 274 | eq_(repo.get_tags(output='name'), ['1.0.0', '1.0.0+1', '1.0.1']) |
275 | eq_(repo_tags[0]['hexsha'], commits_l['master'][-4].hexsha) # next to the last one | |
275 | eq_(repo_tags[0]['hexsha'], commits_l['master'][-5].hexsha) # next to the last one | |
276 | 276 | eq_(repo_tags[-1]['hexsha'], commits_l['master'][0].hexsha) # the last one |
277 | 277 | |
278 | 278 | def hexsha(l): |
468 | 468 | eq_(len(commits['incoming-processed']), 2) |
469 | 469 | eq_(len(commits_l['incoming-processed']), 2) # because original merge has only 1 parent - incoming |
470 | 470 | # to avoid 'dataset init' commit create() needs save=False |
471 | eq_(len(commits['master']), 6) # all commits out there, dataset init, crawler, init, incoming, incoming-processed, meta data aggregation, merge | |
472 | eq_(len(commits_l['master']), 4) # dataset init, init, meta data aggregation, merge | |
471 | eq_(len(commits['master']), 7) # all commits out there, backend, dataset init, crawler, init, incoming, incoming-processed, meta data aggregation, merge | |
472 | eq_(len(commits_l['master']), 5) # backend, dataset init, init, meta data aggregation, merge | |
473 | 473 | |
474 | 474 | # rerun pipeline -- make sure we are on the same in all branches! |
475 | 475 | with chpwd(outd): |
42 | 42 | ['encryption=none', 'type=external', 'externaltype=%s' % ARCHIVES_SPECIAL_REMOTE, |
43 | 43 | 'autoenable=true' |
44 | 44 | ]) |
45 | assert annex.is_special_annex_remote(ARCHIVES_SPECIAL_REMOTE) | |
45 | 46 | # We want two maximally obscure names, which are also different |
46 | 47 | assert(fn_extracted != fn_inarchive_obscure) |
47 | 48 | annex.add(fn_archive, commit=True, msg="Added tarball") |
37 | 37 | from datalad.interface.results import results_from_annex_noinfo |
38 | 38 | from datalad.interface.utils import discover_dataset_trace_to_targets |
39 | 39 | from datalad.interface.utils import eval_results |
40 | from datalad.interface.utils import build_doc | |
40 | from datalad.interface.base import build_doc | |
41 | 41 | from datalad.interface.save import Save |
42 | 42 | from datalad.distribution.utils import _fixup_submodule_dotgit_setup |
43 | 43 | from datalad.support.constraints import EnsureStr |
140 | 140 | as it inflates dataset sizes and impacts flexibility of data |
141 | 141 | transport. If not specified - it will be up to git-annex to |
142 | 142 | decide, possibly on .gitattributes options."""), |
143 | to_annex=Parameter( | |
144 | args=("--to-annex",), | |
145 | action='store_false', | |
146 | dest='to_git', | |
147 | doc="""flag whether to force adding data to Annex, instead of | |
148 | git. It might be that .gitattributes instructs for a file to be | |
149 | added to git, but for some particular files it is desired to be | |
150 | added to annex (e.g. sensitive files etc). | |
151 | If not specified - it will be up to git-annex to | |
152 | decide, possibly on .gitattributes options."""), | |
143 | 153 | recursive=recursion_flag, |
144 | 154 | recursion_limit=recursion_limit, |
145 | 155 | # TODO not functional anymore |
177 | 187 | annex_opts=None, |
178 | 188 | annex_add_opts=None, |
179 | 189 | jobs=None): |
180 | ||
181 | 190 | # parameter constraints: |
182 | 191 | if not path: |
183 | 192 | raise InsufficientArgumentsError( |
9 | 9 | |
10 | 10 | |
11 | 11 | import logging |
12 | import re | |
12 | 13 | from os import listdir |
13 | 14 | from os.path import relpath |
14 | 15 | from os.path import pardir |
16 | 17 | |
17 | 18 | from datalad.interface.base import Interface |
18 | 19 | from datalad.interface.utils import eval_results |
19 | from datalad.interface.utils import build_doc | |
20 | from datalad.interface.base import build_doc | |
20 | 21 | from datalad.interface.results import get_status_dict |
21 | 22 | from datalad.interface.common_opts import location_description |
22 | 23 | # from datalad.interface.common_opts import git_opts |
100 | 101 | reckless=reckless_opt, |
101 | 102 | alt_sources=Parameter( |
102 | 103 | args=('--alternative-sources',), |
104 | dest='alt_sources', | |
103 | 105 | metavar='SOURCE', |
104 | 106 | nargs='+', |
105 | 107 | doc="""Alternative sources to be tried if a dataset cannot |
235 | 237 | lgr.debug("Wiping out unsuccessful clone attempt at: %s", |
236 | 238 | dest_path) |
237 | 239 | rmtree(dest_path) |
240 | if 'could not create work tree' in e.stderr.lower(): | |
241 | # this cannot be fixed by trying another URL | |
242 | yield get_status_dict( | |
243 | status='error', | |
244 | message=re.match(r".*fatal: (.*)\n", | |
245 | e.stderr, | |
246 | flags=re.MULTILINE | re.DOTALL).group(1), | |
247 | **status_kwargs) | |
248 | return | |
238 | 249 | |
239 | 250 | if not destination_dataset.is_installed(): |
240 | 251 | yield get_status_dict( |
19 | 19 | from datalad.interface.base import Interface |
20 | 20 | from datalad.interface.annotate_paths import AnnotatePaths |
21 | 21 | from datalad.interface.utils import eval_results |
22 | from datalad.interface.utils import build_doc | |
22 | from datalad.interface.base import build_doc | |
23 | 23 | from datalad.interface.common_opts import git_opts |
24 | 24 | from datalad.interface.common_opts import annex_opts |
25 | 25 | from datalad.interface.common_opts import annex_init_opts |
111 | 111 | doc="""enforce creation of a dataset in a non-empty directory""", |
112 | 112 | action='store_true'), |
113 | 113 | description=location_description, |
114 | # TODO could move into cfg_annex plugin | |
114 | 115 | no_annex=Parameter( |
115 | 116 | args=("--no-annex",), |
116 | 117 | doc="""if set, a plain Git repository will be created without any |
117 | 118 | annex""", |
118 | 119 | action='store_true'), |
120 | text_no_annex=Parameter( | |
121 | args=("--text-no-annex",), | |
122 | doc="""if set, all text files in the future would be added to Git, | |
123 | not annex. Achieved by adding an entry to `.gitattributes` file. See | |
124 | http://git-annex.branchable.com/tips/largefiles/ and `no_annex` | |
125 | DataLad plugin to establish even more detailed control over which | |
126 | files are placed under annex control.""", | |
127 | action='store_true'), | |
119 | 128 | save=nosave_opt, |
129 | # TODO could move into cfg_annex plugin | |
120 | 130 | annex_version=Parameter( |
121 | 131 | args=("--annex-version",), |
122 | 132 | doc="""select a particular annex repository version. The |
124 | 134 | version. This should be left untouched, unless you know what |
125 | 135 | you are doing""", |
126 | 136 | constraints=EnsureDType(int) | EnsureNone()), |
137 | # TODO could move into cfg_annex plugin | |
127 | 138 | annex_backend=Parameter( |
128 | 139 | args=("--annex-backend",), |
129 | 140 | constraints=EnsureStr() | EnsureNone(), |
132 | 143 | For a list of supported backends see the git-annex |
133 | 144 | documentation. The default is optimized for maximum compatibility |
134 | 145 | of datasets across platforms (especially those with limited |
135 | path lengths)""", | |
136 | nargs=1), | |
146 | path lengths)"""), | |
147 | # TODO could move into cfg_metadata plugin | |
137 | 148 | native_metadata_type=Parameter( |
138 | 149 | args=('--native-metadata-type',), |
139 | 150 | metavar='LABEL', |
142 | 153 | doc="""Metadata type label. Must match the name of the respective |
143 | 154 | parser implementation in Datalad (e.g. "bids").[CMD: This option |
144 | 155 | can be given multiple times CMD]"""), |
156 | # TODO could move into cfg_access/permissions plugin | |
145 | 157 | shared_access=shared_access_opt, |
146 | 158 | git_opts=git_opts, |
147 | 159 | annex_opts=annex_opts, |
164 | 176 | shared_access=None, |
165 | 177 | git_opts=None, |
166 | 178 | annex_opts=None, |
167 | annex_init_opts=None): | |
179 | annex_init_opts=None, | |
180 | text_no_annex=None | |
181 | ): | |
168 | 182 | |
169 | 183 | # two major cases |
170 | 184 | # 1. we got a `dataset` -> we either want to create it (path is None), |
206 | 220 | unavailable_path_msg=None, |
207 | 221 | # if we have a dataset given that actually exists, we want to |
208 | 222 | # fail if the requested path is not in it |
209 | nondataset_path_status='error' if dataset and dataset.is_installed() else '', | |
223 | nondataset_path_status='error' \ | |
224 | if isinstance(dataset, Dataset) and dataset.is_installed() else '', | |
210 | 225 | on_failure='ignore') |
211 | 226 | path = None |
212 | 227 | for r in annotated_paths: |
251 | 266 | |
252 | 267 | # important to use the given Dataset object to avoid spurious ID |
253 | 268 | # changes with not-yet-materialized Datasets |
254 | tbds = dataset if dataset is not None and dataset.path == path['path'] \ | |
269 | tbds = dataset if isinstance(dataset, Dataset) and dataset.path == path['path'] \ | |
255 | 270 | else Dataset(path['path']) |
256 | 271 | |
257 | 272 | # don't create in non-empty directory without `force`: |
274 | 289 | else: |
275 | 290 | # always come with annex when created from scratch |
276 | 291 | lgr.info("Creating a new annex repo at %s", tbds.path) |
277 | AnnexRepo( | |
292 | tbrepo = AnnexRepo( | |
278 | 293 | tbds.path, |
279 | 294 | url=None, |
280 | 295 | create=True, |
283 | 298 | description=description, |
284 | 299 | git_opts=git_opts, |
285 | 300 | annex_opts=annex_opts, |
286 | annex_init_opts=annex_init_opts) | |
301 | annex_init_opts=annex_init_opts | |
302 | ) | |
303 | ||
304 | if text_no_annex: | |
305 | git_attributes_file = opj(tbds.path, '.gitattributes') | |
306 | with open(git_attributes_file, 'a') as f: | |
307 | f.write('* annex.largefiles=(not(mimetype=text/*))\n') | |
308 | tbrepo.add([git_attributes_file], git=True) | |
309 | tbrepo.commit( | |
310 | "Instructed annex to add text files to git", | |
311 | _datalad_msg=True, | |
312 | files=[git_attributes_file] | |
313 | ) | |
287 | 314 | |
288 | 315 | if native_metadata_type is not None: |
289 | 316 | if not isinstance(native_metadata_type, list): |
306 | 333 | with open(opj(tbds.path, '.datalad', '.gitattributes'), 'a') as gitattr: |
307 | 334 | # TODO this will need adjusting, when annex'ed aggregate meta data |
308 | 335 | # comes around |
336 | gitattr.write('# Text files (according to file --mime-type) are added directly to git.\n') | |
337 | gitattr.write('# See http://git-annex.branchable.com/tips/largefiles/ for more info.\n') | |
309 | 338 | gitattr.write('** annex.largefiles=nothing\n') |
310 | 339 | |
311 | 340 | # save everything, we need to do this now and cannot merge with the |
317 | 346 | # the next only makes sense if we saved the created dataset, |
318 | 347 | # otherwise we have no committed state to be registered |
319 | 348 | # in the parent |
320 | if save and dataset is not None and dataset.path != tbds.path: | |
349 | if save and isinstance(dataset, Dataset) and dataset.path != tbds.path: | |
321 | 350 | # we created a dataset in another dataset |
322 | 351 | # -> make submodule |
323 | 352 | for r in dataset.add( |
29 | 29 | datasetmethod, require_dataset |
30 | 30 | from datalad.interface.annotate_paths import AnnotatePaths |
31 | 31 | from datalad.interface.base import Interface |
32 | from datalad.interface.utils import build_doc | |
32 | from datalad.interface.base import build_doc | |
33 | from datalad.interface.utils import eval_results | |
33 | 34 | from datalad.interface.common_opts import recursion_limit, recursion_flag |
34 | 35 | from datalad.interface.common_opts import as_common_datasrc |
35 | 36 | from datalad.interface.common_opts import publish_by_default |
38 | 39 | from datalad.interface.common_opts import annex_wanted_opt |
39 | 40 | from datalad.interface.common_opts import annex_group_opt |
40 | 41 | from datalad.interface.common_opts import annex_groupwanted_opt |
41 | from datalad.interface.utils import eval_results | |
42 | from datalad.interface.utils import build_doc | |
43 | 42 | from datalad.support.annexrepo import AnnexRepo |
44 | 43 | from datalad.support.constraints import EnsureStr, EnsureNone, EnsureBool |
45 | 44 | from datalad.support.constraints import EnsureChoice |
171 | 170 | ssh("rm -rf {}".format(sh_quote(remoteds_path))) |
172 | 171 | # if we succeeded in removing it |
173 | 172 | path_exists = False |
173 | # Since it is gone now, git-annex also should forget about it | |
174 | remotes = ds.repo.get_remotes() | |
175 | if name in remotes: | |
176 | # so we had this remote already, we should announce it dead | |
177 | # XXX what if there was some kind of mismatch and this name | |
178 | # isn't matching the actual remote UUID? should have we | |
179 | # checked more carefully? | |
180 | lgr.info( | |
181 | "Announcing existing remote %s dead to annex and removing", | |
182 | name | |
183 | ) | |
184 | if isinstance(ds.repo, AnnexRepo): | |
185 | ds.repo.set_remote_dead(name) | |
186 | ds.repo.remove_remote(name) | |
174 | 187 | elif existing == 'reconfigure': |
175 | 188 | lgr.info(_msg + " Will only reconfigure") |
176 | 189 | only_reconfigure = True |
716 | 729 | # DataLad |
717 | 730 | # |
718 | 731 | # (Re)generate meta-data for DataLad Web UI and possibly init new submodules |
719 | dsdir="{path}" | |
732 | dsdir="$(dirname $0)/../.." | |
720 | 733 | logfile="$dsdir/{WEB_META_LOG}/{log_filename}" |
721 | 734 | |
735 | if [ ! -e "$dsdir/.git" ]; then | |
736 | echo Assumption of being under .git has failed >&2 | |
737 | exit 1 | |
738 | fi | |
739 | ||
722 | 740 | mkdir -p "$dsdir/{WEB_META_LOG}" # assure logs directory exists |
723 | 741 | |
724 | 742 | ( which datalad > /dev/null \ |
725 | && ( cd ..; GIT_DIR="$PWD/.git" datalad ls -a --json file "$dsdir"; ) \ | |
743 | && ( cd "$dsdir"; GIT_DIR="$PWD/.git" datalad ls -a --json file .; ) \ | |
726 | 744 | || echo "E: no datalad found - skipping generation of indexes for web frontend"; \ |
727 | 745 | ) &> "$logfile" |
728 | 746 | |
729 | 747 | # Some submodules might have been added and thus we better init them |
730 | ( cd ..; git submodule update --init >> "$logfile" 2>&1 || : ; ) | |
748 | ( cd "$dsdir"; git submodule update --init || : ; ) >> "$logfile" 2>&1 | |
731 | 749 | '''.format(WEB_META_LOG=WEB_META_LOG, **locals()) |
732 | 750 | |
733 | 751 | with make_tempfile(content=hook_content) as tempf: |
28 | 28 | from datalad.support.constraints import EnsureChoice |
29 | 29 | from datalad.support.exceptions import MissingExternalDependency |
30 | 30 | from ..interface.base import Interface |
31 | from datalad.interface.utils import build_doc | |
31 | from datalad.interface.base import build_doc | |
32 | 32 | from datalad.distribution.dataset import EnsureDataset, datasetmethod, \ |
33 | 33 | require_dataset, Dataset |
34 | 34 | from datalad.distribution.siblings import Siblings |
25 | 25 | from datalad.support.gitrepo import GitRepo |
26 | 26 | from datalad.support.annexrepo import AnnexRepo |
27 | 27 | from datalad.interface.base import Interface |
28 | from datalad.interface.utils import build_doc | |
28 | from datalad.interface.base import build_doc | |
29 | 29 | |
30 | 30 | lgr = logging.getLogger('datalad.distribution.tests') |
31 | 31 |
36 | 36 | from datalad.interface.results import results_from_annex_noinfo |
37 | 37 | from datalad.interface.utils import handle_dirty_dataset |
38 | 38 | from datalad.interface.utils import eval_results |
39 | from datalad.interface.utils import build_doc | |
39 | from datalad.interface.base import build_doc | |
40 | 40 | |
41 | 41 | lgr = logging.getLogger('datalad.distribution.drop') |
42 | 42 | |
128 | 128 | before file content is dropped. As these checks could lead to slow |
129 | 129 | operation (network latencies, etc), they can be disabled. |
130 | 130 | |
131 | ||
132 | Examples | |
133 | -------- | |
134 | ||
135 | Drop all file content in a dataset:: | |
136 | ||
137 | ~/some/dataset$ datalad drop | |
138 | ||
139 | Drop all file content in a dataset and all its subdatasets:: | |
140 | ||
141 | ~/some/dataset$ datalad drop --recursive | |
131 | Examples: | |
132 | ||
133 | Drop all file content in a dataset:: | |
134 | ||
135 | ~/some/dataset$ datalad drop | |
136 | ||
137 | Drop all file content in a dataset and all its subdatasets:: | |
138 | ||
139 | ~/some/dataset$ datalad drop --recursive | |
142 | 140 | |
143 | 141 | """ |
144 | 142 | _action = 'drop' |
20 | 20 | from datalad.interface.annotate_paths import AnnotatePaths |
21 | 21 | from datalad.interface.annotate_paths import annotated2content_by_ds |
22 | 22 | from datalad.interface.utils import eval_results |
23 | from datalad.interface.utils import build_doc | |
23 | from datalad.interface.base import build_doc | |
24 | 24 | from datalad.interface.results import get_status_dict |
25 | 25 | from datalad.interface.results import results_from_paths |
26 | 26 | from datalad.interface.results import annexjson2result |
29 | 29 | from datalad.interface.results import YieldDatasets |
30 | 30 | from datalad.interface.results import is_result_matching_pathsource_argument |
31 | 31 | from datalad.interface.utils import eval_results |
32 | from datalad.interface.utils import build_doc | |
32 | from datalad.interface.base import build_doc | |
33 | 33 | from datalad.support.constraints import EnsureNone |
34 | 34 | from datalad.support.constraints import EnsureStr |
35 | 35 | from datalad.support.exceptions import InsufficientArgumentsError |
16 | 16 | from os.path import sep as dirsep |
17 | 17 | |
18 | 18 | from datalad.interface.base import Interface |
19 | from datalad.interface.utils import build_doc | |
19 | from datalad.interface.base import build_doc | |
20 | 20 | from datalad.interface.utils import filter_unmodified |
21 | 21 | from datalad.interface.common_opts import annex_copy_opts, recursion_flag, \ |
22 | recursion_limit, git_opts, annex_opts | |
22 | recursion_limit, git_opts, annex_opts, jobs_opt | |
23 | 23 | from datalad.interface.common_opts import missing_sibling_opt |
24 | 24 | from datalad.support.param import Parameter |
25 | 25 | from datalad.support.constraints import EnsureStr |
29 | 29 | from datalad.support.exceptions import CommandError |
30 | 30 | |
31 | 31 | from datalad.utils import assure_list |
32 | from datalad.dochelpers import exc_str | |
32 | 33 | |
33 | 34 | from .dataset import EnsureDataset |
34 | 35 | from .dataset import Dataset |
59 | 60 | return error |
60 | 61 | |
61 | 62 | |
62 | def _publish_dataset(ds, remote, refspec, paths, annex_copy_options, force=False): | |
63 | def _publish_dataset(ds, remote, refspec, paths, annex_copy_options, force=False, jobs=None): | |
63 | 64 | # TODO: this setup is now quite ugly. The only way `refspec` can come |
64 | 65 | # in, is when there is a tracking branch, and we get its state via |
65 | 66 | # `refspec` |
66 | 67 | |
68 | is_annex_repo = isinstance(ds.repo, AnnexRepo) | |
69 | ||
67 | 70 | def _publish_data(): |
68 | remote_wanted = ds.repo.get_preferred_content('wanted', remote) | |
69 | if (paths or annex_copy_options or remote_wanted) and \ | |
70 | isinstance(ds.repo, AnnexRepo) and not \ | |
71 | ds.config.getbool( | |
72 | 'remote.{}'.format(remote), | |
73 | 'annex-ignore', | |
74 | False): | |
71 | if ds.repo.is_remote_annex_ignored(remote): | |
72 | return [], [] # Cannot publish any data | |
73 | try: | |
74 | remote_wanted = ds.repo.get_preferred_content('wanted', remote) | |
75 | except CommandError as exc: | |
76 | if "cannot determine uuid" in str(exc): | |
77 | if not ds.repo.is_remote_annex_ignored(remote): | |
78 | lgr.warning( | |
79 | "Annex failed to determine UUID, skipping publishing data for now: %s", | |
80 | exc_str(exc) | |
81 | ) | |
82 | return [], [] | |
83 | raise | |
84 | ||
85 | if (paths or annex_copy_options or remote_wanted) and is_annex_repo: | |
75 | 86 | lgr.info("Publishing {0} data to {1}".format(ds, remote)) |
76 | 87 | # overwrite URL with pushurl if any, reason: |
77 | 88 | # https://git-annex.branchable.com/bugs/annex_ignores_pushurl_and_uses_only_url_upon___34__copy_--to__34__/ |
98 | 109 | pblshd = ds.repo.copy_to( |
99 | 110 | files=paths, |
100 | 111 | remote=remote, |
101 | options=annex_copy_options_ | |
112 | options=annex_copy_options_, | |
113 | jobs=jobs | |
102 | 114 | ) |
103 | 115 | # if ds.submodules: |
104 | 116 | # # NOTE: we might need to init them on the remote, but needs to |
148 | 160 | # there was no tracking branch, check the push target |
149 | 161 | remote_branch_name = ds.repo.get_active_branch() |
150 | 162 | |
151 | if remote_branch_name in ds.repo.repo.remotes[remote].refs: | |
152 | lgr.debug("Testing for changes with respect to '%s' of remote '%s'", | |
153 | remote_branch_name, remote) | |
154 | current_commit = ds.repo.repo.commit() | |
155 | remote_ref = ds.repo.repo.remotes[remote].refs[remote_branch_name] | |
156 | if paths: | |
157 | # if there were custom paths, we will look at the diff | |
158 | lgr.debug("Since paths provided, looking at diff") | |
159 | diff = current_commit.diff( | |
160 | remote_ref, | |
161 | paths=paths | |
162 | ) | |
163 | else: | |
164 | # if commits differ at all | |
165 | lgr.debug("Since no paths provided, comparing commits") | |
166 | diff = current_commit != remote_ref.commit | |
167 | else: | |
168 | lgr.debug("Remote '%s' has no branch matching %r. Will publish", | |
169 | remote, remote_branch_name) | |
170 | # we don't have any remote state, need to push for sure | |
171 | diff = True | |
163 | diff = _get_remote_diff(ds, paths, None, remote, remote_branch_name) | |
164 | ||
165 | # We might have got new information in git-annex branch although no other | |
166 | # changes | |
167 | if not diff and is_annex_repo: | |
168 | try: | |
169 | git_annex_commit = next(ds.repo.get_branch_commits('git-annex')) | |
170 | except StopIteration: | |
171 | git_annex_commit = None | |
172 | diff = _get_remote_diff(ds, [], git_annex_commit, remote, 'git-annex') | |
173 | if diff: | |
174 | lgr.info("Will publish updated git-annex") | |
172 | 175 | |
173 | 176 | # # remote might be set to be ignored by annex, or we might not even know yet its uuid |
174 | 177 | # annex_ignore = ds.config.getbool('remote.{}.annex-ignore'.format(remote), None) |
177 | 180 | # if annex_uuid is None: |
178 | 181 | # # most probably not yet 'known' and might require some annex |
179 | 182 | knew_remote_uuid = None |
180 | if isinstance(ds.repo, AnnexRepo): | |
183 | if is_annex_repo and not ds.repo.is_remote_annex_ignored(remote): | |
181 | 184 | try: |
182 | 185 | ds.repo.get_preferred_content('wanted', remote) # could be just checking config.remote.uuid |
183 | 186 | knew_remote_uuid = True |
184 | 187 | except CommandError: |
185 | 188 | knew_remote_uuid = False |
189 | ||
186 | 190 | if knew_remote_uuid: |
187 | 191 | # we can try publishing right away |
188 | 192 | published += _publish_data() |
206 | 210 | None, |
207 | 211 | paths, |
208 | 212 | annex_copy_options, |
209 | force=force) | |
213 | force=force, | |
214 | jobs=jobs | |
215 | ) | |
210 | 216 | published.extend(pblsh) |
211 | 217 | skipped.extend(skp) |
218 | ||
219 | if is_annex_repo and \ | |
220 | ds.repo.is_special_annex_remote(remote): | |
221 | # There is nothing else to "publish" | |
222 | lgr.debug( | |
223 | "{0} is a special annex remote, no git push is needed".format(remote) | |
224 | ) | |
225 | return published, skipped | |
212 | 226 | |
213 | 227 | lgr.info("Publishing {0} to {1}".format(ds, remote)) |
214 | 228 | |
216 | 230 | # we need to annex merge first. Otherwise a git push might be |
217 | 231 | # rejected if involving all matching branches for example. |
218 | 232 | # Once at it, also push the annex branch right here. |
219 | if isinstance(ds.repo, AnnexRepo): | |
233 | if is_annex_repo: | |
220 | 234 | lgr.debug("Obtain remote annex info from '%s'", remote) |
221 | 235 | ds.repo.fetch(remote=remote) |
222 | 236 | ds.repo.merge_annex(remote) |
234 | 248 | current_branch = ds.repo.get_active_branch() |
235 | 249 | if current_branch: # possibly make this conditional on a switch |
236 | 250 | # TODO: this should become it own helper |
237 | if isinstance(ds.repo, AnnexRepo): | |
251 | if is_annex_repo: | |
238 | 252 | # annex could manage this branch |
239 | 253 | if current_branch.startswith('annex/direct') \ |
240 | 254 | and ds.config.getbool('annex', 'direct', default=False): |
251 | 265 | # and thus probably broken -- test me! |
252 | 266 | current_branch = match_adjusted.group(1) |
253 | 267 | things2push.append(current_branch) |
254 | if isinstance(ds.repo, AnnexRepo): | |
268 | if is_annex_repo: | |
255 | 269 | things2push.append('git-annex') |
256 | 270 | # check that all our magic found valid branches |
257 | 271 | things2push = [t for t in things2push if t in ds.repo.get_branches()] |
273 | 287 | |
274 | 288 | published.append(ds) |
275 | 289 | |
276 | if knew_remote_uuid is False: | |
290 | late_published_data = None | |
291 | if knew_remote_uuid is False and is_annex_repo: | |
277 | 292 | # publish only after we tried to sync/push and if it was annex repo |
278 | published += _publish_data() | |
293 | late_published_data = _publish_data() | |
294 | published += late_published_data | |
295 | ||
296 | # if we published something (data, subdatasets) even though there were no | |
297 | # diff (thus no push), or there was an additional data published later | |
298 | if ((not diff and published) or late_published_data) \ | |
299 | and is_annex_repo: | |
300 | # we need to do the same annex merge dance and push updated git-annex | |
301 | # and this way also trigger post-update hook which might update | |
302 | # web UI meta-data | |
303 | # https://github.com/datalad/datalad/issues/1658 | |
304 | lgr.info( | |
305 | "Obtaining remote annex info from '%s' and pushing updated", | |
306 | remote | |
307 | ) | |
308 | ds.repo.fetch(remote=remote) | |
309 | ds.repo.merge_annex(remote) | |
310 | # this will trigger post-update hook if present | |
311 | _log_push_info(ds.repo.push(remote=remote, refspec=['git-annex'])) | |
312 | ||
279 | 313 | return published, skipped |
314 | ||
315 | ||
316 | def _get_remote_diff(ds, paths, current_commit, remote, remote_branch_name): | |
317 | """Helper to check if remote has different state of the branch""" | |
318 | if remote_branch_name in ds.repo.repo.remotes[remote].refs: | |
319 | lgr.debug("Testing for changes with respect to '%s' of remote '%s'", | |
320 | remote_branch_name, remote) | |
321 | if current_commit is None: | |
322 | current_commit = ds.repo.repo.commit() | |
323 | remote_ref = ds.repo.repo.remotes[remote].refs[remote_branch_name] | |
324 | if paths: | |
325 | # if there were custom paths, we will look at the diff | |
326 | lgr.debug("Since paths provided, looking at diff") | |
327 | diff = current_commit.diff( | |
328 | remote_ref, | |
329 | paths=paths | |
330 | ) | |
331 | else: | |
332 | # if commits differ at all | |
333 | lgr.debug("Since no paths provided, comparing commits") | |
334 | diff = current_commit != remote_ref.commit | |
335 | else: | |
336 | lgr.debug("Remote '%s' has no branch matching %r. Will publish", | |
337 | remote, remote_branch_name) | |
338 | # we don't have any remote state, need to push for sure | |
339 | diff = True | |
340 | ||
341 | return diff | |
280 | 342 | |
281 | 343 | |
282 | 344 | @build_doc |
365 | 427 | git_opts=git_opts, |
366 | 428 | annex_opts=annex_opts, |
367 | 429 | annex_copy_opts=annex_copy_opts, |
430 | jobs=jobs_opt, | |
368 | 431 | ) |
369 | 432 | |
370 | 433 | @staticmethod |
381 | 444 | git_opts=None, |
382 | 445 | annex_opts=None, |
383 | 446 | annex_copy_opts=None, |
447 | jobs=None | |
384 | 448 | ): |
385 | 449 | |
386 | 450 | # if ever we get a mode, for "with-data" we would need this |
522 | 586 | refspec=remote_info.get('refspec', None), |
523 | 587 | paths=content_by_ds[ds_path], |
524 | 588 | annex_copy_options=annex_copy_opts, |
525 | force=force | |
589 | force=force, | |
590 | jobs=jobs | |
526 | 591 | ) |
527 | 592 | published.extend(pblsh) |
528 | 593 | skipped.extend(skp) |
31 | 31 | from datalad.interface.common_opts import recursion_flag |
32 | 32 | from datalad.interface.utils import path_is_under |
33 | 33 | from datalad.interface.utils import eval_results |
34 | from datalad.interface.utils import build_doc | |
34 | from datalad.interface.base import build_doc | |
35 | 35 | from datalad.interface.results import get_status_dict |
36 | 36 | from datalad.interface.save import Save |
37 | 37 | from datalad.distribution.drop import _drop_files |
63 | 63 | subdirectories within a dataset as always done automatically. An optional |
64 | 64 | recursion limit is applied relative to each given input path. |
65 | 65 | |
66 | Examples | |
67 | -------- | |
68 | ||
69 | Permanently remove a subdataset from a dataset and wipe out the subdataset | |
70 | association too:: | |
71 | ||
72 | ~/some/dataset$ datalad remove somesubdataset1 | |
66 | Examples: | |
67 | ||
68 | Permanently remove a subdataset from a dataset and wipe out the subdataset | |
69 | association too:: | |
70 | ||
71 | ~/some/dataset$ datalad remove somesubdataset1 | |
73 | 72 | """ |
74 | 73 | _action = 'remove' |
75 | 74 |
17 | 17 | |
18 | 18 | from datalad.interface.base import Interface |
19 | 19 | from datalad.interface.utils import eval_results |
20 | from datalad.interface.utils import build_doc | |
20 | from datalad.interface.base import build_doc | |
21 | 21 | from datalad.interface.results import get_status_dict |
22 | 22 | from datalad.support.annexrepo import AnnexRepo |
23 | 23 | from datalad.support.constraints import EnsureStr |
288 | 288 | **dict( |
289 | 289 | res, |
290 | 290 | path=path, |
291 | with_annex='+' if 'annex-uuid' in res else '-', | |
291 | with_annex='+' if 'annex-uuid' in res \ | |
292 | else ('-' if res.get('annex-ignore', None) else '?'), | |
292 | 293 | spec=spec))) |
293 | 294 | |
294 | 295 | |
614 | 615 | if annex_description is not None: |
615 | 616 | info['annex-description'] = annex_description |
616 | 617 | if get_annex_info and isinstance(ds.repo, AnnexRepo): |
617 | for prop in ('wanted', 'required', 'group'): | |
618 | var = ds.repo.get_preferred_content( | |
619 | prop, '.' if remote == 'here' else remote) | |
620 | if var: | |
621 | info['annex-{}'.format(prop)] = var | |
622 | groupwanted = ds.repo.get_groupwanted(remote) | |
623 | if groupwanted: | |
624 | info['annex-groupwanted'] = groupwanted | |
618 | if not ds.repo.is_remote_annex_ignored(remote): | |
619 | try: | |
620 | for prop in ('wanted', 'required', 'group'): | |
621 | var = ds.repo.get_preferred_content( | |
622 | prop, '.' if remote == 'here' else remote) | |
623 | if var: | |
624 | info['annex-{}'.format(prop)] = var | |
625 | groupwanted = ds.repo.get_groupwanted(remote) | |
626 | if groupwanted: | |
627 | info['annex-groupwanted'] = groupwanted | |
628 | except CommandError as exc: | |
629 | if 'cannot determine uuid' in str(exc): | |
630 | # not an annex (or no connection), would be marked as | |
631 | # annex-ignore | |
632 | msg = "Failed to determine if %s carries annex." % remote | |
633 | ds.repo.config.reload() | |
634 | if ds.repo.is_remote_annex_ignored(remote): | |
635 | msg += " Remote was marked by annex as annex-ignore. " \ | |
636 | "Edit .git/config to reset if you think that was done by mistake due to absent connection etc" | |
637 | lgr.warning(msg) | |
638 | info['annex-ignore'] = True | |
639 | else: | |
640 | raise | |
641 | else: | |
642 | info['annex-ignore'] = True | |
625 | 643 | |
626 | 644 | info['status'] = 'ok' |
627 | 645 | yield info |
22 | 22 | |
23 | 23 | from datalad.interface.base import Interface |
24 | 24 | from datalad.interface.utils import eval_results |
25 | from datalad.interface.utils import build_doc | |
25 | from datalad.interface.base import build_doc | |
26 | 26 | from datalad.interface.results import get_status_dict |
27 | 27 | from datalad.support.constraints import EnsureBool |
28 | 28 | from datalad.support.constraints import EnsureStr |
89 | 89 | if arg[0] == test_list_4: |
90 | 90 | result = ds.add('dir', to_git=arg[1], save=False) |
91 | 91 | else: |
92 | result = ds.add(arg[0], to_git=arg[1], save=False, result_xfm='relpaths', | |
92 | result = ds.add(arg[0], to_git=arg[1], save=False, | |
93 | result_xfm='relpaths', | |
93 | 94 | return_type='item-or-list') |
94 | 95 | # order depends on how annex processes it, so let's sort |
95 | 96 | eq_(sorted(result), sorted(arg[0])) |
102 | 103 | # ignore the initial config file in index: |
103 | 104 | indexed.remove(opj('.datalad', 'config')) |
104 | 105 | indexed.remove(opj('.datalad', '.gitattributes')) |
106 | indexed.remove('.gitattributes') | |
105 | 107 | if isinstance(arg[0], list): |
106 | 108 | for x in arg[0]: |
107 | 109 | unstaged.remove(x) |
306 | 308 | @with_tree(tree={ |
307 | 309 | 'file.txt': 'some text', |
308 | 310 | 'empty': '', |
311 | 'file2.txt': 'some text to go to annex', | |
309 | 312 | '.gitattributes': '* annex.largefiles=(not(mimetype=text/*))'} |
310 | 313 | ) |
311 | 314 | def test_add_mimetypes(path): |
318 | 321 | ds.repo.commit('added attributes to git explicitly') |
319 | 322 | # now test that those files will go into git/annex correspondingly |
320 | 323 | __not_tested__ = ds.add(['file.txt', 'empty']) |
321 | ok_clean_git(path) | |
324 | ok_clean_git(path, untracked=['file2.txt']) | |
322 | 325 | # Empty one considered to be application/octet-stream i.e. non-text |
323 | 326 | ok_file_under_git(path, 'empty', annexed=True) |
324 | 327 | ok_file_under_git(path, 'file.txt', annexed=False) |
328 | ||
329 | # But we should be able to force adding file to annex when desired | |
330 | ds.add('file2.txt', to_git=False) | |
331 | ok_file_under_git(path, 'file2.txt', annexed=True)⏎ |
14 | 14 | from os.path import exists |
15 | 15 | from os.path import basename |
16 | 16 | from os.path import dirname |
17 | from os import mkdir | |
18 | from os import chmod | |
19 | from os import geteuid | |
17 | 20 | |
18 | 21 | from mock import patch |
19 | 22 | |
20 | 23 | from datalad.api import create |
21 | 24 | from datalad.api import clone |
22 | 25 | from datalad.utils import chpwd |
26 | from datalad.utils import _path_ | |
27 | from datalad.utils import rmtree | |
23 | 28 | from datalad.support.exceptions import IncompleteResultsError |
24 | 29 | from datalad.support.gitrepo import GitRepo |
25 | 30 | from datalad.support.annexrepo import AnnexRepo |
44 | 49 | from datalad.tests.utils import serve_path_via_http |
45 | 50 | from datalad.tests.utils import use_cassette |
46 | 51 | from datalad.tests.utils import skip_if_no_network |
52 | from datalad.tests.utils import skip_if_on_windows | |
53 | from datalad.tests.utils import skip_if | |
47 | 54 | |
48 | 55 | from ..dataset import Dataset |
49 | 56 | |
308 | 315 | assert clonedsub.path.startswith(path) |
309 | 316 | # no subdataset relation |
310 | 317 | eq_(cloned.subdatasets(), []) |
318 | ||
319 | ||
320 | @skip_if_on_windows | |
321 | @skip_if(not geteuid(), "Will fail under super-user") | |
322 | @with_tempfile(mkdir=True) | |
323 | def test_clone_report_permission_issue(tdir): | |
324 | pdir = _path_(tdir, 'protected') | |
325 | mkdir(pdir) | |
326 | # make it read-only | |
327 | chmod(pdir, 0o555) | |
328 | with chpwd(pdir): | |
329 | res = clone('///', result_xfm=None, return_type='list', on_failure='ignore') | |
330 | assert_status('error', res) | |
331 | assert_result_count( | |
332 | res, 1, status='error', | |
333 | message="could not create work tree dir '%s/datasets.datalad.org': Permission denied" % pdir) |
10 | 10 | |
11 | 11 | import os |
12 | 12 | from os.path import join as opj |
13 | from os.path import lexists | |
13 | 14 | |
14 | 15 | from ..dataset import Dataset |
15 | 16 | from datalad.api import create |
16 | 17 | from datalad.utils import chpwd |
18 | from datalad.utils import _path_ | |
17 | 19 | from datalad.cmd import Runner |
18 | 20 | |
19 | 21 | from datalad.tests.utils import with_tempfile |
22 | from datalad.tests.utils import create_tree | |
20 | 23 | from datalad.tests.utils import eq_ |
21 | 24 | from datalad.tests.utils import ok_ |
22 | 25 | from datalad.tests.utils import assert_not_in |
27 | 30 | from datalad.tests.utils import assert_in_results |
28 | 31 | from datalad.tests.utils import ok_clean_git |
29 | 32 | from datalad.tests.utils import with_tree |
33 | from datalad.tests.utils import ok_file_has_content | |
34 | from datalad.tests.utils import ok_file_under_git | |
30 | 35 | |
31 | 36 | |
32 | 37 | _dataset_hierarchy_template = { |
253 | 258 | # is committed -- ds2 is already known to git and it just pukes with a bit |
254 | 259 | # confusing 'ds2' already exists in the index |
255 | 260 | assert_in('ds2', ds1.subdatasets(result_xfm='relpaths')) |
261 | ||
262 | ||
263 | @with_tempfile(mkdir=True) | |
264 | def test_create_withplugin(path): | |
265 | # first without | |
266 | ds = create(path) | |
267 | assert(not lexists(opj(ds.path, 'README.rst'))) | |
268 | ds.remove() | |
269 | assert(not lexists(ds.path)) | |
270 | # now for reals... | |
271 | ds = create( | |
272 | # needs to identify the dataset, otherwise post-proc | |
273 | # plugin doesn't no what to run on | |
274 | dataset=path, | |
275 | run_after=[['add_readme', 'filename=with hole.txt']]) | |
276 | ok_clean_git(path) | |
277 | # README wil lend up in annex by default | |
278 | # TODO implement `nice_dataset` plugin to give sensible | |
279 | # default and avoid that | |
280 | assert(lexists(opj(ds.path, 'with hole.txt'))) | |
281 | ||
282 | ||
283 | @with_tempfile(mkdir=True) | |
284 | def test_create_text_no_annex(path): | |
285 | ds = create(path, text_no_annex=True) | |
286 | ok_clean_git(path) | |
287 | import re | |
288 | ok_file_has_content( | |
289 | _path_(path, '.gitattributes'), | |
290 | content='\* annex\.largefiles=\(not\(mimetype=text/\*\)\)', | |
291 | re_=True, | |
292 | match=False, | |
293 | flags=re.MULTILINE | |
294 | ) | |
295 | # and check that it is really committing text files to git and binaries | |
296 | # to annex | |
297 | create_tree(path, | |
298 | { | |
299 | 't': 'some text', | |
300 | 'b': '' # empty file is not considered to be a text file | |
301 | # should we adjust the rule to consider only non empty files? | |
302 | } | |
303 | ) | |
304 | ds.add(['t', 'b']) | |
305 | ok_file_under_git(path, 't', annexed=False) | |
306 | ok_file_under_git(path, 'b', annexed=True) |
16 | 16 | |
17 | 17 | from ..dataset import Dataset |
18 | 18 | from datalad.api import publish, install, create_sibling |
19 | from datalad.cmd import Runner | |
19 | 20 | from datalad.utils import chpwd |
20 | 21 | from datalad.tests.utils import create_tree |
21 | 22 | from datalad.support.gitrepo import GitRepo |
32 | 33 | from datalad.tests.utils import assert_raises |
33 | 34 | from datalad.tests.utils import skip_ssh |
34 | 35 | from datalad.tests.utils import assert_dict_equal |
36 | from datalad.tests.utils import assert_false | |
35 | 37 | from datalad.tests.utils import assert_set_equal |
36 | 38 | from datalad.tests.utils import assert_result_count |
37 | 39 | from datalad.tests.utils import assert_not_equal |
72 | 74 | assert_false(exists(opj(target_path, path))) |
73 | 75 | |
74 | 76 | hook_path = _path_(target_path, '.git/hooks/post-update') |
75 | ok_file_has_content(hook_path, | |
76 | '.*\ndsdir="%s"\n.*' % target_path, | |
77 | re_=True, | |
78 | flags=re.DOTALL) | |
77 | # No longer the case -- we are no longer using absolute path in the | |
78 | # script | |
79 | # ok_file_has_content(hook_path, | |
80 | # '.*\ndsdir="%s"\n.*' % target_path, | |
81 | # re_=True, | |
82 | # flags=re.DOTALL) | |
83 | # No absolute path (so dataset could be moved) in the hook | |
84 | with open(hook_path) as f: | |
85 | assert_not_in(target_path, f.read()) | |
79 | 86 | # correct ls_json command in hook content (path wrapped in "quotes) |
80 | 87 | ok_file_has_content(hook_path, |
81 | '.*datalad ls -a --json file "\$dsdir".*', | |
88 | '.*datalad ls -a --json file \..*', | |
82 | 89 | re_=True, |
83 | 90 | flags=re.DOTALL) |
84 | 91 | |
418 | 425 | |
419 | 426 | @skip_ssh |
420 | 427 | @with_tempfile(mkdir=True) |
428 | @with_tempfile | |
429 | def test_replace_and_relative_sshpath(src_path, dst_path): | |
430 | # We need to come up with the path relative to our current home directory | |
431 | # https://github.com/datalad/datalad/issues/1653 | |
432 | dst_relpath = os.path.relpath(dst_path, os.path.expanduser('~')) | |
433 | url = 'localhost:%s' % dst_relpath | |
434 | ds = Dataset(src_path).create() | |
435 | create_tree(ds.path, {'sub.dat': 'lots of data'}) | |
436 | ds.add('sub.dat') | |
437 | ||
438 | ds.create_sibling(url) | |
439 | published = ds.publish('.', to='localhost') | |
440 | assert_in('sub.dat', published[0]) | |
441 | # verify that hook runs and there is nothing in stderr | |
442 | # since it exits with 0 exit even if there was a problem | |
443 | out, err = Runner(cwd=opj(dst_path, '.git'))(_path_('hooks/post-update')) | |
444 | assert_false(out) | |
445 | assert_false(err) | |
446 | ||
447 | # Verify that we could replace and publish no problem | |
448 | # https://github.com/datalad/datalad/issues/1656 | |
449 | # Strangely it spits outs IncompleteResultsError exception atm... so just | |
450 | # checking that it fails somehow | |
451 | assert_raises(Exception, ds.create_sibling, url) | |
452 | ds.create_sibling(url, existing='replace') | |
453 | published2 = ds.publish('.', to='localhost') | |
454 | assert_in('sub.dat', published2[0]) | |
455 | ||
456 | # and one more test since in above test it would not puke ATM but just | |
457 | # not even try to copy since it assumes that file is already there | |
458 | create_tree(ds.path, {'sub2.dat': 'more data'}) | |
459 | ds.add('sub2.dat') | |
460 | published3 = ds.publish(to='localhost') # we publish just git | |
461 | assert_not_in('sub2.dat', published3[0]) | |
462 | # now publish "with" data, which should also trigger the hook! | |
463 | # https://github.com/datalad/datalad/issues/1658 | |
464 | from glob import glob | |
465 | from datalad.consts import WEB_META_LOG | |
466 | logs_prior = glob(_path_(dst_path, WEB_META_LOG, '*')) | |
467 | published4 = ds.publish('.', to='localhost') | |
468 | assert_in('sub2.dat', published4[0]) | |
469 | logs_post = glob(_path_(dst_path, WEB_META_LOG, '*')) | |
470 | eq_(len(logs_post), len(logs_prior) + 1) | |
471 | ||
472 | ||
473 | @skip_ssh | |
474 | @with_tempfile(mkdir=True) | |
421 | 475 | @with_tempfile(suffix="target") |
422 | 476 | def _test_target_ssh_inherit(standardgroup, src_path, target_path): |
423 | 477 | ds = Dataset(src_path).create() |
27 | 27 | from datalad.tests.utils import assert_raises |
28 | 28 | from datalad.tests.utils import assert_false |
29 | 29 | from datalad.tests.utils import assert_result_count |
30 | from datalad.tests.utils import neq_ | |
30 | 31 | from datalad.tests.utils import ok_clean_git |
31 | 32 | from datalad.tests.utils import swallow_logs |
32 | 33 | from datalad.tests.utils import create_tree |
61 | 62 | name='target1') |
62 | 63 | # source.publish(to='target1') |
63 | 64 | with chpwd(p1): |
64 | # since we have only a single commit -- there is no HEAD^ | |
65 | assert_raises(ValueError, publish, to='target1', since='HEAD^') | |
65 | # since we have only two commits (set backend, init dataset) | |
66 | # -- there is no HEAD^^ | |
67 | assert_raises(ValueError, publish, to='target1', since='HEAD^^') | |
66 | 68 | # but now let's add one more commit, we should be able to pusblish |
67 | 69 | source.repo.commit("msg", options=['--allow-empty']) |
68 | 70 | publish(to='target1', since='HEAD^') # must not fail now |
131 | 133 | |
132 | 134 | |
133 | 135 | @with_testrepos('submodule_annex', flavors=['local']) |
134 | @with_tempfile(mkdir=True) | |
135 | @with_tempfile(mkdir=True) | |
136 | @with_tempfile(mkdir=True) | |
137 | @with_tempfile(mkdir=True) | |
138 | def test_publish_recursive(origin, src_path, dst_path, sub1_pub, sub2_pub): | |
139 | ||
136 | @with_tempfile | |
137 | @with_tempfile(mkdir=True) | |
138 | @with_tempfile(mkdir=True) | |
139 | @with_tempfile(mkdir=True) | |
140 | @with_tempfile(mkdir=True) | |
141 | def test_publish_recursive(pristine_origin, origin_path, src_path, dst_path, sub1_pub, sub2_pub): | |
142 | ||
143 | # we will be publishing back to origin, so to not alter testrepo | |
144 | # we will first clone it | |
145 | origin = install(origin_path, source=pristine_origin, recursive=True) | |
140 | 146 | # prepare src |
141 | source = install(src_path, source=origin, recursive=True) | |
147 | source = install(src_path, source=origin_path, recursive=True) | |
142 | 148 | |
143 | 149 | # create plain git at target: |
144 | 150 | target = GitRepo(dst_path, create=True) |
193 | 199 | eq_(list(sub2_target.get_branch_commits("git-annex")), |
194 | 200 | list(sub2.get_branch_commits("git-annex"))) |
195 | 201 | |
202 | # we are tracking origin but origin has different git-annex, since we | |
203 | # cloned from it, so it is not aware of our git-annex | |
204 | neq_(list(origin.repo.get_branch_commits("git-annex")), | |
205 | list(source.repo.get_branch_commits("git-annex"))) | |
206 | # So if we first publish to it recursively, we would update | |
207 | # all sub-datasets since git-annex branch would need to be pushed | |
208 | res_ = publish(dataset=source, recursive=True) | |
209 | eq_(set(r.path for r in res_[0]), | |
210 | set(opj(*([source.path] + x)) for x in ([], ['subm 1'], ['subm 2']))) | |
211 | # and now should carry the same state for git-annex | |
212 | eq_(list(origin.repo.get_branch_commits("git-annex")), | |
213 | list(source.repo.get_branch_commits("git-annex"))) | |
214 | ||
196 | 215 | # test for publishing with --since. By default since no changes, nothing pushed |
197 | 216 | res_ = publish(dataset=source, recursive=True) |
198 | 217 | eq_(set(r.path for r in res_[0]), set()) |
335 | 354 | # before |
336 | 355 | eq_({sub1.path, sub2.path}, |
337 | 356 | set(result_paths)) |
357 | ||
358 | # if we publish again -- nothing to be published | |
359 | eq_(source.publish(to="target"), ([], [])) | |
360 | # if we drop a file and publish again -- dataset should be published | |
361 | # since git-annex branch was updated | |
362 | source.drop('test-annex.dat') | |
363 | eq_(source.publish(to="target"), ([source], [])) | |
364 | eq_(source.publish(to="target"), ([], [])) # and empty again if we try again | |
338 | 365 | |
339 | 366 | |
340 | 367 | @skip_ssh |
29 | 29 | from datalad.interface.common_opts import recursion_flag |
30 | 30 | from datalad.interface.utils import path_is_under |
31 | 31 | from datalad.interface.utils import eval_results |
32 | from datalad.interface.utils import build_doc | |
32 | from datalad.interface.base import build_doc | |
33 | 33 | from datalad.interface.utils import handle_dirty_dataset |
34 | 34 | from datalad.interface.results import get_status_dict |
35 | 35 | from datalad.utils import rmtree |
93 | 93 | subdirectories within a dataset as always done automatically. An optional |
94 | 94 | recursion limit is applied relative to each given input path. |
95 | 95 | |
96 | Examples | |
97 | -------- | |
96 | Examples: | |
98 | 97 | |
99 | Uninstall a subdataset (undo installation):: | |
98 | Uninstall a subdataset (undo installation):: | |
100 | 99 | |
101 | ~/some/dataset$ datalad uninstall somesubdataset1 | |
100 | ~/some/dataset$ datalad uninstall somesubdataset1 | |
102 | 101 | |
103 | 102 | """ |
104 | 103 | _action = 'uninstall' |
17 | 17 | |
18 | 18 | from datalad.interface.base import Interface |
19 | 19 | from datalad.interface.utils import eval_results |
20 | from datalad.interface.utils import build_doc | |
20 | from datalad.interface.base import build_doc | |
21 | 21 | from datalad.interface.results import get_status_dict |
22 | 22 | from datalad.support.constraints import EnsureStr |
23 | 23 | from datalad.support.constraints import EnsureNone |
187 | 187 | try: |
188 | 188 | key = self._bucket.get_key(url_filepath, version_id=params.get('versionId', None)) |
189 | 189 | except S3ResponseError as e: |
190 | raise DownloadError("S3 refused to provide the key for %s from url %s: %s" | |
190 | raise TargetFileAbsent("S3 refused to provide the key for %s from url %s: %s" | |
191 | 191 | % (url_filepath, url, e)) |
192 | 192 | if key is None: |
193 | 193 | raise TargetFileAbsent("No key returned for %s from url %s" % (url_filepath, url)) |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """ | |
9 | ||
10 | """ | |
11 | ||
12 | __docformat__ = 'restructuredtext' | |
13 | ||
14 | import logging | |
15 | from glob import glob | |
16 | from os.path import join as opj, basename, dirname | |
17 | from importlib import import_module | |
18 | ||
19 | from datalad.support.param import Parameter | |
20 | from datalad.support.constraints import EnsureNone | |
21 | from datalad.distribution.dataset import EnsureDataset | |
22 | from datalad.distribution.dataset import datasetmethod | |
23 | from datalad.distribution.dataset import require_dataset | |
24 | from datalad.dochelpers import exc_str | |
25 | ||
26 | from datalad.interface.base import Interface | |
27 | from datalad.interface.utils import build_doc | |
28 | ||
29 | lgr = logging.getLogger('datalad.export') | |
30 | ||
31 | ||
32 | def _get_exporter_names(): | |
33 | basepath = dirname(__file__) | |
34 | return [basename(e)[:-3] | |
35 | for e in glob(opj(basepath, '*.py')) | |
36 | if not e.endswith('__init__.py')] | |
37 | ||
38 | ||
39 | @build_doc | |
40 | class Export(Interface): | |
41 | """Export a dataset to another representation | |
42 | """ | |
43 | # XXX prevent common args from being added to the docstring | |
44 | _no_eval_results = True | |
45 | ||
46 | _params_ = dict( | |
47 | dataset=Parameter( | |
48 | args=("-d", "--dataset"), | |
49 | doc="""specify the dataset to export. If | |
50 | no dataset is given, an attempt is made to identify the dataset | |
51 | based on the current working directory.""", | |
52 | constraints=EnsureDataset() | EnsureNone()), | |
53 | astype=Parameter( | |
54 | args=("astype",), | |
55 | choices=_get_exporter_names(), | |
56 | doc="""label of the type or format the dataset shall be exported | |
57 | to."""), | |
58 | output=Parameter( | |
59 | args=('-o', '--output'), | |
60 | doc="""output destination specification to be passes to the exporter. | |
61 | The particular semantics of the option value depend on the actual | |
62 | exporter. Typically, this will be a file name or a path to a | |
63 | directory."""), | |
64 | getcmdhelp=Parameter( | |
65 | args=('--help-type',), | |
66 | dest='getcmdhelp', | |
67 | action='store_true', | |
68 | doc="""show help for a specific export type/format"""), | |
69 | ) | |
70 | ||
71 | @staticmethod | |
72 | @datasetmethod(name='export') | |
73 | def __call__(astype, dataset, getcmdhelp=False, output=None, **kwargs): | |
74 | # get a handle on the relevant plugin module | |
75 | import datalad.export as export_mod | |
76 | try: | |
77 | exmod = import_module('.%s' % (astype,), package=export_mod.__package__) | |
78 | except ImportError as e: | |
79 | raise ValueError("cannot load exporter '{}': {}".format( | |
80 | astype, exc_str(e))) | |
81 | if getcmdhelp: | |
82 | # no result, but return the module to make the renderer do the rest | |
83 | return (exmod, None) | |
84 | ||
85 | ds = require_dataset(dataset, check_installed=True, purpose='exporting') | |
86 | # call the plugin, either with the argv array from the cmdline call | |
87 | # or directly with the kwargs | |
88 | if 'datalad_unparsed_args' in kwargs: | |
89 | result = exmod._datalad_export_plugin_call( | |
90 | ds, argv=kwargs['datalad_unparsed_args'], output=output) | |
91 | else: | |
92 | result = exmod._datalad_export_plugin_call( | |
93 | ds, output=output, **kwargs) | |
94 | return (exmod, result) | |
95 | ||
96 | @staticmethod | |
97 | def result_renderer_cmdline(res, args): | |
98 | exmod, result = res | |
99 | if args.getcmdhelp: | |
100 | # the function that prints the help was returned as result | |
101 | if not hasattr(exmod, '_datalad_get_cmdline_help'): | |
102 | lgr.error("export plugin '{}' does not provide help".format(exmod)) | |
103 | return | |
104 | replacement = [] | |
105 | help = exmod._datalad_get_cmdline_help() | |
106 | if isinstance(help, tuple): | |
107 | help, replacement = help | |
108 | if replacement: | |
109 | for in_s, out_s in replacement: | |
110 | help = help.replace(in_s, out_s + ' ' * max(0, len(in_s) - len(out_s))) | |
111 | print(help) | |
112 | return | |
113 | # TODO call exporter function (if any) |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """ | |
9 | ||
10 | """ | |
11 | ||
12 | __docformat__ = 'restructuredtext' | |
13 | ||
14 | import logging | |
15 | import tarfile | |
16 | import os | |
17 | ||
18 | from mock import patch | |
19 | from os.path import join as opj, dirname, normpath, isabs | |
20 | from datalad.support.annexrepo import AnnexRepo | |
21 | from datalad.utils import file_basename | |
22 | ||
23 | lgr = logging.getLogger('datalad.export.tarball') | |
24 | ||
25 | ||
26 | # PLUGIN API | |
27 | def _datalad_export_plugin_call(dataset, output, argv=None): | |
28 | if argv: | |
29 | lgr.warn("tarball exporter ignores any additional options '{}'".format( | |
30 | argv)) | |
31 | ||
32 | repo = dataset.repo | |
33 | committed_date = repo.get_committed_date() | |
34 | ||
35 | # could be used later on to filter files by some criterion | |
36 | def _filter_tarinfo(ti): | |
37 | # Reset the date to match the one of the last commit, not from the | |
38 | # filesystem since git doesn't track those at all | |
39 | # TODO: use the date of the last commit when any particular | |
40 | # file was changed -- would be the most kosher yoh thinks to the | |
41 | # degree of our abilities | |
42 | ti.mtime = committed_date | |
43 | return ti | |
44 | ||
45 | if output is None: | |
46 | output = "datalad_{}.tar.gz".format(dataset.id) | |
47 | else: | |
48 | if not output.endswith('.tar.gz'): | |
49 | output += '.tar.gz' | |
50 | ||
51 | root = dataset.path | |
52 | # use dir inside matching the output filename | |
53 | # TODO: could be an option to the export plugin allowing empty value | |
54 | # for no leading dir | |
55 | leading_dir = file_basename(output) | |
56 | ||
57 | # workaround for inability to pass down the time stamp | |
58 | with patch('time.time', return_value=committed_date), \ | |
59 | tarfile.open(output, "w:gz") as tar: | |
60 | repo_files = sorted(repo.get_indexed_files()) | |
61 | if isinstance(repo, AnnexRepo): | |
62 | annexed = repo.is_under_annex( | |
63 | repo_files, allow_quick=True, batch=True) | |
64 | else: | |
65 | annexed = [False] * len(repo_files) | |
66 | for i, rpath in enumerate(repo_files): | |
67 | fpath = opj(root, rpath) | |
68 | if annexed[i]: | |
69 | # resolve to possible link target | |
70 | link_target = os.readlink(fpath) | |
71 | if not isabs(link_target): | |
72 | link_target = normpath(opj(dirname(fpath), link_target)) | |
73 | fpath = link_target | |
74 | # name in the tarball | |
75 | aname = normpath(opj(leading_dir, rpath)) | |
76 | tar.add( | |
77 | fpath, | |
78 | arcname=aname, | |
79 | recursive=False, | |
80 | filter=_filter_tarinfo) | |
81 | ||
82 | # I think it might better return "final" filename where stuff was saved | |
83 | return output | |
84 | ||
85 | ||
86 | # PLUGIN API | |
87 | def _datalad_get_cmdline_help(): | |
88 | return 'Just call it, and it will produce a tarball.' |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """Interfaces tests | |
9 | ||
10 | """ | |
11 | ||
12 | __docformat__ = 'restructuredtext' |
0 | # emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # -*- coding: utf-8 -*- | |
2 | # ex: set sts=4 ts=4 sw=4 noet: | |
3 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
4 | # | |
5 | # See COPYING file distributed along with the datalad package for the | |
6 | # copyright and license terms. | |
7 | # | |
8 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
9 | """Test tarball exporter""" | |
10 | ||
11 | import os | |
12 | import time | |
13 | from os.path import join as opj | |
14 | from os.path import isabs | |
15 | import tarfile | |
16 | ||
17 | from datalad.api import Dataset | |
18 | from datalad.api import export | |
19 | from datalad.utils import chpwd | |
20 | from datalad.utils import md5sum | |
21 | ||
22 | from datalad.tests.utils import with_tree | |
23 | from datalad.tests.utils import ok_startswith | |
24 | from datalad.tests.utils import assert_true, assert_not_equal, assert_raises, \ | |
25 | assert_false, assert_equal | |
26 | ||
27 | ||
28 | _dataset_template = { | |
29 | 'ds': { | |
30 | 'file_up': 'some_content', | |
31 | 'dir': { | |
32 | 'file1_down': 'one', | |
33 | 'file2_down': 'two'}}} | |
34 | ||
35 | ||
36 | @with_tree(_dataset_template) | |
37 | def test_failure(path): | |
38 | ds = Dataset(opj(path, 'ds')).create(force=True) | |
39 | # unknown exporter | |
40 | assert_raises(ValueError, ds.export, 'nah') | |
41 | # non-existing dataset | |
42 | assert_raises(ValueError, export, 'tarball', Dataset('nowhere')) | |
43 | ||
44 | ||
45 | @with_tree(_dataset_template) | |
46 | def test_tarball(path): | |
47 | ds = Dataset(opj(path, 'ds')).create(force=True) | |
48 | ds.add('.') | |
49 | committed_date = ds.repo.get_committed_date() | |
50 | with chpwd(path): | |
51 | _mod, tarball1 = ds.export('tarball') | |
52 | assert(not isabs(tarball1)) | |
53 | tarball1 = opj(path, tarball1) | |
54 | default_outname = opj(path, 'datalad_{}.tar.gz'.format(ds.id)) | |
55 | assert_equal(tarball1, default_outname) | |
56 | assert_true(os.path.exists(default_outname)) | |
57 | custom_outname = opj(path, 'myexport.tar.gz') | |
58 | # feed in without extension | |
59 | ds.export('tarball', output=custom_outname[:-7]) | |
60 | assert_true(os.path.exists(custom_outname)) | |
61 | custom1_md5 = md5sum(custom_outname) | |
62 | # encodes the original tarball filename -> different checksum, despit | |
63 | # same content | |
64 | assert_not_equal(md5sum(default_outname), custom1_md5) | |
65 | # should really sleep so if they stop using time.time - we know | |
66 | time.sleep(1.1) | |
67 | ds.export('tarball', output=custom_outname) | |
68 | # should not encode mtime, so should be identical | |
69 | assert_equal(md5sum(custom_outname), custom1_md5) | |
70 | ||
71 | def check_contents(outname, prefix): | |
72 | with tarfile.open(outname) as tf: | |
73 | nfiles = 0 | |
74 | for ti in tf: | |
75 | # any annex links resolved | |
76 | assert_false(ti.issym()) | |
77 | ok_startswith(ti.name, prefix + '/') | |
78 | assert_equal(ti.mtime, committed_date) | |
79 | if '.datalad' not in ti.name: | |
80 | # ignore any files in .datalad for this test to not be | |
81 | # susceptible to changes in how much we generate a meta info | |
82 | nfiles += 1 | |
83 | # we have exactly three files, and expect no content for any directory | |
84 | assert_equal(nfiles, 3) | |
85 | check_contents(default_outname, 'datalad_%s' % ds.id) | |
86 | check_contents(custom_outname, 'myexport') |
40 | 40 | 'create-sibling-github'), |
41 | 41 | ('datalad.interface.unlock', 'Unlock', 'unlock'), |
42 | 42 | ('datalad.interface.save', 'Save', 'save'), |
43 | ('datalad.export', 'Export', 'export'), | |
43 | ('datalad.plugin', 'Plugin', 'plugin'), | |
44 | 44 | ]) |
45 | 45 | |
46 | 46 | _group_metadata = ( |
27 | 27 | from os.path import normpath |
28 | 28 | |
29 | 29 | from .base import Interface |
30 | from datalad.interface.utils import build_doc | |
30 | from datalad.interface.base import build_doc | |
31 | 31 | from .common_opts import allow_dirty |
32 | 32 | from ..consts import ARCHIVES_SPECIAL_REMOTE |
33 | 33 | from ..support.param import Parameter |
24 | 24 | |
25 | 25 | from datalad.interface.base import Interface |
26 | 26 | from datalad.interface.utils import eval_results |
27 | from datalad.interface.utils import build_doc | |
27 | from datalad.interface.base import build_doc | |
28 | 28 | from datalad.interface.results import get_status_dict |
29 | 29 | from datalad.support.constraints import EnsureStr |
30 | 30 | from datalad.support.constraints import EnsureBool |
23 | 23 | from ..ui import ui |
24 | 24 | from ..dochelpers import exc_str |
25 | 25 | |
26 | from datalad.interface.common_opts import eval_params | |
27 | from datalad.interface.common_opts import eval_defaults | |
26 | 28 | from datalad.support.exceptions import InsufficientArgumentsError |
27 | 29 | from datalad.utils import with_pathsep as _with_sep |
28 | 30 | from datalad.support.constraints import EnsureKeyChoice |
29 | 31 | from datalad.distribution.dataset import Dataset |
30 | 32 | from datalad.distribution.dataset import resolve_path |
33 | ||
34 | ||
35 | default_logchannels = { | |
36 | '': 'debug', | |
37 | 'ok': 'debug', | |
38 | 'notneeded': 'debug', | |
39 | 'impossible': 'warning', | |
40 | 'error': 'error', | |
41 | } | |
31 | 42 | |
32 | 43 | |
33 | 44 | def get_api_name(intfspec): |
241 | 252 | # assign the amended docs |
242 | 253 | func.__doc__ = doc |
243 | 254 | return func |
255 | ||
256 | ||
257 | def build_doc(cls, **kwargs): | |
258 | """Decorator to build docstrings for datalad commands | |
259 | ||
260 | It's intended to decorate the class, the __call__-method of which is the | |
261 | actual command. It expects that __call__-method to be decorated by | |
262 | eval_results. | |
263 | ||
264 | Parameters | |
265 | ---------- | |
266 | cls: Interface | |
267 | class defining a datalad command | |
268 | """ | |
269 | ||
270 | # Note, that this is a class decorator, which is executed only once when the | |
271 | # class is imported. It builds the docstring for the class' __call__ method | |
272 | # and returns the original class. | |
273 | # | |
274 | # This is because a decorator for the actual function would not be able to | |
275 | # behave like this. To build the docstring we need to access the attribute | |
276 | # _params of the class. From within a function decorator we cannot do this | |
277 | # during import time, since the class is being built in this very moment and | |
278 | # is not yet available in the module. And if we do it from within the part | |
279 | # of a function decorator, that is executed when the function is called, we | |
280 | # would need to actually call the command once in order to build this | |
281 | # docstring. | |
282 | ||
283 | lgr.debug("Building doc for {}".format(cls)) | |
284 | ||
285 | cls_doc = cls.__doc__ | |
286 | if hasattr(cls, '_docs_'): | |
287 | # expand docs | |
288 | cls_doc = cls_doc.format(**cls._docs_) | |
289 | ||
290 | call_doc = None | |
291 | # suffix for update_docstring_with_parameters: | |
292 | if cls.__call__.__doc__: | |
293 | call_doc = cls.__call__.__doc__ | |
294 | ||
295 | # build standard doc and insert eval_doc | |
296 | spec = getattr(cls, '_params_', dict()) | |
297 | # get docs for eval_results parameters: | |
298 | spec.update(eval_params) | |
299 | ||
300 | update_docstring_with_parameters( | |
301 | cls.__call__, spec, | |
302 | prefix=alter_interface_docs_for_api(cls_doc), | |
303 | suffix=alter_interface_docs_for_api(call_doc), | |
304 | add_args=eval_defaults if not hasattr(cls, '_no_eval_results') else None | |
305 | ) | |
306 | ||
307 | # return original | |
308 | return cls | |
244 | 309 | |
245 | 310 | |
246 | 311 | class Interface(object): |
323 | 388 | 'AddArchiveContent', 'AggregateMetaData', |
324 | 389 | 'CrawlInit', 'Crawl', 'CreateSiblingGithub', |
325 | 390 | 'CreateTestDataset', 'DownloadURL', 'Export', 'Ls', 'Move', |
326 | 'Publish', 'SSHRun', 'Search'): | |
391 | 'Publish', 'SSHRun', 'Search', 'Test'): | |
327 | 392 | # set all common args explicitly to override class defaults |
328 | 393 | # that are tailored towards the the Python API |
329 | 394 | kwargs['return_type'] = 'generator' |
482 | 547 | return content_by_ds, unavailable_paths |
483 | 548 | |
484 | 549 | |
485 | def merge_allargs2kwargs(call, args, kwargs): | |
486 | """Generate a kwargs dict from a call signature and *args, **kwargs""" | |
550 | def get_allargs_as_kwargs(call, args, kwargs): | |
551 | """Generate a kwargs dict from a call signature and *args, **kwargs | |
552 | ||
553 | Basically resolving the argnames for all positional arguments, and | |
554 | resolvin the defaults for all kwargs that are not given in a kwargs | |
555 | dict | |
556 | """ | |
487 | 557 | from inspect import getargspec |
488 | 558 | argspec = getargspec(call) |
489 | 559 | defaults = argspec.defaults |
498 | 568 | kwargs_[k] = v |
499 | 569 | # update with provided kwarg args |
500 | 570 | kwargs_.update(kwargs) |
501 | assert (nargs == len(kwargs_)) | |
571 | # XXX we cannot assert the following, because our own highlevel | |
572 | # API commands support more kwargs than what is discoverable | |
573 | # from their signature... | |
574 | #assert (nargs == len(kwargs_)) | |
502 | 575 | return kwargs_ |
26 | 26 | from datalad.interface.common_opts import recursion_limit |
27 | 27 | from datalad.interface.results import get_status_dict |
28 | 28 | from datalad.interface.utils import eval_results |
29 | from datalad.interface.utils import build_doc | |
29 | from datalad.interface.base import build_doc | |
30 | 30 | |
31 | 31 | from logging import getLogger |
32 | 32 | lgr = getLogger('datalad.api.clean') |
12 | 12 | __docformat__ = 'restructuredtext' |
13 | 13 | |
14 | 14 | from appdirs import AppDirs |
15 | from os.path import join as opj | |
15 | 16 | from datalad.support.constraints import EnsureBool |
16 | 17 | from datalad.support.constraints import EnsureInt |
17 | 18 | |
67 | 68 | 'destination': 'global', |
68 | 69 | 'default': dirs.user_cache_dir, |
69 | 70 | }, |
71 | 'datalad.locations.system-plugins': { | |
72 | 'ui': ('question', { | |
73 | 'title': 'System plugin directory', | |
74 | 'text': 'Where should datalad search for system plugins?'}), | |
75 | 'destination': 'global', | |
76 | 'default': opj(dirs.site_config_dir, 'plugins'), | |
77 | }, | |
78 | 'datalad.locations.user-plugins': { | |
79 | 'ui': ('question', { | |
80 | 'title': 'User plugin directory', | |
81 | 'text': 'Where should datalad search for user plugins?'}), | |
82 | 'destination': 'global', | |
83 | 'default': opj(dirs.user_config_dir, 'plugins'), | |
84 | }, | |
70 | 85 | 'datalad.exc.str.tblimit': { |
71 | 86 | 'ui': ('question', { |
72 | 87 | 'title': 'This flag is used by the datalad extract_tb function which extracts and formats stack-traces. It caps the number of lines to DATALAD_EXC_STR_TBLIMIT of pre-processed entries from traceback.'}), |
11 | 11 | |
12 | 12 | __docformat__ = 'restructuredtext' |
13 | 13 | |
14 | from datalad.interface.results import known_result_xfms | |
14 | 15 | from datalad.support.param import Parameter |
15 | 16 | from datalad.support.constraints import EnsureInt, EnsureNone, EnsureStr |
16 | 17 | from datalad.support.constraints import EnsureChoice |
18 | from datalad.support.constraints import EnsureCallable | |
17 | 19 | |
18 | 20 | |
19 | 21 | location_description = Parameter( |
214 | 216 | By default it would fail the run ('fail' setting). With 'inherit' a |
215 | 217 | 'create-sibling' with '--inherit-settings' will be used to create sibling |
216 | 218 | on the remote. With 'skip' - it simply will be skipped.""") |
219 | ||
220 | with_plugin_opt = Parameter( | |
221 | args=('--with-plugin',), | |
222 | nargs='*', | |
223 | action='append', | |
224 | metavar='PLUGINSPEC', | |
225 | doc="""DataLad plugin to run in addition. PLUGINSPEC is a list | |
226 | comprised of a plugin name plus optional `key=value` pairs with arguments | |
227 | for the plugin call (see `plugin` command documentation for details). | |
228 | [PY: PLUGINSPECs must be wrapped in list where each item configures | |
229 | one plugin call. Plugins are called in the order defined by this list. | |
230 | PY][CMD: This option can be given more than once to run multiple plugins | |
231 | in the order in which they are given. CMD]""") | |
232 | ||
233 | # define parameters to be used by eval_results to tune behavior | |
234 | # Note: This is done outside eval_results in order to be available when building | |
235 | # docstrings for the decorated functions | |
236 | # TODO: May be we want to move them to be part of the classes _params. Depends | |
237 | # on when and how eval_results actually has to determine the class. | |
238 | # Alternatively build a callable class with these to even have a fake signature | |
239 | # that matches the parameters, so they can be evaluated and defined the exact | |
240 | # same way. | |
241 | ||
242 | eval_params = dict( | |
243 | return_type=Parameter( | |
244 | doc="""return value behavior switch. If 'item-or-list' a single | |
245 | value is returned instead of a one-item return value list, or a | |
246 | list in case of multiple return values. `None` is return in case | |
247 | of an empty list.""", | |
248 | constraints=EnsureChoice('generator', 'list', 'item-or-list')), | |
249 | result_filter=Parameter( | |
250 | doc="""if given, each to-be-returned | |
251 | status dictionary is passed to this callable, and is only | |
252 | returned if the callable's return value does not | |
253 | evaluate to False or a ValueError exception is raised. If the given | |
254 | callable supports `**kwargs` it will additionally be passed the | |
255 | keyword arguments of the original API call.""", | |
256 | constraints=EnsureCallable() | EnsureNone()), | |
257 | result_xfm=Parameter( | |
258 | doc="""if given, each to-be-returned result | |
259 | status dictionary is passed to this callable, and its return value | |
260 | becomes the result instead. This is different from | |
261 | `result_filter`, as it can perform arbitrary transformation of the | |
262 | result value. This is mostly useful for top-level command invocations | |
263 | that need to provide the results in a particular format. Instead of | |
264 | a callable, a label for a pre-crafted result transformation can be | |
265 | given.""", | |
266 | constraints=EnsureChoice(*list(known_result_xfms.keys())) | EnsureCallable() | EnsureNone()), | |
267 | result_renderer=Parameter( | |
268 | doc="""format of return value rendering on stdout""", | |
269 | constraints=EnsureChoice('default', 'json', 'json_pp', 'tailored') | EnsureNone()), | |
270 | on_failure=Parameter( | |
271 | doc="""behavior to perform on failure: 'ignore' any failure is reported, | |
272 | but does not cause an exception; 'continue' if any failure occurs an | |
273 | exception will be raised at the end, but processing other actions will | |
274 | continue for as long as possible; 'stop': processing will stop on first | |
275 | failure and an exception is raised. A failure is any result with status | |
276 | 'impossible' or 'error'. Raised exception is an IncompleteResultsError | |
277 | that carries the result dictionaries of the failures in its `failed` | |
278 | attribute.""", | |
279 | constraints=EnsureChoice('ignore', 'continue', 'stop')), | |
280 | run_before=Parameter( | |
281 | doc="""DataLad plugin to run before the command. PLUGINSPEC is a list | |
282 | comprised of a plugin name plus optional 2-tuples of key-value pairs | |
283 | with arguments for the plugin call (see `plugin` command documentation | |
284 | for details). | |
285 | PLUGINSPECs must be wrapped in list where each item configures | |
286 | one plugin call. Plugins are called in the order defined by this list. | |
287 | For running plugins that require a `dataset` argument it is important | |
288 | to provide the respective dataset as the `dataset` argument of the main | |
289 | command, if it is not in the list of plugin arguments."""), | |
290 | run_after=Parameter( | |
291 | doc="""Like `run_before`, but plugins are executed after the main command | |
292 | has finished."""), | |
293 | ) | |
294 | ||
295 | eval_defaults = dict( | |
296 | return_type='list', | |
297 | result_filter=None, | |
298 | result_renderer=None, | |
299 | result_xfm=None, | |
300 | on_failure='continue', | |
301 | run_before=None, | |
302 | run_after=None, | |
303 | ) |
12 | 12 | |
13 | 13 | from os.path import exists |
14 | 14 | from .base import Interface |
15 | from datalad.interface.utils import build_doc | |
15 | from datalad.interface.base import build_doc | |
16 | 16 | |
17 | 17 | from datalad.support.param import Parameter |
18 | 18 | from datalad.support.constraints import EnsureStr, EnsureNone |
11 | 11 | |
12 | 12 | from os.path import curdir |
13 | 13 | from .base import Interface |
14 | from datalad.interface.utils import build_doc | |
14 | from datalad.interface.base import build_doc | |
15 | 15 | from collections import OrderedDict |
16 | 16 | from datalad.distribution.dataset import Dataset |
17 | 17 |
21 | 21 | from datalad.interface.annotate_paths import annotated2content_by_ds |
22 | 22 | from datalad.interface.base import Interface |
23 | 23 | from datalad.interface.utils import eval_results |
24 | from datalad.interface.utils import build_doc | |
24 | from datalad.interface.base import build_doc | |
25 | 25 | from datalad.support.constraints import EnsureNone |
26 | 26 | from datalad.support.constraints import EnsureStr |
27 | 27 | from datalad.support.constraints import EnsureChoice |
18 | 18 | from os.path import isdir, curdir |
19 | 19 | |
20 | 20 | from .base import Interface |
21 | from datalad.interface.utils import build_doc | |
21 | from datalad.interface.base import build_doc | |
22 | 22 | from ..ui import ui |
23 | 23 | from ..utils import assure_list_from_str |
24 | 24 | from ..dochelpers import exc_str |
25 | 25 | from ..cmdline.helpers import get_repo_instance |
26 | 26 | from ..utils import auto_repr |
27 | 27 | from .base import Interface |
28 | from datalad.interface.utils import build_doc | |
28 | from datalad.interface.base import build_doc | |
29 | 29 | from ..ui import ui |
30 | 30 | from ..utils import swallow_logs |
31 | 31 | from ..consts import METADATA_DIR |
53 | 53 | |
54 | 54 | ATM only s3:// URLs and datasets are supported |
55 | 55 | |
56 | Examples | |
57 | -------- | |
56 | Examples: | |
58 | 57 | |
59 | 58 | $ datalad ls s3://openfmri/tarballs/ds202 # to list S3 bucket |
60 | 59 | $ datalad ls # to list current dataset |
33 | 33 | from datalad.interface.common_opts import save_message_opt |
34 | 34 | from datalad.interface.results import get_status_dict |
35 | 35 | from datalad.interface.utils import eval_results |
36 | from datalad.interface.utils import build_doc | |
36 | from datalad.interface.base import build_doc | |
37 | 37 | from datalad.interface.utils import get_tree_roots |
38 | 38 | from datalad.interface.utils import discover_dataset_trace_to_targets |
39 | 39 |
13 | 13 | |
14 | 14 | import datalad |
15 | 15 | from .base import Interface |
16 | from datalad.interface.utils import build_doc | |
16 | from datalad.interface.base import build_doc | |
17 | 17 | |
18 | 18 | |
19 | 19 | @build_doc |
34 | 34 | |
35 | 35 | from ..base import Interface |
36 | 36 | from ..utils import eval_results |
37 | from ..utils import build_doc | |
37 | from datalad.interface.base import build_doc | |
38 | 38 | from ..utils import handle_dirty_dataset |
39 | 39 | from ..utils import get_paths_by_dataset |
40 | 40 | from ..utils import filter_unmodified |
25 | 25 | from datalad.interface.annotate_paths import annotated2content_by_ds |
26 | 26 | from datalad.interface.results import get_status_dict |
27 | 27 | from datalad.interface.utils import eval_results |
28 | from datalad.interface.utils import build_doc | |
28 | from datalad.interface.base import build_doc | |
29 | 29 | from datalad.interface.common_opts import recursion_flag |
30 | 30 | from datalad.interface.common_opts import recursion_limit |
31 | 31 |
15 | 15 | import logging |
16 | 16 | import wrapt |
17 | 17 | import sys |
18 | import re | |
19 | import shlex | |
18 | 20 | from os import curdir |
19 | 21 | from os import pardir |
20 | 22 | from os import listdir |
21 | from os import linesep | |
22 | 23 | from os.path import join as opj |
23 | 24 | from os.path import lexists |
24 | 25 | from os.path import isdir |
45 | 46 | from datalad import cfg as dlcfg |
46 | 47 | from datalad.dochelpers import exc_str |
47 | 48 | |
49 | ||
48 | 50 | from datalad.support.constraints import Constraint |
49 | from datalad.support.constraints import EnsureChoice | |
50 | from datalad.support.constraints import EnsureNone | |
51 | from datalad.support.constraints import EnsureCallable | |
52 | from datalad.support.param import Parameter | |
53 | 51 | |
54 | 52 | from datalad.ui import ui |
55 | ||
56 | from .base import Interface | |
57 | from .base import update_docstring_with_parameters | |
58 | from .base import alter_interface_docs_for_api | |
59 | from .base import merge_allargs2kwargs | |
53 | import datalad.support.ansi_colors as ac | |
54 | ||
55 | from datalad.interface.base import Interface | |
56 | from datalad.interface.base import default_logchannels | |
57 | from datalad.interface.base import get_allargs_as_kwargs | |
58 | from datalad.interface.common_opts import eval_params | |
59 | from datalad.interface.common_opts import eval_defaults | |
60 | 60 | from .results import known_result_xfms |
61 | 61 | |
62 | 62 | |
63 | 63 | lgr = logging.getLogger('datalad.interface.utils') |
64 | ||
65 | ||
66 | def cls2cmdlinename(cls): | |
67 | "Return the cmdline command name from an Interface class" | |
68 | r = re.compile(r'([a-z0-9])([A-Z])') | |
69 | return r.sub('\\1-\\2', cls.__name__).lower() | |
64 | 70 | |
65 | 71 | |
66 | 72 | def handle_dirty_dataset(ds, mode, msg=None): |
507 | 513 | return keep |
508 | 514 | |
509 | 515 | |
510 | # define parameters to be used by eval_results to tune behavior | |
511 | # Note: This is done outside eval_results in order to be available when building | |
512 | # docstrings for the decorated functions | |
513 | # TODO: May be we want to move them to be part of the classes _params. Depends | |
514 | # on when and how eval_results actually has to determine the class. | |
515 | # Alternatively build a callable class with these to even have a fake signature | |
516 | # that matches the parameters, so they can be evaluated and defined the exact | |
517 | # same way. | |
518 | ||
519 | eval_params = dict( | |
520 | return_type=Parameter( | |
521 | doc="""return value behavior switch. If 'item-or-list' a single | |
522 | value is returned instead of a one-item return value list, or a | |
523 | list in case of multiple return values. `None` is return in case | |
524 | of an empty list.""", | |
525 | constraints=EnsureChoice('generator', 'list', 'item-or-list')), | |
526 | result_filter=Parameter( | |
527 | doc="""if given, each to-be-returned | |
528 | status dictionary is passed to this callable, and is only | |
529 | returned if the callable's return value does not | |
530 | evaluate to False or a ValueError exception is raised. If the given | |
531 | callable supports `**kwargs` it will additionally be passed the | |
532 | keyword arguments of the original API call.""", | |
533 | constraints=EnsureCallable() | EnsureNone()), | |
534 | result_xfm=Parameter( | |
535 | doc="""if given, each to-be-returned result | |
536 | status dictionary is passed to this callable, and its return value | |
537 | becomes the result instead. This is different from | |
538 | `result_filter`, as it can perform arbitrary transformation of the | |
539 | result value. This is mostly useful for top-level command invocations | |
540 | that need to provide the results in a particular format. Instead of | |
541 | a callable, a label for a pre-crafted result transformation can be | |
542 | given.""", | |
543 | constraints=EnsureChoice(*list(known_result_xfms.keys())) | EnsureCallable() | EnsureNone()), | |
544 | result_renderer=Parameter( | |
545 | doc="""format of return value rendering on stdout""", | |
546 | constraints=EnsureChoice('default', 'json', 'json_pp', 'tailored') | EnsureNone()), | |
547 | on_failure=Parameter( | |
548 | doc="""behavior to perform on failure: 'ignore' any failure is reported, | |
549 | but does not cause an exception; 'continue' if any failure occurs an | |
550 | exception will be raised at the end, but processing other actions will | |
551 | continue for as long as possible; 'stop': processing will stop on first | |
552 | failure and an exception is raised. A failure is any result with status | |
553 | 'impossible' or 'error'. Raised exception is an IncompleteResultsError | |
554 | that carries the result dictionaries of the failures in its `failed` | |
555 | attribute.""", | |
556 | constraints=EnsureChoice('ignore', 'continue', 'stop')), | |
557 | ) | |
558 | eval_defaults = dict( | |
559 | return_type='list', | |
560 | result_filter=None, | |
561 | result_renderer=None, | |
562 | result_xfm=None, | |
563 | on_failure='continue', | |
564 | ) | |
565 | ||
566 | ||
567 | 516 | def eval_results(func): |
568 | 517 | """Decorator for return value evaluation of datalad commands. |
569 | 518 | |
605 | 554 | i.e. a datalad command definition |
606 | 555 | """ |
607 | 556 | |
608 | default_logchannels = { | |
609 | '': 'debug', | |
610 | 'ok': 'debug', | |
611 | 'notneeded': 'debug', | |
612 | 'impossible': 'warning', | |
613 | 'error': 'error', | |
614 | } | |
615 | ||
616 | 557 | @wrapt.decorator |
617 | 558 | def eval_func(wrapped, instance, args, kwargs): |
618 | ||
559 | # for result filters and pre/post plugins | |
560 | # we need to produce a dict with argname/argvalue pairs for all args | |
561 | # incl. defaults and args given as positionals | |
562 | allkwargs = get_allargs_as_kwargs(wrapped, args, kwargs) | |
619 | 563 | # determine class, the __call__ method of which we are decorating: |
620 | 564 | # Ben: Note, that this is a bit dirty in PY2 and imposes restrictions on |
621 | 565 | # when and how to use eval_results as well as on how to name a command's |
644 | 588 | _func_class = mod.__dict__[command_class_name] |
645 | 589 | lgr.debug("Determined class of decorated function: %s", _func_class) |
646 | 590 | |
591 | # retrieve common options from kwargs, and fall back on the command | |
592 | # class attributes, or general defaults if needed | |
647 | 593 | common_params = { |
648 | 594 | p_name: kwargs.pop( |
649 | 595 | p_name, |
650 | 596 | getattr(_func_class, p_name, eval_defaults[p_name])) |
651 | 597 | for p_name in eval_params} |
598 | # short cuts and configured setup for common options | |
599 | on_failure = common_params['on_failure'] | |
600 | return_type = common_params['return_type'] | |
601 | # resolve string labels for transformers too | |
602 | result_xfm = common_params['result_xfm'] | |
603 | if result_xfm in known_result_xfms: | |
604 | result_xfm = known_result_xfms[result_xfm] | |
652 | 605 | result_renderer = common_params['result_renderer'] |
653 | ||
606 | # TODO remove this conditional branch entirely, done outside | |
607 | if not result_renderer: | |
608 | result_renderer = dlcfg.get('datalad.api.result-renderer', None) | |
609 | # wrap the filter into a helper to be able to pass additional arguments | |
610 | # if the filter supports it, but at the same time keep the required interface | |
611 | # as minimal as possible. Also do this here, in order to avoid this test | |
612 | # to be performed for each return value | |
613 | result_filter = common_params['result_filter'] | |
614 | _result_filter = result_filter | |
615 | if result_filter: | |
616 | if isinstance(result_filter, Constraint): | |
617 | _result_filter = result_filter.__call__ | |
618 | if (PY2 and inspect.getargspec(_result_filter).keywords) or \ | |
619 | (not PY2 and inspect.getfullargspec(_result_filter).varkw): | |
620 | ||
621 | def _result_filter(res): | |
622 | return result_filter(res, **allkwargs) | |
623 | ||
624 | def _get_plugin_specs(param_key=None, cfg_key=None): | |
625 | spec = common_params.get(param_key, None) | |
626 | if spec is not None: | |
627 | # this is already a list of lists | |
628 | return spec | |
629 | ||
630 | spec = dlcfg.get(cfg_key, None) | |
631 | if spec is None: | |
632 | return | |
633 | elif not isinstance(spec, tuple): | |
634 | spec = [spec] | |
635 | return [shlex.split(s) for s in spec] | |
636 | ||
637 | # query cfg for defaults | |
638 | cmdline_name = cls2cmdlinename(_func_class) | |
639 | run_before = _get_plugin_specs( | |
640 | 'run_before', | |
641 | 'datalad.{}.run-before'.format(cmdline_name)) | |
642 | run_after = _get_plugin_specs( | |
643 | 'run_after', | |
644 | 'datalad.{}.run-after'.format(cmdline_name)) | |
645 | ||
646 | # this internal helper function actually drives the command | |
647 | # generator-style, it may generate an exception if desired, | |
648 | # on incomplete results | |
654 | 649 | def generator_func(*_args, **_kwargs): |
655 | # obtain results | |
656 | results = wrapped(*_args, **_kwargs) | |
650 | from datalad.plugin import Plugin | |
651 | ||
657 | 652 | # flag whether to raise an exception |
658 | # TODO actually compose a meaningful exception | |
659 | 653 | incomplete_results = [] |
660 | # inspect and render | |
661 | result_filter = common_params['result_filter'] | |
662 | # wrap the filter into a helper to be able to pass additional arguments | |
663 | # if the filter supports it, but at the same time keep the required interface | |
664 | # as minimal as possible. Also do this here, in order to avoid this test | |
665 | # to be performed for each return value | |
666 | _result_filter = result_filter | |
667 | if result_filter: | |
668 | if isinstance(result_filter, Constraint): | |
669 | _result_filter = result_filter.__call__ | |
670 | if (PY2 and inspect.getargspec(_result_filter).keywords) or \ | |
671 | (not PY2 and inspect.getfullargspec(_result_filter).varkw): | |
672 | # we need to produce a dict with argname/argvalue pairs for all args | |
673 | # incl. defaults and args given as positionals | |
674 | fullkwargs_ = merge_allargs2kwargs(wrapped, _args, _kwargs) | |
675 | ||
676 | def _result_filter(res): | |
677 | return result_filter(res, **fullkwargs_) | |
678 | result_renderer = common_params['result_renderer'] | |
679 | result_xfm = common_params['result_xfm'] | |
680 | if result_xfm in known_result_xfms: | |
681 | result_xfm = known_result_xfms[result_xfm] | |
682 | on_failure = common_params['on_failure'] | |
683 | if not result_renderer: | |
684 | result_renderer = dlcfg.get('datalad.api.result-renderer', None) | |
685 | 654 | # track what actions were performed how many times |
686 | 655 | action_summary = {} |
687 | for res in results: | |
688 | actsum = action_summary.get(res['action'], {}) | |
689 | if res['status']: | |
690 | actsum[res['status']] = actsum.get(res['status'], 0) + 1 | |
691 | action_summary[res['action']] = actsum | |
692 | ## log message, if a logger was given | |
693 | # remove logger instance from results, as it is no longer useful | |
694 | # after logging was done, it isn't serializable, and generally | |
695 | # pollutes the output | |
696 | res_lgr = res.pop('logger', None) | |
697 | if isinstance(res_lgr, logging.Logger): | |
698 | # didn't get a particular log function, go with default | |
699 | res_lgr = getattr(res_lgr, default_logchannels[res['status']]) | |
700 | if res_lgr and 'message' in res: | |
701 | msg = res['message'] | |
702 | msgargs = None | |
703 | if isinstance(msg, tuple): | |
704 | msgargs = msg[1:] | |
705 | msg = msg[0] | |
706 | if 'path' in res: | |
707 | msg = '{} [{}({})]'.format( | |
708 | msg, res['action'], res['path']) | |
709 | if msgargs: | |
710 | # support string expansion of logging to avoid runtime cost | |
711 | res_lgr(msg, *msgargs) | |
712 | else: | |
713 | res_lgr(msg) | |
714 | ## error handling | |
715 | # looks for error status, and report at the end via | |
716 | # an exception | |
717 | if on_failure in ('continue', 'stop') \ | |
718 | and res['status'] in ('impossible', 'error'): | |
719 | incomplete_results.append(res) | |
720 | if on_failure == 'stop': | |
721 | # first fail -> that's it | |
722 | # raise will happen after the loop | |
723 | break | |
724 | if _result_filter: | |
725 | try: | |
726 | if not _result_filter(res): | |
727 | raise ValueError('excluded by filter') | |
728 | except ValueError as e: | |
729 | lgr.debug('not reporting result (%s)', exc_str(e)) | |
730 | continue | |
731 | ## output rendering | |
732 | if result_renderer == 'default': | |
733 | # TODO have a helper that can expand a result message | |
734 | ui.message('{action}({status}): {path}{type}{msg}'.format( | |
735 | action=res['action'], | |
736 | status=res['status'], | |
737 | path=relpath(res['path'], | |
738 | res['refds']) if res.get('refds', None) else res['path'], | |
739 | type=' ({})'.format(res['type']) if 'type' in res else '', | |
740 | msg=' [{}]'.format( | |
741 | res['message'][0] % res['message'][1:] | |
742 | if isinstance(res['message'], tuple) else res['message']) | |
743 | if 'message' in res else '')) | |
744 | elif result_renderer in ('json', 'json_pp'): | |
745 | ui.message(json.dumps( | |
746 | {k: v for k, v in res.items() | |
747 | if k not in ('message', 'logger')}, | |
748 | sort_keys=True, | |
749 | indent=2 if result_renderer.endswith('_pp') else None)) | |
750 | elif result_renderer == 'tailored': | |
751 | if hasattr(_func_class, 'custom_result_renderer'): | |
752 | _func_class.custom_result_renderer(res, **_kwargs) | |
753 | elif hasattr(result_renderer, '__call__'): | |
754 | result_renderer(res, **_kwargs) | |
755 | if result_xfm: | |
756 | res = result_xfm(res) | |
757 | if res is None: | |
758 | continue | |
759 | yield res | |
760 | ||
656 | ||
657 | for pluginspec in run_before or []: | |
658 | lgr.debug('Running pre-proc plugin %s', pluginspec) | |
659 | for r in _process_results( | |
660 | Plugin.__call__( | |
661 | pluginspec, | |
662 | dataset=allkwargs.get('dataset', None), | |
663 | return_type='generator'), | |
664 | _func_class, action_summary, | |
665 | on_failure, incomplete_results, | |
666 | result_renderer, result_xfm, result_filter, | |
667 | **_kwargs): | |
668 | yield r | |
669 | ||
670 | # process main results | |
671 | for r in _process_results( | |
672 | wrapped(*_args, **_kwargs), | |
673 | _func_class, action_summary, | |
674 | on_failure, incomplete_results, | |
675 | result_renderer, result_xfm, _result_filter, **_kwargs): | |
676 | yield r | |
677 | ||
678 | for pluginspec in run_after or []: | |
679 | lgr.debug('Running post-proc plugin %s', pluginspec) | |
680 | for r in _process_results( | |
681 | Plugin.__call__( | |
682 | pluginspec, | |
683 | dataset=allkwargs.get('dataset', None), | |
684 | return_type='generator'), | |
685 | _func_class, action_summary, | |
686 | on_failure, incomplete_results, | |
687 | result_renderer, result_xfm, result_filter, | |
688 | **_kwargs): | |
689 | yield r | |
690 | ||
691 | # result summary before a potential exception | |
761 | 692 | if result_renderer == 'default' and action_summary and \ |
762 | 693 | sum(sum(s.values()) for s in action_summary.values()) > 1: |
763 | 694 | # give a summary in default mode, when there was more than one |
770 | 701 | for act in sorted(action_summary)))) |
771 | 702 | |
772 | 703 | if incomplete_results: |
773 | # stupid catch all message <- tailor TODO | |
774 | 704 | raise IncompleteResultsError( |
775 | 705 | failed=incomplete_results, |
776 | 706 | msg="Command did not complete successfully") |
777 | 707 | |
778 | if common_params['return_type'] == 'generator': | |
708 | if return_type == 'generator': | |
709 | # hand over the generator | |
779 | 710 | return generator_func(*args, **kwargs) |
780 | 711 | else: |
781 | 712 | @wrapt.decorator |
782 | 713 | def return_func(wrapped_, instance_, args_, kwargs_): |
783 | 714 | results = wrapped_(*args_, **kwargs_) |
784 | 715 | if inspect.isgenerator(results): |
716 | # unwind generator if there is one, this actually runs | |
717 | # any processing | |
785 | 718 | results = list(results) |
786 | 719 | # render summaries |
787 | if not common_params['result_xfm'] and result_renderer == 'tailored': | |
720 | if not result_xfm and result_renderer == 'tailored': | |
788 | 721 | # cannot render transformed results |
789 | 722 | if hasattr(_func_class, 'custom_result_summary_renderer'): |
790 | 723 | _func_class.custom_result_summary_renderer(results) |
791 | if common_params['return_type'] == 'item-or-list' and \ | |
724 | if return_type == 'item-or-list' and \ | |
792 | 725 | len(results) < 2: |
793 | 726 | return results[0] if results else None |
794 | 727 | else: |
799 | 732 | return eval_func(func) |
800 | 733 | |
801 | 734 | |
802 | def build_doc(cls, **kwargs): | |
803 | """Decorator to build docstrings for datalad commands | |
804 | ||
805 | It's intended to decorate the class, the __call__-method of which is the | |
806 | actual command. It expects that __call__-method to be decorated by | |
807 | eval_results. | |
808 | ||
809 | Parameters | |
810 | ---------- | |
811 | cls: Interface | |
812 | class defining a datalad command | |
813 | """ | |
814 | ||
815 | # Note, that this is a class decorator, which is executed only once when the | |
816 | # class is imported. It builds the docstring for the class' __call__ method | |
817 | # and returns the original class. | |
818 | # | |
819 | # This is because a decorator for the actual function would not be able to | |
820 | # behave like this. To build the docstring we need to access the attribute | |
821 | # _params of the class. From within a function decorator we cannot do this | |
822 | # during import time, since the class is being built in this very moment and | |
823 | # is not yet available in the module. And if we do it from within the part | |
824 | # of a function decorator, that is executed when the function is called, we | |
825 | # would need to actually call the command once in order to build this | |
826 | # docstring. | |
827 | ||
828 | lgr.debug("Building doc for {}".format(cls)) | |
829 | ||
830 | cls_doc = cls.__doc__ | |
831 | if hasattr(cls, '_docs_'): | |
832 | # expand docs | |
833 | cls_doc = cls_doc.format(**cls._docs_) | |
834 | ||
835 | call_doc = None | |
836 | # suffix for update_docstring_with_parameters: | |
837 | if cls.__call__.__doc__: | |
838 | call_doc = cls.__call__.__doc__ | |
839 | ||
840 | # build standard doc and insert eval_doc | |
841 | spec = getattr(cls, '_params_', dict()) | |
842 | # get docs for eval_results parameters: | |
843 | spec.update(eval_params) | |
844 | ||
845 | update_docstring_with_parameters( | |
846 | cls.__call__, spec, | |
847 | prefix=alter_interface_docs_for_api(cls_doc), | |
848 | suffix=alter_interface_docs_for_api(call_doc), | |
849 | add_args=eval_defaults if not hasattr(cls, '_no_eval_results') else None | |
850 | ) | |
851 | ||
852 | # return original | |
853 | return cls | |
735 | def _process_results( | |
736 | results, cmd_class, | |
737 | action_summary, on_failure, incomplete_results, | |
738 | result_renderer, result_xfm, result_filter, **kwargs): | |
739 | # private helper pf @eval_results | |
740 | # loop over results generated from some source and handle each | |
741 | # of them according to the requested behavior (logging, rendering, ...) | |
742 | for res in results: | |
743 | actsum = action_summary.get(res['action'], {}) | |
744 | if res['status']: | |
745 | actsum[res['status']] = actsum.get(res['status'], 0) + 1 | |
746 | action_summary[res['action']] = actsum | |
747 | ## log message, if a logger was given | |
748 | # remove logger instance from results, as it is no longer useful | |
749 | # after logging was done, it isn't serializable, and generally | |
750 | # pollutes the output | |
751 | res_lgr = res.pop('logger', None) | |
752 | if isinstance(res_lgr, logging.Logger): | |
753 | # didn't get a particular log function, go with default | |
754 | res_lgr = getattr(res_lgr, default_logchannels[res['status']]) | |
755 | if res_lgr and 'message' in res: | |
756 | msg = res['message'] | |
757 | msgargs = None | |
758 | if isinstance(msg, tuple): | |
759 | msgargs = msg[1:] | |
760 | msg = msg[0] | |
761 | if 'path' in res: | |
762 | msg = '{} [{}({})]'.format( | |
763 | msg, res['action'], res['path']) | |
764 | if msgargs: | |
765 | # support string expansion of logging to avoid runtime cost | |
766 | res_lgr(msg, *msgargs) | |
767 | else: | |
768 | res_lgr(msg) | |
769 | ## error handling | |
770 | # looks for error status, and report at the end via | |
771 | # an exception | |
772 | if on_failure in ('continue', 'stop') \ | |
773 | and res['status'] in ('impossible', 'error'): | |
774 | incomplete_results.append(res) | |
775 | if on_failure == 'stop': | |
776 | # first fail -> that's it | |
777 | # raise will happen after the loop | |
778 | break | |
779 | if result_filter: | |
780 | try: | |
781 | if not result_filter(res): | |
782 | raise ValueError('excluded by filter') | |
783 | except ValueError as e: | |
784 | lgr.debug('not reporting result (%s)', exc_str(e)) | |
785 | continue | |
786 | ## output rendering | |
787 | # TODO RF this in a simple callable that gets passed into this function | |
788 | if result_renderer == 'default': | |
789 | # TODO have a helper that can expand a result message | |
790 | ui.message('{action}({status}): {path}{type}{msg}'.format( | |
791 | action=ac.color_word(res['action'], ac.BOLD), | |
792 | status=ac.color_status(res['status']), | |
793 | path=relpath(res['path'], | |
794 | res['refds']) if res.get('refds', None) else res['path'], | |
795 | type=' ({})'.format( | |
796 | ac.color_word(res['type'], ac.MAGENTA) | |
797 | ) if 'type' in res else '', | |
798 | msg=' [{}]'.format( | |
799 | res['message'][0] % res['message'][1:] | |
800 | if isinstance(res['message'], tuple) else res['message']) | |
801 | if 'message' in res else '')) | |
802 | elif result_renderer in ('json', 'json_pp'): | |
803 | ui.message(json.dumps( | |
804 | {k: v for k, v in res.items() | |
805 | if k not in ('message', 'logger')}, | |
806 | sort_keys=True, | |
807 | indent=2 if result_renderer.endswith('_pp') else None)) | |
808 | elif result_renderer == 'tailored': | |
809 | if hasattr(cmd_class, 'custom_result_renderer'): | |
810 | cmd_class.custom_result_renderer(res, **kwargs) | |
811 | elif hasattr(result_renderer, '__call__'): | |
812 | result_renderer(res, **kwargs) | |
813 | if result_xfm: | |
814 | res = result_xfm(res) | |
815 | if res is None: | |
816 | continue | |
817 | yield res |
13 | 13 | import os |
14 | 14 | from os.path import join as opj, exists, relpath, dirname |
15 | 15 | from datalad.interface.base import Interface |
16 | from datalad.interface.utils import build_doc | |
16 | from datalad.interface.base import build_doc | |
17 | 17 | from datalad.interface.utils import handle_dirty_dataset |
18 | 18 | from datalad.interface.common_opts import recursion_limit, recursion_flag |
19 | 19 | from datalad.interface.common_opts import if_dirty_opt |
47 | 47 | types are configures. Moreover, it is possible to aggregate meta data from |
48 | 48 | any subdatasets into the superdataset, in order to facilitate data |
49 | 49 | discovery without having to obtain any subdataset. |
50 | ||
51 | Returns | |
52 | ------- | |
53 | List | |
54 | Any datasets where (updated) aggregated meta data was saved. | |
55 | 50 | """ |
56 | 51 | # XXX prevent common args from being added to the docstring |
57 | 52 | _no_eval_results = True |
84 | 79 | recursion_limit=None, |
85 | 80 | save=True, |
86 | 81 | if_dirty='save-before'): |
82 | """ | |
83 | Returns | |
84 | ------- | |
85 | List | |
86 | Any datasets where (updated) aggregated meta data was saved. | |
87 | """ | |
87 | 88 | ds = require_dataset( |
88 | 89 | dataset, check_installed=True, purpose='meta data aggregation') |
89 | 90 | modified_ds = [] |
25 | 25 | from datalad.interface.save import Save |
26 | 26 | from datalad.interface.results import get_status_dict |
27 | 27 | from datalad.interface.utils import eval_results |
28 | from datalad.interface.utils import build_doc | |
28 | from datalad.interface.base import build_doc | |
29 | 29 | from datalad.support.constraints import EnsureNone |
30 | 30 | from datalad.support.constraints import EnsureStr |
31 | 31 | from datalad.support.gitrepo import GitRepo |
22 | 22 | from six import reraise |
23 | 23 | from six import PY3 |
24 | 24 | from datalad.interface.base import Interface |
25 | from datalad.interface.utils import build_doc | |
25 | from datalad.interface.base import build_doc | |
26 | 26 | from datalad.distribution.dataset import Dataset |
27 | 27 | from datalad.distribution.dataset import datasetmethod, EnsureDataset, \ |
28 | 28 | require_dataset |
44 | 44 | @build_doc |
45 | 45 | class Search(Interface): |
46 | 46 | """Search within available in datasets' meta data |
47 | ||
48 | Yields | |
49 | ------ | |
50 | location : str | |
51 | (relative) path to the dataset | |
52 | report : dict | |
53 | fields which were requested by `report` option | |
54 | ||
55 | 47 | """ |
56 | 48 | # XXX prevent common args from being added to the docstring |
57 | 49 | _no_eval_results = True |
122 | 114 | report_matched=False, |
123 | 115 | format='custom', |
124 | 116 | regex=False): |
117 | """ | |
118 | Yields | |
119 | ------ | |
120 | location : str | |
121 | (relative) path to the dataset | |
122 | report : dict | |
123 | fields which were requested by `report` option | |
124 | """ | |
125 | 125 | |
126 | 126 | lgr.debug("Initiating search for match=%r and dataset %r", |
127 | 127 | match, dataset) |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """ | |
9 | ||
10 | """ | |
11 | ||
12 | __docformat__ = 'restructuredtext' | |
13 | ||
14 | import logging | |
15 | from glob import glob | |
16 | import re | |
17 | from os.path import join as opj, basename, dirname | |
18 | from os import curdir | |
19 | import inspect | |
20 | ||
21 | from datalad import cfg | |
22 | from datalad.support.param import Parameter | |
23 | from datalad.support.constraints import EnsureNone | |
24 | from datalad.distribution.dataset import EnsureDataset | |
25 | from datalad.distribution.dataset import datasetmethod | |
26 | from datalad.distribution.dataset import require_dataset | |
27 | from datalad.dochelpers import exc_str | |
28 | ||
29 | from datalad.interface.base import Interface | |
30 | from datalad.interface.base import dedent_docstring | |
31 | from datalad.interface.base import build_doc | |
32 | from datalad.interface.utils import eval_results | |
33 | from datalad.ui import ui | |
34 | ||
35 | lgr = logging.getLogger('datalad.plugin') | |
36 | ||
37 | argspec = re.compile(r'^([a-zA-z][a-zA-Z0-9_]*)=(.*)$') | |
38 | ||
39 | ||
40 | def _get_plugins(): | |
41 | locations = ( | |
42 | dirname(__file__), | |
43 | cfg.obtain('datalad.locations.system-plugins'), | |
44 | cfg.obtain('datalad.locations.user-plugins')) | |
45 | return {basename(e)[:-3]: {'file': e} | |
46 | for plugindir in locations | |
47 | for e in glob(opj(plugindir, '[!_]*.py'))} | |
48 | ||
49 | ||
50 | def _load_plugin(filepath): | |
51 | locals = {} | |
52 | globals = {} | |
53 | try: | |
54 | exec(compile(open(filepath, "rb").read(), | |
55 | filepath, 'exec'), | |
56 | globals, | |
57 | locals) | |
58 | except Exception as e: | |
59 | # any exception means full stop | |
60 | raise ValueError('plugin at {} is broken: {}'.format( | |
61 | filepath, exc_str(e))) | |
62 | if not len(locals) or 'dlplugin' not in locals: | |
63 | raise ValueError( | |
64 | "loading plugin '%s' did not yield a 'dlplugin' symbol, found: %s", | |
65 | filepath, locals.keys() if len(locals) else None) | |
66 | return locals['dlplugin'] | |
67 | ||
68 | ||
69 | @build_doc | |
70 | class Plugin(Interface): | |
71 | """Generic plugin interface | |
72 | ||
73 | Using this command, arbitrary DataLad plugins can be executed. Plugins in | |
74 | three different locations are available | |
75 | ||
76 | 1. official plugins that are part of the local DataLad installation | |
77 | ||
78 | 2. system-wide plugins, location configuration:: | |
79 | ||
80 | datalad.locations.system-plugins | |
81 | ||
82 | 3. user-supplied plugins, location configuration:: | |
83 | ||
84 | datalad.locations.user-plugins | |
85 | ||
86 | Identically named plugins in latter location replace those in locations | |
87 | searched before. | |
88 | ||
89 | *Using plugins* | |
90 | ||
91 | A list of all available plugins can be obtained by running this command | |
92 | without arguments:: | |
93 | ||
94 | datalad plugin | |
95 | ||
96 | To run a specific plugin, provide the plugin name as an argument:: | |
97 | ||
98 | datalad plugin export_tarball | |
99 | ||
100 | A plugin may come with its own documentation which can be displayed upon | |
101 | request:: | |
102 | ||
103 | datalad plugin export_tarball -H | |
104 | ||
105 | If a plugin supports (optional) arguments, they can be passed to the plugin | |
106 | as key=value pairs with the name and the respective value of an argument, | |
107 | e.g.:: | |
108 | ||
109 | datalad plugin export_tarball output=myfile | |
110 | ||
111 | Any number of arguments can be given. Only arguments with names supported | |
112 | by the respective plugin are passed to the plugin. If unsupported arguments | |
113 | are given, a warning is issued. | |
114 | ||
115 | When an argument is given multiple times, all values are passed as a list | |
116 | to the respective argument (order of value matches the order in the | |
117 | plugin call):: | |
118 | ||
119 | datalad plugin fancy_plugin input=this input=that | |
120 | ||
121 | Like in most commands, a dedicated --dataset option is supported that | |
122 | can be used to identify a specific dataset to be passed to a plugin's | |
123 | ``dataset`` argument. If a plugin requires such an argument, and no | |
124 | dataset was given, and none was found in the current working directory, | |
125 | the plugin call will fail. A dataset argument can also be passed alongside | |
126 | all other plugin arguments without using --dataset. | |
127 | ||
128 | """ | |
129 | _params_ = dict( | |
130 | dataset=Parameter( | |
131 | args=("-d", "--dataset"), | |
132 | doc="""specify the dataset for the plugin to operate on | |
133 | If no dataset is given, but a plugin take a dataset as an argument, | |
134 | an attempt is made to identify the dataset based on the current | |
135 | working directory.""", | |
136 | constraints=EnsureDataset() | EnsureNone()), | |
137 | plugin=Parameter( | |
138 | args=("plugin",), | |
139 | nargs='*', | |
140 | metavar='PLUGINSPEC', | |
141 | doc="""plugin name plus an optional list of `key=value` pairs with | |
142 | arguments for the plugin call"""), | |
143 | showpluginhelp=Parameter( | |
144 | args=('-H', '--show-plugin-help',), | |
145 | dest='showpluginhelp', | |
146 | action='store_true', | |
147 | doc="""show help for a particular"""), | |
148 | showplugininfo=Parameter( | |
149 | args=('--show-plugin-info',), | |
150 | dest='showplugininfo', | |
151 | action='store_true', | |
152 | doc="""show additional information in plugin overview (e.g. plugin file | |
153 | location"""), | |
154 | ) | |
155 | ||
156 | @staticmethod | |
157 | @datasetmethod(name='plugin') | |
158 | @eval_results | |
159 | def __call__(plugin=None, dataset=None, showpluginhelp=False, showplugininfo=False, **kwargs): | |
160 | plugins = _get_plugins() | |
161 | if not plugin: | |
162 | max_name_len = max(len(k) for k in plugins.keys()) | |
163 | for plname, plinfo in sorted(plugins.items(), key=lambda x: x[0]): | |
164 | spacer = ' ' * (max_name_len - len(plname)) | |
165 | synopsis = None | |
166 | try: | |
167 | with open(plinfo['file']) as plf: | |
168 | for line in plf: | |
169 | if line.startswith('"""'): | |
170 | synopsis = line.strip().strip('"').strip() | |
171 | break | |
172 | except Exception as e: | |
173 | ui.message('{}{} [BROKEN] {}'.format( | |
174 | plname, spacer, exc_str(e))) | |
175 | continue | |
176 | if synopsis: | |
177 | msg = '{}{} - {}'.format( | |
178 | plname, spacer, synopsis) | |
179 | else: | |
180 | msg = '{}{} [no synopsis]'.format(plname, spacer) | |
181 | if showplugininfo: | |
182 | msg = '{} ({})'.format(msg, plinfo['file']) | |
183 | ui.message(msg) | |
184 | return | |
185 | args = None | |
186 | if isinstance(plugin, (list, tuple)): | |
187 | args = plugin[1:] | |
188 | plugin = plugin[0] | |
189 | if plugin not in plugins: | |
190 | raise ValueError("unknown plugin '{}', available: {}".format( | |
191 | plugin, ','.join(plugins.keys()))) | |
192 | user_supplied_args = set() | |
193 | if args: | |
194 | # we got some arguments in the plugin spec, parse them and add to | |
195 | # kwargs | |
196 | for arg in args: | |
197 | if isinstance(arg, tuple): | |
198 | # came from python item-style | |
199 | argname, argval = arg | |
200 | else: | |
201 | parsed = argspec.match(arg) | |
202 | if parsed is None: | |
203 | raise ValueError("invalid plugin argument: '{}'".format(arg)) | |
204 | argname, argval = parsed.groups() | |
205 | if argname in kwargs: | |
206 | # argument was seen at least once before -> make list | |
207 | existing_val = kwargs[argname] | |
208 | if not isinstance(existing_val, list): | |
209 | existing_val = [existing_val] | |
210 | existing_val.append(argval) | |
211 | argval = existing_val | |
212 | kwargs[argname] = argval | |
213 | user_supplied_args.add(argname) | |
214 | plugin_call = _load_plugin(plugins[plugin]['file']) | |
215 | ||
216 | if showpluginhelp: | |
217 | # we don't need special docs for the cmdline, standard python ones | |
218 | # should be comprehensible enough | |
219 | ui.message( | |
220 | dedent_docstring(plugin_call.__doc__) | |
221 | if plugin_call.__doc__ | |
222 | else 'This plugin has no documentation') | |
223 | return | |
224 | ||
225 | # | |
226 | # argument preprocessing | |
227 | # | |
228 | # check the plugin signature and filter out all unsupported args | |
229 | plugin_args, _, _, arg_defaults = inspect.getargspec(plugin_call) | |
230 | supported_args = {k: v for k, v in kwargs.items() if k in plugin_args} | |
231 | excluded_args = user_supplied_args.difference(supported_args.keys()) | |
232 | if excluded_args: | |
233 | lgr.warning('ignoring plugin argument(s) %s, not supported by plugin', | |
234 | excluded_args) | |
235 | # always overwrite the dataset arg if one is needed | |
236 | if 'dataset' in plugin_args: | |
237 | supported_args['dataset'] = require_dataset( | |
238 | # use dedicated arg if given, also anything the came with the plugin args | |
239 | # or curdir as the last resort | |
240 | dataset if dataset else kwargs.get('dataset', curdir), | |
241 | # note 'dataset' arg is always first, if we have defaults for all args | |
242 | # we have a default for 'dataset' to -> it is optional | |
243 | check_installed=len(arg_defaults) != len(plugin_args), | |
244 | purpose='handover to plugin') | |
245 | ||
246 | # call as a generator | |
247 | for res in plugin_call(**supported_args): | |
248 | if not res: | |
249 | continue | |
250 | if dataset: | |
251 | # enforce standard regardless of what plugin did | |
252 | res['refds'] = getattr(dataset, 'path', dataset) | |
253 | elif 'refds' in res: | |
254 | # no base dataset, results must not have them either | |
255 | del res['refds'] | |
256 | if 'logger' not in res: | |
257 | # make sure we have a logger | |
258 | res['logger'] = lgr | |
259 | yield res |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """add a README file to a dataset""" | |
9 | ||
10 | __docformat__ = 'restructuredtext' | |
11 | ||
12 | ||
13 | # PLUGIN API | |
14 | def dlplugin(dataset, filename='README.rst', existing='skip'): | |
15 | """Add basic information about DataLad datasets to a README file | |
16 | ||
17 | The README file is added to the dataset and the addition is saved | |
18 | in the dataset. | |
19 | ||
20 | Parameters | |
21 | ---------- | |
22 | dataset : Dataset | |
23 | dataset to add information to | |
24 | filename : str, optional | |
25 | path of the README file within the dataset. Default: 'README.rst' | |
26 | existing : {'skip', 'append', 'replace'} | |
27 | how to react if a file with the target name already exists: | |
28 | 'skip': do nothing; 'append': append information to the existing | |
29 | file; 'replace': replace the existing file with new content. | |
30 | Default: 'skip' | |
31 | ||
32 | """ | |
33 | ||
34 | from os.path import lexists | |
35 | from os.path import join as opj | |
36 | ||
37 | default_content="""\ | |
38 | About this dataset | |
39 | ================== | |
40 | ||
41 | This is a DataLad dataset{id}. | |
42 | ||
43 | For more information on DataLad and on how to work with its datasets, | |
44 | see the DataLad documentation at: http://docs.datalad.org | |
45 | """.format( | |
46 | id=' (id: {})'.format(dataset.id) if dataset.id else '') | |
47 | filename = opj(dataset.path, filename) | |
48 | res_kwargs = dict(action='add_readme', path=filename) | |
49 | ||
50 | if lexists(filename) and existing == 'skip': | |
51 | yield dict( | |
52 | res_kwargs, | |
53 | status='notneeded', | |
54 | message='file already exists, and not appending content') | |
55 | return | |
56 | ||
57 | # unlock, file could be annexed | |
58 | # TODO yield | |
59 | if lexists(filename): | |
60 | dataset.unlock(filename) | |
61 | ||
62 | with open(filename, 'a' if existing == 'append' else 'w') as fp: | |
63 | fp.write(default_content) | |
64 | yield dict( | |
65 | status='ok', | |
66 | path=filename, | |
67 | type='file', | |
68 | action='add_readme') | |
69 | ||
70 | for r in dataset.add( | |
71 | filename, | |
72 | message='[DATALAD] added README', | |
73 | result_filter=None, | |
74 | result_xfm=None): | |
75 | yield r |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """export a dataset to a tarball""" | |
9 | ||
10 | __docformat__ = 'restructuredtext' | |
11 | ||
12 | ||
13 | # PLUGIN API | |
14 | def dlplugin(dataset, output=None): | |
15 | import os | |
16 | import tarfile | |
17 | from mock import patch | |
18 | from os.path import join as opj, dirname, normpath, isabs | |
19 | from datalad.utils import file_basename | |
20 | from datalad.support.annexrepo import AnnexRepo | |
21 | ||
22 | import logging | |
23 | lgr = logging.getLogger('datalad.plugin.tarball') | |
24 | ||
25 | repo = dataset.repo | |
26 | committed_date = repo.get_committed_date() | |
27 | ||
28 | # could be used later on to filter files by some criterion | |
29 | def _filter_tarinfo(ti): | |
30 | # Reset the date to match the one of the last commit, not from the | |
31 | # filesystem since git doesn't track those at all | |
32 | # TODO: use the date of the last commit when any particular | |
33 | # file was changed -- would be the most kosher yoh thinks to the | |
34 | # degree of our abilities | |
35 | ti.mtime = committed_date | |
36 | return ti | |
37 | ||
38 | if output is None: | |
39 | output = "datalad_{}.tar.gz".format(dataset.id) | |
40 | else: | |
41 | if not output.endswith('.tar.gz'): | |
42 | output += '.tar.gz' | |
43 | ||
44 | root = dataset.path | |
45 | # use dir inside matching the output filename | |
46 | # TODO: could be an option to the export plugin allowing empty value | |
47 | # for no leading dir | |
48 | leading_dir = file_basename(output) | |
49 | ||
50 | # workaround for inability to pass down the time stamp | |
51 | with patch('time.time', return_value=committed_date), \ | |
52 | tarfile.open(output, "w:gz") as tar: | |
53 | repo_files = sorted(repo.get_indexed_files()) | |
54 | if isinstance(repo, AnnexRepo): | |
55 | annexed = repo.is_under_annex( | |
56 | repo_files, allow_quick=True, batch=True) | |
57 | else: | |
58 | annexed = [False] * len(repo_files) | |
59 | for i, rpath in enumerate(repo_files): | |
60 | fpath = opj(root, rpath) | |
61 | if annexed[i]: | |
62 | # resolve to possible link target | |
63 | link_target = os.readlink(fpath) | |
64 | if not isabs(link_target): | |
65 | link_target = normpath(opj(dirname(fpath), link_target)) | |
66 | fpath = link_target | |
67 | # name in the tarball | |
68 | aname = normpath(opj(leading_dir, rpath)) | |
69 | tar.add( | |
70 | fpath, | |
71 | arcname=aname, | |
72 | recursive=False, | |
73 | filter=_filter_tarinfo) | |
74 | ||
75 | if not isabs(output): | |
76 | output = opj(os.getcwd(), output) | |
77 | ||
78 | yield dict( | |
79 | status='ok', | |
80 | path=output, | |
81 | type='file', | |
82 | action='export_tarball', | |
83 | logger=lgr) |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """configure which dataset parts to never put in the annex""" | |
9 | ||
10 | ||
11 | __docformat__ = 'restructuredtext' | |
12 | ||
13 | ||
14 | # PLUGIN API | |
15 | def dlplugin(dataset, pattern, ref_dir='.', makedirs='no'): | |
16 | # could be extended to accept actual largefile expressions | |
17 | """Configure a dataset to never put some content into the dataset's annex | |
18 | ||
19 | This can be useful in mixed datasets that also contain textual data, such | |
20 | as source code, which can be efficiently and more conveniently managed | |
21 | directly in Git. | |
22 | ||
23 | Patterns generally look like this:: | |
24 | ||
25 | code/* | |
26 | ||
27 | which would match all file in the code directory. In order to match all | |
28 | files under ``code/``, including all its subdirectories use such a | |
29 | pattern:: | |
30 | ||
31 | code/** | |
32 | ||
33 | Note that the plugin works incrementally, hence any existing configuration | |
34 | (e.g. from a previous plugin run) is amended, not replaced. | |
35 | ||
36 | Parameters | |
37 | ---------- | |
38 | dataset : Dataset | |
39 | dataset to configure | |
40 | pattern : list | |
41 | list of path patterns. Any content whose path is matching any pattern | |
42 | will not be annexed when added to a dataset, but instead will be | |
43 | tracked directly in Git. Path pattern have to be relative to the | |
44 | directory given by the `ref_dir` option. By default, patterns should | |
45 | be relative to the root of the dataset. | |
46 | ref_dir : str, optional | |
47 | Relative path (within the dataset) to the directory that is to be | |
48 | configured. All patterns are interpreted relative to this path, | |
49 | and configuration is written to a ``.gitattributes`` file in this | |
50 | directory. | |
51 | makedirs : bool, optional | |
52 | If set, any missing directories will be created in order to be able | |
53 | to place a file into ``ref_dir``. Default: False. | |
54 | """ | |
55 | from os.path import join as opj | |
56 | from os.path import isabs | |
57 | from os.path import exists | |
58 | from os import makedirs as makedirsfx | |
59 | from datalad.distribution.dataset import require_dataset | |
60 | from datalad.support.annexrepo import AnnexRepo | |
61 | from datalad.support.constraints import EnsureBool | |
62 | from datalad.utils import assure_list | |
63 | ||
64 | makedirs = EnsureBool()(makedirs) | |
65 | pattern = assure_list(pattern) | |
66 | ds = require_dataset(dataset, check_installed=True, | |
67 | purpose='no_annex configuration') | |
68 | ||
69 | res_kwargs = dict( | |
70 | path=ds.path, | |
71 | type='dataset', | |
72 | action='no_annex', | |
73 | ) | |
74 | ||
75 | # all the ways we refused to cooperate | |
76 | if not isinstance(ds.repo, AnnexRepo): | |
77 | yield dict( | |
78 | res_kwargs, | |
79 | status='notneeded', | |
80 | message='dataset has no annex') | |
81 | return | |
82 | if any(isabs(p) for p in pattern): | |
83 | yield dict( | |
84 | res_kwargs, | |
85 | status='error', | |
86 | message=('path pattern for `no_annex` configuration must be relative paths: %s', | |
87 | pattern)) | |
88 | return | |
89 | if isabs(ref_dir): | |
90 | yield dict( | |
91 | res_kwargs, | |
92 | status='error', | |
93 | message=('`ref_dir` for `no_annex` configuration must be a relative path: %s', | |
94 | ref_dir)) | |
95 | return | |
96 | ||
97 | gitattr_dir = opj(ds.path, ref_dir) | |
98 | if not exists(gitattr_dir): | |
99 | if makedirs: | |
100 | makedirsfx(gitattr_dir) | |
101 | else: | |
102 | yield dict( | |
103 | res_kwargs, | |
104 | status='error', | |
105 | message='target directory for `no_annex` does not exist (consider makedirs=True)') | |
106 | return | |
107 | ||
108 | gitattr_file = opj(gitattr_dir, '.gitattributes') | |
109 | with open(gitattr_file, 'a') as fp: | |
110 | for p in pattern: | |
111 | fp.write('{} annex.largefiles=nothing'.format(p)) | |
112 | yield dict(res_kwargs, status='ok') | |
113 | ||
114 | for r in dataset.add( | |
115 | gitattr_file, | |
116 | to_git=True, | |
117 | message="[DATALAD] exclude paths from annex'ing", | |
118 | result_filter=None, | |
119 | result_xfm=None): | |
120 | yield r |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """Plugin tests | |
9 | ||
10 | """ | |
11 | ||
12 | __docformat__ = 'restructuredtext' |
0 | # emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # -*- coding: utf-8 -*- | |
2 | # ex: set sts=4 ts=4 sw=4 noet: | |
3 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
4 | # | |
5 | # See COPYING file distributed along with the datalad package for the | |
6 | # copyright and license terms. | |
7 | # | |
8 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
9 | """Test plugin interface mechanics""" | |
10 | ||
11 | ||
12 | import logging | |
13 | from os.path import join as opj | |
14 | from os.path import exists | |
15 | from mock import patch | |
16 | ||
17 | from datalad.config import ConfigManager | |
18 | from datalad.api import plugin | |
19 | from datalad.api import create | |
20 | ||
21 | from datalad.tests.utils import swallow_logs | |
22 | from datalad.tests.utils import swallow_outputs | |
23 | from datalad.tests.utils import with_tempfile | |
24 | from datalad.tests.utils import chpwd | |
25 | from datalad.tests.utils import create_tree | |
26 | from datalad.tests.utils import assert_raises | |
27 | from datalad.tests.utils import assert_status | |
28 | from datalad.tests.utils import assert_in | |
29 | from datalad.tests.utils import assert_not_in | |
30 | from datalad.tests.utils import eq_ | |
31 | from datalad.tests.utils import ok_clean_git | |
32 | ||
33 | broken_plugin = """garbage""" | |
34 | ||
35 | nodocs_plugin = """\ | |
36 | def dlplugin(): | |
37 | pass | |
38 | """ | |
39 | ||
40 | # functioning plugin dummy | |
41 | dummy_plugin = '''\ | |
42 | """real dummy""" | |
43 | ||
44 | def dlplugin(dataset, noval, withval='test'): | |
45 | "mydocstring" | |
46 | yield dict( | |
47 | status='ok', | |
48 | action='dummy', | |
49 | args=dict( | |
50 | dataset=dataset, | |
51 | noval=noval, | |
52 | withval=withval)) | |
53 | ''' | |
54 | ||
55 | ||
56 | @with_tempfile() | |
57 | @with_tempfile(mkdir=True) | |
58 | def test_plugin_call(path, dspath): | |
59 | # make plugins | |
60 | create_tree( | |
61 | path, | |
62 | { | |
63 | 'dlplugin_dummy.py': dummy_plugin, | |
64 | 'dlplugin_nodocs.py': nodocs_plugin, | |
65 | 'dlplugin_broken.py': broken_plugin, | |
66 | }) | |
67 | fake_dummy_spec = { | |
68 | 'dummy': {'file': opj(path, 'dlplugin_dummy.py')}, | |
69 | 'nodocs': {'file': opj(path, 'dlplugin_nodocs.py')}, | |
70 | 'broken': {'file': opj(path, 'dlplugin_broken.py')}, | |
71 | } | |
72 | ||
73 | with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec): | |
74 | with swallow_outputs() as cmo: | |
75 | plugin(showplugininfo=True) | |
76 | # hyphen spacing depends on the longest plugin name! | |
77 | # sorted | |
78 | # summary list generation doesn't actually load plugins for speed, | |
79 | # hence broken is not known to be broken here | |
80 | eq_(cmo.out, | |
81 | "broken [no synopsis] ({})\ndummy - real dummy ({})\nnodocs [no synopsis] ({})\n".format( | |
82 | fake_dummy_spec['broken']['file'], | |
83 | fake_dummy_spec['dummy']['file'], | |
84 | fake_dummy_spec['nodocs']['file'])) | |
85 | with swallow_outputs() as cmo: | |
86 | plugin(['dummy'], showpluginhelp=True) | |
87 | eq_(cmo.out.rstrip(), "mydocstring") | |
88 | with swallow_outputs() as cmo: | |
89 | plugin(['nodocs'], showpluginhelp=True) | |
90 | eq_(cmo.out.rstrip(), "This plugin has no documentation") | |
91 | # loading fails, no docs | |
92 | assert_raises(ValueError, plugin, ['broken'], showpluginhelp=True) | |
93 | ||
94 | # assume this most obscure plugin name is not used | |
95 | assert_raises(ValueError, plugin, '32sdfhvz984--^^') | |
96 | ||
97 | # broken plugin argument, must match Python keyword arg | |
98 | # specs | |
99 | assert_raises(ValueError, plugin, ['dummy', '1245']) | |
100 | ||
101 | with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec): | |
102 | # does not trip over unsupported argument, they get filtered out, because | |
103 | # we carry all kinds of stuff | |
104 | with swallow_logs(new_level=logging.WARNING) as cml: | |
105 | res = list(plugin(['dummy', 'noval=one', 'obscure=some'])) | |
106 | assert_status('ok', res) | |
107 | cml.assert_logged( | |
108 | msg=".*ignoring plugin argument\\(s\\).*obscure.*, not supported by plugin.*", | |
109 | regex=True, level='WARNING') | |
110 | # fails on missing positional arg | |
111 | assert_raises(TypeError, plugin, ['dummy']) | |
112 | # positional and kwargs actually make it into the plugin | |
113 | res = list(plugin(['dummy', 'noval=one', 'withval=two']))[0] | |
114 | eq_('one', res['args']['noval']) | |
115 | eq_('two', res['args']['withval']) | |
116 | # kwarg defaults are preserved | |
117 | res = list(plugin(['dummy', 'noval=one']))[0] | |
118 | eq_('test', res['args']['withval']) | |
119 | # repeated specification yields list input | |
120 | res = list(plugin(['dummy', 'noval=one', 'noval=two']))[0] | |
121 | eq_(['one', 'two'], res['args']['noval']) | |
122 | # can do the same thing while bypassing argument parsing for calls | |
123 | # from within python, and even preserve native python dtypes | |
124 | res = list(plugin(['dummy', ('noval', 1), ('noval', 'two')]))[0] | |
125 | eq_([1, 'two'], res['args']['noval']) | |
126 | # and we can further simplify in this case by passing lists right | |
127 | # away | |
128 | res = list(plugin(['dummy', ('noval', [1, 'two'])]))[0] | |
129 | eq_([1, 'two'], res['args']['noval']) | |
130 | ||
131 | # dataset arg handling | |
132 | # run plugin that needs a dataset where there is none | |
133 | with patch('datalad.plugin._get_plugins', return_value=fake_dummy_spec): | |
134 | ds = None | |
135 | with chpwd(dspath): | |
136 | assert_raises(ValueError, plugin, ['dummy', 'noval=one']) | |
137 | # create a dataset here, fixes the error | |
138 | ds = create() | |
139 | res = list(plugin(['dummy', 'noval=one']))[0] | |
140 | # gives dataset instance | |
141 | eq_(ds, res['args']['dataset']) | |
142 | # no do again, giving the dataset path | |
143 | # but careful, `dataset` is a proper argument | |
144 | res = list(plugin(['dummy', 'noval=one'], dataset=dspath))[0] | |
145 | eq_(ds, res['args']['dataset']) | |
146 | # however, if passed alongside the plugins args it also works | |
147 | res = list(plugin(['dummy', 'dataset={}'.format(dspath), 'noval=one']))[0] | |
148 | eq_(ds, res['args']['dataset']) | |
149 | # but if both are given, the proper args takes precedence | |
150 | assert_raises(ValueError, plugin, ['dummy', 'dataset={}'.format(dspath), 'noval=one'], | |
151 | dataset='rubbish') | |
152 | ||
153 | ||
154 | # MIH: I failed to replace our config manager instance for this test run | |
155 | # in order to be able to configure a set of plugins to run prior and after | |
156 | # create. A test should not alter a users config, hence I am disabling this | |
157 | # for now, and hope somebody can fix it up | |
158 | #@with_tempfile(mkdir=True) | |
159 | #def test_plugin_config(path): | |
160 | # with patch.dict('os.environ', | |
161 | # {'HOME': path, 'DATALAD_SNEAKY_ADDITION': 'ignore'}): | |
162 | # with patch('datalad.cfg', ConfigManager()) as cfg: | |
163 | # global_gitconfig = opj(path, '.gitconfig') | |
164 | # assert(not exists(global_gitconfig)) | |
165 | # # swap out the actual config for this test | |
166 | # assert_in('datalad.sneaky.addition', cfg) | |
167 | # # now we configure a plugin to run before and twice after `create` | |
168 | # cfg.add('datalad.create.run-before', | |
169 | # 'add_readme filename=before.txt', | |
170 | # where='global') | |
171 | # cfg.add('datalad.create.run-after', | |
172 | # 'add_readme filename=after1.txt', | |
173 | # where='global') | |
174 | # cfg.add('datalad.create.run-after', | |
175 | # 'add_readme filename=after2.txt', | |
176 | # where='global') | |
177 | # # force reload to pick up newly populated .gitconfig | |
178 | # cfg.reload(force=True) | |
179 | # assert_in('datalad.create.run-before', cfg) | |
180 | # # and now we create a dataset and expect the two readme files | |
181 | # # to be part of it | |
182 | # ds = create(dataset=opj(path, 'ds')) | |
183 | # ok_clean_git(ds.path) | |
184 | # assert(exists(opj(ds.path, 'before.txt'))) | |
185 | # assert(exists(opj(ds.path, 'after1.txt'))) | |
186 | # assert(exists(opj(ds.path, 'after2.txt'))) | |
187 | ||
188 | ||
189 | @with_tempfile(mkdir=True) | |
190 | def test_wtf(path): | |
191 | # smoke test for now | |
192 | with swallow_outputs() as cmo: | |
193 | plugin(['wtf'], dataset=path) | |
194 | assert_not_in('Dataset information', cmo.out) | |
195 | assert_in('Configuration', cmo.out) | |
196 | with chpwd(path): | |
197 | with swallow_outputs() as cmo: | |
198 | plugin(['wtf']) | |
199 | assert_not_in('Dataset information', cmo.out) | |
200 | assert_in('Configuration', cmo.out) | |
201 | # now with a dataset | |
202 | ds = create(path) | |
203 | with swallow_outputs() as cmo: | |
204 | plugin(['wtf'], dataset=ds.path) | |
205 | assert_in('Configuration', cmo.out) | |
206 | assert_in('Dataset information', cmo.out) | |
207 | assert_in('path: {}'.format(ds.path), cmo.out) | |
208 | ||
209 | ||
210 | @with_tempfile(mkdir=True) | |
211 | def test_no_annex(path): | |
212 | ds = create(path) | |
213 | ok_clean_git(ds.path) | |
214 | create_tree( | |
215 | ds.path, | |
216 | {'code': { | |
217 | 'inannex': 'content', | |
218 | 'notinannex': 'othercontent'}}) | |
219 | # add two files, pre and post configuration | |
220 | ds.add(opj('code', 'inannex')) | |
221 | plugin(['no_annex', 'pattern=code/**'], dataset=ds) | |
222 | ds.add(opj('code', 'notinannex')) | |
223 | ok_clean_git(ds.path) | |
224 | # one is annex'ed, the other is not, despite no change in add call | |
225 | # importantly, also .gitattribute is not annexed | |
226 | eq_([opj('code', 'inannex')], | |
227 | ds.repo.get_annexed_files()) |
0 | # emacs: -*- mode: python-mode; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # -*- coding: utf-8 -*- | |
2 | # ex: set sts=4 ts=4 sw=4 noet: | |
3 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
4 | # | |
5 | # See COPYING file distributed along with the datalad package for the | |
6 | # copyright and license terms. | |
7 | # | |
8 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
9 | """Test tarball exporter""" | |
10 | ||
11 | import os | |
12 | import time | |
13 | from os.path import join as opj | |
14 | from os.path import isabs | |
15 | import tarfile | |
16 | ||
17 | from datalad.api import Dataset | |
18 | from datalad.api import plugin | |
19 | from datalad.utils import chpwd | |
20 | from datalad.utils import md5sum | |
21 | ||
22 | from datalad.tests.utils import with_tree | |
23 | from datalad.tests.utils import ok_startswith | |
24 | from datalad.tests.utils import assert_true, assert_not_equal, assert_raises, \ | |
25 | assert_false, assert_equal | |
26 | from datalad.tests.utils import assert_status | |
27 | from datalad.tests.utils import assert_result_count | |
28 | ||
29 | ||
30 | _dataset_template = { | |
31 | 'ds': { | |
32 | 'file_up': 'some_content', | |
33 | 'dir': { | |
34 | 'file1_down': 'one', | |
35 | 'file2_down': 'two'}}} | |
36 | ||
37 | ||
38 | @with_tree(_dataset_template) | |
39 | def test_failure(path): | |
40 | ds = Dataset(opj(path, 'ds')).create(force=True) | |
41 | # unknown pluginer | |
42 | assert_raises(ValueError, ds.plugin, 'nah') | |
43 | # non-existing dataset | |
44 | assert_raises(ValueError, plugin, 'export_tarball', Dataset('nowhere')) | |
45 | ||
46 | ||
47 | @with_tree(_dataset_template) | |
48 | def test_tarball(path): | |
49 | ds = Dataset(opj(path, 'ds')).create(force=True) | |
50 | ds.add('.') | |
51 | committed_date = ds.repo.get_committed_date() | |
52 | default_outname = opj(path, 'datalad_{}.tar.gz'.format(ds.id)) | |
53 | with chpwd(path): | |
54 | res = list(ds.plugin('export_tarball')) | |
55 | assert_status('ok', res) | |
56 | assert_result_count(res, 1) | |
57 | assert(isabs(res[0]['path'])) | |
58 | assert_true(os.path.exists(default_outname)) | |
59 | custom_outname = opj(path, 'myexport.tar.gz') | |
60 | # feed in without extension | |
61 | ds.plugin('export_tarball', output=custom_outname[:-7]) | |
62 | assert_true(os.path.exists(custom_outname)) | |
63 | custom1_md5 = md5sum(custom_outname) | |
64 | # encodes the original tarball filename -> different checksum, despit | |
65 | # same content | |
66 | assert_not_equal(md5sum(default_outname), custom1_md5) | |
67 | # should really sleep so if they stop using time.time - we know | |
68 | time.sleep(1.1) | |
69 | ds.plugin('export_tarball', output=custom_outname) | |
70 | # should not encode mtime, so should be identical | |
71 | assert_equal(md5sum(custom_outname), custom1_md5) | |
72 | ||
73 | def check_contents(outname, prefix): | |
74 | with tarfile.open(outname) as tf: | |
75 | nfiles = 0 | |
76 | for ti in tf: | |
77 | # any annex links resolved | |
78 | assert_false(ti.issym()) | |
79 | ok_startswith(ti.name, prefix + '/') | |
80 | assert_equal(ti.mtime, committed_date) | |
81 | if '.datalad' not in ti.name: | |
82 | # ignore any files in .datalad for this test to not be | |
83 | # susceptible to changes in how much we generate a meta info | |
84 | nfiles += 1 | |
85 | # we have exactly four files (includes .gitattributes for default | |
86 | # MD5E backend), and expect no content for any directory | |
87 | assert_equal(nfiles, 4) | |
88 | check_contents(default_outname, 'datalad_%s' % ds.id) | |
89 | check_contents(custom_outname, 'myexport') |
0 | # emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*- | |
1 | # ex: set sts=4 ts=4 sw=4 noet: | |
2 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
3 | # | |
4 | # See COPYING file distributed along with the datalad package for the | |
5 | # copyright and license terms. | |
6 | # | |
7 | # ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ## | |
8 | """provide information about this DataLad installation""" | |
9 | ||
10 | __docformat__ = 'restructuredtext' | |
11 | ||
12 | ||
13 | # PLUGIN API | |
14 | def dlplugin(dataset=None): | |
15 | """Generate a report about the DataLad installation and configuration | |
16 | ||
17 | IMPORTANT: Sharing this report with untrusted parties (e.g. on the web) | |
18 | should be done with care, as it may include identifying information, and/or | |
19 | credentials or access tokens. | |
20 | ||
21 | Parameters | |
22 | ---------- | |
23 | dataset : Dataset, optional | |
24 | If a dataset is given or found, information on this dataset is provided | |
25 | (if it exists), and its active configuration is reported. | |
26 | """ | |
27 | ds = dataset | |
28 | if ds and not ds.is_installed(): | |
29 | # we don't deal with absent datasets | |
30 | ds = None | |
31 | if ds is None: | |
32 | from datalad import cfg | |
33 | else: | |
34 | cfg = ds.config | |
35 | from datalad.ui import ui | |
36 | from datalad.api import metadata | |
37 | ||
38 | report_template = """\ | |
39 | {dataset} | |
40 | Configuration | |
41 | ============= | |
42 | {cfg} | |
43 | ||
44 | """ | |
45 | ||
46 | dataset_template = """\ | |
47 | Dataset information | |
48 | =================== | |
49 | {basic} | |
50 | ||
51 | Metadata | |
52 | -------- | |
53 | {meta} | |
54 | ||
55 | """ | |
56 | ds_meta = None | |
57 | if ds and ds.is_installed(): | |
58 | ds_meta = metadata( | |
59 | dataset=ds, dataset_global=True, return_type='item-or-list', | |
60 | result_filter=lambda x: x['action'] == 'metadata') | |
61 | if ds_meta: | |
62 | ds_meta = ds_meta['metadata'] | |
63 | ||
64 | ui.message(report_template.format( | |
65 | dataset='' if not ds else dataset_template.format( | |
66 | basic='\n'.join( | |
67 | '{}: {}'.format(k, v) for k, v in ( | |
68 | ('path', ds.path), | |
69 | ('repo', ds.repo.__class__.__name__ if ds.repo else '[NONE]'), | |
70 | )), | |
71 | meta='\n'.join( | |
72 | '{}: {}'.format(k, v) for k, v in ds_meta) | |
73 | if ds_meta else '[no metadata]' | |
74 | ), | |
75 | cfg='\n'.join( | |
76 | '{}: {}'.format(k, '<HIDDEN>' if k.startswith('user.') or 'token' in k else v) | |
77 | for k, v in sorted(cfg.items(), key=lambda x: x[0])), | |
78 | )) | |
79 | yield |
50 | 50 | from datalad.utils import on_windows |
51 | 51 | from datalad.utils import swallow_logs |
52 | 52 | from datalad.utils import assure_list |
53 | from datalad.utils import _path_ | |
53 | 54 | from datalad.cmd import GitRunner |
54 | 55 | |
55 | 56 | # imports from same module: |
96 | 97 | WEB_UUID = "00000000-0000-0000-0000-000000000001" |
97 | 98 | |
98 | 99 | # To be assigned and checked to be good enough upon first call to AnnexRepo |
100 | # 6.20160923 -- --json-progress for get | |
99 | 101 | # 6.20161210 -- annex add to add also changes (not only new files) to git |
100 | 102 | # 6.20170220 -- annex status provides --ignore-submodules |
101 | 103 | GIT_ANNEX_MIN_VERSION = '6.20170220' |
240 | 242 | # to use 'git annex unlock' instead. |
241 | 243 | lgr.warning("direct mode not available for %s. Ignored." % self) |
242 | 244 | |
245 | self._batched = BatchedAnnexes(batch_size=batch_size) | |
246 | ||
243 | 247 | # set default backend for future annex commands: |
244 | 248 | # TODO: Should the backend option of __init__() also migrate |
245 | 249 | # the annex, in case there are annexed files already? |
246 | 250 | if backend: |
247 | lgr.debug("Setting annex backend to %s", backend) | |
248 | # Must be done with explicit release, otherwise on Python3 would end up | |
249 | # with .git/config wiped out | |
250 | # see https://github.com/gitpython-developers/GitPython/issues/333#issuecomment-126633757 | |
251 | ||
252 | # TODO: 'annex.backends' actually is a space separated list. | |
253 | # Figure out, whether we want to allow for a list here or what to | |
254 | # do, if there is sth in that setting already | |
251 | self.set_default_backend(backend, persistent=True) | |
252 | ||
253 | ||
254 | def set_default_backend(self, backend, persistent=True, commit=True): | |
255 | """Set default backend | |
256 | ||
257 | Parameters | |
258 | ---------- | |
259 | backend : str | |
260 | persistent : bool, optional | |
261 | If persistent, would add/commit to .gitattributes. If not -- would | |
262 | set within .git/config | |
263 | """ | |
264 | # TODO: 'annex.backends' actually is a space separated list. | |
265 | # Figure out, whether we want to allow for a list here or what to | |
266 | # do, if there is sth in that setting already | |
267 | if persistent: | |
268 | git_attributes_file = _path_(self.path, '.gitattributes') | |
269 | git_attributes = '' | |
270 | if exists(git_attributes_file): | |
271 | with open(git_attributes_file) as f: | |
272 | git_attributes = f.read() | |
273 | if ' annex.backend=' in git_attributes: | |
274 | lgr.debug( | |
275 | "Not (re)setting backend since seems already set in %s" | |
276 | % git_attributes_file | |
277 | ) | |
278 | else: | |
279 | lgr.debug("Setting annex backend to %s (persistently)", backend) | |
280 | self.config.set('annex.backends', backend, where='local') | |
281 | with open(git_attributes_file, 'a') as f: | |
282 | if git_attributes and not git_attributes.endswith(os.linesep): | |
283 | f.write(os.linesep) | |
284 | f.write('* annex.backend=%s%s' % (backend, os.linesep)) | |
285 | self.add(git_attributes_file, git=True) | |
286 | if commit: | |
287 | self.commit( | |
288 | "Set default backend for all files to be %s" % backend, | |
289 | _datalad_msg=True, | |
290 | files=[git_attributes_file] | |
291 | ) | |
292 | else: | |
293 | lgr.debug("Setting annex backend to %s (in .git/config)", backend) | |
255 | 294 | self.config.set('annex.backends', backend, where='local') |
256 | ||
257 | self._batched = BatchedAnnexes(batch_size=batch_size) | |
258 | 295 | |
259 | 296 | def __del__(self): |
260 | 297 | try: |
826 | 863 | super(AnnexRepo, self).set_remote_url(name, url, push=push) |
827 | 864 | self._set_shared_connection(name, url) |
828 | 865 | |
866 | def set_remote_dead(self, name): | |
867 | """Announce to annex that remote is "dead" | |
868 | """ | |
869 | return self._annex_custom_command([], ["git", "annex", "dead", name]) | |
870 | ||
871 | def is_remote_annex_ignored(self, remote): | |
872 | """Return True if remote is explicitly ignored""" | |
873 | return self.config.getbool( | |
874 | 'remote.{}'.format(remote), 'annex-ignore', | |
875 | default=False | |
876 | ) | |
877 | ||
878 | def is_special_annex_remote(self, remote, check_if_known=True): | |
879 | """Return either remote is a special annex remote | |
880 | ||
881 | Decides based on the presence of diagnostic annex- options | |
882 | for the remote | |
883 | """ | |
884 | if check_if_known: | |
885 | if remote not in self.get_remotes(): | |
886 | raise RemoteNotAvailableError(remote) | |
887 | sec = 'remote.{}'.format(remote) | |
888 | for opt in ('annex-externaltype', 'annex-webdav'): | |
889 | if self.config.has_option(sec, opt): | |
890 | return True | |
891 | return False | |
892 | ||
829 | 893 | @borrowkwargs(GitRepo) |
830 | def get_remotes(self, with_refs_only=False, with_urls_only=False, | |
894 | def get_remotes(self, | |
895 | with_urls_only=False, | |
831 | 896 | exclude_special_remotes=False): |
832 | 897 | """Get known (special-) remotes of the repository |
833 | 898 | |
841 | 906 | remotes : list of str |
842 | 907 | List of names of the remotes |
843 | 908 | """ |
844 | remotes = super(AnnexRepo, self).get_remotes( | |
845 | with_refs_only=with_refs_only, with_urls_only=with_urls_only) | |
909 | remotes = super(AnnexRepo, self).get_remotes(with_urls_only=with_urls_only) | |
846 | 910 | |
847 | 911 | if exclude_special_remotes: |
848 | return [remote for remote in remotes | |
849 | if not self.config.has_option('remote.{}'.format(remote), | |
850 | 'annex-externaltype')] | |
912 | return [ | |
913 | remote for remote in remotes | |
914 | if not self.is_special_annex_remote(remote, check_if_known=False) | |
915 | ] | |
851 | 916 | else: |
852 | 917 | return remotes |
853 | 918 | |
1119 | 1184 | self.config.reload() |
1120 | 1185 | |
1121 | 1186 | @normalize_paths |
1122 | def get(self, files, options=None, jobs=None): | |
1187 | def get(self, files, remote=None, options=None, jobs=None): | |
1123 | 1188 | """Get the actual content of files |
1124 | 1189 | |
1125 | 1190 | Parameters |
1126 | 1191 | ---------- |
1127 | 1192 | files : list of str |
1128 | 1193 | paths to get |
1194 | remote : str, optional | |
1195 | from which remote to fetch content | |
1129 | 1196 | options : list of str, optional |
1130 | 1197 | commandline options for the git annex get command |
1131 | 1198 | jobs : int, optional |
1137 | 1204 | """ |
1138 | 1205 | options = options[:] if options else [] |
1139 | 1206 | |
1207 | if remote: | |
1208 | if remote not in self.get_remotes(): | |
1209 | raise RemoteNotAvailableError( | |
1210 | remote=remote, | |
1211 | cmd="get", | |
1212 | msg="Remote is not known. Known are: %s" | |
1213 | % (self.get_remotes(),) | |
1214 | ) | |
1215 | options += ['--from', remote] | |
1216 | ||
1140 | 1217 | # analyze provided files to decide which actually are needed to be |
1141 | 1218 | # fetched |
1142 | 1219 | |
1143 | 1220 | if '--key' not in options: |
1144 | expected_downloads, fetch_files = self._get_expected_downloads( | |
1145 | files) | |
1221 | expected_downloads, fetch_files = self._get_expected_files( | |
1222 | files, ['--not', '--in', 'here']) | |
1146 | 1223 | else: |
1147 | 1224 | fetch_files = files |
1148 | 1225 | assert(len(files) == 1) |
1155 | 1232 | if len(fetch_files) != len(files): |
1156 | 1233 | lgr.info("Actually getting %d files", len(fetch_files)) |
1157 | 1234 | |
1158 | # TODO: check annex version and issue a one time warning if not | |
1159 | # old enough for --json-progress | |
1160 | ||
1161 | # Without up to date annex, we would still report total! ;) | |
1162 | if self.git_annex_version >= '6.20160923': | |
1163 | # options might be the '--key' which should go last | |
1164 | options = ['--json-progress'] + options | |
1235 | # options might be the '--key' which should go last | |
1236 | options = ['--json-progress'] + options | |
1165 | 1237 | |
1166 | 1238 | # Note: Currently swallowing logs, due to the workaround to report files |
1167 | 1239 | # not found, but don't fail and report about other files and use JSON, |
1179 | 1251 | # from annex failed ones |
1180 | 1252 | with cm: |
1181 | 1253 | results = self._run_annex_command_json( |
1182 | 'get', args=options + fetch_files, | |
1254 | 'get', | |
1255 | args=options + fetch_files, | |
1183 | 1256 | jobs=jobs, |
1184 | 1257 | expected_entries=expected_downloads) |
1185 | 1258 | results_list = list(results) |
1186 | 1259 | # TODO: should we here compare fetch_files against result_list |
1187 | # and womit an exception of incomplete download???? | |
1260 | # and vomit an exception of incomplete download???? | |
1188 | 1261 | return results_list |
1189 | 1262 | |
1190 | def _get_expected_downloads(self, files): | |
1263 | def _get_expected_files(self, files, expr): | |
1191 | 1264 | """Given a list of files, figure out what to be downloaded |
1192 | 1265 | |
1193 | 1266 | Parameters |
1194 | 1267 | ---------- |
1195 | 1268 | files |
1269 | expr: list | |
1270 | Expression to be passed into annex's find | |
1196 | 1271 | |
1197 | 1272 | Returns |
1198 | 1273 | ------- |
1199 | expected_downloads : dict | |
1274 | expected_files : dict | |
1200 | 1275 | key -> size |
1201 | 1276 | fetch_files : list |
1202 | 1277 | files to be fetched |
1203 | 1278 | """ |
1204 | lgr.debug("Determine what files need to be obtained") | |
1279 | lgr.debug("Determine what files match the query to work with") | |
1205 | 1280 | # Let's figure out first which files/keys and of what size to download |
1206 | expected_downloads = {} | |
1281 | expected_files = {} | |
1207 | 1282 | fetch_files = [] |
1208 | 1283 | keys_seen = set() |
1209 | 1284 | unknown_sizes = [] # unused atm |
1210 | 1285 | # for now just record total size, and |
1211 | 1286 | for j in self._run_annex_command_json( |
1212 | 'find', args=['--json', '--not', '--in', 'here'] + files | |
1287 | 'find', args=['--json'] + expr + files | |
1213 | 1288 | ): |
1289 | # TODO: some files might not even be here. So in current fancy | |
1290 | # output reporting scheme we should then theoretically handle | |
1291 | # those cases here and say 'impossible' or something like that | |
1292 | if not j.get('success', True): | |
1293 | # TODO: I guess do something with yielding and filtering for | |
1294 | # what need to be done and what not | |
1295 | continue | |
1214 | 1296 | key = j['key'] |
1215 | 1297 | size = j.get('bytesize') |
1216 | 1298 | if key in keys_seen: |
1221 | 1303 | assert j['file'] |
1222 | 1304 | fetch_files.append(j['file']) |
1223 | 1305 | if size and size.isdigit(): |
1224 | expected_downloads[key] = int(size) | |
1306 | expected_files[key] = int(size) | |
1225 | 1307 | else: |
1226 | expected_downloads[key] = None | |
1308 | expected_files[key] = None | |
1227 | 1309 | unknown_sizes.append(j['file']) |
1228 | return expected_downloads, fetch_files | |
1310 | return expected_files, fetch_files | |
1229 | 1311 | |
1230 | 1312 | @normalize_paths |
1231 | 1313 | def add(self, files, git=None, backend=None, options=None, commit=False, |
2159 | 2241 | json_objects = (json.loads(line) |
2160 | 2242 | for line in out.splitlines() if line.startswith('{')) |
2161 | 2243 | # protect against progress leakage |
2162 | json_objects = [j for j in json_objects if not 'byte-progress' in j] | |
2244 | json_objects = [j for j in json_objects if 'byte-progress' not in j] | |
2163 | 2245 | return json_objects |
2164 | 2246 | |
2165 | 2247 | # TODO: reconsider having any magic at all and maybe just return a list/dict always |
2693 | 2775 | # TODO: we probably need to override get_file_content, since it returns the |
2694 | 2776 | # symlink's target instead of the actual content. |
2695 | 2777 | |
2778 | # We need --auto and --fast having exposed TODO | |
2696 | 2779 | @normalize_paths(match_return_type=False) # get a list even in case of a single item |
2697 | def copy_to(self, files, remote, options=None, log_online=True): | |
2780 | def copy_to(self, files, remote, options=None, jobs=None): | |
2698 | 2781 | """Copy the actual content of `files` to `remote` |
2699 | 2782 | |
2700 | 2783 | Parameters |
2703 | 2786 | path(s) to copy |
2704 | 2787 | remote: str |
2705 | 2788 | name of remote to copy `files` to |
2706 | log_online: bool | |
2707 | see get() | |
2708 | 2789 | |
2709 | 2790 | Returns |
2710 | 2791 | ------- |
2712 | 2793 | files successfully copied |
2713 | 2794 | """ |
2714 | 2795 | |
2796 | # find --in here --not --in remote | |
2715 | 2797 | # TODO: full support of annex copy options would lead to `files` being |
2716 | 2798 | # optional. This means to check for whether files or certain options are |
2717 | 2799 | # given and fail or just pass everything as is and try to figure out, |
2720 | 2802 | if remote not in self.get_remotes(): |
2721 | 2803 | raise ValueError("Unknown remote '{0}'.".format(remote)) |
2722 | 2804 | |
2805 | options = options[:] if options else [] | |
2806 | ||
2807 | # Note: | |
2723 | 2808 | # In case of single path, 'annex copy' will fail, if it cannot copy it. |
2724 | 2809 | # With multiple files, annex will just skip the ones, it cannot deal |
2725 | 2810 | # with. We'll do the same and report back what was successful |
2729 | 2814 | if not isdir(files[0]): |
2730 | 2815 | self.get_file_key(files[0]) |
2731 | 2816 | |
2732 | # Note: | |
2733 | # - annex copy fails, if `files` was a single item, that doesn't exist | |
2734 | # - files not in annex or not even in git don't yield a non-zero exit, | |
2735 | # but are ignored | |
2736 | # - in case of multiple items, annex would silently skip those files | |
2737 | ||
2738 | annex_options = files + ['--to=%s' % remote] | |
2817 | # TODO: RF -- logic is duplicated with get() -- the only difference | |
2818 | # is the verb (copy, copy) or (get, put) and remote ('here', remote)? | |
2819 | if '--key' not in options: | |
2820 | expected_copys, copy_files = self._get_expected_files( | |
2821 | files, ['--in', 'here', '--not', '--in', remote]) | |
2822 | else: | |
2823 | copy_files = files | |
2824 | assert(len(files) == 1) | |
2825 | expected_copys = {files[0]: AnnexRepo.get_size_from_key(files[0])} | |
2826 | ||
2827 | if not copy_files: | |
2828 | lgr.debug("No files found needing copying.") | |
2829 | return [] | |
2830 | ||
2831 | if len(copy_files) != len(files): | |
2832 | lgr.info("Actually copying %d files", len(copy_files)) | |
2833 | ||
2834 | annex_options = ['--to=%s' % remote, '--json-progress'] | |
2739 | 2835 | if options: |
2740 | 2836 | annex_options.extend(shlex.split(options)) |
2741 | # Note: | |
2742 | # As of now, there is no --json option for annex copy. Use it once this | |
2743 | # changed. | |
2744 | results = self._run_annex_command_json( | |
2745 | 'copy', | |
2746 | args=annex_options, | |
2747 | #log_stdout=True, log_stderr=not log_online, | |
2748 | #log_online=log_online, expect_stderr=True | |
2749 | ) | |
2750 | results = list(results) | |
2837 | ||
2838 | cm = swallow_logs() \ | |
2839 | if lgr.getEffectiveLevel() > logging.DEBUG \ | |
2840 | else nothing_cm() | |
2841 | # TODO: provide more meaningful message (possibly aggregating 'note' | |
2842 | # from annex failed ones | |
2843 | with cm: | |
2844 | results = self._run_annex_command_json( | |
2845 | 'copy', | |
2846 | args=annex_options + copy_files, | |
2847 | jobs=jobs, | |
2848 | expected_entries=expected_copys | |
2849 | #log_stdout=True, log_stderr=not log_online, | |
2850 | #log_online=log_online, expect_stderr=True | |
2851 | ) | |
2852 | results_list = list(results) | |
2853 | # XXX this is the only logic different ATM from get | |
2751 | 2854 | # check if any transfer failed since then we should just raise an Exception |
2752 | 2855 | # for now to guarantee consistent behavior with non--json output |
2753 | 2856 | # see https://github.com/datalad/datalad/pull/1349#discussion_r103639456 |
2754 | 2857 | from operator import itemgetter |
2755 | failed_copies = [e['file'] for e in results if not e['success']] | |
2858 | failed_copies = [e['file'] for e in results_list if not e['success']] | |
2756 | 2859 | good_copies = [ |
2757 | e['file'] for e in results | |
2860 | e['file'] for e in results_list | |
2758 | 2861 | if e['success'] and |
2759 | 2862 | e.get('note', '').startswith('to ') # transfer did happen |
2760 | 2863 | ] |
2761 | 2864 | if failed_copies: |
2865 | # TODO: RF for new fancy scheme of outputs reporting | |
2762 | 2866 | raise IncompleteResultsError( |
2763 | 2867 | results=good_copies, failed=failed_copies, |
2764 | 2868 | msg="Failed to copy %d file(s)" % len(failed_copies)) |
25 | 25 | 'ERROR': RED |
26 | 26 | } |
27 | 27 | |
28 | RESULT_STATUS_COLORS = { | |
29 | 'ok': GREEN, | |
30 | 'notneeded': GREEN, | |
31 | 'impossible': YELLOW, | |
32 | 'error': RED | |
33 | } | |
34 | ||
28 | 35 | # Aliases for uniform presentation |
29 | 36 | |
30 | 37 | DATASET = UNDERLINE |
43 | 50 | return "%s%s%s" % (COLOR_SEQ % color, s, RESET_SEQ) \ |
44 | 51 | if ui.is_interactive \ |
45 | 52 | else s |
53 | ||
54 | ||
55 | def color_status(status): | |
56 | col = RESULT_STATUS_COLORS.get(status, None) | |
57 | return color_word(status, col) if col else status |
171 | 171 | return False |
172 | 172 | elif value in ('1', 'yes', 'on', 'enable', 'true'): |
173 | 173 | return True |
174 | raise ValueError("value must be converted to boolean") | |
174 | raise ValueError( | |
175 | "value '{}' must be convertible to boolean".format( | |
176 | value)) | |
175 | 177 | |
176 | 178 | def long_description(self): |
177 | 179 | return 'value must be convertible to type bool' |
892 | 892 | for f in re.findall("'(.*)'[\n$]", stdout)] |
893 | 893 | |
894 | 894 | @normalize_paths(match_return_type=False) |
895 | def remove(self, files, **kwargs): | |
895 | def remove(self, files, recursive=False, **kwargs): | |
896 | 896 | """Remove files. |
897 | 897 | |
898 | 898 | Calls git-rm. |
901 | 901 | ---------- |
902 | 902 | files: str |
903 | 903 | list of paths to remove |
904 | recursive: False | |
905 | either to allow recursive removal from subdirectories | |
904 | 906 | kwargs: |
905 | 907 | see `__init__` |
906 | 908 | |
912 | 914 | |
913 | 915 | files = _remove_empty_items(files) |
914 | 916 | |
917 | if recursive: | |
918 | kwargs['r'] = True | |
915 | 919 | stdout, stderr = self._git_custom_command( |
916 | 920 | files, ['git', 'rm'] + to_options(**kwargs)) |
917 | 921 | |
1158 | 1162 | # return [branch.strip() for branch in |
1159 | 1163 | # self.repo.git.branch(r=True).splitlines()] |
1160 | 1164 | |
1161 | def get_remotes(self, with_refs_only=False, with_urls_only=False): | |
1165 | def get_remotes(self, with_urls_only=False): | |
1162 | 1166 | """Get known remotes of the repository |
1163 | 1167 | |
1164 | 1168 | Parameters |
1165 | 1169 | ---------- |
1166 | with_refs_only : bool, optional | |
1167 | return only remotes with any refs. E.g. annex special remotes | |
1168 | would not have any refs | |
1170 | with_urls_only : bool, optional | |
1171 | return only remotes which have urls | |
1169 | 1172 | |
1170 | 1173 | Returns |
1171 | 1174 | ------- |
1172 | 1175 | remotes : list of str |
1173 | 1176 | List of names of the remotes |
1174 | 1177 | """ |
1175 | ||
1176 | # Note: This still uses GitPython and therefore might cause a gitpy.Repo | |
1177 | # instance to be created. | |
1178 | if with_refs_only: | |
1179 | # older versions of GitPython might not tolerate remotes without | |
1180 | # any references at all, so we need to catch | |
1181 | remotes = [] | |
1182 | for remote in self.repo.remotes: | |
1183 | try: | |
1184 | if len(remote.refs): | |
1185 | remotes.append(remote.name) | |
1186 | except AssertionError as exc: | |
1187 | if "not have any references" not in str(exc): | |
1188 | # was some other reason | |
1189 | raise | |
1190 | 1178 | |
1191 | 1179 | # Note: read directly from config and spare instantiation of gitpy.Repo |
1192 | 1180 | # since we need this in AnnexRepo constructor. Furthermore gitpy does it |
1417 | 1405 | return self._git_custom_command( |
1418 | 1406 | '', ['git', 'remote', 'remove', name] |
1419 | 1407 | ) |
1420 | ||
1421 | def show_remotes(self, name='', verbose=False): | |
1422 | """ | |
1423 | """ | |
1424 | ||
1425 | options = ["-v"] if verbose else [] | |
1426 | name = [name] if name else [] | |
1427 | out, err = self._git_custom_command( | |
1428 | '', ['git', 'remote'] + options + ['show'] + name | |
1429 | ) | |
1430 | return out.rstrip(linesep).splitlines() | |
1431 | 1408 | |
1432 | 1409 | def update_remote(self, name=None, verbose=False): |
1433 | 1410 | """ |
117 | 117 | doc.strip() |
118 | 118 | if len(doc) and not doc.endswith('.'): |
119 | 119 | doc += '.' |
120 | if self.constraints is not None: | |
121 | cdoc = self.constraints.long_description() | |
122 | if cdoc[0] == '(' and cdoc[-1] == ')': | |
123 | cdoc = cdoc[1:-1] | |
124 | addinfo = '' | |
125 | if self.cmd_kwargs.get('nargs', None) == '?' \ | |
126 | or self.cmd_kwargs.get('action', None) == 'append': | |
127 | addinfo = 'list expected, each ' | |
128 | doc += ' Constraints: %s%s.' % (addinfo, cdoc) | |
129 | 120 | if has_default: |
130 | 121 | doc += " [Default: %r]" % (default,) |
131 | 122 | # Explicitly deal with multiple spaces, for some reason |
20 | 20 | |
21 | 21 | from datalad.support.param import Parameter |
22 | 22 | from datalad.interface.base import Interface |
23 | from datalad.interface.utils import build_doc | |
23 | from datalad.interface.base import build_doc | |
24 | 24 | |
25 | 25 | from datalad import ssh_manager |
26 | 26 |
284 | 284 | ar.get('test-annex.dat', options=["--from=NotExistingRemote"]) |
285 | 285 | eq_(cme.exception.remote, "NotExistingRemote") |
286 | 286 | |
287 | # and similar one whenever invoking with remote parameter | |
288 | with assert_raises(RemoteNotAvailableError) as cme: | |
289 | ar.get('test-annex.dat', remote="NotExistingRemote") | |
290 | eq_(cme.exception.remote, "NotExistingRemote") | |
291 | ||
287 | 292 | |
288 | 293 | # 1 is enough to test file_has_content |
289 | 294 | @with_batch_direct |
482 | 487 | @with_tempfile |
483 | 488 | def test_AnnexRepo_migrating_backends(src, dst): |
484 | 489 | ar = AnnexRepo.clone(src, dst, backend='MD5') |
490 | eq_(ar.default_backends, ['MD5']) | |
485 | 491 | # GitPython has a bug which causes .git/config being wiped out |
486 | 492 | # under Python3, triggered by collecting its config instance I guess |
487 | 493 | gc.collect() |
1161 | 1167 | # Test that if we pass a list of items and annex processes them nicely, |
1162 | 1168 | # we would obtain a list back. To not stress our tests even more -- let's mock |
1163 | 1169 | def ok_copy(command, **kwargs): |
1170 | # Check that we do pass to annex call only the list of files which we | |
1171 | # asked to be copied | |
1172 | assert_in('copied1', kwargs['annex_options']) | |
1173 | assert_in('copied2', kwargs['annex_options']) | |
1174 | assert_in('existed', kwargs['annex_options']) | |
1164 | 1175 | return """ |
1165 | 1176 | {"command":"copy","note":"to target ...", "success":true, "key":"akey1", "file":"copied1"} |
1166 | 1177 | {"command":"copy","note":"to target ...", "success":true, "key":"akey2", "file":"copied2"} |
1173 | 1184 | # now let's test that we are correctly raising the exception in case if |
1174 | 1185 | # git-annex execution fails |
1175 | 1186 | orig_run = repo._run_annex_command |
1187 | ||
1188 | # Kinda a bit off the reality since no nonex* would not be returned/handled | |
1189 | # by _get_expected_files, so in real life -- wouldn't get report about Incomplete!? | |
1176 | 1190 | def fail_to_copy(command, **kwargs): |
1177 | 1191 | if command == 'copy': |
1178 | 1192 | # That is not how annex behaves |
1190 | 1204 | else: |
1191 | 1205 | return orig_run(command, **kwargs) |
1192 | 1206 | |
1193 | with patch.object(repo, '_run_annex_command', fail_to_copy): | |
1207 | def fail_to_copy_get_expected(files, expr): | |
1208 | assert files == ["copied", "existed", "nonex1", "nonex2"] | |
1209 | return {'akey1': 10}, ["copied"] | |
1210 | ||
1211 | with patch.object(repo, '_run_annex_command', fail_to_copy), \ | |
1212 | patch.object(repo, '_get_expected_files', fail_to_copy_get_expected): | |
1194 | 1213 | with assert_raises(IncompleteResultsError) as cme: |
1195 | 1214 | repo.copy_to(["copied", "existed", "nonex1", "nonex2"], "target") |
1196 | 1215 | eq_(cme.exception.results, ["copied"]) |
2119 | 2138 | def test_AnnexRepo_flyweight_monitoring_inode(path, store): |
2120 | 2139 | # testing for issue #1512 |
2121 | 2140 | check_repo_deals_with_inode_change(AnnexRepo, path, store) |
2141 | ||
2142 | ||
2143 | @with_tempfile(mkdir=True) | |
2144 | def test_fake_is_not_special(path): | |
2145 | ar = AnnexRepo(path, create=True) | |
2146 | # doesn't exist -- we fail by default | |
2147 | assert_raises(RemoteNotAvailableError, ar.is_special_annex_remote, "fake") | |
2148 | assert_false(ar.is_special_annex_remote("fake", check_if_known=False)) |
347 | 347 | def test_GitRepo_remote_add(orig_path, path): |
348 | 348 | |
349 | 349 | gr = GitRepo.clone(orig_path, path) |
350 | out = gr.show_remotes() | |
350 | out = gr.get_remotes() | |
351 | 351 | assert_in('origin', out) |
352 | 352 | eq_(len(out), 1) |
353 | 353 | gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1') |
354 | out = gr.show_remotes() | |
354 | out = gr.get_remotes() | |
355 | 355 | assert_in('origin', out) |
356 | 356 | assert_in('github', out) |
357 | 357 | eq_(len(out), 2) |
358 | out = gr.show_remotes('github') | |
359 | assert_in(' Fetch URL: git://github.com/datalad/testrepo--basic--r1', out) | |
358 | eq_('git://github.com/datalad/testrepo--basic--r1', gr.config['remote.github.url']) | |
360 | 359 | |
361 | 360 | |
362 | 361 | @with_testrepos(flavors=local_testrepo_flavors) |
366 | 365 | gr = GitRepo.clone(orig_path, path) |
367 | 366 | gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1') |
368 | 367 | gr.remove_remote('github') |
369 | out = gr.show_remotes() | |
368 | out = gr.get_remotes() | |
370 | 369 | eq_(len(out), 1) |
371 | 370 | assert_in('origin', out) |
372 | ||
373 | ||
374 | @with_testrepos(flavors=local_testrepo_flavors) | |
375 | @with_tempfile | |
376 | def test_GitRepo_remote_show(orig_path, path): | |
377 | ||
378 | gr = GitRepo.clone(orig_path, path) | |
379 | gr.add_remote('github', 'git://github.com/datalad/testrepo--basic--r1') | |
380 | out = gr.show_remotes(verbose=True) | |
381 | eq_(len(out), 4) | |
382 | assert_in('origin\t%s (fetch)' % orig_path, out) | |
383 | assert_in('origin\t%s (push)' % orig_path, out) | |
384 | # Some fellas might have some fancy rewrite rules for pushes, so we can't | |
385 | # just check for specific protocol | |
386 | assert_re_in('github\tgit(://|@)github.com[:/]datalad/testrepo--basic--r1 \(fetch\)', | |
387 | out) | |
388 | assert_re_in('github\tgit(://|@)github.com[:/]datalad/testrepo--basic--r1 \(push\)', | |
389 | out) | |
390 | 371 | |
391 | 372 | |
392 | 373 | @with_testrepos(flavors=local_testrepo_flavors) |
63 | 63 | # constraints |
64 | 64 | p = Parameter(doc=doc, constraints=cnstr.EnsureInt() | cnstr.EnsureStr()) |
65 | 65 | autodoc = p.get_autodoc('testname') |
66 | assert_true("convertible to type 'int'" in autodoc) | |
67 | assert_true('must be a string' in autodoc) | |
68 | 66 | assert_true('int or str' in autodoc) |
69 | 67 | |
70 | 68 | with assert_raises(ValueError) as cmr: |
12 | 12 | from os.path import lexists, dirname, join as opj, curdir |
13 | 13 | |
14 | 14 | # Hard coded version, to be done by release process |
15 | __version__ = '0.6.0.dev1' | |
15 | __version__ = '0.8.0' | |
16 | 16 | |
17 | 17 | # NOTE: might cause problems with "python setup.py develop" deployments |
18 | 18 | # so I have even changed buildbot to use pip install -e . |
25 | 25 | generated/man/datalad-create-sibling |
26 | 26 | generated/man/datalad-create-sibling-github |
27 | 27 | generated/man/datalad-drop |
28 | generated/man/datalad-export | |
28 | generated/man/datalad-plugin | |
29 | 29 | generated/man/datalad-get |
30 | 30 | generated/man/datalad-install |
31 | 31 | generated/man/datalad-publish |
0 | .. -*- mode: rst -*- | |
1 | .. vi: set ft=rst sts=4 ts=4 sw=4 et tw=79: | |
2 | ||
3 | .. _chap_customization: | |
4 | ||
5 | ******************************************** | |
6 | Customization and extension of functionality | |
7 | ******************************************** | |
8 | ||
9 | DataLad provides numerous commands that cover many use cases. However, there will | |
10 | always be a demand for further customization at a particular site, or for an | |
11 | individual user. DataLad addresses this need by providing a generic plugin | |
12 | interface. | |
13 | ||
14 | First of all, DataLad plugins can be executed via the :ref:`man_datalad-plugin` | |
15 | command. This allows for executing arbitrary plugins (on particular dataset) | |
16 | at any point in time. | |
17 | ||
18 | In addition, DataLad can be configured to run any number of plugins prior or | |
19 | after particular commands. For example, it is possible to execute a plugin | |
20 | each time DataLad has created a dataset to configure it so that all files | |
21 | that are added to its ``code/`` subdirectory will always be managed directly | |
22 | with Git and not be put into the dataset's annex. In order to achieve this, | |
23 | adjust your Git configuration in the following way:: | |
24 | ||
25 | git config --global --add datalad.create.run-after 'no_annex pattern=code/**' | |
26 | ||
27 | This will cause DataLad to run the ``no_annex`` plugin to add the given pattern | |
28 | to the dataset's ``.gitattribute`` file, which in turn instructs git annex to | |
29 | send any matching files directly to Git. The same functionality is available | |
30 | for ad-hoc adjustments via the ``--run-after`` option supported by most | |
31 | commands. | |
32 | ||
33 | Analog to ``--run-after`` DataLad also supports ``--run-before`` to execute | |
34 | plugins prior a command. | |
35 | ||
36 | DataLad will discover plugins at three locations: | |
37 | ||
38 | 1. official plugins that are part of the local DataLad installation | |
39 | ||
40 | 2. system-wide plugins, provided by the local admin | |
41 | ||
42 | The location where plugins need to be placed depends on the platform. | |
43 | On GNU/Linux systems this will be ``/etc/xdg/datalad/plugins``, whereas | |
44 | on Windows it will be ``C:\ProgramData\datalad.org\datalad\plugins``. | |
45 | ||
46 | This default location can be overridden by setting the | |
47 | ``datalad.locations.system-plugins`` configuration variable in the local or | |
48 | global Git configuration. | |
49 | ||
50 | 3. user-supplied plugins, customizable by each user | |
51 | ||
52 | Again, the location will depend on the platform. On GNU/Linux systems this | |
53 | will be ``$HOME/.config/datalad/plugins``, whereas on Windows it will be | |
54 | ``C:\Users\<username>\AppData\Local\datalad.org\datalad\plugins``. | |
55 | ||
56 | This default location can be overridden by setting the | |
57 | ``datalad.locations.user-plugins`` configuration variable in the local or | |
58 | global Git configuration. | |
59 | ||
60 | Identically named plugins in latter location replace those in locations | |
61 | searched before. This can be used to alter the behavior of plugins provided | |
62 | with DataLad, and enables users to adjust a site-wide configuration. | |
63 | ||
64 | ||
65 | Writing own plugins | |
66 | =================== | |
67 | ||
68 | Plugins are written in Python. In order for DataLad to be able to find | |
69 | them, plugins need to be placed in one of the supported locations described | |
70 | above. Plugin file names have to have a '.py' extensions and must not start | |
71 | with an underscore ('_'). | |
72 | ||
73 | Plugin source files must define a function named:: | |
74 | ||
75 | dlplugin | |
76 | ||
77 | This function is executed as the plugin. It can have any number of | |
78 | arguments (positional, or keyword arguments with defaults), or none at | |
79 | all. All arguments, except ``dataset`` must expect any value to | |
80 | be a string. | |
81 | ||
82 | The plugin function must be self-contained, i.e. all needed imports | |
83 | of definitions must be done within the body of the function. | |
84 | ||
85 | The doc string of the plugin function is displayed when the plugin | |
86 | documentation is requested. The first line in a plugin file that starts | |
87 | with triple double-quotes will be used as the plugin short description | |
88 | (this will typically be the docstring of the module file). This short | |
89 | description is displayed as the plugin synopsis in the plugin overview | |
90 | list. | |
91 | ||
92 | Plugin functions must yield their results as a Python generator. Results are | |
93 | DataLad status dictionaries. There are no constraints on the number of results, | |
94 | or the number and nature of result properties. However, conventions exists and | |
95 | must be followed for compatibility with the result evaluation and rendering | |
96 | performed by DataLad. | |
97 | ||
98 | The following property keys must exist: | |
99 | ||
100 | "status" | |
101 | {'ok', 'notneeded', 'impossible', 'error'} | |
102 | ||
103 | "action" | |
104 | label for the action performed by the plugin. In many cases this | |
105 | could be the plugin's name. | |
106 | ||
107 | The following keys should exists if possible: | |
108 | ||
109 | "path" | |
110 | absolute path to a result on the file system | |
111 | ||
112 | "type" | |
113 | label indicating the nature of a result (e.g. 'file', 'dataset', | |
114 | 'directory', etc.) | |
115 | ||
116 | "message" | |
117 | string message annotating the result, particularly important for | |
118 | non-ok results. This can be a tuple with 'logging'-style string | |
119 | expansion. |
11 | 11 | |
12 | 12 | Datalad is a Python package and can be installed via pip_, which is the |
13 | 13 | preferred method unless system packages are available for the target platform |
14 | (see below):: | |
14 | (see below). To automatically install datalad and all its software dependencies | |
15 | type:: | |
15 | 16 | |
16 | 17 | pip install datalad |
17 | 18 | |
18 | 19 | .. _pip: https://pip.pypa.io |
19 | 20 | |
20 | This will automatically install all software dependencies necessary to provide | |
21 | core functionality. Several additional installation schemes are supported | |
22 | (e.g., ``publish``, ``metadata``, ``tests``, ``crawl``):: | |
21 | Several additional installation schemes are supported (``[SCHEME]`` can be e.g. | |
22 | ``publish``, ``metadata``, ``tests`` or ``crawl``):: | |
23 | 23 | |
24 | pip install datalad[SCHEME] | |
25 | ||
26 | where ``SCHEME`` can be any supported scheme, such as the ones listed above. | |
24 | pip install datalad [SCHEME] | |
25 | ||
26 | .. cool, but why should I (or a first-time reader) even bother about the schemes? | |
27 | 27 | |
28 | 28 | In addition, it is necessary to have a working installation of git-annex_, |
29 | 29 | which is not set up automatically at this point. |
38 | 38 | package:: |
39 | 39 | |
40 | 40 | sudo apt-get install datalad |
41 | ||
42 | A current version of git-annex (as also provided by the NeuroDebian_ | |
43 | repository) can be installed by typing:: | |
44 | ||
45 | sudo apt-get install git-annex | |
41 | 46 | |
42 | 47 | .. _neurodebian: http://neuro.debian.net |
43 | 48 | |
59 | 64 | First steps |
60 | 65 | =========== |
61 | 66 | |
62 | After datalad is installed it can be queried for information about known | |
63 | datasets. For example, we might want to look for dataset thats were funded by, | |
64 | or acknowledge the US National Science Foundation (NSF):: | |
67 | Datalad can be queried for information about known datasets. Doing a first search | |
68 | query, datalad automatically offers assistence to obtain a :term:`superdataset` first. | |
69 | The superdataset is a lightweight container that contains meta information about known datasets but does not contain actual data itself. | |
70 | ||
71 | For example, we might want to look for dataset thats were funded by, or acknowledge the US National Science Foundation (NSF):: | |
65 | 72 | |
66 | 73 | ~ % datalad search NSF |
67 | 74 | No DataLad dataset found at current location |
75 | 82 | ~/datalad/openfmri/ds000003 |
76 | 83 | ... |
77 | 84 | |
78 | On first attempt, datalad offers assistence to obtain a :term:`superdataset` | |
79 | with information on all datasets it knows about. This is a lightweight | |
80 | container that does not actually contain data, but meta information only. Once | |
81 | downloaded queries can be made offline. | |
82 | ||
83 | 85 | Any known dataset can now be installed inside the local superdataset with a |
84 | 86 | command like this:: |
85 | 87 |
27 | 27 | api.create_sibling |
28 | 28 | api.create_sibling_github |
29 | 29 | api.drop |
30 | api.export | |
30 | api.plugin | |
31 | 31 | api.get |
32 | 32 | api.install |
33 | 33 | api.publish |
76 | 76 | api.crawl |
77 | 77 | api.crawl_init |
78 | 78 | api.test |
79 | ||
80 | Plugins | |
81 | ------- | |
82 | ||
83 | DataLad can be customized by plugins. The following plugins are shipped | |
84 | with DataLad. | |
85 | ||
86 | .. currentmodule:: datalad.plugin | |
87 | .. autosummary:: | |
88 | :toctree: generated | |
89 | ||
90 | add_readme | |
91 | export_tarball | |
92 | no_annex | |
93 | wtf | |
79 | 94 | |
80 | 95 | |
81 | 96 | Support functionality |
0 | 0 | # Theoretically we don't want -e here but ATM pip would puke if just .[full] is provided |
1 | 1 | # Since we use requirements.txt ATM only for development IMHO it is ok but |
2 | 2 | # we need to figure out/complaint to pip folks |
3 | # For now, until https://github.com/GrahamDumpleton/wrapt/issues/98 resolved | |
4 | # we should use our version which allows to disable extension(s) | |
5 | git+https://github.com/yarikoptic/wrapt@develop | |
3 | 6 | -e .[devel] |
7 |