Commit 4cddfbdf31b8f9dffe315f9124a74e376a927c9c - ctdconverter

Merge pull request #30 from WorkflowConversion/cwl_support Added CWL Support chahuistle authored 6 years ago GitHub committed 6 years ago

16 changed file(s) with 1899 addition(s) and 1655 deletion(s). Raw diff Collapse all Expand all

+160

-13

README.md less more

0	0	# CTDConverter
1
2	1	Given one or more CTD files, `CTD2Converter` generates the needed wrappers to include them in workflow engines, such as Galaxy and CWL.
3	2
4	3	## Dependencies
	4	`CTDConverter` has the following python dependencies:
5	5
6		`CTDConverter` relies on [CTDopts]. The dependencies of each of the converters are as follows:
	6	- [CTDopts]
	7	- `lxml`
	8	- `ruamel.yaml`
7	9
8		### Galaxy Converter
	10	### Installing Dependencies
	11	We recommend the use of `conda` to manage all dependencies. If you're not sure what `conda` is, make sure to read the [using-conda](conda documentation).
9	12
10		- Generation of Galaxy ToolConfig files relies on `lxml` to generate nice-looking XML files.
	13	The easiest way to get you started with CTD conversion is to create a `conda` environment on which you'll install all dependencies. Using environments in `conda` allows you to have parallel, independent python environments, thus avoiding conflicts between libraries. If you haven't installed `conda`, check [conda-install](conda's installation guide).
11	14
12		## Installing Dependencies
13		You can install the [CTDopts] and `lxml` modules via `conda`, like so:
	15	Once you've installed `conda`, create an environment named `ctd-converter`, like so:
14	16
15	17	```sh
16		$ conda install lxml
17		$ conda install -c workflowconversion ctdopts
	18	$ conda create --name ctd-converter
18	19	```
19	20
20		Note that the [CTDopts] module is available on the `workflowconversion` channel.
	21	You will now need to activate the environment by executing the following command:
21	22
22		Of course, you can just download [CTDopts] and make it available through your `PYTHONPATH` environment variable. To get more information about how to install python modules, visit: https://docs.python.org/2/install/.
	23	```sh
	24	$ source activate ctd-converter
	25	```
23	26
	27	Install the required dependencies as follows (the order of execution is actually important, due to transitive dependencies):
24	28
25		## How to install CTDConverter
	29	```sh
	30	$ conda install --channel workflowconversion ctdopts
	31	$ conda install lxml
	32	$ conda install --channel conda-forge ruamel.yaml
	33	$ conda install libxml2=2.9.2
	34	```
26	35
27		1. Download the source code from https://github.com/genericworkflownodes/CTDConverter.
	36	`lxml` depends on `libxml2`. When you install `lxml` you'll get the latest version of `libxml2` (2.9.4) by default. You would usually want the latest version, but there is, however, a bug in validating XML files against a schema in this version of `libxml2`.
	37
	38	If you require validation of input CTDs against a schema (which we recommend), you will need to downgrade to the latest known version of `libxml2` that works, namely, 2.9.2.
	39
	40	You could just download dependencies manually and make them available through your `PYTHONPATH` environment variable, if you're into that. To get more information about how to install python modules without using `conda`, visit: https://docs.python.org/2/install/.
	41
	42	## How to install `CTDConverter`
	43	`CTDConverter` is not a python module, rather, a series of scripts, so installing it is as easy as downloading the source code from https://github.com/genericworkflownodes/CTDConverter. Once you've installed all dependencies, downloaded `CTDConverter` and activated your `conda` environment, you're good to go.
28	44
29	45	## Usage
	46	The first thing that you need to tell `CTDConverter` is the output format of the converted wrappers. `CTDConverter` supports conversion of CTDs into Galaxy and CWL. Invoking it is as simple as follows:
30	47
31		Check the detailed documentation for each of the converters:
	48	$ python convert.py [FORMAT] [ADDITIONAL_PARAMETERS ...]
	49
	50	Here `[FORMAT]` can be any of the supported formats (i.e., `cwl`, `galaxy`). `CTDConverter` offers a series of format-specific scripts and we've designed these scripts to behave somewhat similarly. All converter scripts have the same core functionality, that is, read CTD files, parse them using [CTDopts], validate against a schema, etc. Of course, each converter script might add extra functionality that is not present in other engines. Only the Galaxy converter script supports generation of a `tool_conf.xml` file, for instance.
	51
	52	The following sections in this file describe the parameters that all converter scripts share.
	53
	54	Please refer to the detailed documentation for each of the converters for more information:
32	55
33	56	- [Generation of Galaxy ToolConfig files](galaxy/README.md)
	57	- [Generation of CWL task files](cwl/README.md)
34	58
	59	## Fail Policy while processing several Files
	60	`CTDConverter` can parse several CTDs and convert them. However, the process will be interrupted and an error code will be returned at the first encountered error (e.g., a CTD is not valid, there are missing support files, etc.).
	61
	62	## Converting a single CTD
	63	In its simplest form, the converter takes an input CTD file and generates an output file. The following usage of `CTDConverter`:
	64
	65	$ python convert.py [FORMAT] -i /data/sample_input.ctd -o /data/sample_output.xml
	66
	67	will parse `/data/sample_input.ctd` and generate an appropriate converted file under `/data/sample_output.xml`. The generated file can be added to your workflow engine as usual.
	68
	69	## Converting several CTDs
	70	When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
	71
	72	$ python convert.py [FORMAT] -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/converted-files
	73
	74	Will convert `/data/ctds/one.ctd` into `/data/converted-files/one.[EXT]` and `/data/ctds/two.ctd` into `/data/converted-files/two.[EXT]`. Each converter has a preferred extension, here shown as a variable (`[EXT]`). Galaxy prefers `xml`, while CWL prefers `cwl`.
	75
	76	You can use wildcard expansion, as supported by most modern operating systems:
	77
	78	$ python convert.py [FORMAT] -i /data/ctds/*.ctd -o /data/converted-files
	79
	80	## Common Parameters
	81	### Input File(s)
	82	* Purpose: Provide input CTD file(s) to convert.
	83	* Short/long version: `-i` / `--input`
	84	* Required: yes.
	85	* Taken values: a list of input CTD files.
	86
	87	Examples:
	88
	89	Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
	90
	91	$ python convert.py [FORMAT] -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
	92	$ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
	93	$ python convert.py [FORMAT] --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
	94	$ python convert.py [FORMAT] --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
	95
	96	The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
	97
	98	$ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
	99
	100	Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Let's assume that the folder `/data/ctds` contains three files: `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/wrappers` folder:
	101
	102	$ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/wrappers
	103	$ python convert.py [FORMAT] -i /data/*.ctd -o /data/wrappers
	104
	105	### Output Destination
	106	* Purpose: Provide output destination for the converted wrapper files.
	107	* Short/long version: `-o` / `--output-destination`
	108	* Required: yes.
	109	* Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder in which all converted CTDs will be written is expected.
	110
	111	Examples:
	112
	113	A single input is given, and the output will be generated into `/data/output.xml`:
	114
	115	$ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
	116
	117	Several inputs are given. The output is the already existent folder, `/data/wrappers`, and at the end of the operation, the files `/data/wrappers/input_one.[EXT]` and `/data/wrappers/input_two.[EXT]` will be generated:
	118
	119	$ python convert.py [FORMAT] -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
	120
	121	Please note that the output file name is not taken from the name of the input file, rather from the name of the tool, that is, from the `name` attribute in the `<tool>` element in its corresponding CTD. By convention, the name of the CTD file and the name of the tool match.
	122
	123	### Blacklisting Parameters
	124	* Purpose: Some parameters present in the CTD are not to be exposed on the output files. Think of parameters such as `--help`, `--debug` that might won't make much sense to be exposed to final users in a workflow management system.
	125	* Short/long version: `-b` / `--blacklist-parameters`
	126	* Required: no.
	127	* Taken values: A list of parameters to be blacklisted.
	128
	129	Example:
	130
	131	$ pythonconvert.py [FORMAT] ... -b h help quiet
	132
	133	In this case, `CTDConverter` will not process any of the parameters named `h`, `help`, or `quiet`, that is, they will not appear in the generated output files.
	134
	135	### Schema Validation
	136	* Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
	137	* Short/long version: `-V` / `--validation-schema`
	138	* Required: no.
	139	* Taken values: location of the schema file (e.g., CTD.xsd).
	140
	141	CTDs can be validated against a schema. The master version of the schema can be found on [CTDSchema].
	142
	143	If a schema is provided, all input CTDs will be validated against it.
	144
	145	NOTE: Please make sure to read the [section on issues with schema validation](#issues-with-libxml2-and-schema-validation) if you require validation of CTDs against a schema.
	146
	147	### Hardcoding Parameters
	148	* Purpose: Fixing the value of a parameter and hide it from the end user.
	149	* Short/long version: `-p` / `--hardcoded-parameters`
	150	* Required: no.
	151	* Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use.
	152
	153	It is sometimes required that parameters are hidden from the end user in workflow systems and that they take a predetermined, fixed value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `-p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. Only the first two columns are mandatory.
	154
	155	If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
	156
	157	Lines starting with `#` will be ignored. The following is an example of a valid file:
	158
	159	# Parameter name # Value # Tool(s)
	160	threads 8
	161	mode quiet
	162	xtandem_executable xtandem XTandemAdapter
	163	verbosity high Foo, Bar
	164
	165	The parameters `threads` and `mode` will be set to `8` and `quiet`, respectively, for all parsed CTDs. However, the `xtandem_executable` parameter will be set to `xtandem` only for the `XTandemAdapter` tool. Similarly, the parameter `verbosity` will be set to `high` for the `Foo` and `Bar` tools only.
	166
	167	### Providing a default executable Path
	168	* Purpose: Help workflow engines locate tools by providing a path.
	169	* Short/long version: `-x` / `--default-executable-path`
	170	* Required: no.
	171	* Taken values: The default executable path of the tools in the target workflow engine.
	172
	173	CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the appropriate sections in the output files.
	174
	175	The following invocation of the converter will use `/opt/suite/bin` as a prefix when providing the executable path in the output files for any input CTD that lacks the `<executablePath>` section:
	176
	177	$ python convert.py [FORMAT] -x /opt/suite/bin ...
	178
35	179
36	180	[CTDopts]: https://github.com/genericworkflownodes/CTDopts
	181	[CTDSchema]: https://github.com/WorkflowConversion/CTDSchema
	182	[conda-install]: https://conda.io/docs/install/quick.html
	183	[using-conda]: https://conda.io/docs/using/envs.html⏎

-0

common/__init__.py less more

(New empty file)

+45

-0

common/exceptions.py less more

	0	#!/usr/bin/env python
	1	# encoding: utf-8
	2
	3	"""
	4	@author: delagarza
	5	"""
	6
	7	from CTDopts.CTDopts import ModelError
	8
	9
	10	class CLIError(Exception):
	11	# Generic exception to raise and log different fatal errors.
	12	def __init__(self, msg):
	13	super(CLIError).__init__(type(self))
	14	self.msg = "E: %s" % msg
	15
	16	def __str__(self):
	17	return self.msg
	18
	19	def __unicode__(self):
	20	return self.msg
	21
	22
	23	class InvalidModelException(ModelError):
	24	def __init__(self, message):
	25	super(InvalidModelException, self).__init__()
	26	self.message = message
	27
	28	def __str__(self):
	29	return self.message
	30
	31	def __repr__(self):
	32	return self.message
	33
	34
	35	class ApplicationException(Exception):
	36	def __init__(self, msg):
	37	super(ApplicationException).__init__(type(self))
	38	self.msg = msg
	39
	40	def __str__(self):
	41	return self.msg
	42
	43	def __unicode__(self):
	44	return self.msg⏎

+23

-0

common/logger.py less more

	0	#!/usr/bin/env python
	1	# encoding: utf-8
	2	import sys
	3
	4	MESSAGE_INDENTATION_INCREMENT = 2
	5
	6
	7	def _get_indented_text(text, indentation_level):
	8	return ("%(indentation)s%(text)s" %
	9	{"indentation": " " * (MESSAGE_INDENTATION_INCREMENT * indentation_level),
	10	"text": text})
	11
	12
	13	def warning(warning_text, indentation_level=0):
	14	sys.stdout.write(_get_indented_text("WARNING: %s\n" % warning_text, indentation_level))
	15
	16
	17	def error(error_text, indentation_level=0):
	18	sys.stderr.write(_get_indented_text("ERROR: %s\n" % error_text, indentation_level))
	19
	20
	21	def info(info_text, indentation_level=0):
	22	sys.stdout.write(_get_indented_text("INFO: %s\n" % info_text, indentation_level))

+318

-0

common/utils.py less more

	0	#!/usr/bin/env python
	1	# encoding: utf-8
	2	import ntpath
	3	import os
	4
	5	from lxml import etree
	6	from string import strip
	7	from logger import info, error, warning
	8
	9	from common.exceptions import ApplicationException
	10	from CTDopts.CTDopts import CTDModel, ParameterGroup
	11
	12
	13	MESSAGE_INDENTATION_INCREMENT = 2
	14
	15
	16	# simple struct-class containing a tuple with input/output location and the in-memory CTDModel
	17	class ParsedCTD:
	18	def __init__(self, ctd_model=None, input_file=None, suggested_output_file=None):
	19	self.ctd_model = ctd_model
	20	self.input_file = input_file
	21	self.suggested_output_file = suggested_output_file
	22
	23
	24	class ParameterHardcoder:
	25	def __init__(self):
	26	# map whose keys are the composite names of tools and parameters in the following pattern:
	27	# [ToolName][separator][ParameterName] -> HardcodedValue
	28	# if the parameter applies to all tools, then the following pattern is used:
	29	# [ParameterName] -> HardcodedValue
	30
	31	# examples (assuming separator is '#'):
	32	# threads -> 24
	33	# XtandemAdapter#adapter -> xtandem.exe
	34	# adapter -> adapter.exe
	35	self.separator = "!"
	36	self.parameter_map = {}
	37
	38	# the most specific value will be returned in case of overlap
	39	def get_hardcoded_value(self, parameter_name, tool_name):
	40	# look for the value that would apply for all tools
	41	generic_value = self.parameter_map.get(parameter_name, None)
	42	specific_value = self.parameter_map.get(self.build_key(parameter_name, tool_name), None)
	43	if specific_value is not None:
	44	return specific_value
	45
	46	return generic_value
	47
	48	def register_parameter(self, parameter_name, parameter_value, tool_name=None):
	49	self.parameter_map[self.build_key(parameter_name, tool_name)] = parameter_value
	50
	51	def build_key(self, parameter_name, tool_name):
	52	if tool_name is None:
	53	return parameter_name
	54	return "%s%s%s" % (parameter_name, self.separator, tool_name)
	55
	56
	57	def validate_path_exists(path):
	58	if not os.path.isfile(path) or not os.path.exists(path):
	59	raise ApplicationException("The provided path (%s) does not exist or is not a valid file path." % path)
	60
	61
	62	def validate_argument_is_directory(args, argument_name):
	63	file_name = getattr(args, argument_name)
	64	if file_name is not None and os.path.isdir(file_name):
	65	raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
	66
	67
	68	def validate_argument_is_valid_path(args, argument_name):
	69	paths_to_check = []
	70	# check if we are handling a single file or a list of files
	71	member_value = getattr(args, argument_name)
	72	if member_value is not None:
	73	if isinstance(member_value, list):
	74	for file_name in member_value:
	75	paths_to_check.append(strip(str(file_name)))
	76	else:
	77	paths_to_check.append(strip(str(member_value)))
	78
	79	for path_to_check in paths_to_check:
	80	validate_path_exists(path_to_check)
	81
	82
	83	# taken from
	84	# http://stackoverflow.com/questions/8384737/python-extract-file-name-from-path-no-matter-what-the-os-path-format
	85	def get_filename(path):
	86	head, tail = ntpath.split(path)
	87	return tail or ntpath.basename(head)
	88
	89
	90	def get_filename_without_suffix(path):
	91	root, ext = os.path.splitext(os.path.basename(path))
	92	return root
	93
	94
	95	def parse_input_ctds(xsd_location, input_ctds, output_destination, output_file_extension):
	96	is_converting_multiple_ctds = len(input_ctds) > 1
	97	parsed_ctds = []
	98	schema = None
	99	if xsd_location is not None:
	100	try:
	101	info("Loading validation schema from %s" % xsd_location, 0)
	102	schema = etree.XMLSchema(etree.parse(xsd_location))
	103	except Exception, e:
	104	error("Could not load validation schema %s. Reason: %s" % (xsd_location, str(e)), 0)
	105	else:
	106	warning("Validation against a schema has not been enabled.", 0)
	107
	108	for input_ctd in input_ctds:
	109	if schema is not None:
	110	validate_against_schema(input_ctd, schema)
	111
	112	output_file = output_destination
	113	# if multiple inputs are being converted, we need to generate a different output_file for each input
	114	if is_converting_multiple_ctds:
	115	output_file = os.path.join(output_file, get_filename_without_suffix(input_ctd) + "." + output_file_extension)
	116	info("Parsing %s" % input_ctd)
	117	parsed_ctds.append(ParsedCTD(CTDModel(from_file=input_ctd), input_ctd, output_file))
	118
	119	return parsed_ctds
	120
	121
	122	def flatten_list_of_lists(args, list_name):
	123	setattr(args, list_name, [item for sub_list in getattr(args, list_name) for item in sub_list])
	124
	125
	126	def validate_against_schema(ctd_file, schema):
	127	try:
	128	parser = etree.XMLParser(schema=schema)
	129	etree.parse(ctd_file, parser=parser)
	130	except etree.XMLSyntaxError, e:
	131	raise ApplicationException("Invalid CTD file %s. Reason: %s" % (ctd_file, str(e)))
	132
	133
	134	def add_common_parameters(parser, version, last_updated):
	135	parser.add_argument("FORMAT", default=None, help="Output format (mandatory). Can be one of: cwl, galaxy.")
	136	parser.add_argument("-i", "--input", dest="input_files", default=[], required=True, nargs="+", action="append",
	137	help="List of CTD files to convert.")
	138	parser.add_argument("-o", "--output-destination", dest="output_destination", required=True,
	139	help="If multiple input files are given, then a folder in which all converted "
	140	"files will be generated is expected; "
	141	"if a single input file is given, then a destination file is expected.")
	142	parser.add_argument("-x", "--default-executable-path", dest="default_executable_path",
	143	help="Use this executable path when <executablePath> is not present in the CTD",
	144	default=None, required=False)
	145	parser.add_argument("-b", "--blacklist-parameters", dest="blacklisted_parameters", default=[], nargs="+",
	146	action="append",
	147	help="List of parameters that will be ignored and won't appear on the galaxy stub",
	148	required=False)
	149	parser.add_argument("-p", "--hardcoded-parameters", dest="hardcoded_parameters", default=None, required=False,
	150	help="File containing hardcoded values for the given parameters. Run with '-h' or '--help' "
	151	"to see a brief example on the format of this file.")
	152	parser.add_argument("-V", "--validation-schema", dest="xsd_location", default=None, required=False,
	153	help="Location of the schema to use to validate CTDs. If not provided, no schema validation "
	154	"will take place.")
	155
	156	# TODO: add verbosity, maybe?
	157	program_version = "v%s" % version
	158	program_build_date = str(last_updated)
	159	program_version_message = "%%(prog)s %s (%s)" % (program_version, program_build_date)
	160	parser.add_argument("-v", "--version", action="version", version=program_version_message)
	161
	162
	163	def parse_hardcoded_parameters(hardcoded_parameters_file):
	164	parameter_hardcoder = ParameterHardcoder()
	165	if hardcoded_parameters_file is not None:
	166	line_number = 0
	167	with open(hardcoded_parameters_file) as f:
	168	for line in f:
	169	line_number += 1
	170	if line is None or not line.strip() or line.strip().startswith("#"):
	171	pass
	172	else:
	173	# the third column must not be obtained as a whole, and not split
	174	parsed_hardcoded_parameter = line.strip().split(None, 2)
	175	# valid lines contain two or three columns
	176	if len(parsed_hardcoded_parameter) != 2 and len(parsed_hardcoded_parameter) != 3:
	177	warning("Invalid line at line number %d of the given hardcoded parameters file. Line will be"
	178	"ignored:\n%s" % (line_number, line), 0)
	179	continue
	180
	181	parameter_name = parsed_hardcoded_parameter[0]
	182	hardcoded_value = parsed_hardcoded_parameter[1]
	183	tool_names = None
	184	if len(parsed_hardcoded_parameter) == 3:
	185	tool_names = parsed_hardcoded_parameter[2].split(',')
	186	if tool_names:
	187	for tool_name in tool_names:
	188	parameter_hardcoder.register_parameter(parameter_name, hardcoded_value, tool_name.strip())
	189	else:
	190	parameter_hardcoder.register_parameter(parameter_name, hardcoded_value)
	191
	192	return parameter_hardcoder
	193
	194
	195	def extract_tool_help_text(ctd_model):
	196	manual = ""
	197	doc_url = None
	198	if "manual" in ctd_model.opt_attribs.keys():
	199	manual += "%s\n\n" % ctd_model.opt_attribs["manual"]
	200	if "docurl" in ctd_model.opt_attribs.keys():
	201	doc_url = ctd_model.opt_attribs["docurl"]
	202
	203	help_text = "No help available"
	204	if manual is not None:
	205	help_text = manual
	206	if doc_url is not None:
	207	help_text = ("" if manual is None else manual) + "\nFor more information, visit %s" % doc_url
	208
	209	return help_text
	210
	211
	212	def extract_tool_executable_path(model, default_executable_path):
	213	# rules to build the executable path:
	214	# if executablePath is null, then use default_executable_path
	215	# if executablePath is null and executableName is null, then the name of the tool will be used
	216	# if executablePath is null and executableName is not null, then executableName will be used
	217	# if executablePath is not null and executableName is null,
	218	# then executablePath and the name of the tool will be used
	219	# if executablePath is not null and executableName is not null, then both will be used
	220
	221	# first, check if the model has executablePath / executableName defined
	222	executable_path = model.opt_attribs.get("executablePath", None)
	223	executable_name = model.opt_attribs.get("executableName", None)
	224
	225	# check if we need to use the default_executable_path
	226	if executable_path is None:
	227	executable_path = default_executable_path
	228
	229	# fix the executablePath to make sure that there is a '/' in the end
	230	if executable_path is not None:
	231	executable_path = executable_path.strip()
	232	if not executable_path.endswith("/"):
	233	executable_path += "/"
	234
	235	# assume that we have all information present
	236	command = str(executable_path) + str(executable_name)
	237	if executable_path is None:
	238	if executable_name is None:
	239	command = model.name
	240	else:
	241	command = executable_name
	242	else:
	243	if executable_name is None:
	244	command = executable_path + model.name
	245	return command
	246
	247
	248	def extract_and_flatten_parameters(ctd_model):
	249	parameters = []
	250	if len(ctd_model.parameters.parameters) > 0:
	251	# use this to put parameters that are to be processed
	252	# we know that CTDModel has one parent ParameterGroup
	253	pending = [ctd_model.parameters]
	254	while len(pending) > 0:
	255	# take one element from 'pending'
	256	parameter = pending.pop()
	257	if type(parameter) is not ParameterGroup:
	258	parameters.append(parameter)
	259	else:
	260	# append the first-level children of this ParameterGroup
	261	pending.extend(parameter.parameters.values())
	262	# returned the reversed list of parameters (as it is now,
	263	# we have the last parameter in the CTD as first in the list)
	264	return reversed(parameters)
	265
	266
	267	# some parameters are mapped to command line options, this method helps resolve those mappings, if any
	268	def resolve_param_mapping(param, ctd_model):
	269	# go through all mappings and find if the given param appears as a reference name in a mapping element
	270	param_mapping = None
	271	for cli_element in ctd_model.cli:
	272	for mapping_element in cli_element.mappings:
	273	if mapping_element.reference_name == param.name:
	274	if param_mapping is not None:
	275	warning("The parameter %s has more than one mapping in the <cli> section. "
	276	"The first found mapping, %s, will be used." % (param.name, param_mapping), 1)
	277	else:
	278	param_mapping = cli_element.option_identifier
	279
	280	return param_mapping if param_mapping is not None else param.name
	281
	282
	283	def _extract_param_cli_name(param, ctd_model):
	284	# we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
	285	if type(param.parent) == ParameterGroup:
	286	if not hasattr(param.parent.parent, 'parent'):
	287	return resolve_param_mapping(param, ctd_model)
	288	elif not hasattr(param.parent.parent.parent, 'parent'):
	289	return resolve_param_mapping(param, ctd_model)
	290	else:
	291	if ctd_model.cli:
	292	warning("Using nested parameter sections (NODE elements) is not compatible with <cli>", 1)
	293	return extract_param_name(param.parent) + ":" + resolve_param_mapping(param, ctd_model)
	294	else:
	295	return resolve_param_mapping(param, ctd_model)
	296
	297
	298	def extract_param_name(param):
	299	# we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
	300	if type(param.parent) == ParameterGroup:
	301	if not hasattr(param.parent.parent, "parent"):
	302	return param.name
	303	elif not hasattr(param.parent.parent.parent, "parent"):
	304	return param.name
	305	else:
	306	return extract_param_name(param.parent) + ":" + param.name
	307	else:
	308	return param.name
	309
	310
	311	def extract_command_line_prefix(param, ctd_model):
	312	param_name = extract_param_name(param)
	313	param_cli_name = _extract_param_cli_name(param, ctd_model)
	314	if param_name == param_cli_name:
	315	# there was no mapping, so for the cli name we will use a '-' in the prefix
	316	param_cli_name = "-" + param_name
	317	return param_cli_name

+272

-0

convert.py less more

	0	import os
	1	import sys
	2	import traceback
	3	import common.utils as utils
	4
	5	from argparse import ArgumentParser
	6	from argparse import RawDescriptionHelpFormatter
	7	from common.exceptions import ApplicationException, ModelError
	8
	9	__all__ = []
	10	__version__ = 2.0
	11	__date__ = '2014-09-17'
	12	__updated__ = '2017-08-09'
	13
	14	program_version = "v%s" % __version__
	15	program_build_date = str(__updated__)
	16	program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
	17	program_short_description = "CTDConverter - A project from the WorkflowConversion family " \
	18	"(https://github.com/WorkflowConversion/CTDConverter)"
	19	program_usage = '''
	20	USAGE:
	21
	22	$ python convert.py [FORMAT] [ARGUMENTS ...]
	23
	24	FORMAT can be either one of the supported output formats: cwl, galaxy.
	25
	26	There is one converter for each supported FORMAT, each taking a different set of arguments. Please consult the detailed
	27	documentation for each of the converters. Nevertheless, all converters have the following common parameters/options:
	28
	29
	30	I - Parsing a single CTD file and convert it:
	31
	32	$ python convert.py [FORMAT] -i [INPUT_FILE] -o [OUTPUT_FILE]
	33
	34
	35	II - Parsing several CTD files, output converted wrappers in a given folder:
	36
	37	$ python converter.py [FORMAT] -i [INPUT_FILES] -o [OUTPUT_DIRECTORY]
	38
	39
	40	III - Hardcoding parameters
	41
	42	It is possible to hardcode parameters. This makes sense if you want to set a tool in 'quiet' mode or if your tools
	43	support multi-threading and accept the number of threads via a parameter, without giving end users the chance to
	44	change the values for these parameters.
	45
	46	In order to generate hardcoded parameters, you need to provide a simple file. Each line of this file contains
	47	two or three columns separated by whitespace. Any line starting with a '#' will be ignored. The first column contains
	48	the name of the parameter, the second column contains the value that will always be set for this parameter. Only the
	49	first two columns are mandatory.
	50
	51	If the parameter is to be hardcoded only for a set of tools, then a third column can be added. This column contains
	52	a comma-separated list of tool names for which the parameter will be hardcoded. If a third column is not present,
	53	then all processed tools containing the given parameter will get a hardcoded value for it.
	54
	55	The following is an example of a valid file:
	56
	57	##################################### HARDCODED PARAMETERS example #####################################
	58	# Every line starting with a # will be handled as a comment and will not be parsed.
	59	# The first column is the name of the parameter and the second column is the value that will be used.
	60
	61	# Parameter name # Value # Tool(s)
	62	threads 8
	63	mode quiet
	64	xtandem_executable xtandem XTandemAdapter
	65	verbosity high Foo, Bar
	66
	67	#########################################################################################################
	68
	69	Using the above file will produce a command-line similar to:
	70
	71	[TOOL] ... -threads 8 -mode quiet ...
	72
	73	for all tools. For XTandemAdapter, however, the command-line will look like:
	74
	75	XtandemAdapter ... -threads 8 -mode quiet -xtandem_executable xtandem ...
	76
	77	And for tools Foo and Bar, the command-line will be similar to:
	78
	79	Foo -threads 8 -mode quiet -verbosity high ...
	80
	81
	82	IV - Engine-specific parameters
	83
	84	i - Galaxy
	85
	86	a. Providing file formats, mimetypes
	87
	88	Galaxy supports the concept of file format in order to connect compatible ports, that is, input ports of a
	89	certain data format will be able to receive data from a port from the same format. This converter allows you
	90	to provide a personalized file in which you can relate the CTD data formats with supported Galaxy data formats.
	91	The layout of this file consists of lines, each of either one or four columns separated by any amount of
	92	whitespace. The content of each column is as follows:
	93
	94	* 1st column: file extension
	95	* 2nd column: data type, as listed in Galaxy
	96	* 3rd column: full-named Galaxy data type, as it will appear on datatypes_conf.xml
	97	* 4th column: mimetype (optional)
	98
	99	The following is an example of a valid "file formats" file:
	100
	101	########################################## FILE FORMATS example ##########################################
	102	# Every line starting with a # will be handled as a comment and will not be parsed.
	103	# The first column is the file format as given in the CTD and second column is the Galaxy data format. The
	104	# second, third, fourth and fifth columns can be left empty if the data type has already been registered
	105	# in Galaxy, otherwise, all but the mimetype must be provided.
	106
	107	# CTD type # Galaxy type # Long Galaxy data type # Mimetype
	108	csv tabular galaxy.datatypes.data:Text
	109	fasta
	110	ini txt galaxy.datatypes.data:Text
	111	txt
	112	idxml txt galaxy.datatypes.xml:GenericXml application/xml
	113	options txt galaxy.datatypes.data:Text
	114	grid grid galaxy.datatypes.data:Grid
	115	##########################################################################################################
	116
	117	Note that each line consists precisely of either one, three or four columns. In the case of data types already
	118	registered in Galaxy (such as fasta and txt in the above example), only the first column is needed. In the
	119	case of data types that haven't been yet registered in Galaxy, the first three columns are needed
	120	(mimetype is optional).
	121
	122	For information about Galaxy data types and subclasses, see the following page:
	123	https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
	124
	125
	126	b. Finer control over which tools will be converted
	127
	128	Sometimes only a subset of CTDs needs to be converted. It is possible to either explicitly specify which tools
	129	will be converted or which tools will not be converted.
	130
	131	The value of the -s/--skip-tools parameter is a file in which each line will be interpreted as the name of a
	132	tool that will not be converted. Conversely, the value of the -r/--required-tools is a file in which each line
	133	will be interpreted as a tool that is required. Only one of these parameters can be specified at a given time.
	134
	135	The format of both files is exactly the same. As stated before, each line will be interpreted as the name of a
	136	tool. Any line starting with a '#' will be ignored.
	137
	138
	139	ii - CWL
	140
	141	There are, for now, no CWL-specific parameters or options.
	142
	143	'''
	144
	145	program_license = '''%(short_description)s
	146
	147	Copyright 2017, WorklfowConversion
	148
	149	Licensed under the Apache License, Version 2.0 (the "License");
	150	you may not use this file except in compliance with the License.
	151	You may obtain a copy of the License at
	152
	153	http://www.apache.org/licenses/LICENSE-2.0
	154
	155	Unless required by applicable law or agreed to in writing, software
	156	distributed under the License is distributed on an "AS IS" BASIS,
	157	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	158	See the License for the specific language governing permissions and
	159	limitations under the License.
	160
	161	%(usage)s
	162	''' % {'short_description': program_short_description, 'usage': program_usage}
	163
	164
	165	def main(argv=None):
	166	if argv is None:
	167	argv = sys.argv
	168	else:
	169	sys.argv.extend(argv)
	170
	171	# check that we have, at least, one argument provided
	172	# at this point we cannot parse the arguments, because each converter takes different arguments, meaning each
	173	# converter will register its own parameters after we've registered the basic ones... we have to do it old school
	174	if len(argv) < 2:
	175	utils.error("Not enough arguments provided")
	176	print("\nUsage: $ python convert.py [TARGET] [ARGUMENTS]\n\n" +
	177	"Where:\n" +
	178	" target: one of 'cwl' or 'galaxy'\n\n" +
	179	"Run again using the -h/--help option to print more detailed help.\n")
	180	return 1
	181
	182	# TODO: at some point this should look like real software engineering and use a map containing converter instances
	183	# whose keys would be the name of the converter (e.g., cwl, galaxy), but for the time being, only two formats
	184	# are supported
	185	target = str.lower(argv[1])
	186	if target == 'cwl':
	187	from cwl import converter
	188	elif target == 'galaxy':
	189	from galaxy import converter
	190	elif target == '-h' or target == '--help' or target == '--h' or target == 'help':
	191	print(program_license)
	192	return 0
	193	else:
	194	utils.error("Unrecognized target engine. Supported targets are 'cwl' and 'galaxy'.")
	195	return 1
	196
	197	utils.info("Using %s converter" % target)
	198
	199	try:
	200	# Setup argument parser
	201	parser = ArgumentParser(prog="CTDConverter", description=program_license,
	202	formatter_class=RawDescriptionHelpFormatter, add_help=True)
	203	utils.add_common_parameters(parser, program_version_message, program_build_date)
	204
	205	# add tool-specific arguments
	206	converter.add_specific_args(parser)
	207
	208	# parse arguments and perform some basic, common validation
	209	args = parser.parse_args()
	210	validate_and_prepare_common_arguments(args)
	211
	212	# parse the input CTD files into CTDModels
	213	parsed_ctds = utils.parse_input_ctds(args.xsd_location, args.input_files, args.output_destination,
	214	converter.get_preferred_file_extension())
	215
	216	# let the converter do its own thing
	217	converter.convert_models(args, parsed_ctds)
	218	return 0
	219
	220	except KeyboardInterrupt:
	221	print("Interrupted...")
	222	return 0
	223
	224	except ApplicationException, e:
	225	traceback.print_exc()
	226	utils.error("CTDConverter could not complete the requested operation.", 0)
	227	utils.error("Reason: " + e.msg, 0)
	228	return 1
	229
	230	except ModelError, e:
	231	traceback.print_exc()
	232	utils.error("There seems to be a problem with one of your input CTDs.", 0)
	233	utils.error("Reason: " + e.msg, 0)
	234	return 1
	235
	236	except Exception, e:
	237	traceback.print_exc()
	238	utils.error("CTDConverter could not complete the requested operation.", 0)
	239	utils.error("Reason: " + e.msg, 0)
	240	return 2
	241
	242
	243	def validate_and_prepare_common_arguments(args):
	244	# flatten lists of lists to a list containing elements
	245	lists_to_flatten = ["input_files", "blacklisted_parameters"]
	246	for list_to_flatten in lists_to_flatten:
	247	utils.flatten_list_of_lists(args, list_to_flatten)
	248
	249	# if input is a single file, we expect output to be a file (and not a dir that already exists)
	250	if len(args.input_files) == 1:
	251	if os.path.isdir(args.output_destination):
	252	raise ApplicationException("If a single input file is provided, output (%s) is expected to be a file "
	253	"and not a folder.\n" % args.output_destination)
	254
	255	# if input is a list of files, we expect output to be a folder
	256	if len(args.input_files) > 1:
	257	if not os.path.isdir(args.output_destination):
	258	raise ApplicationException("If several input files are provided, output (%s) is expected to be an "
	259	"existing directory.\n" % args.output_destination)
	260
	261	# check that the provided input files, if provided, contain a valid file path
	262	input_arguments_to_check = ["xsd_location", "input_files", "hardcoded_parameters"]
	263	for argument_name in input_arguments_to_check:
	264	utils.validate_argument_is_valid_path(args, argument_name)
	265
	266	# add the parameter hardcoder
	267	args.parameter_hardcoder = utils.parse_hardcoded_parameters(args.hardcoded_parameters)
	268
	269
	270	if __name__ == "__main__":
	271	sys.exit(main())⏎

-0

cwl/README.md less more

	0	# Conversion of CTD Files to CWL
	1
	2	## How to use: Parameters in Detail
	3	The CWL converter has, for now, only the basic parameters described in the [top README file](../README.md).
	4

-0

cwl/__init__.py less more

(New empty file)

+184

-0

cwl/converter.py less more

	0	#!/usr/bin/env python
	1	# encoding: utf-8
	2
	3	# instead of using cwlgen, we decided to use PyYAML directly
	4	# we promptly found a problem with cwlgen, namely, it is not possible to construct something like:
	5	# some_paramter:
	6	# type: ['null', string]
	7	# which kind of sucks, because this seems to be the way to state that a parameter is truly optional and has no default
	8	# since cwlgen is just "fancy classes" around the yaml.dump() method, we implemented our own generation of yaml
	9
	10
	11	import ruamel.yaml as yaml
	12
	13	from CTDopts.CTDopts import _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, ModelError, _Null
	14	from common import utils, logger
	15
	16	# all cwl-related properties are defined here
	17
	18	CWL_SHEBANG = "#!/usr/bin/env cwl-runner"
	19	CURRENT_CWL_VERSION = 'v1.0'
	20	CWL_VERSION = 'cwlVersion'
	21	CLASS = 'class'
	22	BASE_COMMAND = 'baseCommand'
	23	INPUTS = 'inputs'
	24	ID = 'id'
	25	TYPE = 'type'
	26	INPUT_BINDING = 'inputBinding'
	27	OUTPUT_BINDING = 'outputBinding'
	28	PREFIX = 'prefix'
	29	OUTPUTS = 'outputs'
	30	VALUE_FROM = 'valueFrom'
	31	GLOB = 'glob'
	32	LABEL = 'label'
	33	DOC = 'doc'
	34	DEFAULT = 'default'
	35
	36	# types
	37	TYPE_NULL = 'null'
	38	TYPE_BOOLEAN = 'boolean'
	39	TYPE_INT = 'int'
	40	TYPE_LONG = 'long'
	41	TYPE_FLOAT = 'float'
	42	TYPE_DOUBLE = 'double'
	43	TYPE_STRING = 'string'
	44	TYPE_FILE = 'File'
	45	TYPE_DIRECTORY = 'Directory'
	46
	47	TYPE_TO_CWL_TYPE = {int: TYPE_INT, float: TYPE_DOUBLE, str: TYPE_STRING, bool: TYPE_BOOLEAN, _InFile: TYPE_FILE,
	48	_OutFile: TYPE_FILE, _Choices: TYPE_STRING}
	49
	50
	51	def add_specific_args(parser):
	52	# no specific arguments for CWL conversion, for now
	53	# however, this method has to be defined, otherwise ../convert.py won't work for CWL
	54	pass
	55
	56
	57	def get_preferred_file_extension():
	58	return "cwl"
	59
	60
	61	def convert_models(args, parsed_ctds):
	62	# go through each ctd model and perform the conversion, easy as pie!
	63	for parsed_ctd in parsed_ctds:
	64	model = parsed_ctd.ctd_model
	65	origin_file = parsed_ctd.input_file
	66	output_file = parsed_ctd.suggested_output_file
	67
	68	logger.info("Converting %s (source %s)" % (model.name, utils.get_filename(origin_file)))
	69	cwl_tool = convert_to_cwl(model, args)
	70
	71	logger.info("Writing to %s" % utils.get_filename(output_file), 1)
	72
	73	stream = file(output_file, 'w')
	74	stream.write(CWL_SHEBANG + '\n\n')
	75	stream.write("# This CWL file was automatically generated using CTDConverter.\n")
	76	stream.write("# Visit https://github.com/WorkflowConversion/CTDConverter for more information.\n\n")
	77	yaml.dump(cwl_tool, stream, default_flow_style=False)
	78	stream.close()
	79
	80
	81	# returns a dictionary
	82	def convert_to_cwl(ctd_model, args):
	83	# create cwl_tool object with the basic information
	84	base_command = utils.extract_tool_executable_path(ctd_model, args.default_executable_path)
	85
	86	# add basic properties
	87	cwl_tool = {}
	88	cwl_tool[CWL_VERSION] = CURRENT_CWL_VERSION
	89	cwl_tool[CLASS] = 'CommandLineTool'
	90	cwl_tool[LABEL] = ctd_model.opt_attribs["description"]
	91	cwl_tool[DOC] = utils.extract_tool_help_text(ctd_model)
	92	cwl_tool[BASE_COMMAND] = base_command
	93
	94	# TODO: test with optional output files
	95
	96	# add inputs/outputs
	97	for param in utils.extract_and_flatten_parameters(ctd_model):
	98	if param.name in args.blacklisted_parameters:
	99	continue
	100
	101	param_name = utils.extract_param_name(param)
	102	cwl_fixed_param_name = fix_param_name(param_name)
	103	hardcoded_value = args.parameter_hardcoder.get_hardcoded_value(param_name, ctd_model.name)
	104	param_default = str(param.default) if param.default is not _Null and param.default is not None else None
	105
	106	if param.type is _OutFile:
	107	create_lists_if_missing(cwl_tool, [INPUTS, OUTPUTS])
	108	# we know the only outputs are of type _OutFile
	109	# we need an input of type string that will contain the name of the output file
	110	input_binding = {}
	111	input_binding[PREFIX] = utils.extract_command_line_prefix(param, ctd_model)
	112	if hardcoded_value is not None:
	113	input_binding[VALUE_FROM] = hardcoded_value
	114
	115	label = "Filename for %s output file" % param_name
	116	input_name_for_output_filename = get_input_name_for_output_filename(param)
	117	input_param = {}
	118	input_param[ID] = input_name_for_output_filename
	119	input_param[INPUT_BINDING] = input_binding
	120	input_param[DOC] = label
	121	input_param[LABEL] = label
	122	if param_default is not None:
	123	input_param[DEFAULT] = param_default
	124	input_param[TYPE] = generate_cwl_param_type(param, TYPE_STRING)
	125
	126	output_binding = {}
	127	output_binding[GLOB] = "$(inputs.%s)" % input_name_for_output_filename
	128
	129	output_param = {}
	130	output_param[ID] = cwl_fixed_param_name
	131	output_param[OUTPUT_BINDING] = output_binding
	132	output_param[DOC] = param.description
	133	output_param[LABEL] = param.description
	134	output_param[TYPE] = generate_cwl_param_type(param)
	135
	136	cwl_tool[INPUTS].append(input_param)
	137	cwl_tool[OUTPUTS].append(output_param)
	138
	139	else:
	140	create_lists_if_missing(cwl_tool, [INPUTS])
	141	# we know that anything that is not an _OutFile is an input
	142	input_binding = {}
	143	input_binding[PREFIX] = utils.extract_command_line_prefix(param, ctd_model)
	144	if hardcoded_value is not None:
	145	input_binding[VALUE_FROM] = hardcoded_value
	146
	147	input_param = {}
	148	input_param[ID] = cwl_fixed_param_name
	149	input_param[DOC] = param.description
	150	input_param[LABEL] = param.description
	151	if param_default is not None:
	152	input_param[DEFAULT] = param_default
	153	input_param[INPUT_BINDING] = input_binding
	154	input_param[TYPE] = generate_cwl_param_type(param)
	155
	156	cwl_tool[INPUTS].append(input_param)
	157
	158	return cwl_tool
	159
	160
	161	def create_lists_if_missing(cwl_tool, keys):
	162	for key in keys:
	163	if key not in cwl_tool:
	164	cwl_tool[key] = []
	165
	166
	167	def get_input_name_for_output_filename(param):
	168	assert param.type is _OutFile, "Only output files can get a generated filename input parameter."
	169	return fix_param_name(utils.extract_param_name(param)) + "_filename"
	170
	171
	172	def fix_param_name(param_name):
	173	# IMPORTANT: there seems to be a problem in CWL if the prefix and the parameter name are the same, so we need to
	174	# prepend something to the parameter name that will be registered in CWL, also, using colons in parameter
	175	# names seems to bring all sorts of problems for cwl-runner
	176	return 'param_' + param_name.replace(":", "_")
	177
	178
	179	# in order to provide "true" optional params, the parameter type should be something like ['null', <CWLType>],
	180	# for instance ['null', int]
	181	def generate_cwl_param_type(param, forced_type=None):
	182	cwl_type = TYPE_TO_CWL_TYPE[param.type] if forced_type is None else forced_type
	183	return cwl_type if param.required else ['null', cwl_type]

+25

-222

galaxy/README.md less more

0	0	# Conversion of CTD Files to Galaxy ToolConfigs
	1	## Generating a `tool_conf.xml` File
	2	* Purpose: Galaxy uses a file `tool_conf.xml` in which other tools can be included. `CTDConverter` can also generate this file. Categories will be extracted from the provided input CTDs and for each category, a different `<section>` will be generated. Any input CTD lacking a category will be sorted under the provided default category.
	3	* Short/long version: `-t` / `--tool-conf-destination`
	4	* Required: no.
	5	* Taken values: The destination of the file.
1	6
2		## How to use: most common Tasks
	7	$ python convert.py galaxy -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -t /data/generated-galaxy-stubs/tool_conf.xml
	8
3	9
4		The Galaxy ToolConfig generator takes several parameters and a varying number of inputs and outputs. The following sub-sections show how to perform the most common operations.
5
6		Running the generator with the `-h/--help` parameter will print extended information about each of the parameters.
7
8		### Macros
9
10		Galaxy supports the use of macros via a `macros.xml` file (we provide a sample macros file in [macros.xml]). Instead of repeating sections, macros can be used and expanded. If you want fine control over the macros, you can use the `-m` / `--macros` parameter to provide your own macros file.
11
12		Please note that the used macros file must be copied to your Galaxy installation on the same location in which you place the generated ToolConfig files, otherwise Galaxy will not be able to parse the generated ToolConfig files!
13
14		### One input, one Output
15
16		In its simplest form, the converter takes an input CTD file and generates an output Galaxy ToolConfig file. The following usage of `generator.py`:
17
18		$ python generator.py -i /data/sample_input.ctd -o /data/sample_output.xml
19
20		will parse `/data/sample_input.ctd` and generate a Galaxy tool wrapper under `/data/sample_output.xml`. The generated file can be added to your Galaxy instance like any other tool.
21
22		### Converting several CTDs at once
23
24		When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
25
26		$ python generator.py -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/generated-galaxy-stubs
27
28		Will convert `/data/ctds/one.ctd` into `/data/generated-galaxy-stubs/one.xml` and `/data/ctds/two.ctd` into `/data/generated-galaxy-stubs/two.xml`.
29
30		You can use wildcard expansion, as supported by most modern operating systems:
31
32		$ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs
33
34		### Generating a tool_conf.xml File
35
36		The generator supports generation of a `tool_conf.xml` file which you can later use in your local Galaxy installation. The parameter `-t`/`--tool-conf-destination` contains the path of a file in which a `tool_conf.xml` file will be generated.
37
38		$ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -t /data/generated-galaxy-stubs/tool_conf.xml
39
40
41		## How to use: Parameters in Detail
42
43		### A Word about Parameters taking Lists of Values
44
45		All parameters have a short and a long option and some parameters take list of values. Using either the long or the short option of the parameter will produce the same output. The following examples show how to pass values using the `-f` / `--foo` parameter:
46
47		The following uses of the parameter will pass the list of values containing `bar`, `blah` and `blu`:
48
49		-f bar blah blu
50		--foo bar blah blu
51		-f bar -f blah -f blu
52		--foo bar --foo blah --foo blu
53		-f bar --foo blah blu
54
55		The following uses of the parameter will pass a single value `bar`:
56
57		-f bar
58		--foo bar
59
60		### Schema Validation
61
62		* Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
63		* Short/long version: `v` / `--validation-schema`
64		* Required: no.
65		* Taken values: location of the schema file (e.g., CTD.xsd).
66
67		CTDs can be validated against a schema. The master version of the schema can be found under [CTDSchema].
68
69		If a schema is provided, all input CTDs will be validated against it.
70
71		### Input File(s)
72
73		* Purpose: Provide input CTD file(s) to convert.
74		* Short/long version: `-i` / `--input`
75		* Required: yes.
76		* Taken values: a list of input CTD files.
77
78		Example:
79
80		Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
81
82		$ python generator.py -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
83		$ python generator.py -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
84		$ python generator.py --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
85		$ python generator.py --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
86
87		The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
88
89		$ python generator.py -i /data/input.ctd -o /data/output.xml -m sample_files/macros.xml
90
91		Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Imagine that the folder `/data/ctds` contains three files, `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/galaxy`:
92
93		$ python generator.py -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/galaxy
94		$ python generator.py -i /data/*.ctd -o /data/galaxy
95
96		### Finer Control over the Tools to be converted
97
98		Sometimes only a set of CTDs in a folder need to be converted. The parameter `-r`/`--required-tools` takes the path a file containing the names of tools that will be converted.
99
100		$ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -r required_tools.txt
101
102		On the other hand, if you want the generator to skip conversion of some CTDs, the parameter `-s`/`--skip-tools` will take the path of a file containing the names of tools that will not be converted.
103
104		$ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -s skipped_tools.txt
105
106		The format of these files (`required_tools.txt`, `skipped_tools.txt` in the examples above) is straightforward. Each line contains the name of a tool and any line starting with `#` will be ignored.
107
108		### Output Destination
109
110		* Purpose: Provide output destination for the generated Galaxy ToolConfig files.
111		* Short/long version: `-o` / `--output-destination`
112		* Required: yes.
113		* Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder, in which all generated Galaxy ToolConfig will be written, is expected.
114
115		Example:
116
117		A single input is given, and the output will be generated into `/data/output.xml`:
118
119		$ python generator.py -i /data/input.ctd -o /data/output.xml
120
121		Several inputs are given. The output is the already existent folder, `/data/stubs`, and at the end of the operation, the files `/data/stubs/input_one.ctd.xml` and `/data/stubs/input_two.ctd.xml` will be generated:
122
123		$ python generator.py -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
124
125
126		### Adding Parameters to the Command-line
127
	10	## Adding Parameters to the Command-line
128	11	* Purpose: Galaxy ToolConfig files include a `<command>` element in which the command line to invoke the tool can be given. Sometimes it is needed to invoke your tools in a certain way (i.e., passing certain parameters). For instance, some tools offer the possibility to be invoked in a verbose or quiet way or even to be invoked in a headless way (i.e., without GUI).
129	12	* Short/long version: `-a` / `--add-to-command-line`
130	13	* Required: no.

132	15
133	16	Example:
134	17
135		$ python generator.py ... -a "--quiet --no-gui"
	18	$ python convert.py galaxy ... -a "--quiet --no-gui"
136	19
137	20	Will generate the following `<command>` element in the generated Galaxy ToolConfig:
138	21
139	22	<command>TOOL_NAME --quiet --no-gui ...</command>
140	23
141
142		### Blacklisting Parameters
143
144		* Purpose: Some parameters present in the CTD are not to be exposed on Galaxy. Think of parameters such as `--help`, `--debug`, that might won't make much sense to be exposed to final users in a workflow management system such as Galaxy.
145		* Short/long version: `-b` / `--blacklist-parameters`
146		* Required: no.
147		* Taken values: A list of parameters to be blacklisted.
148
149		Example:
150
151		$ python generator.py ... -b h help quiet
152
153		Will not process any of the parameters named `h`, `help`, or `quiet` and will not appear in the generated Galaxy ToolConfig.
154
155		### Generating a tool_conf.xml file
156
157		* Purpose: Galaxy uses a file `tool_conf.xml` in which other tools can be included. `generator.py` can also generate this file. Categories will be extracted from the provided input CTDs and for each category, a different `<section>` will be generated. Any input CTD lacking a category will be sorted under the provided default category.
158		* Short/long version: `-t` / `--tool-conf-destination`
159		* Required: no.
160		* Taken values: The destination of the file.
161
162		### Providing a default Category
163
164		* Purpose: Input CTDs that lack a category will be sorted under the value given to this parameter. If this parameter is not given, then the category `DEFAULT` will be used.
	24	## Providing a default Category
	25	* Purpose: Input CTDs that lack a category will be sorted under the value given to this parameter. If this parameter is not provided, then the category `DEFAULT` will be used.
165	26	* Short/long version: `-c` / `--default-category`
166	27	* Required: no.
167	28	* Taken values: The value for the default category to use for input CTDs lacking a category.

170	31
171	32	Suppose there is a folder containing several CTD files. Some of those CTDs don't have the optional attribute `category` and the rest belong to the `Data Processing` category. The following invocation:
172	33
173		$ python generator.py ... -c Other
	34	$ python convert.py galaxy ... -c Other
174	35
175	36	will generate, for each of the categories, a different section. Additionally, CTDs lacking a category will be sorted under the given category, `Other`, as shown:
176	37

186	47	...
187	48	</section>
188	49
189		### Providing a Path for the Location of the ToolConfig Files
190
191		* Purpose: The `tool_conf.xml` file contains references to files which in turn contain Galaxy ToolConfig files. Using this parameter, you can provide information about the location of your tools.
	50	## Providing a Path for the Location of the ToolConfig Files
	51	* Purpose: The `tool_conf.xml` file contains references to files which in turn contain Galaxy ToolConfig files. Using this parameter, you can provide information about the location of your wrappers on your Galaxy instance.
192	52	* Short/long version: `-g` / `--galaxy-tool-path`
193	53	* Required: no.
194	54	* Taken values: The path relative to your `$GALAXY_ROOT/tools` folder on which your tools are located.
195	55
196	56	Example:
197	57
198		$ python generator.py ... -g my_tools_folder
	58	$ python convert.py galaxy ... -g my_tools_folder
199	59
200	60	Will generate `<tool>` elements in the generated `tool_conf.xml` as follows:
201	61

203	63
204	64	In this example, `tool_conf.xml` refers to a file located on `$GALAXY_ROOT/tools/my_tools_folder/some_tool.xml`.
205	65
206
207		### Hardcoding Parameters
208
209		* Purpose: Fixing the value of a parameter and hide it from the end user.
210		* Short/long version: `-p` / `--hardcoded-parameters`
211		* Required: no.
212		* Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use in the `<command>` section.
213
214		It is sometimes required that parameters are hidden from the end user in workflow systems such as Galaxy and that they take a predetermined value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. The first two columns are mandatory.
215
216		If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
217
218		Lines starting with `#` will be ignored. The following is an example of a valid file:
219
220		# Parameter name # Value # Tool(s)
221		threads \${GALAXY_SLOTS:-24}
222		mode quiet
223		xtandem_executable xtandem XTandemAdapter
224		verbosity high Foo, Bar
225
226		This will produce a `<command>` section similar to the following one for all tools but `XTandemAdapter`, `Foo` and `Bar`:
227
228		<command>TOOL_NAME -threads \${GALAXY_SLOTS:-24} -mode quiet ...</command>
229
230		For `XTandemAdapter`, the `<command>` will be similar to:
231
232		<command>XtandemAdapter ... -threads \${GALAXY_SLOTS:-24} -mode quiet -xtandem_executable xtandem ...</command>
233
234		And for tools `Foo` and `Bar`, the `<command>` will be similar to:
235
236		<command>Foo ... ... -threads \${GALAXY_SLOTS:-24} -mode quiet -verbosity high ...</command>
237
238
239		### Including additional Macros Files
240
	66	## Including additional Macros Files
241	67	* Purpose: Include external macros files.
242	68	* Short/long version: `-m` / `--macros`
243	69	* Required: no.

246	72
247	73	ToolConfig supports elaborate sections such as `<stdio>`, `<requirements>`, etc., that are identical across tools of the same suite. Macros files assist in the task of including external xml sections into ToolConfig files. For more information about the syntax of macros files, see: https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#Reusing_Repeated_Configuration_Elements
248	74
249		There are some macros that are required, namely `stdio`, `requirements` and `advanced_options`. A template macro file is included in [macros.xml]. It can be edited to suit your needs and you could add extra macros or leave it as it is and include additional files.
	75	There are some macros that are required, namely `stdio`, `requirements` and `advanced_options`. A template macro file is included in [macros.xml]. It can be edited to suit your needs and you could add extra macros or leave it as it is and include additional files. Every macro found in the provided files will be expanded.
250	76
251		Every macro found in the included files and in `support_files/macros.xml` will be expanded. Users are responsible for copying the given macros files in their corresponding galaxy folders.
	77	Please note that the used macros files must be copied to your Galaxy installation on the same location in which you place the generated ToolConfig files, otherwise Galaxy will not be able to parse the generated ToolConfig files!
252	78
253		### Providing a default executable Path
254
255		* Purpose: Help Galaxy locate tools by providing a path.
256		* Short/long version: `-x` / `--default-executable-path`
257		* Required: no.
258		* Taken values: The default executable path of the tools in the Galaxy installation.
259
260		CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the `<command>` section. Suppose that you have installed a tool suite in your local Galaxy instance under `/opt/suite/bin`. The following invocation of the converter:
261
262		$ python generator.py -x /opt/suite/bin ...
263
264		Will produce a `<command>` section similar to:
265
266		<command>/opt/suite/bin/Foo ...</command>
267
268		For those CTDs in which no `<executablePath>` could be found.
269
270
271		### Generating a `datatypes_conf.xml` File
272
	79	## Generating a `datatypes_conf.xml` File
273	80	* Purpose: Specify the destination of a generated `datatypes_conf.xml` file.
274	81	* Short/long version: `-d` / `--datatypes-destination`
275	82	* Required: no.

277	84
278	85	It is likely that your tools use file formats or mimetypes that have not been registered in Galaxy. The generator allows you to specify a path in which an automatically generated `datatypes_conf.xml` file will be created. Consult the next section to get information about how to register file formats and mimetypes.
279	86
280
281		### Providing Galaxy File Formats
282
	87	## Providing Galaxy File Formats
283	88	* Purpose: Register new file formats and mimetypes.
284	89	* Short/long version: `-f` / `--formats-file`
285	90	* Required: no.

307	112
308	113	For information about Galaxy data types and subclasses, consult the following page: https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
309	114
310
311		## Notes about some of the OpenMS Tools
312
313		* Most of the tools can be generated automatically. Some of the tools need some extra work (for now).
314		* These adapters need to be changed, such that you provide the path to the executable:
	115	## Remarks about some of the OpenMS Tools
	116	* Most of the tools can be generated automatically. However, some of the tools need some extra work (for now).
	117	* The following adapters need to be changed, such that you provide the path to the executable:
315	118	* FidoAdapter (add `-exe fido` in the command tag, delete the `$param_exe` in the command tag, delete the parameter from the input list).
316	119	* MSGFPlusAdapter (add `-executable msgfplus.jar` in the command tag, delete the `$param_executable` in the command tag, delete the parameter from the input list).
317	120	* MyriMatchAdapter (add `-myrimatch_executable myrimatch` in the command tag, delete the `$param_myrimatch_executable` in the command tag, delete the parameter from the input list).

320	123	* XTandemAdapter (add `-xtandem_executable xtandem` in the command tag, delete the $param_xtandem_executable in the command tag, delete the parameter from the input list).
321	124	* To avoid the deletion in the inputs you can also add these parameters to the blacklist
322	125
323		$ python generator.py -b exe executable myrimatch_excutable omssa_executable pepnovo_executable xtandem_executable
	126	$ python convert.py galaxy -b exe executable myrimatch_excutable omssa_executable pepnovo_executable xtandem_executable
324	127
325		* These tools have multiple outputs (number of inputs = number of outputs) which is not yet supported in Galaxy-stable:
	128	* The following tools have multiple outputs (number of inputs = number of outputs) which is not yet supported in Galaxy-stable:
326	129	* SeedListGenerator
327	130	* SpecLibSearcher
328	131	* MapAlignerIdentification

-0

galaxy/__init__.py less more

(New empty file)

+867

-0

galaxy/converter.py less more

	0	#!/usr/bin/env python
	1	# encoding: utf-8
	2	import os
	3	import string
	4
	5	from collections import OrderedDict
	6	from string import strip
	7	from lxml import etree
	8	from lxml.etree import SubElement, Element, ElementTree, ParseError, parse
	9
	10	from common import utils, logger
	11	from common.exceptions import ApplicationException, InvalidModelException
	12
	13	from CTDopts.CTDopts import _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, ModelError, _Null
	14
	15
	16	TYPE_TO_GALAXY_TYPE = {int: 'integer', float: 'float', str: 'text', bool: 'boolean', _InFile: 'data',
	17	_OutFile: 'data', _Choices: 'select'}
	18	STDIO_MACRO_NAME = "stdio"
	19	REQUIREMENTS_MACRO_NAME = "requirements"
	20	ADVANCED_OPTIONS_MACRO_NAME = "advanced_options"
	21
	22	REQUIRED_MACROS = [STDIO_MACRO_NAME, REQUIREMENTS_MACRO_NAME, ADVANCED_OPTIONS_MACRO_NAME]
	23
	24
	25	class ExitCode:
	26	def __init__(self, code_range="", level="", description=None):
	27	self.range = code_range
	28	self.level = level
	29	self.description = description
	30
	31
	32	class DataType:
	33	def __init__(self, extension, galaxy_extension=None, galaxy_type=None, mimetype=None):
	34	self.extension = extension
	35	self.galaxy_extension = galaxy_extension
	36	self.galaxy_type = galaxy_type
	37	self.mimetype = mimetype
	38
	39
	40	def add_specific_args(parser):
	41	parser.add_argument("-f", "--formats-file", dest="formats_file",
	42	help="File containing the supported file formats. Run with '-h' or '--help' to see a "
	43	"brief example on the layout of this file.", default=None, required=False)
	44	parser.add_argument("-a", "--add-to-command-line", dest="add_to_command_line",
	45	help="Adds content to the command line", default="", required=False)
	46	parser.add_argument("-d", "--datatypes-destination", dest="data_types_destination",
	47	help="Specify the location of a datatypes_conf.xml to modify and add the registered "
	48	"data types. If the provided destination does not exist, a new file will be created.",
	49	default=None, required=False)
	50	parser.add_argument("-c", "--default-category", dest="default_category", default="DEFAULT", required=False,
	51	help="Default category to use for tools lacking a category when generating tool_conf.xml")
	52	parser.add_argument("-t", "--tool-conf-destination", dest="tool_conf_destination", default=None, required=False,
	53	help="Specify the location of an existing tool_conf.xml that will be modified to include "
	54	"the converted tools. If the provided destination does not exist, a new file will"
	55	"be created.")
	56	parser.add_argument("-g", "--galaxy-tool-path", dest="galaxy_tool_path", default=None, required=False,
	57	help="The path that will be prepended to the file names when generating tool_conf.xml")
	58	parser.add_argument("-r", "--required-tools", dest="required_tools_file", default=None, required=False,
	59	help="Each line of the file will be interpreted as a tool name that needs translation. "
	60	"Run with '-h' or '--help' to see a brief example on the format of this file.")
	61	parser.add_argument("-s", "--skip-tools", dest="skip_tools_file", default=None, required=False,
	62	help="File containing a list of tools for which a Galaxy stub will not be generated. "
	63	"Run with '-h' or '--help' to see a brief example on the format of this file.")
	64	parser.add_argument("-m", "--macros", dest="macros_files", default=[], nargs="*",
	65	action="append", required=None, help="Import the additional given file(s) as macros. "
	66	"The macros stdio, requirements and advanced_options are "
	67	"required. Please see galaxy/macros.xml for an example of a "
	68	"valid macros file. All defined macros will be imported.")
	69
	70
	71	def convert_models(args, parsed_ctds):
	72	# validate and prepare the passed arguments
	73	validate_and_prepare_args(args)
	74
	75	# extract the names of the macros and check that we have found the ones we need
	76	macros_to_expand = parse_macros_files(args.macros_files)
	77
	78	# parse the given supported file-formats file
	79	supported_file_formats = parse_file_formats(args.formats_file)
	80
	81	# parse the skip/required tools files
	82	skip_tools = parse_tools_list_file(args.skip_tools_file)
	83	required_tools = parse_tools_list_file(args.required_tools_file)
	84
	85	_convert_internal(parsed_ctds,
	86	supported_file_formats=supported_file_formats,
	87	default_executable_path=args.default_executable_path,
	88	add_to_command_line=args.add_to_command_line,
	89	blacklisted_parameters=args.blacklisted_parameters,
	90	required_tools=required_tools,
	91	skip_tools=skip_tools,
	92	macros_file_names=args.macros_files,
	93	macros_to_expand=macros_to_expand,
	94	parameter_hardcoder=args.parameter_hardcoder)
	95
	96	# generation of galaxy stubs is ready... now, let's see if we need to generate a tool_conf.xml
	97	if args.tool_conf_destination is not None:
	98	generate_tool_conf(parsed_ctds, args.tool_conf_destination,
	99	args.galaxy_tool_path, args.default_category)
	100
	101	# generate datatypes_conf.xml
	102	if args.data_types_destination is not None:
	103	generate_data_type_conf(supported_file_formats, args.data_types_destination)
	104
	105
	106	def parse_tools_list_file(tools_list_file):
	107	tools_list = None
	108	if tools_list_file is not None:
	109	tools_list = []
	110	with open(tools_list_file) as f:
	111	for line in f:
	112	if line is None or not line.strip() or line.strip().startswith("#"):
	113	continue
	114	else:
	115	tools_list.append(line.strip())
	116
	117	return tools_list
	118
	119
	120	def parse_macros_files(macros_file_names):
	121	macros_to_expand = set()
	122
	123	for macros_file_name in macros_file_names:
	124	try:
	125	macros_file = open(macros_file_name)
	126	logger.info("Loading macros from %s" % macros_file_name, 0)
	127	root = parse(macros_file).getroot()
	128	for xml_element in root.findall("xml"):
	129	name = xml_element.attrib["name"]
	130	if name in macros_to_expand:
	131	logger.warning("Macro %s has already been found. Duplicate found in file %s." %
	132	(name, macros_file_name), 0)
	133	else:
	134	logger.info("Macro %s found" % name, 1)
	135	macros_to_expand.add(name)
	136	except ParseError, e:
	137	raise ApplicationException("The macros file " + macros_file_name + " could not be parsed. Cause: " +
	138	str(e))
	139	except IOError, e:
	140	raise ApplicationException("The macros file " + macros_file_name + " could not be opened. Cause: " +
	141	str(e))
	142
	143	# we depend on "stdio", "requirements" and "advanced_options" to exist on all the given macros files
	144	missing_needed_macros = []
	145	for required_macro in REQUIRED_MACROS:
	146	if required_macro not in macros_to_expand:
	147	missing_needed_macros.append(required_macro)
	148
	149	if missing_needed_macros:
	150	raise ApplicationException(
	151	"The following required macro(s) were not found in any of the given macros files: %s, "
	152	"see galaxy/macros.xml for an example of a valid macros file."
	153	% ", ".join(missing_needed_macros))
	154
	155	# we do not need to "expand" the advanced_options macro
	156	macros_to_expand.remove(ADVANCED_OPTIONS_MACRO_NAME)
	157	return macros_to_expand
	158
	159
	160	def parse_file_formats(formats_file):
	161	supported_formats = {}
	162	if formats_file is not None:
	163	line_number = 0
	164	with open(formats_file) as f:
	165	for line in f:
	166	line_number += 1
	167	if line is None or not line.strip() or line.strip().startswith("#"):
	168	# ignore (it'd be weird to have something like:
	169	# if line is not None and not (not line.strip()) ...
	170	pass
	171	else:
	172	# not an empty line, no comment
	173	# strip the line and split by whitespace
	174	parsed_formats = line.strip().split()
	175	# valid lines contain either one or four columns
	176	if not (len(parsed_formats) == 1 or len(parsed_formats) == 3 or len(parsed_formats) == 4):
	177	logger.warning(
	178	"Invalid line at line number %d of the given formats file. Line will be ignored:\n%s" %
	179	(line_number, line), 0)
	180	# ignore the line
	181	continue
	182	elif len(parsed_formats) == 1:
	183	supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[0])
	184	else:
	185	mimetype = None
	186	# check if mimetype was provided
	187	if len(parsed_formats) == 4:
	188	mimetype = parsed_formats[3]
	189	supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[1],
	190	parsed_formats[2], mimetype)
	191	return supported_formats
	192
	193
	194	def validate_and_prepare_args(args):
	195	# check that only one of skip_tools_file and required_tools_file has been provided
	196	if args.skip_tools_file is not None and args.required_tools_file is not None:
	197	raise ApplicationException(
	198	"You have provided both a file with tools to ignore and a file with required tools.\n"
	199	"Only one of -s/--skip-tools, -r/--required-tools can be provided.")
	200
	201	# flatten macros_files to make sure that we have a list containing file names and not a list of lists
	202	utils.flatten_list_of_lists(args, "macros_files")
	203
	204	# check that the arguments point to a valid, existing path
	205	input_variables_to_check = ["skip_tools_file", "required_tools_file", "macros_files", "formats_file"]
	206	for variable_name in input_variables_to_check:
	207	utils.validate_argument_is_valid_path(args, variable_name)
	208
	209	# check that the provided output files, if provided, contain a valid file path (i.e., not a folder)
	210	output_variables_to_check = ["data_types_destination", "tool_conf_destination"]
	211	for variable_name in output_variables_to_check:
	212	file_name = getattr(args, variable_name)
	213	if file_name is not None and os.path.isdir(file_name):
	214	raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
	215
	216	if not args.macros_files:
	217	# list is empty, provide the default value
	218	logger.warning("Using default macros from galaxy/macros.xml", 0)
	219	args.macros_files = ["galaxy/macros.xml"]
	220
	221
	222	def get_preferred_file_extension():
	223	return "xml"
	224
	225
	226	def _convert_internal(parsed_ctds, **kwargs):
	227	# parse all input files into models using CTDopts (via utils)
	228	# the output is a tuple containing the model, output destination, origin file
	229	for parsed_ctd in parsed_ctds:
	230	model = parsed_ctd.ctd_model
	231	origin_file = parsed_ctd.input_file
	232	output_file = parsed_ctd.suggested_output_file
	233
	234	if kwargs["skip_tools"] is not None and model.name in kwargs["skip_tools"]:
	235	logger.info("Skipping tool %s" % model.name, 0)
	236	continue
	237	elif kwargs["required_tools"] is not None and model.name not in kwargs["required_tools"]:
	238	logger.info("Tool %s is not required, skipping it" % model.name, 0)
	239	continue
	240	else:
	241	logger.info("Converting %s (source %s)" % (model.name, utils.get_filename(origin_file)), 0)
	242	tool = create_tool(model)
	243	write_header(tool, model)
	244	create_description(tool, model)
	245	expand_macros(tool, model, **kwargs)
	246	create_command(tool, model, **kwargs)
	247	create_inputs(tool, model, **kwargs)
	248	create_outputs(tool, model, **kwargs)
	249	create_help(tool, model)
	250
	251	# wrap our tool element into a tree to be able to serialize it
	252	tree = ElementTree(tool)
	253	logger.info("Writing to %s" % utils.get_filename(output_file), 1)
	254	tree.write(open(output_file, 'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
	255
	256
	257	def write_header(tool, model):
	258	tool.addprevious(etree.Comment(
	259	"This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). "
	260	"This file was automatically generated using CTDConverter."))
	261	tool.addprevious(etree.Comment('Proposed Tool Section: [%s]' % model.opt_attribs.get("category", "")))
	262
	263
	264	def generate_tool_conf(parsed_ctds, tool_conf_destination, galaxy_tool_path, default_category):
	265	# for each category, we keep a list of models corresponding to it
	266	categories_to_tools = dict()
	267	for parsed_ctd in parsed_ctds:
	268	category = strip(parsed_ctd.ctd_model.opt_attribs.get("category", ""))
	269	if not category.strip():
	270	category = default_category
	271	if category not in categories_to_tools:
	272	categories_to_tools[category] = []
	273	categories_to_tools[category].append(utils.get_filename(parsed_ctd.suggested_output_file))
	274
	275	# at this point, we should have a map for all categories->tools
	276	toolbox_node = Element("toolbox")
	277
	278	if galaxy_tool_path is not None and not galaxy_tool_path.strip().endswith("/"):
	279	galaxy_tool_path = galaxy_tool_path.strip() + "/"
	280	if galaxy_tool_path is None:
	281	galaxy_tool_path = ""
	282
	283	for category, file_names in categories_to_tools.iteritems():
	284	section_node = add_child_node(toolbox_node, "section")
	285	section_node.attrib["id"] = "section-id-" + "".join(category.split())
	286	section_node.attrib["name"] = category
	287
	288	for filename in file_names:
	289	tool_node = add_child_node(section_node, "tool")
	290	tool_node.attrib["file"] = galaxy_tool_path + filename
	291
	292	toolconf_tree = ElementTree(toolbox_node)
	293	toolconf_tree.write(open(tool_conf_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
	294	logger.info("Generated Galaxy tool_conf.xml in %s" % tool_conf_destination, 0)
	295
	296
	297	def generate_data_type_conf(supported_file_formats, data_types_destination):
	298	data_types_node = Element("datatypes")
	299	registration_node = add_child_node(data_types_node, "registration")
	300	registration_node.attrib["converters_path"] = "lib/galaxy/datatypes/converters"
	301	registration_node.attrib["display_path"] = "display_applications"
	302
	303	for format_name in supported_file_formats:
	304	data_type = supported_file_formats[format_name]
	305	# add only if it's a data type that does not exist in Galaxy
	306	if data_type.galaxy_type is not None:
	307	data_type_node = add_child_node(registration_node, "datatype")
	308	# we know galaxy_extension is not None
	309	data_type_node.attrib["extension"] = data_type.galaxy_extension
	310	data_type_node.attrib["type"] = data_type.galaxy_type
	311	if data_type.mimetype is not None:
	312	data_type_node.attrib["mimetype"] = data_type.mimetype
	313
	314	data_types_tree = ElementTree(data_types_node)
	315	data_types_tree.write(open(data_types_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
	316	logger.info("Generated Galaxy datatypes_conf.xml in %s" % data_types_destination, 0)
	317
	318
	319	def create_tool(model):
	320	return Element("tool", OrderedDict([("id", model.name), ("name", model.name), ("version", model.version)]))
	321
	322
	323	def create_description(tool, model):
	324	if "description" in model.opt_attribs.keys() and model.opt_attribs["description"] is not None:
	325	description = SubElement(tool,"description")
	326	description.text = model.opt_attribs["description"]
	327
	328
	329	def create_command(tool, model, **kwargs):
	330	final_command = utils.extract_tool_executable_path(model, kwargs["default_executable_path"]) + '\n'
	331	final_command += kwargs["add_to_command_line"] + '\n'
	332	advanced_command_start = "#if $adv_opts.adv_opts_selector=='advanced':\n"
	333	advanced_command_end = "#end if"
	334	advanced_command = ""
	335	parameter_hardcoder = kwargs["parameter_hardcoder"]
	336
	337	found_output_parameter = False
	338	for param in utils.extract_and_flatten_parameters(model):
	339	if param.type is _OutFile:
	340	found_output_parameter = True
	341	command = ""
	342	param_name = utils.extract_param_name(param)
	343	command_line_prefix = utils.extract_command_line_prefix(param, model)
	344
	345	if param.name in kwargs["blacklisted_parameters"]:
	346	continue
	347
	348	hardcoded_value = parameter_hardcoder.get_hardcoded_value(param_name, model.name)
	349	if hardcoded_value:
	350	command += "%s %s\n" % (command_line_prefix, hardcoded_value)
	351	else:
	352	# parameter is neither blacklisted nor hardcoded...
	353	galaxy_parameter_name = get_galaxy_parameter_name(param)
	354	repeat_galaxy_parameter_name = get_repeat_galaxy_parameter_name(param)
	355
	356	# logic for ITEMLISTs
	357	if param.is_list:
	358	if param.type is _InFile:
	359	command += command_line_prefix + "\n"
	360	command += " #for token in $" + galaxy_parameter_name + ":\n"
	361	command += " $token\n"
	362	command += " #end for\n"
	363	else:
	364	command += "\n#if $" + repeat_galaxy_parameter_name + ":\n"
	365	command += command_line_prefix + "\n"
	366	command += " #for token in $" + repeat_galaxy_parameter_name + ":\n"
	367	command += " #if \" \" in str(token):\n"
	368	command += " \"$token." + galaxy_parameter_name + "\"\n"
	369	command += " #else\n"
	370	command += " $token." + galaxy_parameter_name + "\n"
	371	command += " #end if\n"
	372	command += " #end for\n"
	373	command += "#end if\n"
	374	# logic for other ITEMs
	375	else:
	376	if param.advanced and param.type is not _OutFile:
	377	actual_parameter = "$adv_opts.%s" % galaxy_parameter_name
	378	else:
	379	actual_parameter = "$%s" % galaxy_parameter_name
	380	# TODO only useful for text fields, integers or floats
	381	# not useful for choices, input fields ...
	382
	383	if not is_boolean_parameter(param) and type(param.restrictions) is _Choices :
	384	command += "#if " + actual_parameter + ":\n"
	385	command += " %s\n" % command_line_prefix
	386	command += " #if \" \" in str(" + actual_parameter + "):\n"
	387	command += " \"" + actual_parameter + "\"\n"
	388	command += " #else\n"
	389	command += " " + actual_parameter + "\n"
	390	command += " #end if\n"
	391	command += "#end if\n"
	392	elif is_boolean_parameter(param):
	393	command += "#if " + actual_parameter + ":\n"
	394	command += " %s\n" % command_line_prefix
	395	command += "#end if\n"
	396	elif TYPE_TO_GALAXY_TYPE[param.type] is 'text':
	397	command += "#if " + actual_parameter + ":\n"
	398	command += " %s " % command_line_prefix
	399	command += " \"" + actual_parameter + "\"\n"
	400	command += "#end if\n"
	401	else:
	402	command += "#if " + actual_parameter + ":\n"
	403	command += " %s " % command_line_prefix
	404	command += actual_parameter + "\n"
	405	command += "#end if\n"
	406
	407	if param.advanced and param.type is not _OutFile:
	408	advanced_command += " %s" % command
	409	else:
	410	final_command += command
	411
	412	if advanced_command:
	413	final_command += "%s%s%s\n" % (advanced_command_start, advanced_command, advanced_command_end)
	414
	415	if not found_output_parameter:
	416	final_command += "> $param_stdout\n"
	417
	418	command_node = add_child_node(tool, "command")
	419	command_node.text = final_command
	420
	421
	422	# creates the xml elements needed to import the needed macros files
	423	# and to "expand" the macros
	424	def expand_macros(tool, model, **kwargs):
	425	macros_node = add_child_node(tool, "macros")
	426	token_node = add_child_node(macros_node, "token")
	427	token_node.attrib["name"] = "@EXECUTABLE@"
	428	token_node.text = utils.extract_tool_executable_path(model, kwargs["default_executable_path"])
	429
	430	# add <import> nodes
	431	for macro_file_name in kwargs["macros_file_names"]:
	432	macro_file = open(macro_file_name)
	433	import_node = add_child_node(macros_node, "import")
	434	# do not add the path of the file, rather, just its basename
	435	import_node.text = os.path.basename(macro_file.name)
	436
	437	# add <expand> nodes
	438	for expand_macro in kwargs["macros_to_expand"]:
	439	expand_node = add_child_node(tool, "expand")
	440	expand_node.attrib["macro"] = expand_macro
	441
	442
	443	def get_galaxy_parameter_name(param):
	444	return "param_%s" % utils.extract_param_name(param).replace(":", "_").replace("-", "_")
	445
	446
	447	def get_input_with_same_restrictions(out_param, model, supported_file_formats):
	448	for param in utils.extract_and_flatten_parameters(model):
	449	if param.type is _InFile:
	450	if param.restrictions is not None:
	451	in_param_formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
	452	out_param_formats = get_supported_file_types(out_param.restrictions.formats, supported_file_formats)
	453	if in_param_formats == out_param_formats:
	454	return param
	455
	456
	457	def create_inputs(tool, model, **kwargs):
	458	inputs_node = SubElement(tool, "inputs")
	459
	460	# some suites (such as OpenMS) need some advanced options when handling inputs
	461	expand_advanced_node = add_child_node(tool, "expand", OrderedDict([("macro", ADVANCED_OPTIONS_MACRO_NAME)]))
	462	parameter_hardcoder = kwargs["parameter_hardcoder"]
	463
	464	# treat all non output-file parameters as inputs
	465	for param in utils.extract_and_flatten_parameters(model):
	466	# no need to show hardcoded parameters
	467	hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
	468	if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
	469	# let's not use an extra level of indentation and use NOP
	470	continue
	471	if param.type is not _OutFile:
	472	if param.advanced:
	473	if expand_advanced_node is not None:
	474	parent_node = expand_advanced_node
	475	else:
	476	# something went wrong... we are handling an advanced parameter and the
	477	# advanced input macro was not set... inform the user about it
	478	logger.info("The parameter %s has been set as advanced, but advanced_input_macro has "
	479	"not been set." % param.name, 1)
	480	# there is not much we can do, other than use the inputs_node as a parent node!
	481	parent_node = inputs_node
	482	else:
	483	parent_node = inputs_node
	484
	485	# for lists we need a repeat tag
	486	if param.is_list and param.type is not _InFile:
	487	rep_node = add_child_node(parent_node, "repeat")
	488	create_repeat_attribute_list(rep_node, param)
	489	parent_node = rep_node
	490
	491	param_node = add_child_node(parent_node, "param")
	492	create_param_attribute_list(param_node, param, kwargs["supported_file_formats"])
	493
	494	# advanced parameter selection should be at the end
	495	# and only available if an advanced parameter exists
	496	if expand_advanced_node is not None and len(expand_advanced_node) > 0:
	497	inputs_node.append(expand_advanced_node)
	498
	499
	500	def get_repeat_galaxy_parameter_name(param):
	501	return "rep_" + get_galaxy_parameter_name(param)
	502
	503
	504	def create_repeat_attribute_list(rep_node, param):
	505	rep_node.attrib["name"] = get_repeat_galaxy_parameter_name(param)
	506	if param.required:
	507	rep_node.attrib["min"] = "1"
	508	else:
	509	rep_node.attrib["min"] = "0"
	510	# for the ITEMLISTs which have LISTITEM children we only
	511	# need one parameter as it is given as a string
	512	if param.default is not None:
	513	rep_node.attrib["max"] = "1"
	514	rep_node.attrib["title"] = get_galaxy_parameter_name(param)
	515
	516
	517	def create_param_attribute_list(param_node, param, supported_file_formats):
	518	param_node.attrib["name"] = get_galaxy_parameter_name(param)
	519
	520	param_type = TYPE_TO_GALAXY_TYPE[param.type]
	521	if param_type is None:
	522	raise ModelError("Unrecognized parameter type %(type)s for parameter %(name)s"
	523	% {"type": param.type, "name": param.name})
	524
	525	if param.is_list:
	526	param_type = "text"
	527
	528	if is_selection_parameter(param):
	529	param_type = "select"
	530	if len(param.restrictions.choices) < 5:
	531	param_node.attrib["display"] = "radio"
	532
	533	if is_boolean_parameter(param):
	534	param_type = "boolean"
	535
	536	if param.type is _InFile:
	537	# assume it's just text unless restrictions are provided
	538	param_format = "txt"
	539	if param.restrictions is not None:
	540	# join all formats of the file, take mapping from supported_file if available for an entry
	541	if type(param.restrictions) is _FileFormat:
	542	param_format = ",".join([get_supported_file_type(i, supported_file_formats) if
	543	get_supported_file_type(i, supported_file_formats)
	544	else i for i in param.restrictions.formats])
	545	else:
	546	raise InvalidModelException("Expected 'file type' restrictions for input file [%(name)s], "
	547	"but instead got [%(type)s]"
	548	% {"name": param.name, "type": type(param.restrictions)})
	549
	550	param_node.attrib["type"] = "data"
	551	param_node.attrib["format"] = param_format
	552	# in the case of multiple input set multiple flag
	553	if param.is_list:
	554	param_node.attrib["multiple"] = "true"
	555
	556	else:
	557	param_node.attrib["type"] = param_type
	558
	559	# check for parameters with restricted values (which will correspond to a "select" in galaxy)
	560	if param.restrictions is not None:
	561	# it could be either _Choices or _NumericRange, with special case for boolean types
	562	if param_type == "boolean":
	563	create_boolean_parameter(param_node, param)
	564	elif type(param.restrictions) is _Choices:
	565	# create as many <option> elements as restriction values
	566	for choice in param.restrictions.choices:
	567	option_node = add_child_node(param_node, "option", OrderedDict([("value", str(choice))]))
	568	option_node.text = str(choice)
	569
	570	# preselect the default value
	571	if param.default == choice:
	572	option_node.attrib["selected"] = "true"
	573
	574	elif type(param.restrictions) is _NumericRange:
	575	if param.type is not int and param.type is not float:
	576	raise InvalidModelException("Expected either 'int' or 'float' in the numeric range restriction for "
	577	"parameter [%(name)s], but instead got [%(type)s]" %
	578	{"name": param.name, "type": type(param.restrictions)})
	579	# extract the min and max values and add them as attributes
	580	# validate the provided min and max values
	581	if param.restrictions.n_min is not None:
	582	param_node.attrib["min"] = str(param.restrictions.n_min)
	583	if param.restrictions.n_max is not None:
	584	param_node.attrib["max"] = str(param.restrictions.n_max)
	585	elif type(param.restrictions) is _FileFormat:
	586	param_node.attrib["format"] = ','.join([get_supported_file_type(i, supported_file_formats) if
	587	get_supported_file_type(i, supported_file_formats)
	588	else i for i in param.restrictions.formats])
	589	else:
	590	raise InvalidModelException("Unrecognized restriction type [%(type)s] for parameter [%(name)s]"
	591	% {"type": type(param.restrictions), "name": param.name})
	592
	593	if param_type == "select" and param.default in param.restrictions.choices:
	594	param_node.attrib["optional"] = "False"
	595	else:
	596	param_node.attrib["optional"] = str(not param.required)
	597
	598	if param_type == "text":
	599	# add size attribute... this is the length of a textbox field in Galaxy (it could also be 15x2, for instance)
	600	param_node.attrib["size"] = "30"
	601	# add sanitizer nodes, this is needed for special character like "["
	602	# which are used for example by FeatureFinderMultiplex
	603	sanitizer_node = SubElement(param_node, "sanitizer")
	604
	605	valid_node = SubElement(sanitizer_node, "valid", OrderedDict([("initial", "string.printable")]))
	606	add_child_node(valid_node, "remove", OrderedDict([("value", '\'')]))
	607	add_child_node(valid_node, "remove", OrderedDict([("value", '"')]))
	608
	609	# check for default value
	610	if param.default is not None and param.default is not _Null:
	611	if type(param.default) is list:
	612	# we ASSUME that a list of parameters looks like:
	613	# $ tool -ignore He Ar Xe
	614	# meaning, that, for example, Helium, Argon and Xenon will be ignored
	615	param_node.attrib["value"] = ' '.join(map(str, param.default))
	616
	617	elif param_type != "boolean":
	618	param_node.attrib["value"] = str(param.default)
	619
	620	else:
	621	# simple boolean with a default
	622	if param.default is True:
	623	param_node.attrib["checked"] = "true"
	624	else:
	625	if param.type is int or param.type is float:
	626	# galaxy requires "value" to be included for int/float
	627	# since no default was included, we need to figure out one in a clever way... but let the user know
	628	# that we are "thinking" for him/her
	629	logger.warning("Generating default value for parameter [%s]. "
	630	"Galaxy requires the attribute 'value' to be set for integer/floats. "
	631	"Edit the CTD file and provide a suitable default value." % param.name, 1)
	632	# check if there's a min/max and try to use them
	633	default_value = None
	634	if param.restrictions is not None:
	635	if type(param.restrictions) is _NumericRange:
	636	default_value = param.restrictions.n_min
	637	if default_value is None:
	638	default_value = param.restrictions.n_max
	639	if default_value is None:
	640	# no min/max provided... just use 0 and see what happens
	641	default_value = 0
	642	else:
	643	# should never be here, since we have validated this anyway...
	644	# this code is here just for documentation purposes
	645	# however, better safe than sorry!
	646	# (it could be that the code changes and then we have an ugly scenario)
	647	raise InvalidModelException("Expected either a numeric range for parameter [%(name)s], "
	648	"but instead got [%(type)s]"
	649	% {"name": param.name, "type": type(param.restrictions)})
	650	else:
	651	# no restrictions and no default value provided...
	652	# make up something
	653	default_value = 0
	654	param_node.attrib["value"] = str(default_value)
	655
	656	label = "%s parameter" % param.name
	657	help_text = ""
	658
	659	if param.description is not None:
	660	label, help_text = generate_label_and_help(param.description)
	661
	662	param_node.attrib["label"] = label
	663	param_node.attrib["help"] = "(-%s)" % param.name + " " + help_text
	664
	665
	666	def generate_label_and_help(desc):
	667	help_text = ""
	668	# This tag is found in some descriptions
	669	if not isinstance(desc, basestring):
	670	desc = str(desc)
	671	desc = desc.encode("utf8").replace("#br#", " <br>")
	672	# Get rid of dots in the end
	673	if desc.endswith("."):
	674	desc = desc.rstrip(".")
	675	# Check if first word is a normal word and make it uppercase
	676	if str(desc).find(" ") > -1:
	677	first_word, rest = str(desc).split(" ", 1)
	678	if str(first_word).islower():
	679	# check if label has a quotient of the form a/b
	680	if first_word.find("/") != 1 :
	681	first_word.capitalize()
	682	desc = first_word + " " + rest
	683	label = desc.decode("utf8")
	684
	685	# Try to split the label if it is too long
	686	if len(desc) > 50:
	687	# find an example and put everything before in the label and the e.g. in the help
	688	if desc.find("e.g.") > 1 :
	689	label, help_text = desc.split("e.g.",1)
	690	help_text = "e.g." + help_text
	691	else:
	692	# find the end of the first sentence
	693	# look for ". " because some labels contain .file or something similar
	694	delimiter = ""
	695	if desc.find(". ") > 1 and desc.find("? ") > 1:
	696	if desc.find(". ") < desc.find("? "):
	697	delimiter = ". "
	698	else:
	699	delimiter = "? "
	700	elif desc.find(". ") > 1:
	701	delimiter = ". "
	702	elif desc.find("? ") > 1:
	703	delimiter = "? "
	704	if delimiter != "":
	705	label, help_text = desc.split(delimiter, 1)
	706
	707	# add the question mark back
	708	if delimiter == "? ":
	709	label += "? "
	710
	711	# remove all linebreaks
	712	label = label.rstrip().rstrip('<br>').rstrip()
	713	return label, help_text
	714
	715
	716	# determines if the given choices are boolean (basically, if the possible values are yes/no, true/false)
	717	def is_boolean_parameter(param):
	718	# detect boolean selects of OpenMS
	719	if is_selection_parameter(param):
	720	if len(param.restrictions.choices) == 2:
	721	# check that default value is false to make sure it is an actual flag
	722	if "false" in param.restrictions.choices and \
	723	"true" in param.restrictions.choices and \
	724	param.default == "false":
	725	return True
	726	else:
	727	return param.type is bool
	728
	729
	730	# determines if there are choices for the parameter
	731	def is_selection_parameter(param):
	732	return type(param.restrictions) is _Choices
	733
	734
	735	def get_lowercase_list(some_list):
	736	lowercase_list = map(str, some_list)
	737	lowercase_list = map(string.lower, lowercase_list)
	738	lowercase_list = map(strip, lowercase_list)
	739	return lowercase_list
	740
	741
	742	# creates a galaxy boolean parameter type
	743	# this method assumes that param has restrictions, and that only two restictions are present
	744	# (either yes/no or true/false)
	745	def create_boolean_parameter(param_node, param):
	746	# first, determine the 'truevalue' and the 'falsevalue'
	747	"""TODO: true and false values can be way more than 'true' and 'false'
	748	but for that we need CTD support
	749	"""
	750	# by default, 'true' and 'false' are handled as flags, like the verbose flag (i.e., -v)
	751	true_value = "-%s" % utils.extract_param_name(param)
	752	false_value = ""
	753	choices = get_lowercase_list(param.restrictions.choices)
	754	if "yes" in choices:
	755	true_value = "yes"
	756	false_value = "no"
	757	param_node.attrib["truevalue"] = true_value
	758	param_node.attrib["falsevalue"] = false_value
	759
	760	# set the checked attribute
	761	if param.default is not None:
	762	checked_value = "false"
	763	default = strip(string.lower(param.default))
	764	if default == "yes" or default == "true":
	765	checked_value = "true"
	766	param_node.attrib["checked"] = checked_value
	767
	768
	769	def create_outputs(parent, model, **kwargs):
	770	outputs_node = add_child_node(parent, "outputs")
	771	parameter_hardcoder = kwargs["parameter_hardcoder"]
	772
	773	for param in utils.extract_and_flatten_parameters(model):
	774
	775	# no need to show hardcoded parameters
	776	hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
	777	if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
	778	# let's not use an extra level of indentation and use NOP
	779	continue
	780	if param.type is _OutFile:
	781	create_output_node(outputs_node, param, model, kwargs["supported_file_formats"])
	782
	783	# If there are no outputs defined in the ctd the node will have no children
	784	# and the stdout will be used as output
	785	if len(outputs_node) == 0:
	786	add_child_node(outputs_node, "data",
	787	OrderedDict([("name", "param_stdout"), ("format", "txt"), ("label", "Output from stdout")]))
	788
	789
	790	def create_output_node(parent, param, model, supported_file_formats):
	791	data_node = add_child_node(parent, "data")
	792	data_node.attrib["name"] = get_galaxy_parameter_name(param)
	793
	794	data_format = "data"
	795	if param.restrictions is not None:
	796	if type(param.restrictions) is _FileFormat:
	797	# set the first data output node to the first file format
	798
	799	# check if there are formats that have not been registered yet...
	800	output = list()
	801	for format_name in param.restrictions.formats:
	802	if not format_name in supported_file_formats.keys():
	803	output.append(str(format_name))
	804
	805	# warn only if there's about to complain
	806	if output:
	807	logger.warning("Parameter " + param.name + " has the following unsupported format(s):"
	808	+ ','.join(output), 1)
	809	data_format = ','.join(output)
	810
	811	formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
	812	try:
	813	data_format = formats.pop()
	814	except KeyError:
	815	# there is not much we can do, other than catching the exception
	816	pass
	817	# if there are more than one output file formats try to take the format from the input parameter
	818	if formats:
	819	corresponding_input = get_input_with_same_restrictions(param, model, supported_file_formats)
	820	if corresponding_input is not None:
	821	data_format = "input"
	822	data_node.attrib["metadata_source"] = get_galaxy_parameter_name(corresponding_input)
	823	else:
	824	raise InvalidModelException("Unrecognized restriction type [%(type)s] "
	825	"for output [%(name)s]" % {"type": type(param.restrictions),
	826	"name": param.name})
	827	data_node.attrib["format"] = data_format
	828
	829	# TODO: find a smarter label ?
	830	return data_node
	831
	832
	833	# Get the supported file format for one given format
	834	def get_supported_file_type(format_name, supported_file_formats):
	835	if format_name in supported_file_formats.keys():
	836	return supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
	837	else:
	838	return None
	839
	840
	841	def get_supported_file_types(formats, supported_file_formats):
	842	return set([supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
	843	for format_name in formats if format_name in supported_file_formats.keys()])
	844
	845
	846	def create_change_format_node(parent, data_formats, input_ref):
	847	# <change_format>
	848	# <when input="secondary_structure" value="true" format="txt"/>
	849	# </change_format>
	850	change_format_node = add_child_node(parent, "change_format")
	851	for data_format in data_formats:
	852	add_child_node(change_format_node, "when",
	853	OrderedDict([("input", input_ref), ("value", data_format), ("format", data_format)]))
	854
	855
	856	# Shows basic information about the file, such as data ranges and file type.
	857	def create_help(tool, model):
	858	help_node = add_child_node(tool, "help")
	859	# TODO: do we need CDATA Section here?
	860	help_node.text = utils.extract_tool_help_text(model)
	861
	862
	863	# adds and returns a child node using the given name to the given parent node
	864	def add_child_node(parent_node, child_node_name, attributes=OrderedDict([])):
	865	child_node = SubElement(parent_node, child_node_name, attributes)
	866	return child_node

-2

~~galaxy/dist/conda/bld.bat~~ less more

0		"%PYTHON%" setup.py install
1		if errorlevel 1 exit 1

-1

~~galaxy/dist/conda/build.sh~~ less more

$PYTHON setup.py install

-28

~~galaxy/dist/conda/meta.yaml~~ less more

0		package:
1		name: ctd2galaxy
2		version: "1.0"
3
4		source:
5		git_rev: v1.0
6		git_url: https://github.com/WorkflowConversion/CTD2Galaxy.git
7
8		build:
9		noarch_python: True
10
11		requirements:
12		build:
13		- python
14		- setuptools
15
16		run:
17		- python
18		- lxml
19		- ctdopts 1.0
20
21		test:
22		imports:
23		- CTDopts.CTDopts
24
25		about:
26		home: https://github.com/WorkflowConversion/CTD2Galaxy
27		license_file: LICENSE

-1389

~~galaxy/generator.py~~ less more

0		#!/usr/bin/env python
1		# encoding: utf-8
2
3		"""
4		@author: delagarza
5		"""
6
7
8		import sys
9		import os
10		import traceback
11		import ntpath
12		import string
13
14		from argparse import ArgumentParser
15		from argparse import RawDescriptionHelpFormatter
16		from collections import OrderedDict
17		from string import strip
18		from lxml import etree
19		from lxml.etree import SubElement, Element, ElementTree, ParseError, parse
20
21		from CTDopts.CTDopts import CTDModel, _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, \
22		ModelError, _Null
23
24		__all__ = []
25		__version__ = 1.0
26		__date__ = '2014-09-17'
27		__updated__ = '2016-05-09'
28
29		MESSAGE_INDENTATION_INCREMENT = 2
30
31		TYPE_TO_GALAXY_TYPE = {int: 'integer', float: 'float', str: 'text', bool: 'boolean', _InFile: 'data',
32		_OutFile: 'data', _Choices: 'select'}
33
34		STDIO_MACRO_NAME = "stdio"
35		REQUIREMENTS_MACRO_NAME = "requirements"
36		ADVANCED_OPTIONS_MACRO_NAME = "advanced_options"
37
38		REQUIRED_MACROS = [STDIO_MACRO_NAME, REQUIREMENTS_MACRO_NAME, ADVANCED_OPTIONS_MACRO_NAME]
39
40
41		class CLIError(Exception):
42		# Generic exception to raise and log different fatal errors.
43		def __init__(self, msg):
44		super(CLIError).__init__(type(self))
45		self.msg = "E: %s" % msg
46
47		def __str__(self):
48		return self.msg
49
50		def __unicode__(self):
51		return self.msg
52
53
54		class InvalidModelException(ModelError):
55		def __init__(self, message):
56		super(InvalidModelException, self).__init__()
57		self.message = message
58
59		def __str__(self):
60		return self.message
61
62		def __repr__(self):
63		return self.message
64
65
66		class ApplicationException(Exception):
67		def __init__(self, msg):
68		super(ApplicationException).__init__(type(self))
69		self.msg = msg
70
71		def __str__(self):
72		return self.msg
73
74		def __unicode__(self):
75		return self.msg
76
77
78		class ExitCode:
79		def __init__(self, code_range="", level="", description=None):
80		self.range = code_range
81		self.level = level
82		self.description = description
83
84
85		class DataType:
86		def __init__(self, extension, galaxy_extension=None, galaxy_type=None, mimetype=None):
87		self.extension = extension
88		self.galaxy_extension = galaxy_extension
89		self.galaxy_type = galaxy_type
90		self.mimetype = mimetype
91
92
93		class ParameterHardcoder:
94		def __init__(self):
95		# map whose keys are the composite names of tools and parameters in the following pattern:
96		# [ToolName][separator][ParameterName] -> HardcodedValue
97		# if the parameter applies to all tools, then the following pattern is used:
98		# [ParameterName] -> HardcodedValue
99
100		# examples (assuming separator is '#'):
101		# threads -> 24
102		# XtandemAdapter#adapter -> xtandem.exe
103		# adapter -> adapter.exe
104		self.separator = "!"
105		self.parameter_map = {}
106
107		# the most specific value will be returned in case of overlap
108		def get_hardcoded_value(self, parameter_name, tool_name):
109		# look for the value that would apply for all tools
110		generic_value = self.parameter_map.get(parameter_name, None)
111		specific_value = self.parameter_map.get(self.build_key(parameter_name, tool_name), None)
112		if specific_value is not None:
113		return specific_value
114
115		return generic_value
116
117		def register_parameter(self, parameter_name, parameter_value, tool_name=None):
118		self.parameter_map[self.build_key(parameter_name, tool_name)] = parameter_value
119
120		def build_key(self, parameter_name, tool_name):
121		if tool_name is None:
122		return parameter_name
123		return "%s%s%s" % (parameter_name, self.separator, tool_name)
124
125
126		def main(argv=None): # IGNORE:C0111
127		# Command line options.
128		if argv is None:
129		argv = sys.argv
130		else:
131		sys.argv.extend(argv)
132
133		program_version = "v%s" % __version__
134		program_build_date = str(__updated__)
135		program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
136		program_short_description = "CTD2Galaxy - A project from the GenericWorkflowNodes family " \
137		"(https://github.com/orgs/genericworkflownodes)"
138		program_usage = '''
139		USAGE:
140
141		I - Parsing a single CTD file and generate a Galaxy wrapper:
142
143		$ python generator.py -i input.ctd -o output.xml
144
145
146		II - Parsing all found CTD files (files with .ctd and .xml extension) in a given folder and
147		output converted Galaxy wrappers in a given folder:
148
149		$ python generator.py -i /home/user/*.ctd -o /home/user/galaxywrappers
150
151
152		III - Providing file formats, mimetypes
153
154		Galaxy supports the concept of file format in order to connect compatible ports, that is, input ports of a certain
155		data format will be able to receive data from a port from the same format. This converter allows you to provide
156		a personalized file in which you can relate the CTD data formats with supported Galaxy data formats. The layout of
157		this file consists of lines, each of either one or four columns separated by any amount of whitespace. The content
158		of each column is as follows:
159
160		* 1st column: file extension
161		* 2nd column: data type, as listed in Galaxy
162		* 3rd column: full-named Galaxy data type, as it will appear on datatypes_conf.xml
163		* 4th column: mimetype (optional)
164
165		The following is an example of a valid "file formats" file:
166
167		########################################## FILE FORMATS example ##########################################
168		# Every line starting with a # will be handled as a comment and will not be parsed.
169		# The first column is the file format as given in the CTD and second column is the Galaxy data format.
170		# The second, third, fourth and fifth column can be left empty if the data type has already been registered
171		# in Galaxy, otherwise, all but the mimetype must be provided.
172
173		# CTD type # Galaxy type # Long Galaxy data type # Mimetype
174		csv tabular galaxy.datatypes.data:Text
175		fasta
176		ini txt galaxy.datatypes.data:Text
177		txt
178		idxml txt galaxy.datatypes.xml:GenericXml application/xml
179		options txt galaxy.datatypes.data:Text
180		grid grid galaxy.datatypes.data:Grid
181
182		##########################################################################################################
183
184		Note that each line consists precisely of either one, three or four columns. In the case of data types already
185		registered in Galaxy (such as fasta and txt in the above example), only the first column is needed. In the case of
186		data types that haven't been yet registered in Galaxy, the first three columns are needed (mimetype is optional).
187
188		For information about Galaxy data types and subclasses, see the following page:
189		https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
190
191
192		IV - Hardcoding parameters
193
194		It is possible to hardcode parameters. This makes sense if you want to set a tool in Galaxy in 'quiet' mode or if
195		your tools support multi-threading and accept the number of threads via a parameter, without giving the end user the
196		chance to change the values for these parameters.
197
198		In order to generate hardcoded parameters, you need to provide a simple file. Each line of this file contains two
199		or three columns separated by whitespace. Any line starting with a '#' will be ignored. The first column contains
200		the name of the parameter, the second column contains the value that will always be set for this parameter. The
201		first two columns are mandatory.
202
203		If the parameter is to be hardcoded only for a set of tools, then a third column can be added. This column includes
204		a comma-separated list of tool names for which the parameter will be hardcoded. If a third column is not included,
205		then all processed tools containing the given parameter will get a hardcoded value for it.
206
207		The following is an example of a valid file:
208
209		##################################### HARDCODED PARAMETERS example #####################################
210		# Every line starting with a # will be handled as a comment and will not be parsed.
211		# The first column is the name of the parameter and the second column is the value that will be used.
212
213		# Parameter name # Value # Tool(s)
214		threads \${GALAXY_SLOTS:-24}
215		mode quiet
216		xtandem_executable xtandem XTandemAdapter
217		verbosity high Foo, Bar
218
219		#########################################################################################################
220
221		Using the above file will produce a <command> similar to:
222
223		[tool_name] ... -threads \${GALAXY_SLOTS:-24} -mode quiet ...
224
225		For all tools. For XTandemAdapter, the <command> will be similar to:
226
227		XtandemAdapter ... -threads \${GALAXY_SLOTS:-24} -mode quiet -xtandem_executable xtandem ...
228
229		And for tools Foo and Bar, the <command> will be similar to:
230
231		Foo ... ... -threads \${GALAXY_SLOTS:-24} -mode quiet -verbosity high ...
232
233
234		V - Control which tools will be converted
235
236		Sometimes only a subset of CTDs needs to be converted. It is possible to either explicitly specify which tools will
237		be converted or which tools will not be converted.
238
239		The value of the -s/--skip-tools parameter is a file in which each line will be interpreted as the name of a tool
240		that will not be converted. Conversely, the value of the -r/--required-tools is a file in which each line will be
241		interpreted as a tool that is required. Only one of these parameters can be specified at a given time.
242
243		The format of both files is exactly the same. As stated before, each line will be interpreted as the name of a tool;
244		any line starting with a '#' will be ignored.
245
246		'''
247		program_license = '''%(short_description)s
248		Copyright 2015, Luis de la Garza
249
250		Licensed under the Apache License, Version 2.0 (the "License");
251		you may not use this file except in compliance with the License.
252		You may obtain a copy of the License at
253
254		http://www.apache.org/licenses/LICENSE-2.0
255
256		Unless required by applicable law or agreed to in writing, software
257		distributed under the License is distributed on an "AS IS" BASIS,
258		WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
259		See the License for the specific language governing permissions and
260		limitations under the License.
261
262		%(usage)s
263		''' % {'short_description': program_short_description, 'usage': program_usage}
264
265		try:
266		# Setup argument parser
267		parser = ArgumentParser(prog="CTD2Galaxy", description=program_license,
268		formatter_class=RawDescriptionHelpFormatter, add_help=True)
269		parser.add_argument("-i", "--input", dest="input_files", default=[], required=True, nargs="+", action="append",
270		help="List of CTD files to convert.")
271		parser.add_argument("-o", "--output-destination", dest="output_destination", required=True,
272		help="If multiple input files are given, then a folder in which all generated "
273		"XMLs will be generated is expected;"
274		"if a single input file is given, then a destination file is expected.")
275		parser.add_argument("-f", "--formats-file", dest="formats_file",
276		help="File containing the supported file formats. Run with '-h' or '--help' to see a "
277		"brief example on the layout of this file.", default=None, required=False)
278		parser.add_argument("-a", "--add-to-command-line", dest="add_to_command_line",
279		help="Adds content to the command line", default="", required=False)
280		parser.add_argument("-d", "--datatypes-destination", dest="data_types_destination",
281		help="Specify the location of a datatypes_conf.xml to modify and add the registered "
282		"data types. If the provided destination does not exist, a new file will be created.",
283		default=None, required=False)
284		parser.add_argument("-x", "--default-executable-path", dest="default_executable_path",
285		help="Use this executable path when <executablePath> is not present in the CTD",
286		default=None, required=False)
287		parser.add_argument("-b", "--blacklist-parameters", dest="blacklisted_parameters", default=[], nargs="+", action="append",
288		help="List of parameters that will be ignored and won't appear on the galaxy stub",
289		required=False)
290		parser.add_argument("-c", "--default-category", dest="default_category", default="DEFAULT", required=False,
291		help="Default category to use for tools lacking a category when generating tool_conf.xml")
292		parser.add_argument("-t", "--tool-conf-destination", dest="tool_conf_destination", default=None, required=False,
293		help="Specify the location of an existing tool_conf.xml that will be modified to include "
294		"the converted tools. If the provided destination does not exist, a new file will"
295		"be created.")
296		parser.add_argument("-g", "--galaxy-tool-path", dest="galaxy_tool_path", default=None, required=False,
297		help="The path that will be prepended to the file names when generating tool_conf.xml")
298		parser.add_argument("-r", "--required-tools", dest="required_tools_file", default=None, required=False,
299		help="Each line of the file will be interpreted as a tool name that needs translation. "
300		"Run with '-h' or '--help' to see a brief example on the format of this file.")
301		parser.add_argument("-s", "--skip-tools", dest="skip_tools_file", default=None, required=False,
302		help="File containing a list of tools for which a Galaxy stub will not be generated. "
303		"Run with '-h' or '--help' to see a brief example on the format of this file.")
304		parser.add_argument("-m", "--macros", dest="macros_files", default=[], nargs="*",
305		action="append", required=None, help="Import the additional given file(s) as macros. "
306		"The macros stdio, requirements and advanced_options are required. Please see "
307		"macros.xml for an example of a valid macros file. Al defined macros will be imported.")
308		parser.add_argument("-p", "--hardcoded-parameters", dest="hardcoded_parameters", default=None, required=False,
309		help="File containing hardcoded values for the given parameters. Run with '-h' or '--help' "
310		"to see a brief example on the format of this file.")
311		parser.add_argument("-v", "--validation-schema", dest="xsd_location", default=None, required=False,
312		help="Location of the schema to use to validate CTDs.")
313
314		# TODO: add verbosity, maybe?
315		parser.add_argument("-V", "--version", action='version', version=program_version_message)
316
317		# Process arguments
318		args = parser.parse_args()
319
320		# validate and prepare the passed arguments
321		validate_and_prepare_args(args)
322
323		# extract the names of the macros and check that we have found the ones we need
324		macros_to_expand = parse_macros_files(args.macros_files)
325
326		# parse the given supported file-formats file
327		supported_file_formats = parse_file_formats(args.formats_file)
328
329		# parse the hardcoded parameters file¬
330		parameter_hardcoder = parse_hardcoded_parameters(args.hardcoded_parameters)
331
332		# parse the skip/required tools files
333		skip_tools = parse_tools_list_file(args.skip_tools_file)
334		required_tools = parse_tools_list_file(args.required_tools_file)
335
336		#if verbose > 0:
337		# print("Verbose mode on")
338		parsed_models = convert(args.input_files,
339		args.output_destination,
340		supported_file_formats=supported_file_formats,
341		default_executable_path=args.default_executable_path,
342		add_to_command_line=args.add_to_command_line,
343		blacklisted_parameters=args.blacklisted_parameters,
344		required_tools=required_tools,
345		skip_tools=skip_tools,
346		macros_file_names=args.macros_files,
347		macros_to_expand=macros_to_expand,
348		parameter_hardcoder=parameter_hardcoder,
349		xsd_location=args.xsd_location)
350
351		#TODO: add some sort of warning if a macro that doesn't exist is to be expanded
352
353		# it is not needed to copy the macros files, since the user has provided them
354
355		# generation of galaxy stubs is ready... now, let's see if we need to generate a tool_conf.xml
356		if args.tool_conf_destination is not None:
357		generate_tool_conf(parsed_models, args.tool_conf_destination,
358		args.galaxy_tool_path, args.default_category)
359
360		# now datatypes_conf.xml
361		if args.data_types_destination is not None:
362		generate_data_type_conf(supported_file_formats, args.data_types_destination)
363
364		return 0
365
366		except KeyboardInterrupt:
367		# handle keyboard interrupt
368		return 0
369		except ApplicationException, e:
370		error("CTD2Galaxy could not complete the requested operation.", 0)
371		error("Reason: " + e.msg, 0)
372		return 1
373		except ModelError, e:
374		error("There seems to be a problem with one of your input CTDs.", 0)
375		error("Reason: " + e.msg, 0)
376		return 1
377		except Exception, e:
378		traceback.print_exc()
379		return 2
380
381
382		def parse_tools_list_file(tools_list_file):
383		tools_list = None
384		if tools_list_file is not None:
385		tools_list = []
386		with open(tools_list_file) as f:
387		for line in f:
388		if line is None or not line.strip() or line.strip().startswith("#"):
389		continue
390		else:
391		tools_list.append(line.strip())
392
393		return tools_list
394
395
396		def parse_macros_files(macros_file_names):
397		macros_to_expand = set()
398
399		for macros_file_name in macros_file_names:
400		try:
401		macros_file = open(macros_file_name)
402		info("Loading macros from %s" % macros_file_name, 0)
403		root = parse(macros_file).getroot()
404		for xml_element in root.findall("xml"):
405		name = xml_element.attrib["name"]
406		if name in macros_to_expand:
407		warning("Macro %s has already been found. Duplicate found in file %s." %
408		(name, macros_file_name), 0)
409		else:
410		info("Macro %s found" % name, 1)
411		macros_to_expand.add(name)
412		except ParseError, e:
413		raise ApplicationException("The macros file " + macros_file_name + " could not be parsed. Cause: " +
414		str(e))
415		except IOError, e:
416		raise ApplicationException("The macros file " + macros_file_name + " could not be opened. Cause: " +
417		str(e))
418
419		# we depend on "stdio", "requirements" and "advanced_options" to exist on all the given macros files
420		missing_needed_macros = []
421		for required_macro in REQUIRED_MACROS:
422		if required_macro not in macros_to_expand:
423		missing_needed_macros.append(required_macro)
424
425		if missing_needed_macros:
426		raise ApplicationException(
427		"The following required macro(s) were not found in any of the given macros files: %s, "
428		"see sample_files/macros.xml for an example of a valid macros file."
429		% ", ".join(missing_needed_macros))
430
431		# we do not need to "expand" the advanced_options macro
432		macros_to_expand.remove(ADVANCED_OPTIONS_MACRO_NAME)
433		return macros_to_expand
434
435
436		def parse_hardcoded_parameters(hardcoded_parameters_file):
437		parameter_hardcoder = ParameterHardcoder()
438		if hardcoded_parameters_file is not None:
439		line_number = 0
440		with open(hardcoded_parameters_file) as f:
441		for line in f:
442		line_number += 1
443		if line is None or not line.strip() or line.strip().startswith("#"):
444		pass
445		else:
446		# the third column must not be obtained as a whole, and not split
447		parsed_hardcoded_parameter = line.strip().split(None, 2)
448		# valid lines contain two or three columns
449		if len(parsed_hardcoded_parameter) != 2 and len(parsed_hardcoded_parameter) != 3:
450		warning("Invalid line at line number %d of the given hardcoded parameters file. Line will be"
451		"ignored:\n%s" % (line_number, line), 0)
452		continue
453
454		parameter_name = parsed_hardcoded_parameter[0]
455		hardcoded_value = parsed_hardcoded_parameter[1]
456		tool_names = None
457		if len(parsed_hardcoded_parameter) == 3:
458		tool_names = parsed_hardcoded_parameter[2].split(',')
459		if tool_names:
460		for tool_name in tool_names:
461		parameter_hardcoder.register_parameter(parameter_name, hardcoded_value, tool_name.strip())
462		else:
463		parameter_hardcoder.register_parameter(parameter_name, hardcoded_value)
464
465		return parameter_hardcoder
466
467
468		def parse_file_formats(formats_file):
469		supported_formats = {}
470		if formats_file is not None:
471		line_number = 0
472		with open(formats_file) as f:
473		for line in f:
474		line_number += 1
475		if line is None or not line.strip() or line.strip().startswith("#"):
476		# ignore (it'd be weird to have something like:
477		# if line is not None and not (not line.strip()) ...
478		pass
479		else:
480		# not an empty line, no comment
481		# strip the line and split by whitespace
482		parsed_formats = line.strip().split()
483		# valid lines contain either one or four columns
484		if not (len(parsed_formats) == 1 or len(parsed_formats) == 3 or len(parsed_formats) == 4):
485		warning("Invalid line at line number %d of the given formats file. Line will be ignored:\n%s" %
486		(line_number, line), 0)
487		# ignore the line
488		continue
489		elif len(parsed_formats) == 1:
490		supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[0])
491		else:
492		mimetype = None
493		# check if mimetype was provided
494		if len(parsed_formats) == 4:
495		mimetype = parsed_formats[3]
496		supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[1],
497		parsed_formats[2], mimetype)
498		return supported_formats
499
500
501		def validate_and_prepare_args(args):
502		# check that only one of skip_tools_file and required_tools_file has been provided
503		if args.skip_tools_file is not None and args.required_tools_file is not None:
504		raise ApplicationException(
505		"You have provided both a file with tools to ignore and a file with required tools.\n"
506		"Only one of -s/--skip-tools, -r/--required-tools can be provided.")
507
508		# first, we convert all list of lists in args to flat lists
509		lists_to_flatten = ["input_files", "blacklisted_parameters", "macros_files"]
510		for list_to_flatten in lists_to_flatten:
511		setattr(args, list_to_flatten, [item for sub_list in getattr(args, list_to_flatten) for item in sub_list])
512
513		# if input is a single file, we expect output to be a file (and not a dir that already exists)
514		if len(args.input_files) == 1:
515		if os.path.isdir(args.output_destination):
516		raise ApplicationException("If a single input file is provided, output (%s) is expected to be a file "
517		"and not a folder.\n" % args.output_destination)
518
519		# if input is a list of files, we expect output to be a folder
520		if len(args.input_files) > 1:
521		if not os.path.isdir(args.output_destination):
522		raise ApplicationException("If several input files are provided, output (%s) is expected to be an "
523		"existing directory.\n" % args.output_destination)
524
525		# check that the provided input files, if provided, contain a valid file path
526		input_variables_to_check = ["skip_tools_file", "required_tools_file", "macros_files", "xsd_location",
527		"input_files", "formats_file", "hardcoded_parameters"]
528
529		for variable_name in input_variables_to_check:
530		paths_to_check = []
531		# check if we are handling a single file or a list of files
532		member_value = getattr(args, variable_name)
533		if member_value is not None:
534		if isinstance(member_value, list):
535		for file_name in member_value:
536		paths_to_check.append(strip(str(file_name)))
537		else:
538		paths_to_check.append(strip(str(member_value)))
539
540		for path_to_check in paths_to_check:
541		if not os.path.isfile(path_to_check) or not os.path.exists(path_to_check):
542		raise ApplicationException(
543		"The provided input file (%s) does not exist or is not a valid file path."
544		% path_to_check)
545
546		# check that the provided output files, if provided, contain a valid file path (i.e., not a folder)
547		output_variables_to_check = ["data_types_destination", "tool_conf_destination"]
548
549		for variable_name in output_variables_to_check:
550		file_name = getattr(args, variable_name)
551		if file_name is not None and os.path.isdir(file_name):
552		raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
553
554		if not args.macros_files:
555		# list is empty, provide the default value
556		warning("Using default macros from macros.xml", 0)
557		args.macros_files = ["macros.xml"]
558
559
560		def convert(input_files, output_destination, **kwargs):
561		# first, generate a model
562		is_converting_multiple_ctds = len(input_files) > 1
563		parsed_models = []
564		schema = None
565		if kwargs["xsd_location"] is not None:
566		try:
567		info("Loading validation schema from %s" % kwargs["xsd_location"], 0)
568		schema = etree.XMLSchema(etree.parse(kwargs["xsd_location"]))
569		except Exception, e:
570		error("Could not load validation schema %s. Reason: %s" % (kwargs["xsd_location"], str(e)), 0)
571		else:
572		info("Validation against a schema has not been enabled.", 0)
573		for input_file in input_files:
574		try:
575		if schema is not None:
576		validate_against_schema(input_file, schema)
577		model = CTDModel(from_file=input_file)
578		except Exception, e:
579		error(str(e), 1)
580		continue
581
582		if kwargs["skip_tools"] is not None and model.name in kwargs["skip_tools"]:
583		info("Skipping tool %s" % model.name, 0)
584		continue
585		elif kwargs["required_tools"] is not None and model.name not in kwargs["required_tools"]:
586		info("Tool %s is not required, skipping it" % model.name, 0)
587		continue
588		else:
589		info("Converting from %s " % input_file, 0)
590		tool = create_tool(model)
591		write_header(tool, model)
592		create_description(tool, model)
593		expand_macros(tool, model, **kwargs)
594		create_command(tool, model, **kwargs)
595		create_inputs(tool, model, **kwargs)
596		create_outputs(tool, model, **kwargs)
597		create_help(tool, model)
598
599		# finally, serialize the tool
600		output_file = output_destination
601		# if multiple inputs are being converted,
602		# then we need to generate a different output_file for each input
603		if is_converting_multiple_ctds:
604		output_file = os.path.join(output_file, get_filename_without_suffix(input_file) + ".xml")
605		# wrap our tool element into a tree to be able to serialize it
606		tree = ElementTree(tool)
607		tree.write(open(output_file, 'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
608		# let's use model to hold the name of the output file
609		parsed_models.append([model, get_filename(output_file)])
610
611		return parsed_models
612
613
614		# validates a ctd file against the schema
615		def validate_against_schema(ctd_file, schema):
616		try:
617		parser = etree.XMLParser(schema=schema)
618		etree.parse(ctd_file, parser=parser)
619		except etree.XMLSyntaxError, e:
620		raise ApplicationException("Input ctd file %s is not valid. Reason: %s" % (ctd_file, str(e)))
621
622
623		def write_header(tool, model):
624		tool.addprevious(etree.Comment(
625		"This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). "
626		"This file was automatically generated using CTD2Galaxy."))
627		tool.addprevious(etree.Comment('Proposed Tool Section: [%s]' % model.opt_attribs.get("category", "")))
628
629
630		def generate_tool_conf(parsed_models, tool_conf_destination, galaxy_tool_path, default_category):
631		# for each category, we keep a list of models corresponding to it
632		categories_to_tools = dict()
633		for model in parsed_models:
634		category = strip(model[0].opt_attribs.get("category", ""))
635		if not category.strip():
636		category = default_category
637		if category not in categories_to_tools:
638		categories_to_tools[category] = []
639		categories_to_tools[category].append(model[1])
640
641		# at this point, we should have a map for all categories->tools
642		toolbox_node = Element("toolbox")
643
644		if galaxy_tool_path is not None and not galaxy_tool_path.strip().endswith("/"):
645		galaxy_tool_path = galaxy_tool_path.strip() + "/"
646		if galaxy_tool_path is None:
647		galaxy_tool_path = ""
648
649		for category, file_names in categories_to_tools.iteritems():
650		section_node = add_child_node(toolbox_node, "section")
651		section_node.attrib["id"] = "section-id-" + "".join(category.split())
652		section_node.attrib["name"] = category
653
654		for filename in file_names:
655		tool_node = add_child_node(section_node, "tool")
656		tool_node.attrib["file"] = galaxy_tool_path + filename
657
658		toolconf_tree = ElementTree(toolbox_node)
659		toolconf_tree.write(open(tool_conf_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
660		info("Generated Galaxy tool_conf.xml in %s" % tool_conf_destination, 0)
661
662
663		def generate_data_type_conf(supported_file_formats, data_types_destination):
664		data_types_node = Element("datatypes")
665		registration_node = add_child_node(data_types_node, "registration")
666		registration_node.attrib["converters_path"] = "lib/galaxy/datatypes/converters"
667		registration_node.attrib["display_path"] = "display_applications"
668
669		for format_name in supported_file_formats:
670		data_type = supported_file_formats[format_name]
671		# add only if it's a data type that does not exist in Galaxy
672		if data_type.galaxy_type is not None:
673		data_type_node = add_child_node(registration_node, "datatype")
674		# we know galaxy_extension is not None
675		data_type_node.attrib["extension"] = data_type.galaxy_extension
676		data_type_node.attrib["type"] = data_type.galaxy_type
677		if data_type.mimetype is not None:
678		data_type_node.attrib["mimetype"] = data_type.mimetype
679
680		data_types_tree = ElementTree(data_types_node)
681		data_types_tree.write(open(data_types_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
682		info("Generated Galaxy datatypes_conf.xml in %s" % data_types_destination, 0)
683
684
685		# taken from
686		# http://stackoverflow.com/questions/8384737/python-extract-file-name-from-path-no-matter-what-the-os-path-format
687		def get_filename(path):
688		head, tail = ntpath.split(path)
689		return tail or ntpath.basename(head)
690
691
692		def get_filename_without_suffix(path):
693		root, ext = os.path.splitext(os.path.basename(path))
694		return root
695
696
697		def create_tool(model):
698		return Element("tool", OrderedDict([("id", model.name), ("name", model.name), ("version", model.version)]))
699
700
701		def create_description(tool, model):
702		if "description" in model.opt_attribs.keys() and model.opt_attribs["description"] is not None:
703		description = SubElement(tool,"description")
704		description.text = model.opt_attribs["description"]
705
706
707		def get_param_cli_name(param, model):
708		# we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
709		if type(param.parent) == ParameterGroup:
710		if not hasattr(param.parent.parent, 'parent'):
711		return resolve_param_mapping(param, model)
712		elif not hasattr(param.parent.parent.parent, 'parent'):
713		return resolve_param_mapping(param, model)
714		else:
715		if model.cli:
716		warning("Using nested parameter sections (NODE elements) is not compatible with <cli>", py1)
717		return get_param_name(param.parent) + ":" + resolve_param_mapping(param, model)
718		else:
719		return resolve_param_mapping(param, model)
720
721
722		def get_param_name(param):
723		# we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
724		if type(param.parent) == ParameterGroup:
725		if not hasattr(param.parent.parent, 'parent'):
726		return param.name
727		elif not hasattr(param.parent.parent.parent, 'parent'):
728		return param.name
729		else:
730		return get_param_name(param.parent) + ":" + param.name
731		else:
732		return param.name
733
734
735		# some parameters are mapped to command line options, this method helps resolve those mappings, if any
736		def resolve_param_mapping(param, model):
737		# go through all mappings and find if the given param appears as a reference name in a mapping element
738		param_mapping = None
739		for cli_element in model.cli:
740		for mapping_element in cli_element.mappings:
741		if mapping_element.reference_name == param.name:
742		if param_mapping is not None:
743		warning("The parameter %s has more than one mapping in the <cli> section. "
744		"The first found mapping, %s, will be used." % (param.name, param_mapping), 1)
745		else:
746		param_mapping = cli_element.option_identifier
747
748		return param_mapping if param_mapping is not None else param.name
749
750		def create_command(tool, model, **kwargs):
751		final_command = get_tool_executable_path(model, kwargs["default_executable_path"]) + '\n'
752		final_command += kwargs["add_to_command_line"] + '\n'
753		advanced_command_start = "#if $adv_opts.adv_opts_selector=='advanced':\n"
754		advanced_command_end = '#end if'
755		advanced_command = ''
756		parameter_hardcoder = kwargs["parameter_hardcoder"]
757
758		found_output_parameter = False
759		for param in extract_parameters(model):
760		if param.type is _OutFile:
761		found_output_parameter = True
762		command = ''
763		param_name = get_param_name(param)
764		param_cli_name = get_param_cli_name(param, model)
765		if param_name == param_cli_name:
766		# there was no mapping, so for the cli name we will use a '-' in the prefix
767		param_cli_name = '-' + param_name
768
769		if param.name in kwargs["blacklisted_parameters"]:
770		continue
771
772		hardcoded_value = parameter_hardcoder.get_hardcoded_value(param_name, model.name)
773		if hardcoded_value:
774		command += '%s %s\n' % (param_cli_name, hardcoded_value)
775		else:
776		# parameter is neither blacklisted nor hardcoded...
777		galaxy_parameter_name = get_galaxy_parameter_name(param)
778		repeat_galaxy_parameter_name = get_repeat_galaxy_parameter_name(param)
779
780		# logic for ITEMLISTs
781		if param.is_list:
782		if param.type is _InFile:
783		command += param_cli_name + "\n"
784		command += " #for token in $" + galaxy_parameter_name + ":\n"
785		command += " $token\n"
786		command += " #end for\n"
787		else:
788		command += "\n#if $" + repeat_galaxy_parameter_name + ":\n"
789		command += param_cli_name + "\n"
790		command += " #for token in $" + repeat_galaxy_parameter_name + ":\n"
791		command += " #if \" \" in str(token):\n"
792		command += " \"$token." + galaxy_parameter_name + "\"\n"
793		command += " #else\n"
794		command += " $token." + galaxy_parameter_name + "\n"
795		command += " #end if\n"
796		command += " #end for\n"
797		command += "#end if\n"
798		# logic for other ITEMs
799		else:
800		if param.advanced and param.type is not _OutFile:
801		actual_parameter = "$adv_opts.%s" % galaxy_parameter_name
802		else:
803		actual_parameter = "$%s" % galaxy_parameter_name
804		## if whitespace_validation has been set, we need to generate, for each parameter:
805		## #if str( $t ).split() != '':
806		## -t "$t"
807		## #end if
808		## TODO only useful for text fields, integers or floats
809		## not useful for choices, input fields ...
810
811		if not is_boolean_parameter(param) and type(param.restrictions) is _Choices :
812		command += "#if " + actual_parameter + ":\n"
813		command += ' %s\n' % param_cli_name
814		command += " #if \" \" in str(" + actual_parameter + "):\n"
815		command += " \"" + actual_parameter + "\"\n"
816		command += " #else\n"
817		command += " " + actual_parameter + "\n"
818		command += " #end if\n"
819		command += "#end if\n"
820		elif is_boolean_parameter(param):
821		command += "#if " + actual_parameter + ":\n"
822		command += ' %s\n' % param_cli_name
823		command += "#end if\n"
824		elif TYPE_TO_GALAXY_TYPE[param.type] is 'text':
825		command += "#if " + actual_parameter + ":\n"
826		command += " %s " % param_cli_name
827		command += " \"" + actual_parameter + "\"\n"
828		command += "#end if\n"
829		else:
830		command += "#if " + actual_parameter + ":\n"
831		command += ' %s ' % param_cli_name
832		command += actual_parameter + "\n"
833		command += "#end if\n"
834
835		if param.advanced and param.type is not _OutFile:
836		advanced_command += " %s" % command
837		else:
838		final_command += command
839
840		if advanced_command:
841		final_command += "%s%s%s\n" % (advanced_command_start, advanced_command, advanced_command_end)
842
843		if not found_output_parameter:
844		final_command += "> $param_stdout\n"
845
846		command_node = add_child_node(tool, "command")
847		command_node.text = final_command
848
849
850		# creates the xml elements needed to import the needed macros files
851		# and to "expand" the macros
852		def expand_macros(tool, model, **kwargs):
853		macros_node = add_child_node(tool, "macros")
854		token_node = add_child_node(macros_node, "token")
855		token_node.attrib["name"] = "@EXECUTABLE@"
856		token_node.text = get_tool_executable_path(model, kwargs["default_executable_path"])
857
858		# add <import> nodes
859		for macro_file_name in kwargs["macros_file_names"]:
860		macro_file = open(macro_file_name)
861		import_node = add_child_node(macros_node, "import")
862		# do not add the path of the file, rather, just its basename
863		import_node.text = os.path.basename(macro_file.name)
864
865		# add <expand> nodes
866		for expand_macro in kwargs["macros_to_expand"]:
867		expand_node = add_child_node(tool, "expand")
868		expand_node.attrib["macro"] = expand_macro
869
870
871		def get_tool_executable_path(model, default_executable_path):
872		# rules to build the galaxy executable path:
873		# if executablePath is null, then use default_executable_path and store it in executablePath
874		# if executablePath is null and executableName is null, then the name of the tool will be used
875		# if executablePath is null and executableName is not null, then executableName will be used
876		# if executablePath is not null and executableName is null,
877		# then executablePath and the name of the tool will be used
878		# if executablePath is not null and executableName is not null, then both will be used
879
880		# first, check if the model has executablePath / executableName defined
881		executable_path = model.opt_attribs.get("executablePath", None)
882		executable_name = model.opt_attribs.get("executableName", None)
883
884		# check if we need to use the default_executable_path
885		if executable_path is None:
886		executable_path = default_executable_path
887
888		# fix the executablePath to make sure that there is a '/' in the end
889		if executable_path is not None:
890		executable_path = executable_path.strip()
891		if not executable_path.endswith('/'):
892		executable_path += '/'
893
894		# assume that we have all information present
895		command = str(executable_path) + str(executable_name)
896		if executable_path is None:
897		if executable_name is None:
898		command = model.name
899		else:
900		command = executable_name
901		else:
902		if executable_name is None:
903		command = executable_path + model.name
904		return command
905
906
907		def get_galaxy_parameter_name(param):
908		return "param_%s" % get_param_name(param).replace(':', '_').replace('-', '_')
909
910
911		def get_input_with_same_restrictions(out_param, model, supported_file_formats):
912		for param in extract_parameters(model):
913		if param.type is _InFile:
914		if param.restrictions is not None:
915		in_param_formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
916		out_param_formats = get_supported_file_types(out_param.restrictions.formats, supported_file_formats)
917		if in_param_formats == out_param_formats:
918		return param
919
920
921		def create_inputs(tool, model, **kwargs):
922		inputs_node = SubElement(tool, "inputs")
923
924		# some suites (such as OpenMS) need some advanced options when handling inputs
925		expand_advanced_node = add_child_node(tool, "expand", OrderedDict([("macro", ADVANCED_OPTIONS_MACRO_NAME)]))
926		parameter_hardcoder = kwargs["parameter_hardcoder"]
927
928		# treat all non output-file parameters as inputs
929		for param in extract_parameters(model):
930		# no need to show hardcoded parameters
931		hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
932		if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
933		# let's not use an extra level of indentation and use NOP
934		continue
935		if param.type is not _OutFile:
936		if param.advanced:
937		if expand_advanced_node is not None:
938		parent_node = expand_advanced_node
939		else:
940		# something went wrong... we are handling an advanced parameter and the
941		# advanced input macro was not set... inform the user about it
942		info("The parameter %s has been set as advanced, but advanced_input_macro has "
943		"not been set." % param.name, 1)
944		# there is not much we can do, other than use the inputs_node as a parent node!
945		parent_node = inputs_node
946		else:
947		parent_node = inputs_node
948
949		# for lists we need a repeat tag
950		if param.is_list and param.type is not _InFile:
951		rep_node = add_child_node(parent_node, "repeat")
952		create_repeat_attribute_list(rep_node, param)
953		parent_node = rep_node
954
955		param_node = add_child_node(parent_node, "param")
956		create_param_attribute_list(param_node, param, kwargs["supported_file_formats"])
957
958		# advanced parameter selection should be at the end
959		# and only available if an advanced parameter exists
960		if expand_advanced_node is not None and len(expand_advanced_node) > 0:
961		inputs_node.append(expand_advanced_node)
962
963
964		def get_repeat_galaxy_parameter_name(param):
965		return "rep_" + get_galaxy_parameter_name(param)
966
967
968		def create_repeat_attribute_list(rep_node, param):
969		rep_node.attrib["name"] = get_repeat_galaxy_parameter_name(param)
970		if param.required:
971		rep_node.attrib["min"] = "1"
972		else:
973		rep_node.attrib["min"] = "0"
974		# for the ITEMLISTs which have LISTITEM children we only
975		# need one parameter as it is given as a string
976		if param.default is not None:
977		rep_node.attrib["max"] = "1"
978		rep_node.attrib["title"] = get_galaxy_parameter_name(param)
979
980
981		def create_param_attribute_list(param_node, param, supported_file_formats):
982		param_node.attrib["name"] = get_galaxy_parameter_name(param)
983
984		param_type = TYPE_TO_GALAXY_TYPE[param.type]
985		if param_type is None:
986		raise ModelError("Unrecognized parameter type %(type)s for parameter %(name)s"
987		% {"type": param.type, "name": param.name})
988
989		if param.is_list:
990		param_type = "text"
991
992		if is_selection_parameter(param):
993		param_type = "select"
994		if len(param.restrictions.choices) < 5:
995		param_node.attrib["display"] = "radio"
996
997		if is_boolean_parameter(param):
998		param_type = "boolean"
999
1000		if param.type is _InFile:
1001		# assume it's just text unless restrictions are provided
1002		param_format = "txt"
1003		if param.restrictions is not None:
1004		# join all formats of the file, take mapping from supported_file if available for an entry
1005		if type(param.restrictions) is _FileFormat:
1006		param_format = ','.join([get_supported_file_type(i, supported_file_formats) if
1007		get_supported_file_type(i, supported_file_formats)
1008		else i for i in param.restrictions.formats])
1009		else:
1010		raise InvalidModelException("Expected 'file type' restrictions for input file [%(name)s], "
1011		"but instead got [%(type)s]"
1012		% {"name": param.name, "type": type(param.restrictions)})
1013
1014		param_node.attrib["type"] = "data"
1015		param_node.attrib["format"] = param_format
1016		# in the case of multiple input set multiple flag
1017		if param.is_list:
1018		param_node.attrib["multiple"] = "true"
1019
1020		else:
1021		param_node.attrib["type"] = param_type
1022
1023		# check for parameters with restricted values (which will correspond to a "select" in galaxy)
1024		if param.restrictions is not None:
1025		# it could be either _Choices or _NumericRange, with special case for boolean types
1026		if param_type == "boolean":
1027		create_boolean_parameter(param_node, param)
1028		elif type(param.restrictions) is _Choices:
1029		# create as many <option> elements as restriction values
1030		for choice in param.restrictions.choices:
1031		option_node = add_child_node(param_node, "option", OrderedDict([("value", str(choice))]))
1032		option_node.text = str(choice)
1033
1034		# preselect the default value
1035		if param.default == choice:
1036		option_node.attrib["selected"] = "true"
1037
1038		elif type(param.restrictions) is _NumericRange:
1039		if param.type is not int and param.type is not float:
1040		raise InvalidModelException("Expected either 'int' or 'float' in the numeric range restriction for "
1041		"parameter [%(name)s], but instead got [%(type)s]" %
1042		{"name": param.name, "type": type(param.restrictions)})
1043		# extract the min and max values and add them as attributes
1044		# validate the provided min and max values
1045		if param.restrictions.n_min is not None:
1046		param_node.attrib["min"] = str(param.restrictions.n_min)
1047		if param.restrictions.n_max is not None:
1048		param_node.attrib["max"] = str(param.restrictions.n_max)
1049		elif type(param.restrictions) is _FileFormat:
1050		param_node.attrib["format"] = ','.join([get_supported_file_type(i, supported_file_formats) if
1051		get_supported_file_type(i, supported_file_formats)
1052		else i for i in param.restrictions.formats])
1053		else:
1054		raise InvalidModelException("Unrecognized restriction type [%(type)s] for parameter [%(name)s]"
1055		% {"type": type(param.restrictions), "name": param.name})
1056
1057		if param_type == "select" and param.default in param.restrictions.choices:
1058		param_node.attrib["optional"] = "False"
1059		else:
1060		param_node.attrib["optional"] = str(not param.required)
1061
1062		if param_type == "text":
1063		# add size attribute... this is the length of a textbox field in Galaxy (it could also be 15x2, for instance)
1064		param_node.attrib["size"] = "30"
1065		# add sanitizer nodes, this is needed for special character like "["
1066		# which are used for example by FeatureFinderMultiplex
1067		sanitizer_node = SubElement(param_node, "sanitizer")
1068
1069		valid_node = SubElement(sanitizer_node, "valid", OrderedDict([("initial", "string.printable")]))
1070		add_child_node(valid_node, "remove", OrderedDict([("value", '\'')]))
1071		add_child_node(valid_node, "remove", OrderedDict([("value", '"')]))
1072
1073		# check for default value
1074		if param.default is not None and param.default is not _Null:
1075		if type(param.default) is list:
1076		# we ASSUME that a list of parameters looks like:
1077		# $ tool -ignore He Ar Xe
1078		# meaning, that, for example, Helium, Argon and Xenon will be ignored
1079		param_node.attrib["value"] = ' '.join(map(str, param.default))
1080
1081		elif param_type != "boolean":
1082		param_node.attrib["value"] = str(param.default)
1083
1084		else:
1085		# simple boolean with a default
1086		if param.default is True:
1087		param_node.attrib["checked"] = "true"
1088		else:
1089		if param.type is int or param.type is float:
1090		# galaxy requires "value" to be included for int/float
1091		# since no default was included, we need to figure out one in a clever way... but let the user know
1092		# that we are "thinking" for him/her
1093		warning("Generating default value for parameter [%s]. "
1094		"Galaxy requires the attribute 'value' to be set for integer/floats. "
1095		"Edit the CTD file and provide a suitable default value." % param.name, 1)
1096		# check if there's a min/max and try to use them
1097		default_value = None
1098		if param.restrictions is not None:
1099		if type(param.restrictions) is _NumericRange:
1100		default_value = param.restrictions.n_min
1101		if default_value is None:
1102		default_value = param.restrictions.n_max
1103		if default_value is None:
1104		# no min/max provided... just use 0 and see what happens
1105		default_value = 0
1106		else:
1107		# should never be here, since we have validated this anyway...
1108		# this code is here just for documentation purposes
1109		# however, better safe than sorry!
1110		# (it could be that the code changes and then we have an ugly scenario)
1111		raise InvalidModelException("Expected either a numeric range for parameter [%(name)s], "
1112		"but instead got [%(type)s]"
1113		% {"name": param.name, "type": type(param.restrictions)})
1114		else:
1115		# no restrictions and no default value provided...
1116		# make up something
1117		default_value = 0
1118		param_node.attrib["value"] = str(default_value)
1119
1120		label = "%s parameter" % param.name
1121		help_text = ""
1122
1123		if param.description is not None:
1124		label, help_text = generate_label_and_help(param.description)
1125
1126		param_node.attrib["label"] = label
1127		param_node.attrib["help"] = "(-%s)" % param.name + " " + help_text
1128
1129
1130		def generate_label_and_help(desc):
1131		label = ""
1132		help_text = ""
1133		# This tag is found in some descriptions
1134		if not isinstance(desc, basestring):
1135		desc = str(desc)
1136		desc = desc.encode("utf8").replace("#br#", " <br>")
1137		# Get rid of dots in the end
1138		if desc.endswith("."):
1139		desc = desc.rstrip(".")
1140		# Check if first word is a normal word and make it uppercase
1141		if str(desc).find(" ") > -1:
1142		first_word, rest = str(desc).split(" ", 1)
1143		if str(first_word).islower():
1144		# check if label has a quotient of the form a/b
1145		if first_word.find("/") != 1 :
1146		first_word.capitalize()
1147		desc = first_word + " " + rest
1148		label = desc.decode("utf8")
1149
1150		# Try to split the label if it is too long
1151		if len(desc) > 50:
1152		# find an example and put everything before in the label and the e.g. in the help
1153		if desc.find("e.g.") > 1 :
1154		label, help_text = desc.split("e.g.",1)
1155		help_text = "e.g." + help_text
1156		else:
1157		# find the end of the first sentence
1158		# look for ". " because some labels contain .file or something similar
1159		delimiter = ""
1160		if desc.find(". ") > 1 and desc.find("? ") > 1:
1161		if desc.find(". ") < desc.find("? "):
1162		delimiter = ". "
1163		else:
1164		delimiter = "? "
1165		elif desc.find(". ") > 1:
1166		delimiter = ". "
1167		elif desc.find("? ") > 1:
1168		delimiter = "? "
1169		if delimiter != "":
1170		label, help_text = desc.split(delimiter, 1)
1171
1172		# add the question mark back
1173		if delimiter == "? ":
1174		label += "? "
1175
1176		# remove all linebreaks
1177		label = label.rstrip().rstrip('<br>').rstrip()
1178		return label, help_text
1179
1180
1181		def get_indented_text(text, indentation_level):
1182		return ("%(indentation)s%(text)s" %
1183		{"indentation": " " * (MESSAGE_INDENTATION_INCREMENT * indentation_level),
1184		"text": text})
1185
1186
1187		def warning(warning_text, indentation_level):
1188		sys.stdout.write(get_indented_text("WARNING: %s\n" % warning_text, indentation_level))
1189
1190
1191		def error(error_text, indentation_level):
1192		sys.stderr.write(get_indented_text("ERROR: %s\n" % error_text, indentation_level))
1193
1194
1195		def info(info_text, indentation_level):
1196		sys.stdout.write(get_indented_text("INFO: %s\n" % info_text, indentation_level))
1197
1198
1199		# determines if the given choices are boolean (basically, if the possible values are yes/no, true/false)
1200		def is_boolean_parameter(param):
1201		## detect boolean selects of OpenMS
1202		if is_selection_parameter(param):
1203		if len(param.restrictions.choices) == 2:
1204		# check that default value is false to make sure it is an actual flag
1205		if "false" in param.restrictions.choices and \
1206		"true" in param.restrictions.choices and \
1207		param.default == "false":
1208		return True
1209		else:
1210		return param.type is bool
1211
1212
1213		# determines if there are choices for the parameter
1214		def is_selection_parameter(param):
1215		return type(param.restrictions) is _Choices
1216
1217
1218		def get_lowercase_list(some_list):
1219		lowercase_list = map(str, some_list)
1220		lowercase_list = map(string.lower, lowercase_list)
1221		lowercase_list = map(strip, lowercase_list)
1222		return lowercase_list
1223
1224
1225		# creates a galaxy boolean parameter type
1226		# this method assumes that param has restrictions, and that only two restictions are present
1227		# (either yes/no or true/false)
1228		def create_boolean_parameter(param_node, param):
1229		# first, determine the 'truevalue' and the 'falsevalue'
1230		"""TODO: true and false values can be way more than 'true' and 'false'
1231		but for that we need CTD support
1232		"""
1233		# by default, 'true' and 'false' are handled as flags, like the verbose flag (i.e., -v)
1234		true_value = "-%s" % get_param_name(param)
1235		false_value = ""
1236		choices = get_lowercase_list(param.restrictions.choices)
1237		if "yes" in choices:
1238		true_value = "yes"
1239		false_value = "no"
1240		param_node.attrib["truevalue"] = true_value
1241		param_node.attrib["falsevalue"] = false_value
1242
1243		# set the checked attribute
1244		if param.default is not None:
1245		checked_value = "false"
1246		default = strip(string.lower(param.default))
1247		if default == "yes" or default == "true":
1248		checked_value = "true"
1249		#attribute_list["checked"] = checked_value
1250		param_node.attrib["checked"] = checked_value
1251
1252
1253		def create_outputs(parent, model, **kwargs):
1254		outputs_node = add_child_node(parent, "outputs")
1255		parameter_hardcoder = kwargs["parameter_hardcoder"]
1256
1257		for param in extract_parameters(model):
1258
1259		# no need to show hardcoded parameters
1260		hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
1261		if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
1262		# let's not use an extra level of indentation and use NOP
1263		continue
1264		if param.type is _OutFile:
1265		create_output_node(outputs_node, param, model, kwargs["supported_file_formats"])
1266
1267		# If there are no outputs defined in the ctd the node will have no children
1268		# and the stdout will be used as output
1269		if len(outputs_node) == 0:
1270		add_child_node(outputs_node, "data",
1271		OrderedDict([("name", "param_stdout"), ("format", "txt"), ("label", "Output from stdout")]))
1272
1273
1274		def create_output_node(parent, param, model, supported_file_formats):
1275		data_node = add_child_node(parent, "data")
1276		data_node.attrib["name"] = get_galaxy_parameter_name(param)
1277
1278		data_format = "data"
1279		if param.restrictions is not None:
1280		if type(param.restrictions) is _FileFormat:
1281		# set the first data output node to the first file format
1282
1283		# check if there are formats that have not been registered yet...
1284		output = list()
1285		for format_name in param.restrictions.formats:
1286		if not format_name in supported_file_formats.keys():
1287		output.append(str(format_name))
1288
1289		# warn only if there's about to complain
1290		if output:
1291		warning("Parameter " + param.name + " has the following unsupported format(s):" + ','.join(output), 1)
1292		data_format = ','.join(output)
1293
1294		formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
1295		try:
1296		data_format = formats.pop()
1297		except KeyError:
1298		# there is not much we can do, other than catching the exception
1299		pass
1300		# if there are more than one output file formats try to take the format from the input parameter
1301		if formats:
1302		corresponding_input = get_input_with_same_restrictions(param, model, supported_file_formats)
1303		if corresponding_input is not None:
1304		data_format = "input"
1305		data_node.attrib["metadata_source"] = get_galaxy_parameter_name(corresponding_input)
1306		else:
1307		raise InvalidModelException("Unrecognized restriction type [%(type)s] "
1308		"for output [%(name)s]" % {"type": type(param.restrictions),
1309		"name": param.name})
1310		data_node.attrib["format"] = data_format
1311
1312		#TODO: find a smarter label ?
1313		#if param.description is not None:
1314		# data_node.setAttribute("label", param.description)
1315		return data_node
1316
1317
1318		# Get the supported file format for one given format
1319		def get_supported_file_type(format_name, supported_file_formats):
1320		if format_name in supported_file_formats.keys():
1321		return supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
1322		else:
1323		return None
1324
1325
1326		def get_supported_file_types(formats, supported_file_formats):
1327		return set([supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
1328		for format_name in formats if format_name in supported_file_formats.keys()])
1329
1330
1331		def create_change_format_node(parent, data_formats, input_ref):
1332		# <change_format>
1333		# <when input="secondary_structure" value="true" format="txt"/>
1334		# </change_format>
1335		change_format_node = add_child_node(parent, "change_format")
1336		for data_format in data_formats:
1337		add_child_node(change_format_node, "when",
1338		OrderedDict([("input", input_ref), ("value", data_format), ("format", data_format)]))
1339
1340
1341		# Shows basic information about the file, such as data ranges and file type.
1342		def create_help(tool, model):
1343		manual = ''
1344		doc_url = None
1345		if 'manual' in model.opt_attribs.keys():
1346		manual += '%s\n\n' % model.opt_attribs["manual"]
1347		if 'docurl' in model.opt_attribs.keys():
1348		doc_url = model.opt_attribs["docurl"]
1349
1350		help_text = "No help available"
1351		if manual is not None:
1352		help_text = manual
1353		if doc_url is not None:
1354		help_text = ("" if manual is None else manual) + "\nFor more information, visit %s" % doc_url
1355		help_node = add_child_node(tool, "help")
1356		# TODO: do we need CDATA Section here?
1357		help_node.text = help_text
1358
1359
1360		# since a model might contain several ParameterGroup elements,
1361		# we want to simply 'flatten' the parameters to generate the Galaxy wrapper
1362		def extract_parameters(model):
1363		parameters = []
1364		if len(model.parameters.parameters) > 0:
1365		# use this to put parameters that are to be processed
1366		# we know that CTDModel has one parent ParameterGroup
1367		pending = [model.parameters]
1368		while len(pending) > 0:
1369		# take one element from 'pending'
1370		parameter = pending.pop()
1371		if type(parameter) is not ParameterGroup:
1372		parameters.append(parameter)
1373		else:
1374		# append the first-level children of this ParameterGroup
1375		pending.extend(parameter.parameters.values())
1376		# returned the reversed list of parameters (as it is now,
1377		# we have the last parameter in the CTD as first in the list)
1378		return reversed(parameters)
1379
1380
1381		# adds and returns a child node using the given name to the given parent node
1382		def add_child_node(parent_node, child_node_name, attributes=OrderedDict([])):
1383		child_node = SubElement(parent_node, child_node_name, attributes)
1384		return child_node
1385
1386
1387		if __name__ == "__main__":
1388		sys.exit(main())