Package list ctdconverter / 4cddfbd
Merge pull request #30 from WorkflowConversion/cwl_support Added CWL Support chahuistle authored 4 years ago GitHub committed 4 years ago
16 changed file(s) with 1899 addition(s) and 1655 deletion(s). Raw diff Collapse all Expand all
00 # CTDConverter
1
21 Given one or more CTD files, `CTD2Converter` generates the needed wrappers to include them in workflow engines, such as Galaxy and CWL.
32
43 ## Dependencies
4 `CTDConverter` has the following python dependencies:
55
6 `CTDConverter` relies on [CTDopts]. The dependencies of each of the converters are as follows:
6 - [CTDopts]
7 - `lxml`
8 - `ruamel.yaml`
79
8 ### Galaxy Converter
10 ### Installing Dependencies
11 We recommend the use of `conda` to manage all dependencies. If you're not sure what `conda` is, make sure to read the [using-conda](conda documentation).
912
10 - Generation of Galaxy ToolConfig files relies on `lxml` to generate nice-looking XML files.
13 The easiest way to get you started with CTD conversion is to create a `conda` environment on which you'll install all dependencies. Using environments in `conda` allows you to have parallel, independent python environments, thus avoiding conflicts between libraries. If you haven't installed `conda`, check [conda-install](conda's installation guide).
1114
12 ## Installing Dependencies
13 You can install the [CTDopts] and `lxml` modules via `conda`, like so:
15 Once you've installed `conda`, create an environment named `ctd-converter`, like so:
1416
1517 ```sh
16 $ conda install lxml
17 $ conda install -c workflowconversion ctdopts
18 $ conda create --name ctd-converter
1819 ```
1920
20 Note that the [CTDopts] module is available on the `workflowconversion` channel.
21 You will now need to *activate* the environment by executing the following command:
2122
22 Of course, you can just download [CTDopts] and make it available through your `PYTHONPATH` environment variable. To get more information about how to install python modules, visit: https://docs.python.org/2/install/.
23 ```sh
24 $ source activate ctd-converter
25 ```
2326
27 Install the required dependencies as follows (the order of execution **is actually important**, due to transitive dependencies):
2428
25 ## How to install CTDConverter
29 ```sh
30 $ conda install --channel workflowconversion ctdopts
31 $ conda install lxml
32 $ conda install --channel conda-forge ruamel.yaml
33 $ conda install libxml2=2.9.2
34 ```
2635
27 1. Download the source code from https://github.com/genericworkflownodes/CTDConverter.
36 `lxml` depends on `libxml2`. When you install `lxml` you'll get the latest version of `libxml2` (2.9.4) by default. You would usually want the latest version, but there is, however, a bug in validating XML files against a schema in this version of `libxml2`.
37
38 If you require validation of input CTDs against a schema (which we recommend), you will need to downgrade to the latest known version of `libxml2` that works, namely, 2.9.2.
39
40 You could just download dependencies manually and make them available through your `PYTHONPATH` environment variable, if you're into that. To get more information about how to install python modules without using `conda`, visit: https://docs.python.org/2/install/.
41
42 ## How to install `CTDConverter`
43 `CTDConverter` is not a python module, rather, a series of scripts, so installing it is as easy as downloading the source code from https://github.com/genericworkflownodes/CTDConverter. Once you've installed all dependencies, downloaded `CTDConverter` and activated your `conda` environment, you're good to go.
2844
2945 ## Usage
46 The first thing that you need to tell `CTDConverter` is the output format of the converted wrappers. `CTDConverter` supports conversion of CTDs into Galaxy and CWL. Invoking it is as simple as follows:
3047
31 Check the detailed documentation for each of the converters:
48 $ python convert.py [FORMAT] [ADDITIONAL_PARAMETERS ...]
49
50 Here `[FORMAT]` can be any of the supported formats (i.e., `cwl`, `galaxy`). `CTDConverter` offers a series of format-specific scripts and we've designed these scripts to behave *somewhat* similarly. All converter scripts have the same core functionality, that is, read CTD files, parse them using [CTDopts], validate against a schema, etc. Of course, each converter script might add extra functionality that is not present in other engines. Only the Galaxy converter script supports generation of a `tool_conf.xml` file, for instance.
51
52 The following sections in this file describe the parameters that all converter scripts share.
53
54 Please refer to the detailed documentation for each of the converters for more information:
3255
3356 - [Generation of Galaxy ToolConfig files](galaxy/README.md)
57 - [Generation of CWL task files](cwl/README.md)
3458
59 ## Fail Policy while processing several Files
60 `CTDConverter` can parse several CTDs and convert them. However, the process will be interrupted and an error code will be returned at the first encountered error (e.g., a CTD is not valid, there are missing support files, etc.).
61
62 ## Converting a single CTD
63 In its simplest form, the converter takes an input CTD file and generates an output file. The following usage of `CTDConverter`:
64
65 $ python convert.py [FORMAT] -i /data/sample_input.ctd -o /data/sample_output.xml
66
67 will parse `/data/sample_input.ctd` and generate an appropriate converted file under `/data/sample_output.xml`. The generated file can be added to your workflow engine as usual.
68
69 ## Converting several CTDs
70 When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
71
72 $ python convert.py [FORMAT] -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/converted-files
73
74 Will convert `/data/ctds/one.ctd` into `/data/converted-files/one.[EXT]` and `/data/ctds/two.ctd` into `/data/converted-files/two.[EXT]`. Each converter has a preferred extension, here shown as a variable (`[EXT]`). Galaxy prefers `xml`, while CWL prefers `cwl`.
75
76 You can use wildcard expansion, as supported by most modern operating systems:
77
78 $ python convert.py [FORMAT] -i /data/ctds/*.ctd -o /data/converted-files
79
80 ## Common Parameters
81 ### Input File(s)
82 * Purpose: Provide input CTD file(s) to convert.
83 * Short/long version: `-i` / `--input`
84 * Required: yes.
85 * Taken values: a list of input CTD files.
86
87 Examples:
88
89 Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
90
91 $ python convert.py [FORMAT] -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
92 $ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
93 $ python convert.py [FORMAT] --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
94 $ python convert.py [FORMAT] --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
95
96 The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
97
98 $ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
99
100 Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Let's assume that the folder `/data/ctds` contains three files: `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/wrappers` folder:
101
102 $ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/wrappers
103 $ python convert.py [FORMAT] -i /data/*.ctd -o /data/wrappers
104
105 ### Output Destination
106 * Purpose: Provide output destination for the converted wrapper files.
107 * Short/long version: `-o` / `--output-destination`
108 * Required: yes.
109 * Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder in which all converted CTDs will be written is expected.
110
111 Examples:
112
113 A single input is given, and the output will be generated into `/data/output.xml`:
114
115 $ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
116
117 Several inputs are given. The output is the already existent folder, `/data/wrappers`, and at the end of the operation, the files `/data/wrappers/input_one.[EXT]` and `/data/wrappers/input_two.[EXT]` will be generated:
118
119 $ python convert.py [FORMAT] -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
120
121 Please note that the output file name is **not** taken from the name of the input file, rather from the name of the tool, that is, from the `name` attribute in the `<tool>` element in its corresponding CTD. By convention, the name of the CTD file and the name of the tool match.
122
123 ### Blacklisting Parameters
124 * Purpose: Some parameters present in the CTD are not to be exposed on the output files. Think of parameters such as `--help`, `--debug` that might won't make much sense to be exposed to final users in a workflow management system.
125 * Short/long version: `-b` / `--blacklist-parameters`
126 * Required: no.
127 * Taken values: A list of parameters to be blacklisted.
128
129 Example:
130
131 $ pythonconvert.py [FORMAT] ... -b h help quiet
132
133 In this case, `CTDConverter` will not process any of the parameters named `h`, `help`, or `quiet`, that is, they will not appear in the generated output files.
134
135 ### Schema Validation
136 * Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
137 * Short/long version: `-V` / `--validation-schema`
138 * Required: no.
139 * Taken values: location of the schema file (e.g., CTD.xsd).
140
141 CTDs can be validated against a schema. The master version of the schema can be found on [CTDSchema].
142
143 If a schema is provided, all input CTDs will be validated against it.
144
145 **NOTE:** Please make sure to read the [section on issues with schema validation](#issues-with-libxml2-and-schema-validation) if you require validation of CTDs against a schema.
146
147 ### Hardcoding Parameters
148 * Purpose: Fixing the value of a parameter and hide it from the end user.
149 * Short/long version: `-p` / `--hardcoded-parameters`
150 * Required: no.
151 * Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use.
152
153 It is sometimes required that parameters are hidden from the end user in workflow systems and that they take a predetermined, fixed value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `-p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. Only the first two columns are mandatory.
154
155 If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
156
157 Lines starting with `#` will be ignored. The following is an example of a valid file:
158
159 # Parameter name # Value # Tool(s)
160 threads 8
161 mode quiet
162 xtandem_executable xtandem XTandemAdapter
163 verbosity high Foo, Bar
164
165 The parameters `threads` and `mode` will be set to `8` and `quiet`, respectively, for all parsed CTDs. However, the `xtandem_executable` parameter will be set to `xtandem` only for the `XTandemAdapter` tool. Similarly, the parameter `verbosity` will be set to `high` for the `Foo` and `Bar` tools only.
166
167 ### Providing a default executable Path
168 * Purpose: Help workflow engines locate tools by providing a path.
169 * Short/long version: `-x` / `--default-executable-path`
170 * Required: no.
171 * Taken values: The default executable path of the tools in the target workflow engine.
172
173 CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the appropriate sections in the output files.
174
175 The following invocation of the converter will use `/opt/suite/bin` as a prefix when providing the executable path in the output files for any input CTD that lacks the `<executablePath>` section:
176
177 $ python convert.py [FORMAT] -x /opt/suite/bin ...
178
35179
36180 [CTDopts]: https://github.com/genericworkflownodes/CTDopts
181 [CTDSchema]: https://github.com/WorkflowConversion/CTDSchema
182 [conda-install]: https://conda.io/docs/install/quick.html
183 [using-conda]: https://conda.io/docs/using/envs.html
(New empty file)
0 #!/usr/bin/env python
1 # encoding: utf-8
2
3 """
4 @author: delagarza
5 """
6
7 from CTDopts.CTDopts import ModelError
8
9
10 class CLIError(Exception):
11 # Generic exception to raise and log different fatal errors.
12 def __init__(self, msg):
13 super(CLIError).__init__(type(self))
14 self.msg = "E: %s" % msg
15
16 def __str__(self):
17 return self.msg
18
19 def __unicode__(self):
20 return self.msg
21
22
23 class InvalidModelException(ModelError):
24 def __init__(self, message):
25 super(InvalidModelException, self).__init__()
26 self.message = message
27
28 def __str__(self):
29 return self.message
30
31 def __repr__(self):
32 return self.message
33
34
35 class ApplicationException(Exception):
36 def __init__(self, msg):
37 super(ApplicationException).__init__(type(self))
38 self.msg = msg
39
40 def __str__(self):
41 return self.msg
42
43 def __unicode__(self):
44 return self.msg
0 #!/usr/bin/env python
1 # encoding: utf-8
2 import sys
3
4 MESSAGE_INDENTATION_INCREMENT = 2
5
6
7 def _get_indented_text(text, indentation_level):
8 return ("%(indentation)s%(text)s" %
9 {"indentation": " " * (MESSAGE_INDENTATION_INCREMENT * indentation_level),
10 "text": text})
11
12
13 def warning(warning_text, indentation_level=0):
14 sys.stdout.write(_get_indented_text("WARNING: %s\n" % warning_text, indentation_level))
15
16
17 def error(error_text, indentation_level=0):
18 sys.stderr.write(_get_indented_text("ERROR: %s\n" % error_text, indentation_level))
19
20
21 def info(info_text, indentation_level=0):
22 sys.stdout.write(_get_indented_text("INFO: %s\n" % info_text, indentation_level))
0 #!/usr/bin/env python
1 # encoding: utf-8
2 import ntpath
3 import os
4
5 from lxml import etree
6 from string import strip
7 from logger import info, error, warning
8
9 from common.exceptions import ApplicationException
10 from CTDopts.CTDopts import CTDModel, ParameterGroup
11
12
13 MESSAGE_INDENTATION_INCREMENT = 2
14
15
16 # simple struct-class containing a tuple with input/output location and the in-memory CTDModel
17 class ParsedCTD:
18 def __init__(self, ctd_model=None, input_file=None, suggested_output_file=None):
19 self.ctd_model = ctd_model
20 self.input_file = input_file
21 self.suggested_output_file = suggested_output_file
22
23
24 class ParameterHardcoder:
25 def __init__(self):
26 # map whose keys are the composite names of tools and parameters in the following pattern:
27 # [ToolName][separator][ParameterName] -> HardcodedValue
28 # if the parameter applies to all tools, then the following pattern is used:
29 # [ParameterName] -> HardcodedValue
30
31 # examples (assuming separator is '#'):
32 # threads -> 24
33 # XtandemAdapter#adapter -> xtandem.exe
34 # adapter -> adapter.exe
35 self.separator = "!"
36 self.parameter_map = {}
37
38 # the most specific value will be returned in case of overlap
39 def get_hardcoded_value(self, parameter_name, tool_name):
40 # look for the value that would apply for all tools
41 generic_value = self.parameter_map.get(parameter_name, None)
42 specific_value = self.parameter_map.get(self.build_key(parameter_name, tool_name), None)
43 if specific_value is not None:
44 return specific_value
45
46 return generic_value
47
48 def register_parameter(self, parameter_name, parameter_value, tool_name=None):
49 self.parameter_map[self.build_key(parameter_name, tool_name)] = parameter_value
50
51 def build_key(self, parameter_name, tool_name):
52 if tool_name is None:
53 return parameter_name
54 return "%s%s%s" % (parameter_name, self.separator, tool_name)
55
56
57 def validate_path_exists(path):
58 if not os.path.isfile(path) or not os.path.exists(path):
59 raise ApplicationException("The provided path (%s) does not exist or is not a valid file path." % path)
60
61
62 def validate_argument_is_directory(args, argument_name):
63 file_name = getattr(args, argument_name)
64 if file_name is not None and os.path.isdir(file_name):
65 raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
66
67
68 def validate_argument_is_valid_path(args, argument_name):
69 paths_to_check = []
70 # check if we are handling a single file or a list of files
71 member_value = getattr(args, argument_name)
72 if member_value is not None:
73 if isinstance(member_value, list):
74 for file_name in member_value:
75 paths_to_check.append(strip(str(file_name)))
76 else:
77 paths_to_check.append(strip(str(member_value)))
78
79 for path_to_check in paths_to_check:
80 validate_path_exists(path_to_check)
81
82
83 # taken from
84 # http://stackoverflow.com/questions/8384737/python-extract-file-name-from-path-no-matter-what-the-os-path-format
85 def get_filename(path):
86 head, tail = ntpath.split(path)
87 return tail or ntpath.basename(head)
88
89
90 def get_filename_without_suffix(path):
91 root, ext = os.path.splitext(os.path.basename(path))
92 return root
93
94
95 def parse_input_ctds(xsd_location, input_ctds, output_destination, output_file_extension):
96 is_converting_multiple_ctds = len(input_ctds) > 1
97 parsed_ctds = []
98 schema = None
99 if xsd_location is not None:
100 try:
101 info("Loading validation schema from %s" % xsd_location, 0)
102 schema = etree.XMLSchema(etree.parse(xsd_location))
103 except Exception, e:
104 error("Could not load validation schema %s. Reason: %s" % (xsd_location, str(e)), 0)
105 else:
106 warning("Validation against a schema has not been enabled.", 0)
107
108 for input_ctd in input_ctds:
109 if schema is not None:
110 validate_against_schema(input_ctd, schema)
111
112 output_file = output_destination
113 # if multiple inputs are being converted, we need to generate a different output_file for each input
114 if is_converting_multiple_ctds:
115 output_file = os.path.join(output_file, get_filename_without_suffix(input_ctd) + "." + output_file_extension)
116 info("Parsing %s" % input_ctd)
117 parsed_ctds.append(ParsedCTD(CTDModel(from_file=input_ctd), input_ctd, output_file))
118
119 return parsed_ctds
120
121
122 def flatten_list_of_lists(args, list_name):
123 setattr(args, list_name, [item for sub_list in getattr(args, list_name) for item in sub_list])
124
125
126 def validate_against_schema(ctd_file, schema):
127 try:
128 parser = etree.XMLParser(schema=schema)
129 etree.parse(ctd_file, parser=parser)
130 except etree.XMLSyntaxError, e:
131 raise ApplicationException("Invalid CTD file %s. Reason: %s" % (ctd_file, str(e)))
132
133
134 def add_common_parameters(parser, version, last_updated):
135 parser.add_argument("FORMAT", default=None, help="Output format (mandatory). Can be one of: cwl, galaxy.")
136 parser.add_argument("-i", "--input", dest="input_files", default=[], required=True, nargs="+", action="append",
137 help="List of CTD files to convert.")
138 parser.add_argument("-o", "--output-destination", dest="output_destination", required=True,
139 help="If multiple input files are given, then a folder in which all converted "
140 "files will be generated is expected; "
141 "if a single input file is given, then a destination file is expected.")
142 parser.add_argument("-x", "--default-executable-path", dest="default_executable_path",
143 help="Use this executable path when <executablePath> is not present in the CTD",
144 default=None, required=False)
145 parser.add_argument("-b", "--blacklist-parameters", dest="blacklisted_parameters", default=[], nargs="+",
146 action="append",
147 help="List of parameters that will be ignored and won't appear on the galaxy stub",
148 required=False)
149 parser.add_argument("-p", "--hardcoded-parameters", dest="hardcoded_parameters", default=None, required=False,
150 help="File containing hardcoded values for the given parameters. Run with '-h' or '--help' "
151 "to see a brief example on the format of this file.")
152 parser.add_argument("-V", "--validation-schema", dest="xsd_location", default=None, required=False,
153 help="Location of the schema to use to validate CTDs. If not provided, no schema validation "
154 "will take place.")
155
156 # TODO: add verbosity, maybe?
157 program_version = "v%s" % version
158 program_build_date = str(last_updated)
159 program_version_message = "%%(prog)s %s (%s)" % (program_version, program_build_date)
160 parser.add_argument("-v", "--version", action="version", version=program_version_message)
161
162
163 def parse_hardcoded_parameters(hardcoded_parameters_file):
164 parameter_hardcoder = ParameterHardcoder()
165 if hardcoded_parameters_file is not None:
166 line_number = 0
167 with open(hardcoded_parameters_file) as f:
168 for line in f:
169 line_number += 1
170 if line is None or not line.strip() or line.strip().startswith("#"):
171 pass
172 else:
173 # the third column must not be obtained as a whole, and not split
174 parsed_hardcoded_parameter = line.strip().split(None, 2)
175 # valid lines contain two or three columns
176 if len(parsed_hardcoded_parameter) != 2 and len(parsed_hardcoded_parameter) != 3:
177 warning("Invalid line at line number %d of the given hardcoded parameters file. Line will be"
178 "ignored:\n%s" % (line_number, line), 0)
179 continue
180
181 parameter_name = parsed_hardcoded_parameter[0]
182 hardcoded_value = parsed_hardcoded_parameter[1]
183 tool_names = None
184 if len(parsed_hardcoded_parameter) == 3:
185 tool_names = parsed_hardcoded_parameter[2].split(',')
186 if tool_names:
187 for tool_name in tool_names:
188 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value, tool_name.strip())
189 else:
190 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value)
191
192 return parameter_hardcoder
193
194
195 def extract_tool_help_text(ctd_model):
196 manual = ""
197 doc_url = None
198 if "manual" in ctd_model.opt_attribs.keys():
199 manual += "%s\n\n" % ctd_model.opt_attribs["manual"]
200 if "docurl" in ctd_model.opt_attribs.keys():
201 doc_url = ctd_model.opt_attribs["docurl"]
202
203 help_text = "No help available"
204 if manual is not None:
205 help_text = manual
206 if doc_url is not None:
207 help_text = ("" if manual is None else manual) + "\nFor more information, visit %s" % doc_url
208
209 return help_text
210
211
212 def extract_tool_executable_path(model, default_executable_path):
213 # rules to build the executable path:
214 # if executablePath is null, then use default_executable_path
215 # if executablePath is null and executableName is null, then the name of the tool will be used
216 # if executablePath is null and executableName is not null, then executableName will be used
217 # if executablePath is not null and executableName is null,
218 # then executablePath and the name of the tool will be used
219 # if executablePath is not null and executableName is not null, then both will be used
220
221 # first, check if the model has executablePath / executableName defined
222 executable_path = model.opt_attribs.get("executablePath", None)
223 executable_name = model.opt_attribs.get("executableName", None)
224
225 # check if we need to use the default_executable_path
226 if executable_path is None:
227 executable_path = default_executable_path
228
229 # fix the executablePath to make sure that there is a '/' in the end
230 if executable_path is not None:
231 executable_path = executable_path.strip()
232 if not executable_path.endswith("/"):
233 executable_path += "/"
234
235 # assume that we have all information present
236 command = str(executable_path) + str(executable_name)
237 if executable_path is None:
238 if executable_name is None:
239 command = model.name
240 else:
241 command = executable_name
242 else:
243 if executable_name is None:
244 command = executable_path + model.name
245 return command
246
247
248 def extract_and_flatten_parameters(ctd_model):
249 parameters = []
250 if len(ctd_model.parameters.parameters) > 0:
251 # use this to put parameters that are to be processed
252 # we know that CTDModel has one parent ParameterGroup
253 pending = [ctd_model.parameters]
254 while len(pending) > 0:
255 # take one element from 'pending'
256 parameter = pending.pop()
257 if type(parameter) is not ParameterGroup:
258 parameters.append(parameter)
259 else:
260 # append the first-level children of this ParameterGroup
261 pending.extend(parameter.parameters.values())
262 # returned the reversed list of parameters (as it is now,
263 # we have the last parameter in the CTD as first in the list)
264 return reversed(parameters)
265
266
267 # some parameters are mapped to command line options, this method helps resolve those mappings, if any
268 def resolve_param_mapping(param, ctd_model):
269 # go through all mappings and find if the given param appears as a reference name in a mapping element
270 param_mapping = None
271 for cli_element in ctd_model.cli:
272 for mapping_element in cli_element.mappings:
273 if mapping_element.reference_name == param.name:
274 if param_mapping is not None:
275 warning("The parameter %s has more than one mapping in the <cli> section. "
276 "The first found mapping, %s, will be used." % (param.name, param_mapping), 1)
277 else:
278 param_mapping = cli_element.option_identifier
279
280 return param_mapping if param_mapping is not None else param.name
281
282
283 def _extract_param_cli_name(param, ctd_model):
284 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
285 if type(param.parent) == ParameterGroup:
286 if not hasattr(param.parent.parent, 'parent'):
287 return resolve_param_mapping(param, ctd_model)
288 elif not hasattr(param.parent.parent.parent, 'parent'):
289 return resolve_param_mapping(param, ctd_model)
290 else:
291 if ctd_model.cli:
292 warning("Using nested parameter sections (NODE elements) is not compatible with <cli>", 1)
293 return extract_param_name(param.parent) + ":" + resolve_param_mapping(param, ctd_model)
294 else:
295 return resolve_param_mapping(param, ctd_model)
296
297
298 def extract_param_name(param):
299 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
300 if type(param.parent) == ParameterGroup:
301 if not hasattr(param.parent.parent, "parent"):
302 return param.name
303 elif not hasattr(param.parent.parent.parent, "parent"):
304 return param.name
305 else:
306 return extract_param_name(param.parent) + ":" + param.name
307 else:
308 return param.name
309
310
311 def extract_command_line_prefix(param, ctd_model):
312 param_name = extract_param_name(param)
313 param_cli_name = _extract_param_cli_name(param, ctd_model)
314 if param_name == param_cli_name:
315 # there was no mapping, so for the cli name we will use a '-' in the prefix
316 param_cli_name = "-" + param_name
317 return param_cli_name
0 import os
1 import sys
2 import traceback
3 import common.utils as utils
4
5 from argparse import ArgumentParser
6 from argparse import RawDescriptionHelpFormatter
7 from common.exceptions import ApplicationException, ModelError
8
9 __all__ = []
10 __version__ = 2.0
11 __date__ = '2014-09-17'
12 __updated__ = '2017-08-09'
13
14 program_version = "v%s" % __version__
15 program_build_date = str(__updated__)
16 program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
17 program_short_description = "CTDConverter - A project from the WorkflowConversion family " \
18 "(https://github.com/WorkflowConversion/CTDConverter)"
19 program_usage = '''
20 USAGE:
21
22 $ python convert.py [FORMAT] [ARGUMENTS ...]
23
24 FORMAT can be either one of the supported output formats: cwl, galaxy.
25
26 There is one converter for each supported FORMAT, each taking a different set of arguments. Please consult the detailed
27 documentation for each of the converters. Nevertheless, all converters have the following common parameters/options:
28
29
30 I - Parsing a single CTD file and convert it:
31
32 $ python convert.py [FORMAT] -i [INPUT_FILE] -o [OUTPUT_FILE]
33
34
35 II - Parsing several CTD files, output converted wrappers in a given folder:
36
37 $ python converter.py [FORMAT] -i [INPUT_FILES] -o [OUTPUT_DIRECTORY]
38
39
40 III - Hardcoding parameters
41
42 It is possible to hardcode parameters. This makes sense if you want to set a tool in 'quiet' mode or if your tools
43 support multi-threading and accept the number of threads via a parameter, without giving end users the chance to
44 change the values for these parameters.
45
46 In order to generate hardcoded parameters, you need to provide a simple file. Each line of this file contains
47 two or three columns separated by whitespace. Any line starting with a '#' will be ignored. The first column contains
48 the name of the parameter, the second column contains the value that will always be set for this parameter. Only the
49 first two columns are mandatory.
50
51 If the parameter is to be hardcoded only for a set of tools, then a third column can be added. This column contains
52 a comma-separated list of tool names for which the parameter will be hardcoded. If a third column is not present,
53 then all processed tools containing the given parameter will get a hardcoded value for it.
54
55 The following is an example of a valid file:
56
57 ##################################### HARDCODED PARAMETERS example #####################################
58 # Every line starting with a # will be handled as a comment and will not be parsed.
59 # The first column is the name of the parameter and the second column is the value that will be used.
60
61 # Parameter name # Value # Tool(s)
62 threads 8
63 mode quiet
64 xtandem_executable xtandem XTandemAdapter
65 verbosity high Foo, Bar
66
67 #########################################################################################################
68
69 Using the above file will produce a command-line similar to:
70
71 [TOOL] ... -threads 8 -mode quiet ...
72
73 for all tools. For XTandemAdapter, however, the command-line will look like:
74
75 XtandemAdapter ... -threads 8 -mode quiet -xtandem_executable xtandem ...
76
77 And for tools Foo and Bar, the command-line will be similar to:
78
79 Foo -threads 8 -mode quiet -verbosity high ...
80
81
82 IV - Engine-specific parameters
83
84 i - Galaxy
85
86 a. Providing file formats, mimetypes
87
88 Galaxy supports the concept of file format in order to connect compatible ports, that is, input ports of a
89 certain data format will be able to receive data from a port from the same format. This converter allows you
90 to provide a personalized file in which you can relate the CTD data formats with supported Galaxy data formats.
91 The layout of this file consists of lines, each of either one or four columns separated by any amount of
92 whitespace. The content of each column is as follows:
93
94 * 1st column: file extension
95 * 2nd column: data type, as listed in Galaxy
96 * 3rd column: full-named Galaxy data type, as it will appear on datatypes_conf.xml
97 * 4th column: mimetype (optional)
98
99 The following is an example of a valid "file formats" file:
100
101 ########################################## FILE FORMATS example ##########################################
102 # Every line starting with a # will be handled as a comment and will not be parsed.
103 # The first column is the file format as given in the CTD and second column is the Galaxy data format. The
104 # second, third, fourth and fifth columns can be left empty if the data type has already been registered
105 # in Galaxy, otherwise, all but the mimetype must be provided.
106
107 # CTD type # Galaxy type # Long Galaxy data type # Mimetype
108 csv tabular galaxy.datatypes.data:Text
109 fasta
110 ini txt galaxy.datatypes.data:Text
111 txt
112 idxml txt galaxy.datatypes.xml:GenericXml application/xml
113 options txt galaxy.datatypes.data:Text
114 grid grid galaxy.datatypes.data:Grid
115 ##########################################################################################################
116
117 Note that each line consists precisely of either one, three or four columns. In the case of data types already
118 registered in Galaxy (such as fasta and txt in the above example), only the first column is needed. In the
119 case of data types that haven't been yet registered in Galaxy, the first three columns are needed
120 (mimetype is optional).
121
122 For information about Galaxy data types and subclasses, see the following page:
123 https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
124
125
126 b. Finer control over which tools will be converted
127
128 Sometimes only a subset of CTDs needs to be converted. It is possible to either explicitly specify which tools
129 will be converted or which tools will not be converted.
130
131 The value of the -s/--skip-tools parameter is a file in which each line will be interpreted as the name of a
132 tool that will not be converted. Conversely, the value of the -r/--required-tools is a file in which each line
133 will be interpreted as a tool that is required. Only one of these parameters can be specified at a given time.
134
135 The format of both files is exactly the same. As stated before, each line will be interpreted as the name of a
136 tool. Any line starting with a '#' will be ignored.
137
138
139 ii - CWL
140
141 There are, for now, no CWL-specific parameters or options.
142
143 '''
144
145 program_license = '''%(short_description)s
146
147 Copyright 2017, WorklfowConversion
148
149 Licensed under the Apache License, Version 2.0 (the "License");
150 you may not use this file except in compliance with the License.
151 You may obtain a copy of the License at
152
153 http://www.apache.org/licenses/LICENSE-2.0
154
155 Unless required by applicable law or agreed to in writing, software
156 distributed under the License is distributed on an "AS IS" BASIS,
157 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
158 See the License for the specific language governing permissions and
159 limitations under the License.
160
161 %(usage)s
162 ''' % {'short_description': program_short_description, 'usage': program_usage}
163
164
165 def main(argv=None):
166 if argv is None:
167 argv = sys.argv
168 else:
169 sys.argv.extend(argv)
170
171 # check that we have, at least, one argument provided
172 # at this point we cannot parse the arguments, because each converter takes different arguments, meaning each
173 # converter will register its own parameters after we've registered the basic ones... we have to do it old school
174 if len(argv) < 2:
175 utils.error("Not enough arguments provided")
176 print("\nUsage: $ python convert.py [TARGET] [ARGUMENTS]\n\n" +
177 "Where:\n" +
178 " target: one of 'cwl' or 'galaxy'\n\n" +
179 "Run again using the -h/--help option to print more detailed help.\n")
180 return 1
181
182 # TODO: at some point this should look like real software engineering and use a map containing converter instances
183 # whose keys would be the name of the converter (e.g., cwl, galaxy), but for the time being, only two formats
184 # are supported
185 target = str.lower(argv[1])
186 if target == 'cwl':
187 from cwl import converter
188 elif target == 'galaxy':
189 from galaxy import converter
190 elif target == '-h' or target == '--help' or target == '--h' or target == 'help':
191 print(program_license)
192 return 0
193 else:
194 utils.error("Unrecognized target engine. Supported targets are 'cwl' and 'galaxy'.")
195 return 1
196
197 utils.info("Using %s converter" % target)
198
199 try:
200 # Setup argument parser
201 parser = ArgumentParser(prog="CTDConverter", description=program_license,
202 formatter_class=RawDescriptionHelpFormatter, add_help=True)
203 utils.add_common_parameters(parser, program_version_message, program_build_date)
204
205 # add tool-specific arguments
206 converter.add_specific_args(parser)
207
208 # parse arguments and perform some basic, common validation
209 args = parser.parse_args()
210 validate_and_prepare_common_arguments(args)
211
212 # parse the input CTD files into CTDModels
213 parsed_ctds = utils.parse_input_ctds(args.xsd_location, args.input_files, args.output_destination,
214 converter.get_preferred_file_extension())
215
216 # let the converter do its own thing
217 converter.convert_models(args, parsed_ctds)
218 return 0
219
220 except KeyboardInterrupt:
221 print("Interrupted...")
222 return 0
223
224 except ApplicationException, e:
225 traceback.print_exc()
226 utils.error("CTDConverter could not complete the requested operation.", 0)
227 utils.error("Reason: " + e.msg, 0)
228 return 1
229
230 except ModelError, e:
231 traceback.print_exc()
232 utils.error("There seems to be a problem with one of your input CTDs.", 0)
233 utils.error("Reason: " + e.msg, 0)
234 return 1
235
236 except Exception, e:
237 traceback.print_exc()
238 utils.error("CTDConverter could not complete the requested operation.", 0)
239 utils.error("Reason: " + e.msg, 0)
240 return 2
241
242
243 def validate_and_prepare_common_arguments(args):
244 # flatten lists of lists to a list containing elements
245 lists_to_flatten = ["input_files", "blacklisted_parameters"]
246 for list_to_flatten in lists_to_flatten:
247 utils.flatten_list_of_lists(args, list_to_flatten)
248
249 # if input is a single file, we expect output to be a file (and not a dir that already exists)
250 if len(args.input_files) == 1:
251 if os.path.isdir(args.output_destination):
252 raise ApplicationException("If a single input file is provided, output (%s) is expected to be a file "
253 "and not a folder.\n" % args.output_destination)
254
255 # if input is a list of files, we expect output to be a folder
256 if len(args.input_files) > 1:
257 if not os.path.isdir(args.output_destination):
258 raise ApplicationException("If several input files are provided, output (%s) is expected to be an "
259 "existing directory.\n" % args.output_destination)
260
261 # check that the provided input files, if provided, contain a valid file path
262 input_arguments_to_check = ["xsd_location", "input_files", "hardcoded_parameters"]
263 for argument_name in input_arguments_to_check:
264 utils.validate_argument_is_valid_path(args, argument_name)
265
266 # add the parameter hardcoder
267 args.parameter_hardcoder = utils.parse_hardcoded_parameters(args.hardcoded_parameters)
268
269
270 if __name__ == "__main__":
271 sys.exit(main())
0 # Conversion of CTD Files to CWL
1
2 ## How to use: Parameters in Detail
3 The CWL converter has, for now, only the basic parameters described in the [top README file](../README.md).
4
(New empty file)
0 #!/usr/bin/env python
1 # encoding: utf-8
2
3 # instead of using cwlgen, we decided to use PyYAML directly
4 # we promptly found a problem with cwlgen, namely, it is not possible to construct something like:
5 # some_paramter:
6 # type: ['null', string]
7 # which kind of sucks, because this seems to be the way to state that a parameter is truly optional and has no default
8 # since cwlgen is just "fancy classes" around the yaml.dump() method, we implemented our own generation of yaml
9
10
11 import ruamel.yaml as yaml
12
13 from CTDopts.CTDopts import _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, ModelError, _Null
14 from common import utils, logger
15
16 # all cwl-related properties are defined here
17
18 CWL_SHEBANG = "#!/usr/bin/env cwl-runner"
19 CURRENT_CWL_VERSION = 'v1.0'
20 CWL_VERSION = 'cwlVersion'
21 CLASS = 'class'
22 BASE_COMMAND = 'baseCommand'
23 INPUTS = 'inputs'
24 ID = 'id'
25 TYPE = 'type'
26 INPUT_BINDING = 'inputBinding'
27 OUTPUT_BINDING = 'outputBinding'
28 PREFIX = 'prefix'
29 OUTPUTS = 'outputs'
30 VALUE_FROM = 'valueFrom'
31 GLOB = 'glob'
32 LABEL = 'label'
33 DOC = 'doc'
34 DEFAULT = 'default'
35
36 # types
37 TYPE_NULL = 'null'
38 TYPE_BOOLEAN = 'boolean'
39 TYPE_INT = 'int'
40 TYPE_LONG = 'long'
41 TYPE_FLOAT = 'float'
42 TYPE_DOUBLE = 'double'
43 TYPE_STRING = 'string'
44 TYPE_FILE = 'File'
45 TYPE_DIRECTORY = 'Directory'
46
47 TYPE_TO_CWL_TYPE = {int: TYPE_INT, float: TYPE_DOUBLE, str: TYPE_STRING, bool: TYPE_BOOLEAN, _InFile: TYPE_FILE,
48 _OutFile: TYPE_FILE, _Choices: TYPE_STRING}
49
50
51 def add_specific_args(parser):
52 # no specific arguments for CWL conversion, for now
53 # however, this method has to be defined, otherwise ../convert.py won't work for CWL
54 pass
55
56
57 def get_preferred_file_extension():
58 return "cwl"
59
60
61 def convert_models(args, parsed_ctds):
62 # go through each ctd model and perform the conversion, easy as pie!
63 for parsed_ctd in parsed_ctds:
64 model = parsed_ctd.ctd_model
65 origin_file = parsed_ctd.input_file
66 output_file = parsed_ctd.suggested_output_file
67
68 logger.info("Converting %s (source %s)" % (model.name, utils.get_filename(origin_file)))
69 cwl_tool = convert_to_cwl(model, args)
70
71 logger.info("Writing to %s" % utils.get_filename(output_file), 1)
72
73 stream = file(output_file, 'w')
74 stream.write(CWL_SHEBANG + '\n\n')
75 stream.write("# This CWL file was automatically generated using CTDConverter.\n")
76 stream.write("# Visit https://github.com/WorkflowConversion/CTDConverter for more information.\n\n")
77 yaml.dump(cwl_tool, stream, default_flow_style=False)
78 stream.close()
79
80
81 # returns a dictionary
82 def convert_to_cwl(ctd_model, args):
83 # create cwl_tool object with the basic information
84 base_command = utils.extract_tool_executable_path(ctd_model, args.default_executable_path)
85
86 # add basic properties
87 cwl_tool = {}
88 cwl_tool[CWL_VERSION] = CURRENT_CWL_VERSION
89 cwl_tool[CLASS] = 'CommandLineTool'
90 cwl_tool[LABEL] = ctd_model.opt_attribs["description"]
91 cwl_tool[DOC] = utils.extract_tool_help_text(ctd_model)
92 cwl_tool[BASE_COMMAND] = base_command
93
94 # TODO: test with optional output files
95
96 # add inputs/outputs
97 for param in utils.extract_and_flatten_parameters(ctd_model):
98 if param.name in args.blacklisted_parameters:
99 continue
100
101 param_name = utils.extract_param_name(param)
102 cwl_fixed_param_name = fix_param_name(param_name)
103 hardcoded_value = args.parameter_hardcoder.get_hardcoded_value(param_name, ctd_model.name)
104 param_default = str(param.default) if param.default is not _Null and param.default is not None else None
105
106 if param.type is _OutFile:
107 create_lists_if_missing(cwl_tool, [INPUTS, OUTPUTS])
108 # we know the only outputs are of type _OutFile
109 # we need an input of type string that will contain the name of the output file
110 input_binding = {}
111 input_binding[PREFIX] = utils.extract_command_line_prefix(param, ctd_model)
112 if hardcoded_value is not None:
113 input_binding[VALUE_FROM] = hardcoded_value
114
115 label = "Filename for %s output file" % param_name
116 input_name_for_output_filename = get_input_name_for_output_filename(param)
117 input_param = {}
118 input_param[ID] = input_name_for_output_filename
119 input_param[INPUT_BINDING] = input_binding
120 input_param[DOC] = label
121 input_param[LABEL] = label
122 if param_default is not None:
123 input_param[DEFAULT] = param_default
124 input_param[TYPE] = generate_cwl_param_type(param, TYPE_STRING)
125
126 output_binding = {}
127 output_binding[GLOB] = "$(inputs.%s)" % input_name_for_output_filename
128
129 output_param = {}
130 output_param[ID] = cwl_fixed_param_name
131 output_param[OUTPUT_BINDING] = output_binding
132 output_param[DOC] = param.description
133 output_param[LABEL] = param.description
134 output_param[TYPE] = generate_cwl_param_type(param)
135
136 cwl_tool[INPUTS].append(input_param)
137 cwl_tool[OUTPUTS].append(output_param)
138
139 else:
140 create_lists_if_missing(cwl_tool, [INPUTS])
141 # we know that anything that is not an _OutFile is an input
142 input_binding = {}
143 input_binding[PREFIX] = utils.extract_command_line_prefix(param, ctd_model)
144 if hardcoded_value is not None:
145 input_binding[VALUE_FROM] = hardcoded_value
146
147 input_param = {}
148 input_param[ID] = cwl_fixed_param_name
149 input_param[DOC] = param.description
150 input_param[LABEL] = param.description
151 if param_default is not None:
152 input_param[DEFAULT] = param_default
153 input_param[INPUT_BINDING] = input_binding
154 input_param[TYPE] = generate_cwl_param_type(param)
155
156 cwl_tool[INPUTS].append(input_param)
157
158 return cwl_tool
159
160
161 def create_lists_if_missing(cwl_tool, keys):
162 for key in keys:
163 if key not in cwl_tool:
164 cwl_tool[key] = []
165
166
167 def get_input_name_for_output_filename(param):
168 assert param.type is _OutFile, "Only output files can get a generated filename input parameter."
169 return fix_param_name(utils.extract_param_name(param)) + "_filename"
170
171
172 def fix_param_name(param_name):
173 # IMPORTANT: there seems to be a problem in CWL if the prefix and the parameter name are the same, so we need to
174 # prepend something to the parameter name that will be registered in CWL, also, using colons in parameter
175 # names seems to bring all sorts of problems for cwl-runner
176 return 'param_' + param_name.replace(":", "_")
177
178
179 # in order to provide "true" optional params, the parameter type should be something like ['null', <CWLType>],
180 # for instance ['null', int]
181 def generate_cwl_param_type(param, forced_type=None):
182 cwl_type = TYPE_TO_CWL_TYPE[param.type] if forced_type is None else forced_type
183 return cwl_type if param.required else ['null', cwl_type]
00 # Conversion of CTD Files to Galaxy ToolConfigs
1 ## Generating a `tool_conf.xml` File
2 * Purpose: Galaxy uses a file `tool_conf.xml` in which other tools can be included. `CTDConverter` can also generate this file. Categories will be extracted from the provided input CTDs and for each category, a different `<section>` will be generated. Any input CTD lacking a category will be sorted under the provided default category.
3 * Short/long version: `-t` / `--tool-conf-destination`
4 * Required: no.
5 * Taken values: The destination of the file.
16
2 ## How to use: most common Tasks
7 $ python convert.py galaxy -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -t /data/generated-galaxy-stubs/tool_conf.xml
8
39
4 The Galaxy ToolConfig generator takes several parameters and a varying number of inputs and outputs. The following sub-sections show how to perform the most common operations.
5
6 Running the generator with the `-h/--help` parameter will print extended information about each of the parameters.
7
8 ### Macros
9
10 Galaxy supports the use of macros via a `macros.xml` file (we provide a sample macros file in [macros.xml]). Instead of repeating sections, macros can be used and expanded. If you want fine control over the macros, you can use the `-m` / `--macros` parameter to provide your own macros file.
11
12 Please note that the used macros file **must** be copied to your Galaxy installation on the same location in which you place the generated *ToolConfig* files, otherwise Galaxy will not be able to parse the generated *ToolConfig* files!
13
14 ### One input, one Output
15
16 In its simplest form, the converter takes an input CTD file and generates an output Galaxy *ToolConfig* file. The following usage of `generator.py`:
17
18 $ python generator.py -i /data/sample_input.ctd -o /data/sample_output.xml
19
20 will parse `/data/sample_input.ctd` and generate a Galaxy tool wrapper under `/data/sample_output.xml`. The generated file can be added to your Galaxy instance like any other tool.
21
22 ### Converting several CTDs at once
23
24 When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
25
26 $ python generator.py -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/generated-galaxy-stubs
27
28 Will convert `/data/ctds/one.ctd` into `/data/generated-galaxy-stubs/one.xml` and `/data/ctds/two.ctd` into `/data/generated-galaxy-stubs/two.xml`.
29
30 You can use wildcard expansion, as supported by most modern operating systems:
31
32 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs
33
34 ### Generating a tool_conf.xml File
35
36 The generator supports generation of a `tool_conf.xml` file which you can later use in your local Galaxy installation. The parameter `-t`/`--tool-conf-destination` contains the path of a file in which a `tool_conf.xml` file will be generated.
37
38 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -t /data/generated-galaxy-stubs/tool_conf.xml
39
40
41 ## How to use: Parameters in Detail
42
43 ### A Word about Parameters taking Lists of Values
44
45 All parameters have a short and a long option and some parameters take list of values. Using either the long or the short option of the parameter will produce the same output. The following examples show how to pass values using the `-f` / `--foo` parameter:
46
47 The following uses of the parameter will pass the list of values containing `bar`, `blah` and `blu`:
48
49 -f bar blah blu
50 --foo bar blah blu
51 -f bar -f blah -f blu
52 --foo bar --foo blah --foo blu
53 -f bar --foo blah blu
54
55 The following uses of the parameter will pass a single value `bar`:
56
57 -f bar
58 --foo bar
59
60 ### Schema Validation
61
62 * Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
63 * Short/long version: `v` / `--validation-schema`
64 * Required: no.
65 * Taken values: location of the schema file (e.g., CTD.xsd).
66
67 CTDs can be validated against a schema. The master version of the schema can be found under [CTDSchema].
68
69 If a schema is provided, all input CTDs will be validated against it.
70
71 ### Input File(s)
72
73 * Purpose: Provide input CTD file(s) to convert.
74 * Short/long version: `-i` / `--input`
75 * Required: yes.
76 * Taken values: a list of input CTD files.
77
78 Example:
79
80 Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
81
82 $ python generator.py -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
83 $ python generator.py -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
84 $ python generator.py --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
85 $ python generator.py --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
86
87 The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
88
89 $ python generator.py -i /data/input.ctd -o /data/output.xml -m sample_files/macros.xml
90
91 Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Imagine that the folder `/data/ctds` contains three files, `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/galaxy`:
92
93 $ python generator.py -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/galaxy
94 $ python generator.py -i /data/*.ctd -o /data/galaxy
95
96 ### Finer Control over the Tools to be converted
97
98 Sometimes only a set of CTDs in a folder need to be converted. The parameter `-r`/`--required-tools` takes the path a file containing the names of tools that will be converted.
99
100 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -r required_tools.txt
101
102 On the other hand, if you want the generator to skip conversion of some CTDs, the parameter `-s`/`--skip-tools` will take the path of a file containing the names of tools that will not be converted.
103
104 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -s skipped_tools.txt
105
106 The format of these files (`required_tools.txt`, `skipped_tools.txt` in the examples above) is straightforward. Each line contains the name of a tool and any line starting with `#` will be ignored.
107
108 ### Output Destination
109
110 * Purpose: Provide output destination for the generated Galaxy *ToolConfig* files.
111 * Short/long version: `-o` / `--output-destination`
112 * Required: yes.
113 * Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder, in which all generated Galaxy *ToolConfig* will be written, is expected.
114
115 Example:
116
117 A single input is given, and the output will be generated into `/data/output.xml`:
118
119 $ python generator.py -i /data/input.ctd -o /data/output.xml
120
121 Several inputs are given. The output is the already existent folder, `/data/stubs`, and at the end of the operation, the files `/data/stubs/input_one.ctd.xml` and `/data/stubs/input_two.ctd.xml` will be generated:
122
123 $ python generator.py -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
124
125
126 ### Adding Parameters to the Command-line
127
10 ## Adding Parameters to the Command-line
12811 * Purpose: Galaxy *ToolConfig* files include a `<command>` element in which the command line to invoke the tool can be given. Sometimes it is needed to invoke your tools in a certain way (i.e., passing certain parameters). For instance, some tools offer the possibility to be invoked in a verbose or quiet way or even to be invoked in a headless way (i.e., without GUI).
12912 * Short/long version: `-a` / `--add-to-command-line`
13013 * Required: no.
13215
13316 Example:
13417
135 $ python generator.py ... -a "--quiet --no-gui"
18 $ python convert.py galaxy ... -a "--quiet --no-gui"
13619
13720 Will generate the following `<command>` element in the generated Galaxy *ToolConfig*:
13821
13922 <command>TOOL_NAME --quiet --no-gui ...</command>
14023
141
142 ### Blacklisting Parameters
143
144 * Purpose: Some parameters present in the CTD are not to be exposed on Galaxy. Think of parameters such as `--help`, `--debug`, that might won't make much sense to be exposed to final users in a workflow management system such as Galaxy.
145 * Short/long version: `-b` / `--blacklist-parameters`
146 * Required: no.
147 * Taken values: A list of parameters to be blacklisted.
148
149 Example:
150
151 $ python generator.py ... -b h help quiet
152
153 Will not process any of the parameters named `h`, `help`, or `quiet` and will not appear in the generated Galaxy *ToolConfig*.
154
155 ### Generating a tool_conf.xml file
156
157 * Purpose: Galaxy uses a file `tool_conf.xml` in which other tools can be included. `generator.py` can also generate this file. Categories will be extracted from the provided input CTDs and for each category, a different `<section>` will be generated. Any input CTD lacking a category will be sorted under the provided default category.
158 * Short/long version: `-t` / `--tool-conf-destination`
159 * Required: no.
160 * Taken values: The destination of the file.
161
162 ### Providing a default Category
163
164 * Purpose: Input CTDs that lack a category will be sorted under the value given to this parameter. If this parameter is not given, then the category `DEFAULT` will be used.
24 ## Providing a default Category
25 * Purpose: Input CTDs that lack a category will be sorted under the value given to this parameter. If this parameter is not provided, then the category `DEFAULT` will be used.
16526 * Short/long version: `-c` / `--default-category`
16627 * Required: no.
16728 * Taken values: The value for the default category to use for input CTDs lacking a category.
17031
17132 Suppose there is a folder containing several CTD files. Some of those CTDs don't have the optional attribute `category` and the rest belong to the `Data Processing` category. The following invocation:
17233
173 $ python generator.py ... -c Other
34 $ python convert.py galaxy ... -c Other
17435
17536 will generate, for each of the categories, a different section. Additionally, CTDs lacking a category will be sorted under the given category, `Other`, as shown:
17637
18647 ...
18748 </section>
18849
189 ### Providing a Path for the Location of the ToolConfig Files
190
191 * Purpose: The `tool_conf.xml` file contains references to files which in turn contain Galaxy *ToolConfig* files. Using this parameter, you can provide information about the location of your tools.
50 ## Providing a Path for the Location of the *ToolConfig* Files
51 * Purpose: The `tool_conf.xml` file contains references to files which in turn contain Galaxy *ToolConfig* files. Using this parameter, you can provide information about the location of your wrappers on your Galaxy instance.
19252 * Short/long version: `-g` / `--galaxy-tool-path`
19353 * Required: no.
19454 * Taken values: The path relative to your `$GALAXY_ROOT/tools` folder on which your tools are located.
19555
19656 Example:
19757
198 $ python generator.py ... -g my_tools_folder
58 $ python convert.py galaxy ... -g my_tools_folder
19959
20060 Will generate `<tool>` elements in the generated `tool_conf.xml` as follows:
20161
20363
20464 In this example, `tool_conf.xml` refers to a file located on `$GALAXY_ROOT/tools/my_tools_folder/some_tool.xml`.
20565
206
207 ### Hardcoding Parameters
208
209 * Purpose: Fixing the value of a parameter and hide it from the end user.
210 * Short/long version: `-p` / `--hardcoded-parameters`
211 * Required: no.
212 * Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use in the `<command>` section.
213
214 It is sometimes required that parameters are hidden from the end user in workflow systems such as Galaxy and that they take a predetermined value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. The first two columns are mandatory.
215
216 If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
217
218 Lines starting with `#` will be ignored. The following is an example of a valid file:
219
220 # Parameter name # Value # Tool(s)
221 threads \${GALAXY_SLOTS:-24}
222 mode quiet
223 xtandem_executable xtandem XTandemAdapter
224 verbosity high Foo, Bar
225
226 This will produce a `<command>` section similar to the following one for all tools but `XTandemAdapter`, `Foo` and `Bar`:
227
228 <command>TOOL_NAME -threads \${GALAXY_SLOTS:-24} -mode quiet ...</command>
229
230 For `XTandemAdapter`, the `<command>` will be similar to:
231
232 <command>XtandemAdapter ... -threads \${GALAXY_SLOTS:-24} -mode quiet -xtandem_executable xtandem ...</command>
233
234 And for tools `Foo` and `Bar`, the `<command>` will be similar to:
235
236 <command>Foo ... ... -threads \${GALAXY_SLOTS:-24} -mode quiet -verbosity high ...</command>
237
238
239 ### Including additional Macros Files
240
66 ## Including additional Macros Files
24167 * Purpose: Include external macros files.
24268 * Short/long version: `-m` / `--macros`
24369 * Required: no.
24672
24773 *ToolConfig* supports elaborate sections such as `<stdio>`, `<requirements>`, etc., that are identical across tools of the same suite. Macros files assist in the task of including external xml sections into *ToolConfig* files. For more information about the syntax of macros files, see: https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#Reusing_Repeated_Configuration_Elements
24874
249 There are some macros that are required, namely `stdio`, `requirements` and `advanced_options`. A template macro file is included in [macros.xml]. It can be edited to suit your needs and you could add extra macros or leave it as it is and include additional files.
75 There are some macros that are required, namely `stdio`, `requirements` and `advanced_options`. A template macro file is included in [macros.xml]. It can be edited to suit your needs and you could add extra macros or leave it as it is and include additional files. Every macro found in the provided files will be expanded.
25076
251 Every macro found in the included files and in `support_files/macros.xml` will be expanded. Users are responsible for copying the given macros files in their corresponding galaxy folders.
77 Please note that the used macros files **must** be copied to your Galaxy installation on the same location in which you place the generated *ToolConfig* files, otherwise Galaxy will not be able to parse the generated *ToolConfig* files!
25278
253 ### Providing a default executable Path
254
255 * Purpose: Help Galaxy locate tools by providing a path.
256 * Short/long version: `-x` / `--default-executable-path`
257 * Required: no.
258 * Taken values: The default executable path of the tools in the Galaxy installation.
259
260 CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the `<command>` section. Suppose that you have installed a tool suite in your local Galaxy instance under `/opt/suite/bin`. The following invocation of the converter:
261
262 $ python generator.py -x /opt/suite/bin ...
263
264 Will produce a `<command>` section similar to:
265
266 <command>/opt/suite/bin/Foo ...</command>
267
268 For those CTDs in which no `<executablePath>` could be found.
269
270
271 ### Generating a `datatypes_conf.xml` File
272
79 ## Generating a `datatypes_conf.xml` File
27380 * Purpose: Specify the destination of a generated `datatypes_conf.xml` file.
27481 * Short/long version: `-d` / `--datatypes-destination`
27582 * Required: no.
27784
27885 It is likely that your tools use file formats or mimetypes that have not been registered in Galaxy. The generator allows you to specify a path in which an automatically generated `datatypes_conf.xml` file will be created. Consult the next section to get information about how to register file formats and mimetypes.
27986
280
281 ### Providing Galaxy File Formats
282
87 ## Providing Galaxy File Formats
28388 * Purpose: Register new file formats and mimetypes.
28489 * Short/long version: `-f` / `--formats-file`
28590 * Required: no.
307112
308113 For information about Galaxy data types and subclasses, consult the following page: https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
309114
310
311 ## Notes about some of the *OpenMS* Tools
312
313 * Most of the tools can be generated automatically. Some of the tools need some extra work (for now).
314 * These adapters need to be changed, such that you provide the path to the executable:
115 ## Remarks about some of the *OpenMS* Tools
116 * Most of the tools can be generated automatically. However, some of the tools need some extra work (for now).
117 * The following adapters need to be changed, such that you provide the path to the executable:
315118 * FidoAdapter (add `-exe fido` in the command tag, delete the `$param_exe` in the command tag, delete the parameter from the input list).
316119 * MSGFPlusAdapter (add `-executable msgfplus.jar` in the command tag, delete the `$param_executable` in the command tag, delete the parameter from the input list).
317120 * MyriMatchAdapter (add `-myrimatch_executable myrimatch` in the command tag, delete the `$param_myrimatch_executable` in the command tag, delete the parameter from the input list).
320123 * XTandemAdapter (add `-xtandem_executable xtandem` in the command tag, delete the $param_xtandem_executable in the command tag, delete the parameter from the input list).
321124 * To avoid the deletion in the inputs you can also add these parameters to the blacklist
322125
323 $ python generator.py -b exe executable myrimatch_excutable omssa_executable pepnovo_executable xtandem_executable
126 $ python convert.py galaxy -b exe executable myrimatch_excutable omssa_executable pepnovo_executable xtandem_executable
324127
325 * These tools have multiple outputs (number of inputs = number of outputs) which is not yet supported in Galaxy-stable:
128 * The following tools have multiple outputs (number of inputs = number of outputs) which is not yet supported in Galaxy-stable:
326129 * SeedListGenerator
327130 * SpecLibSearcher
328131 * MapAlignerIdentification
(New empty file)
0 #!/usr/bin/env python
1 # encoding: utf-8
2 import os
3 import string
4
5 from collections import OrderedDict
6 from string import strip
7 from lxml import etree
8 from lxml.etree import SubElement, Element, ElementTree, ParseError, parse
9
10 from common import utils, logger
11 from common.exceptions import ApplicationException, InvalidModelException
12
13 from CTDopts.CTDopts import _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, ModelError, _Null
14
15
16 TYPE_TO_GALAXY_TYPE = {int: 'integer', float: 'float', str: 'text', bool: 'boolean', _InFile: 'data',
17 _OutFile: 'data', _Choices: 'select'}
18 STDIO_MACRO_NAME = "stdio"
19 REQUIREMENTS_MACRO_NAME = "requirements"
20 ADVANCED_OPTIONS_MACRO_NAME = "advanced_options"
21
22 REQUIRED_MACROS = [STDIO_MACRO_NAME, REQUIREMENTS_MACRO_NAME, ADVANCED_OPTIONS_MACRO_NAME]
23
24
25 class ExitCode:
26 def __init__(self, code_range="", level="", description=None):
27 self.range = code_range
28 self.level = level
29 self.description = description
30
31
32 class DataType:
33 def __init__(self, extension, galaxy_extension=None, galaxy_type=None, mimetype=None):
34 self.extension = extension
35 self.galaxy_extension = galaxy_extension
36 self.galaxy_type = galaxy_type
37 self.mimetype = mimetype
38
39
40 def add_specific_args(parser):
41 parser.add_argument("-f", "--formats-file", dest="formats_file",
42 help="File containing the supported file formats. Run with '-h' or '--help' to see a "
43 "brief example on the layout of this file.", default=None, required=False)
44 parser.add_argument("-a", "--add-to-command-line", dest="add_to_command_line",
45 help="Adds content to the command line", default="", required=False)
46 parser.add_argument("-d", "--datatypes-destination", dest="data_types_destination",
47 help="Specify the location of a datatypes_conf.xml to modify and add the registered "
48 "data types. If the provided destination does not exist, a new file will be created.",
49 default=None, required=False)
50 parser.add_argument("-c", "--default-category", dest="default_category", default="DEFAULT", required=False,
51 help="Default category to use for tools lacking a category when generating tool_conf.xml")
52 parser.add_argument("-t", "--tool-conf-destination", dest="tool_conf_destination", default=None, required=False,
53 help="Specify the location of an existing tool_conf.xml that will be modified to include "
54 "the converted tools. If the provided destination does not exist, a new file will"
55 "be created.")
56 parser.add_argument("-g", "--galaxy-tool-path", dest="galaxy_tool_path", default=None, required=False,
57 help="The path that will be prepended to the file names when generating tool_conf.xml")
58 parser.add_argument("-r", "--required-tools", dest="required_tools_file", default=None, required=False,
59 help="Each line of the file will be interpreted as a tool name that needs translation. "
60 "Run with '-h' or '--help' to see a brief example on the format of this file.")
61 parser.add_argument("-s", "--skip-tools", dest="skip_tools_file", default=None, required=False,
62 help="File containing a list of tools for which a Galaxy stub will not be generated. "
63 "Run with '-h' or '--help' to see a brief example on the format of this file.")
64 parser.add_argument("-m", "--macros", dest="macros_files", default=[], nargs="*",
65 action="append", required=None, help="Import the additional given file(s) as macros. "
66 "The macros stdio, requirements and advanced_options are "
67 "required. Please see galaxy/macros.xml for an example of a "
68 "valid macros file. All defined macros will be imported.")
69
70
71 def convert_models(args, parsed_ctds):
72 # validate and prepare the passed arguments
73 validate_and_prepare_args(args)
74
75 # extract the names of the macros and check that we have found the ones we need
76 macros_to_expand = parse_macros_files(args.macros_files)
77
78 # parse the given supported file-formats file
79 supported_file_formats = parse_file_formats(args.formats_file)
80
81 # parse the skip/required tools files
82 skip_tools = parse_tools_list_file(args.skip_tools_file)
83 required_tools = parse_tools_list_file(args.required_tools_file)
84
85 _convert_internal(parsed_ctds,
86 supported_file_formats=supported_file_formats,
87 default_executable_path=args.default_executable_path,
88 add_to_command_line=args.add_to_command_line,
89 blacklisted_parameters=args.blacklisted_parameters,
90 required_tools=required_tools,
91 skip_tools=skip_tools,
92 macros_file_names=args.macros_files,
93 macros_to_expand=macros_to_expand,
94 parameter_hardcoder=args.parameter_hardcoder)
95
96 # generation of galaxy stubs is ready... now, let's see if we need to generate a tool_conf.xml
97 if args.tool_conf_destination is not None:
98 generate_tool_conf(parsed_ctds, args.tool_conf_destination,
99 args.galaxy_tool_path, args.default_category)
100
101 # generate datatypes_conf.xml
102 if args.data_types_destination is not None:
103 generate_data_type_conf(supported_file_formats, args.data_types_destination)
104
105
106 def parse_tools_list_file(tools_list_file):
107 tools_list = None
108 if tools_list_file is not None:
109 tools_list = []
110 with open(tools_list_file) as f:
111 for line in f:
112 if line is None or not line.strip() or line.strip().startswith("#"):
113 continue
114 else:
115 tools_list.append(line.strip())
116
117 return tools_list
118
119
120 def parse_macros_files(macros_file_names):
121 macros_to_expand = set()
122
123 for macros_file_name in macros_file_names:
124 try:
125 macros_file = open(macros_file_name)
126 logger.info("Loading macros from %s" % macros_file_name, 0)
127 root = parse(macros_file).getroot()
128 for xml_element in root.findall("xml"):
129 name = xml_element.attrib["name"]
130 if name in macros_to_expand:
131 logger.warning("Macro %s has already been found. Duplicate found in file %s." %
132 (name, macros_file_name), 0)
133 else:
134 logger.info("Macro %s found" % name, 1)
135 macros_to_expand.add(name)
136 except ParseError, e:
137 raise ApplicationException("The macros file " + macros_file_name + " could not be parsed. Cause: " +
138 str(e))
139 except IOError, e:
140 raise ApplicationException("The macros file " + macros_file_name + " could not be opened. Cause: " +
141 str(e))
142
143 # we depend on "stdio", "requirements" and "advanced_options" to exist on all the given macros files
144 missing_needed_macros = []
145 for required_macro in REQUIRED_MACROS:
146 if required_macro not in macros_to_expand:
147 missing_needed_macros.append(required_macro)
148
149 if missing_needed_macros:
150 raise ApplicationException(
151 "The following required macro(s) were not found in any of the given macros files: %s, "
152 "see galaxy/macros.xml for an example of a valid macros file."
153 % ", ".join(missing_needed_macros))
154
155 # we do not need to "expand" the advanced_options macro
156 macros_to_expand.remove(ADVANCED_OPTIONS_MACRO_NAME)
157 return macros_to_expand
158
159
160 def parse_file_formats(formats_file):
161 supported_formats = {}
162 if formats_file is not None:
163 line_number = 0
164 with open(formats_file) as f:
165 for line in f:
166 line_number += 1
167 if line is None or not line.strip() or line.strip().startswith("#"):
168 # ignore (it'd be weird to have something like:
169 # if line is not None and not (not line.strip()) ...
170 pass
171 else:
172 # not an empty line, no comment
173 # strip the line and split by whitespace
174 parsed_formats = line.strip().split()
175 # valid lines contain either one or four columns
176 if not (len(parsed_formats) == 1 or len(parsed_formats) == 3 or len(parsed_formats) == 4):
177 logger.warning(
178 "Invalid line at line number %d of the given formats file. Line will be ignored:\n%s" %
179 (line_number, line), 0)
180 # ignore the line
181 continue
182 elif len(parsed_formats) == 1:
183 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[0])
184 else:
185 mimetype = None
186 # check if mimetype was provided
187 if len(parsed_formats) == 4:
188 mimetype = parsed_formats[3]
189 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[1],
190 parsed_formats[2], mimetype)
191 return supported_formats
192
193
194 def validate_and_prepare_args(args):
195 # check that only one of skip_tools_file and required_tools_file has been provided
196 if args.skip_tools_file is not None and args.required_tools_file is not None:
197 raise ApplicationException(
198 "You have provided both a file with tools to ignore and a file with required tools.\n"
199 "Only one of -s/--skip-tools, -r/--required-tools can be provided.")
200
201 # flatten macros_files to make sure that we have a list containing file names and not a list of lists
202 utils.flatten_list_of_lists(args, "macros_files")
203
204 # check that the arguments point to a valid, existing path
205 input_variables_to_check = ["skip_tools_file", "required_tools_file", "macros_files", "formats_file"]
206 for variable_name in input_variables_to_check:
207 utils.validate_argument_is_valid_path(args, variable_name)
208
209 # check that the provided output files, if provided, contain a valid file path (i.e., not a folder)
210 output_variables_to_check = ["data_types_destination", "tool_conf_destination"]
211 for variable_name in output_variables_to_check:
212 file_name = getattr(args, variable_name)
213 if file_name is not None and os.path.isdir(file_name):
214 raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
215
216 if not args.macros_files:
217 # list is empty, provide the default value
218 logger.warning("Using default macros from galaxy/macros.xml", 0)
219 args.macros_files = ["galaxy/macros.xml"]
220
221
222 def get_preferred_file_extension():
223 return "xml"
224
225
226 def _convert_internal(parsed_ctds, **kwargs):
227 # parse all input files into models using CTDopts (via utils)
228 # the output is a tuple containing the model, output destination, origin file
229 for parsed_ctd in parsed_ctds:
230 model = parsed_ctd.ctd_model
231 origin_file = parsed_ctd.input_file
232 output_file = parsed_ctd.suggested_output_file
233
234 if kwargs["skip_tools"] is not None and model.name in kwargs["skip_tools"]:
235 logger.info("Skipping tool %s" % model.name, 0)
236 continue
237 elif kwargs["required_tools"] is not None and model.name not in kwargs["required_tools"]:
238 logger.info("Tool %s is not required, skipping it" % model.name, 0)
239 continue
240 else:
241 logger.info("Converting %s (source %s)" % (model.name, utils.get_filename(origin_file)), 0)
242 tool = create_tool(model)
243 write_header(tool, model)
244 create_description(tool, model)
245 expand_macros(tool, model, **kwargs)
246 create_command(tool, model, **kwargs)
247 create_inputs(tool, model, **kwargs)
248 create_outputs(tool, model, **kwargs)
249 create_help(tool, model)
250
251 # wrap our tool element into a tree to be able to serialize it
252 tree = ElementTree(tool)
253 logger.info("Writing to %s" % utils.get_filename(output_file), 1)
254 tree.write(open(output_file, 'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
255
256
257 def write_header(tool, model):
258 tool.addprevious(etree.Comment(
259 "This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). "
260 "This file was automatically generated using CTDConverter."))
261 tool.addprevious(etree.Comment('Proposed Tool Section: [%s]' % model.opt_attribs.get("category", "")))
262
263
264 def generate_tool_conf(parsed_ctds, tool_conf_destination, galaxy_tool_path, default_category):
265 # for each category, we keep a list of models corresponding to it
266 categories_to_tools = dict()
267 for parsed_ctd in parsed_ctds:
268 category = strip(parsed_ctd.ctd_model.opt_attribs.get("category", ""))
269 if not category.strip():
270 category = default_category
271 if category not in categories_to_tools:
272 categories_to_tools[category] = []
273 categories_to_tools[category].append(utils.get_filename(parsed_ctd.suggested_output_file))
274
275 # at this point, we should have a map for all categories->tools
276 toolbox_node = Element("toolbox")
277
278 if galaxy_tool_path is not None and not galaxy_tool_path.strip().endswith("/"):
279 galaxy_tool_path = galaxy_tool_path.strip() + "/"
280 if galaxy_tool_path is None:
281 galaxy_tool_path = ""
282
283 for category, file_names in categories_to_tools.iteritems():
284 section_node = add_child_node(toolbox_node, "section")
285 section_node.attrib["id"] = "section-id-" + "".join(category.split())
286 section_node.attrib["name"] = category
287
288 for filename in file_names:
289 tool_node = add_child_node(section_node, "tool")
290 tool_node.attrib["file"] = galaxy_tool_path + filename
291
292 toolconf_tree = ElementTree(toolbox_node)
293 toolconf_tree.write(open(tool_conf_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
294 logger.info("Generated Galaxy tool_conf.xml in %s" % tool_conf_destination, 0)
295
296
297 def generate_data_type_conf(supported_file_formats, data_types_destination):
298 data_types_node = Element("datatypes")
299 registration_node = add_child_node(data_types_node, "registration")
300 registration_node.attrib["converters_path"] = "lib/galaxy/datatypes/converters"
301 registration_node.attrib["display_path"] = "display_applications"
302
303 for format_name in supported_file_formats:
304 data_type = supported_file_formats[format_name]
305 # add only if it's a data type that does not exist in Galaxy
306 if data_type.galaxy_type is not None:
307 data_type_node = add_child_node(registration_node, "datatype")
308 # we know galaxy_extension is not None
309 data_type_node.attrib["extension"] = data_type.galaxy_extension
310 data_type_node.attrib["type"] = data_type.galaxy_type
311 if data_type.mimetype is not None:
312 data_type_node.attrib["mimetype"] = data_type.mimetype
313
314 data_types_tree = ElementTree(data_types_node)
315 data_types_tree.write(open(data_types_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
316 logger.info("Generated Galaxy datatypes_conf.xml in %s" % data_types_destination, 0)
317
318
319 def create_tool(model):
320 return Element("tool", OrderedDict([("id", model.name), ("name", model.name), ("version", model.version)]))
321
322
323 def create_description(tool, model):
324 if "description" in model.opt_attribs.keys() and model.opt_attribs["description"] is not None:
325 description = SubElement(tool,"description")
326 description.text = model.opt_attribs["description"]
327
328
329 def create_command(tool, model, **kwargs):
330 final_command = utils.extract_tool_executable_path(model, kwargs["default_executable_path"]) + '\n'
331 final_command += kwargs["add_to_command_line"] + '\n'
332 advanced_command_start = "#if $adv_opts.adv_opts_selector=='advanced':\n"
333 advanced_command_end = "#end if"
334 advanced_command = ""
335 parameter_hardcoder = kwargs["parameter_hardcoder"]
336
337 found_output_parameter = False
338 for param in utils.extract_and_flatten_parameters(model):
339 if param.type is _OutFile:
340 found_output_parameter = True
341 command = ""
342 param_name = utils.extract_param_name(param)
343 command_line_prefix = utils.extract_command_line_prefix(param, model)
344
345 if param.name in kwargs["blacklisted_parameters"]:
346 continue
347
348 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param_name, model.name)
349 if hardcoded_value:
350 command += "%s %s\n" % (command_line_prefix, hardcoded_value)
351 else:
352 # parameter is neither blacklisted nor hardcoded...
353 galaxy_parameter_name = get_galaxy_parameter_name(param)
354 repeat_galaxy_parameter_name = get_repeat_galaxy_parameter_name(param)
355
356 # logic for ITEMLISTs
357 if param.is_list:
358 if param.type is _InFile:
359 command += command_line_prefix + "\n"
360 command += " #for token in $" + galaxy_parameter_name + ":\n"
361 command += " $token\n"
362 command += " #end for\n"
363 else:
364 command += "\n#if $" + repeat_galaxy_parameter_name + ":\n"
365 command += command_line_prefix + "\n"
366 command += " #for token in $" + repeat_galaxy_parameter_name + ":\n"
367 command += " #if \" \" in str(token):\n"
368 command += " \"$token." + galaxy_parameter_name + "\"\n"
369 command += " #else\n"
370 command += " $token." + galaxy_parameter_name + "\n"
371 command += " #end if\n"
372 command += " #end for\n"
373 command += "#end if\n"
374 # logic for other ITEMs
375 else:
376 if param.advanced and param.type is not _OutFile:
377 actual_parameter = "$adv_opts.%s" % galaxy_parameter_name
378 else:
379 actual_parameter = "$%s" % galaxy_parameter_name
380 # TODO only useful for text fields, integers or floats
381 # not useful for choices, input fields ...
382
383 if not is_boolean_parameter(param) and type(param.restrictions) is _Choices :
384 command += "#if " + actual_parameter + ":\n"
385 command += " %s\n" % command_line_prefix
386 command += " #if \" \" in str(" + actual_parameter + "):\n"
387 command += " \"" + actual_parameter + "\"\n"
388 command += " #else\n"
389 command += " " + actual_parameter + "\n"
390 command += " #end if\n"
391 command += "#end if\n"
392 elif is_boolean_parameter(param):
393 command += "#if " + actual_parameter + ":\n"
394 command += " %s\n" % command_line_prefix
395 command += "#end if\n"
396 elif TYPE_TO_GALAXY_TYPE[param.type] is 'text':
397 command += "#if " + actual_parameter + ":\n"
398 command += " %s " % command_line_prefix
399 command += " \"" + actual_parameter + "\"\n"
400 command += "#end if\n"
401 else:
402 command += "#if " + actual_parameter + ":\n"
403 command += " %s " % command_line_prefix
404 command += actual_parameter + "\n"
405 command += "#end if\n"
406
407 if param.advanced and param.type is not _OutFile:
408 advanced_command += " %s" % command
409 else:
410 final_command += command
411
412 if advanced_command:
413 final_command += "%s%s%s\n" % (advanced_command_start, advanced_command, advanced_command_end)
414
415 if not found_output_parameter:
416 final_command += "> $param_stdout\n"
417
418 command_node = add_child_node(tool, "command")
419 command_node.text = final_command
420
421
422 # creates the xml elements needed to import the needed macros files
423 # and to "expand" the macros
424 def expand_macros(tool, model, **kwargs):
425 macros_node = add_child_node(tool, "macros")
426 token_node = add_child_node(macros_node, "token")
427 token_node.attrib["name"] = "@EXECUTABLE@"
428 token_node.text = utils.extract_tool_executable_path(model, kwargs["default_executable_path"])
429
430 # add <import> nodes
431 for macro_file_name in kwargs["macros_file_names"]:
432 macro_file = open(macro_file_name)
433 import_node = add_child_node(macros_node, "import")
434 # do not add the path of the file, rather, just its basename
435 import_node.text = os.path.basename(macro_file.name)
436
437 # add <expand> nodes
438 for expand_macro in kwargs["macros_to_expand"]:
439 expand_node = add_child_node(tool, "expand")
440 expand_node.attrib["macro"] = expand_macro
441
442
443 def get_galaxy_parameter_name(param):
444 return "param_%s" % utils.extract_param_name(param).replace(":", "_").replace("-", "_")
445
446
447 def get_input_with_same_restrictions(out_param, model, supported_file_formats):
448 for param in utils.extract_and_flatten_parameters(model):
449 if param.type is _InFile:
450 if param.restrictions is not None:
451 in_param_formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
452 out_param_formats = get_supported_file_types(out_param.restrictions.formats, supported_file_formats)
453 if in_param_formats == out_param_formats:
454 return param
455
456
457 def create_inputs(tool, model, **kwargs):
458 inputs_node = SubElement(tool, "inputs")
459
460 # some suites (such as OpenMS) need some advanced options when handling inputs
461 expand_advanced_node = add_child_node(tool, "expand", OrderedDict([("macro", ADVANCED_OPTIONS_MACRO_NAME)]))
462 parameter_hardcoder = kwargs["parameter_hardcoder"]
463
464 # treat all non output-file parameters as inputs
465 for param in utils.extract_and_flatten_parameters(model):
466 # no need to show hardcoded parameters
467 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
468 if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
469 # let's not use an extra level of indentation and use NOP
470 continue
471 if param.type is not _OutFile:
472 if param.advanced:
473 if expand_advanced_node is not None:
474 parent_node = expand_advanced_node
475 else:
476 # something went wrong... we are handling an advanced parameter and the
477 # advanced input macro was not set... inform the user about it
478 logger.info("The parameter %s has been set as advanced, but advanced_input_macro has "
479 "not been set." % param.name, 1)
480 # there is not much we can do, other than use the inputs_node as a parent node!
481 parent_node = inputs_node
482 else:
483 parent_node = inputs_node
484
485 # for lists we need a repeat tag
486 if param.is_list and param.type is not _InFile:
487 rep_node = add_child_node(parent_node, "repeat")
488 create_repeat_attribute_list(rep_node, param)
489 parent_node = rep_node
490
491 param_node = add_child_node(parent_node, "param")
492 create_param_attribute_list(param_node, param, kwargs["supported_file_formats"])
493
494 # advanced parameter selection should be at the end
495 # and only available if an advanced parameter exists
496 if expand_advanced_node is not None and len(expand_advanced_node) > 0:
497 inputs_node.append(expand_advanced_node)
498
499
500 def get_repeat_galaxy_parameter_name(param):
501 return "rep_" + get_galaxy_parameter_name(param)
502
503
504 def create_repeat_attribute_list(rep_node, param):
505 rep_node.attrib["name"] = get_repeat_galaxy_parameter_name(param)
506 if param.required:
507 rep_node.attrib["min"] = "1"
508 else:
509 rep_node.attrib["min"] = "0"
510 # for the ITEMLISTs which have LISTITEM children we only
511 # need one parameter as it is given as a string
512 if param.default is not None:
513 rep_node.attrib["max"] = "1"
514 rep_node.attrib["title"] = get_galaxy_parameter_name(param)
515
516
517 def create_param_attribute_list(param_node, param, supported_file_formats):
518 param_node.attrib["name"] = get_galaxy_parameter_name(param)
519
520 param_type = TYPE_TO_GALAXY_TYPE[param.type]
521 if param_type is None:
522 raise ModelError("Unrecognized parameter type %(type)s for parameter %(name)s"
523 % {"type": param.type, "name": param.name})
524
525 if param.is_list:
526 param_type = "text"
527
528 if is_selection_parameter(param):
529 param_type = "select"
530 if len(param.restrictions.choices) < 5:
531 param_node.attrib["display"] = "radio"
532
533 if is_boolean_parameter(param):
534 param_type = "boolean"
535
536 if param.type is _InFile:
537 # assume it's just text unless restrictions are provided
538 param_format = "txt"
539 if param.restrictions is not None:
540 # join all formats of the file, take mapping from supported_file if available for an entry
541 if type(param.restrictions) is _FileFormat:
542 param_format = ",".join([get_supported_file_type(i, supported_file_formats) if
543 get_supported_file_type(i, supported_file_formats)
544 else i for i in param.restrictions.formats])
545 else:
546 raise InvalidModelException("Expected 'file type' restrictions for input file [%(name)s], "
547 "but instead got [%(type)s]"
548 % {"name": param.name, "type": type(param.restrictions)})
549
550 param_node.attrib["type"] = "data"
551 param_node.attrib["format"] = param_format
552 # in the case of multiple input set multiple flag
553 if param.is_list:
554 param_node.attrib["multiple"] = "true"
555
556 else:
557 param_node.attrib["type"] = param_type
558
559 # check for parameters with restricted values (which will correspond to a "select" in galaxy)
560 if param.restrictions is not None:
561 # it could be either _Choices or _NumericRange, with special case for boolean types
562 if param_type == "boolean":
563 create_boolean_parameter(param_node, param)
564 elif type(param.restrictions) is _Choices:
565 # create as many <option> elements as restriction values
566 for choice in param.restrictions.choices:
567 option_node = add_child_node(param_node, "option", OrderedDict([("value", str(choice))]))
568 option_node.text = str(choice)
569
570 # preselect the default value
571 if param.default == choice:
572 option_node.attrib["selected"] = "true"
573
574 elif type(param.restrictions) is _NumericRange:
575 if param.type is not int and param.type is not float:
576 raise InvalidModelException("Expected either 'int' or 'float' in the numeric range restriction for "
577 "parameter [%(name)s], but instead got [%(type)s]" %
578 {"name": param.name, "type": type(param.restrictions)})
579 # extract the min and max values and add them as attributes
580 # validate the provided min and max values
581 if param.restrictions.n_min is not None:
582 param_node.attrib["min"] = str(param.restrictions.n_min)
583 if param.restrictions.n_max is not None:
584 param_node.attrib["max"] = str(param.restrictions.n_max)
585 elif type(param.restrictions) is _FileFormat:
586 param_node.attrib["format"] = ','.join([get_supported_file_type(i, supported_file_formats) if
587 get_supported_file_type(i, supported_file_formats)
588 else i for i in param.restrictions.formats])
589 else:
590 raise InvalidModelException("Unrecognized restriction type [%(type)s] for parameter [%(name)s]"
591 % {"type": type(param.restrictions), "name": param.name})
592
593 if param_type == "select" and param.default in param.restrictions.choices:
594 param_node.attrib["optional"] = "False"
595 else:
596 param_node.attrib["optional"] = str(not param.required)
597
598 if param_type == "text":
599 # add size attribute... this is the length of a textbox field in Galaxy (it could also be 15x2, for instance)
600 param_node.attrib["size"] = "30"
601 # add sanitizer nodes, this is needed for special character like "["
602 # which are used for example by FeatureFinderMultiplex
603 sanitizer_node = SubElement(param_node, "sanitizer")
604
605 valid_node = SubElement(sanitizer_node, "valid", OrderedDict([("initial", "string.printable")]))
606 add_child_node(valid_node, "remove", OrderedDict([("value", '\'')]))
607 add_child_node(valid_node, "remove", OrderedDict([("value", '"')]))
608
609 # check for default value
610 if param.default is not None and param.default is not _Null:
611 if type(param.default) is list:
612 # we ASSUME that a list of parameters looks like:
613 # $ tool -ignore He Ar Xe
614 # meaning, that, for example, Helium, Argon and Xenon will be ignored
615 param_node.attrib["value"] = ' '.join(map(str, param.default))
616
617 elif param_type != "boolean":
618 param_node.attrib["value"] = str(param.default)
619
620 else:
621 # simple boolean with a default
622 if param.default is True:
623 param_node.attrib["checked"] = "true"
624 else:
625 if param.type is int or param.type is float:
626 # galaxy requires "value" to be included for int/float
627 # since no default was included, we need to figure out one in a clever way... but let the user know
628 # that we are "thinking" for him/her
629 logger.warning("Generating default value for parameter [%s]. "
630 "Galaxy requires the attribute 'value' to be set for integer/floats. "
631 "Edit the CTD file and provide a suitable default value." % param.name, 1)
632 # check if there's a min/max and try to use them
633 default_value = None
634 if param.restrictions is not None:
635 if type(param.restrictions) is _NumericRange:
636 default_value = param.restrictions.n_min
637 if default_value is None:
638 default_value = param.restrictions.n_max
639 if default_value is None:
640 # no min/max provided... just use 0 and see what happens
641 default_value = 0
642 else:
643 # should never be here, since we have validated this anyway...
644 # this code is here just for documentation purposes
645 # however, better safe than sorry!
646 # (it could be that the code changes and then we have an ugly scenario)
647 raise InvalidModelException("Expected either a numeric range for parameter [%(name)s], "
648 "but instead got [%(type)s]"
649 % {"name": param.name, "type": type(param.restrictions)})
650 else:
651 # no restrictions and no default value provided...
652 # make up something
653 default_value = 0
654 param_node.attrib["value"] = str(default_value)
655
656 label = "%s parameter" % param.name
657 help_text = ""
658
659 if param.description is not None:
660 label, help_text = generate_label_and_help(param.description)
661
662 param_node.attrib["label"] = label
663 param_node.attrib["help"] = "(-%s)" % param.name + " " + help_text
664
665
666 def generate_label_and_help(desc):
667 help_text = ""
668 # This tag is found in some descriptions
669 if not isinstance(desc, basestring):
670 desc = str(desc)
671 desc = desc.encode("utf8").replace("#br#", " <br>")
672 # Get rid of dots in the end
673 if desc.endswith("."):
674 desc = desc.rstrip(".")
675 # Check if first word is a normal word and make it uppercase
676 if str(desc).find(" ") > -1:
677 first_word, rest = str(desc).split(" ", 1)
678 if str(first_word).islower():
679 # check if label has a quotient of the form a/b
680 if first_word.find("/") != 1 :
681 first_word.capitalize()
682 desc = first_word + " " + rest
683 label = desc.decode("utf8")
684
685 # Try to split the label if it is too long
686 if len(desc) > 50:
687 # find an example and put everything before in the label and the e.g. in the help
688 if desc.find("e.g.") > 1 :
689 label, help_text = desc.split("e.g.",1)
690 help_text = "e.g." + help_text
691 else:
692 # find the end of the first sentence
693 # look for ". " because some labels contain .file or something similar
694 delimiter = ""
695 if desc.find(". ") > 1 and desc.find("? ") > 1:
696 if desc.find(". ") < desc.find("? "):
697 delimiter = ". "
698 else:
699 delimiter = "? "
700 elif desc.find(". ") > 1:
701 delimiter = ". "
702 elif desc.find("? ") > 1:
703 delimiter = "? "
704 if delimiter != "":
705 label, help_text = desc.split(delimiter, 1)
706
707 # add the question mark back
708 if delimiter == "? ":
709 label += "? "
710
711 # remove all linebreaks
712 label = label.rstrip().rstrip('<br>').rstrip()
713 return label, help_text
714
715
716 # determines if the given choices are boolean (basically, if the possible values are yes/no, true/false)
717 def is_boolean_parameter(param):
718 # detect boolean selects of OpenMS
719 if is_selection_parameter(param):
720 if len(param.restrictions.choices) == 2:
721 # check that default value is false to make sure it is an actual flag
722 if "false" in param.restrictions.choices and \
723 "true" in param.restrictions.choices and \
724 param.default == "false":
725 return True
726 else:
727 return param.type is bool
728
729
730 # determines if there are choices for the parameter
731 def is_selection_parameter(param):
732 return type(param.restrictions) is _Choices
733
734
735 def get_lowercase_list(some_list):
736 lowercase_list = map(str, some_list)
737 lowercase_list = map(string.lower, lowercase_list)
738 lowercase_list = map(strip, lowercase_list)
739 return lowercase_list
740
741
742 # creates a galaxy boolean parameter type
743 # this method assumes that param has restrictions, and that only two restictions are present
744 # (either yes/no or true/false)
745 def create_boolean_parameter(param_node, param):
746 # first, determine the 'truevalue' and the 'falsevalue'
747 """TODO: true and false values can be way more than 'true' and 'false'
748 but for that we need CTD support
749 """
750 # by default, 'true' and 'false' are handled as flags, like the verbose flag (i.e., -v)
751 true_value = "-%s" % utils.extract_param_name(param)
752 false_value = ""
753 choices = get_lowercase_list(param.restrictions.choices)
754 if "yes" in choices:
755 true_value = "yes"
756 false_value = "no"
757 param_node.attrib["truevalue"] = true_value
758 param_node.attrib["falsevalue"] = false_value
759
760 # set the checked attribute
761 if param.default is not None:
762 checked_value = "false"
763 default = strip(string.lower(param.default))
764 if default == "yes" or default == "true":
765 checked_value = "true"
766 param_node.attrib["checked"] = checked_value
767
768
769 def create_outputs(parent, model, **kwargs):
770 outputs_node = add_child_node(parent, "outputs")
771 parameter_hardcoder = kwargs["parameter_hardcoder"]
772
773 for param in utils.extract_and_flatten_parameters(model):
774
775 # no need to show hardcoded parameters
776 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
777 if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
778 # let's not use an extra level of indentation and use NOP
779 continue
780 if param.type is _OutFile:
781 create_output_node(outputs_node, param, model, kwargs["supported_file_formats"])
782
783 # If there are no outputs defined in the ctd the node will have no children
784 # and the stdout will be used as output
785 if len(outputs_node) == 0:
786 add_child_node(outputs_node, "data",
787 OrderedDict([("name", "param_stdout"), ("format", "txt"), ("label", "Output from stdout")]))
788
789
790 def create_output_node(parent, param, model, supported_file_formats):
791 data_node = add_child_node(parent, "data")
792 data_node.attrib["name"] = get_galaxy_parameter_name(param)
793
794 data_format = "data"
795 if param.restrictions is not None:
796 if type(param.restrictions) is _FileFormat:
797 # set the first data output node to the first file format
798
799 # check if there are formats that have not been registered yet...
800 output = list()
801 for format_name in param.restrictions.formats:
802 if not format_name in supported_file_formats.keys():
803 output.append(str(format_name))
804
805 # warn only if there's about to complain
806 if output:
807 logger.warning("Parameter " + param.name + " has the following unsupported format(s):"
808 + ','.join(output), 1)
809 data_format = ','.join(output)
810
811 formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
812 try:
813 data_format = formats.pop()
814 except KeyError:
815 # there is not much we can do, other than catching the exception
816 pass
817 # if there are more than one output file formats try to take the format from the input parameter
818 if formats:
819 corresponding_input = get_input_with_same_restrictions(param, model, supported_file_formats)
820 if corresponding_input is not None:
821 data_format = "input"
822 data_node.attrib["metadata_source"] = get_galaxy_parameter_name(corresponding_input)
823 else:
824 raise InvalidModelException("Unrecognized restriction type [%(type)s] "
825 "for output [%(name)s]" % {"type": type(param.restrictions),
826 "name": param.name})
827 data_node.attrib["format"] = data_format
828
829 # TODO: find a smarter label ?
830 return data_node
831
832
833 # Get the supported file format for one given format
834 def get_supported_file_type(format_name, supported_file_formats):
835 if format_name in supported_file_formats.keys():
836 return supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
837 else:
838 return None
839
840
841 def get_supported_file_types(formats, supported_file_formats):
842 return set([supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
843 for format_name in formats if format_name in supported_file_formats.keys()])
844
845
846 def create_change_format_node(parent, data_formats, input_ref):
847 # <change_format>
848 # <when input="secondary_structure" value="true" format="txt"/>
849 # </change_format>
850 change_format_node = add_child_node(parent, "change_format")
851 for data_format in data_formats:
852 add_child_node(change_format_node, "when",
853 OrderedDict([("input", input_ref), ("value", data_format), ("format", data_format)]))
854
855
856 # Shows basic information about the file, such as data ranges and file type.
857 def create_help(tool, model):
858 help_node = add_child_node(tool, "help")
859 # TODO: do we need CDATA Section here?
860 help_node.text = utils.extract_tool_help_text(model)
861
862
863 # adds and returns a child node using the given name to the given parent node
864 def add_child_node(parent_node, child_node_name, attributes=OrderedDict([])):
865 child_node = SubElement(parent_node, child_node_name, attributes)
866 return child_node
+0
-2
galaxy/dist/conda/bld.bat less more
0 "%PYTHON%" setup.py install
1 if errorlevel 1 exit 1
+0
-1
galaxy/dist/conda/build.sh less more
0 $PYTHON setup.py install
+0
-28
galaxy/dist/conda/meta.yaml less more
0 package:
1 name: ctd2galaxy
2 version: "1.0"
3
4 source:
5 git_rev: v1.0
6 git_url: https://github.com/WorkflowConversion/CTD2Galaxy.git
7
8 build:
9 noarch_python: True
10
11 requirements:
12 build:
13 - python
14 - setuptools
15
16 run:
17 - python
18 - lxml
19 - ctdopts 1.0
20
21 test:
22 imports:
23 - CTDopts.CTDopts
24
25 about:
26 home: https://github.com/WorkflowConversion/CTD2Galaxy
27 license_file: LICENSE
+0
-1389
galaxy/generator.py less more
0 #!/usr/bin/env python
1 # encoding: utf-8
2
3 """
4 @author: delagarza
5 """
6
7
8 import sys
9 import os
10 import traceback
11 import ntpath
12 import string
13
14 from argparse import ArgumentParser
15 from argparse import RawDescriptionHelpFormatter
16 from collections import OrderedDict
17 from string import strip
18 from lxml import etree
19 from lxml.etree import SubElement, Element, ElementTree, ParseError, parse
20
21 from CTDopts.CTDopts import CTDModel, _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, \
22 ModelError, _Null
23
24 __all__ = []
25 __version__ = 1.0
26 __date__ = '2014-09-17'
27 __updated__ = '2016-05-09'
28
29 MESSAGE_INDENTATION_INCREMENT = 2
30
31 TYPE_TO_GALAXY_TYPE = {int: 'integer', float: 'float', str: 'text', bool: 'boolean', _InFile: 'data',
32 _OutFile: 'data', _Choices: 'select'}
33
34 STDIO_MACRO_NAME = "stdio"
35 REQUIREMENTS_MACRO_NAME = "requirements"
36 ADVANCED_OPTIONS_MACRO_NAME = "advanced_options"
37
38 REQUIRED_MACROS = [STDIO_MACRO_NAME, REQUIREMENTS_MACRO_NAME, ADVANCED_OPTIONS_MACRO_NAME]
39
40
41 class CLIError(Exception):
42 # Generic exception to raise and log different fatal errors.
43 def __init__(self, msg):
44 super(CLIError).__init__(type(self))
45 self.msg = "E: %s" % msg
46
47 def __str__(self):
48 return self.msg
49
50 def __unicode__(self):
51 return self.msg
52
53
54 class InvalidModelException(ModelError):
55 def __init__(self, message):
56 super(InvalidModelException, self).__init__()
57 self.message = message
58
59 def __str__(self):
60 return self.message
61
62 def __repr__(self):
63 return self.message
64
65
66 class ApplicationException(Exception):
67 def __init__(self, msg):
68 super(ApplicationException).__init__(type(self))
69 self.msg = msg
70
71 def __str__(self):
72 return self.msg
73
74 def __unicode__(self):
75 return self.msg
76
77
78 class ExitCode:
79 def __init__(self, code_range="", level="", description=None):
80 self.range = code_range
81 self.level = level
82 self.description = description
83
84
85 class DataType:
86 def __init__(self, extension, galaxy_extension=None, galaxy_type=None, mimetype=None):
87 self.extension = extension
88 self.galaxy_extension = galaxy_extension
89 self.galaxy_type = galaxy_type
90 self.mimetype = mimetype
91
92
93 class ParameterHardcoder:
94 def __init__(self):
95 # map whose keys are the composite names of tools and parameters in the following pattern:
96 # [ToolName][separator][ParameterName] -> HardcodedValue
97 # if the parameter applies to all tools, then the following pattern is used:
98 # [ParameterName] -> HardcodedValue
99
100 # examples (assuming separator is '#'):
101 # threads -> 24
102 # XtandemAdapter#adapter -> xtandem.exe
103 # adapter -> adapter.exe
104 self.separator = "!"
105 self.parameter_map = {}
106
107 # the most specific value will be returned in case of overlap
108 def get_hardcoded_value(self, parameter_name, tool_name):
109 # look for the value that would apply for all tools
110 generic_value = self.parameter_map.get(parameter_name, None)
111 specific_value = self.parameter_map.get(self.build_key(parameter_name, tool_name), None)
112 if specific_value is not None:
113 return specific_value
114
115 return generic_value
116
117 def register_parameter(self, parameter_name, parameter_value, tool_name=None):
118 self.parameter_map[self.build_key(parameter_name, tool_name)] = parameter_value
119
120 def build_key(self, parameter_name, tool_name):
121 if tool_name is None:
122 return parameter_name
123 return "%s%s%s" % (parameter_name, self.separator, tool_name)
124
125
126 def main(argv=None): # IGNORE:C0111
127 # Command line options.
128 if argv is None:
129 argv = sys.argv
130 else:
131 sys.argv.extend(argv)
132
133 program_version = "v%s" % __version__
134 program_build_date = str(__updated__)
135 program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
136 program_short_description = "CTD2Galaxy - A project from the GenericWorkflowNodes family " \
137 "(https://github.com/orgs/genericworkflownodes)"
138 program_usage = '''
139 USAGE:
140
141 I - Parsing a single CTD file and generate a Galaxy wrapper:
142
143 $ python generator.py -i input.ctd -o output.xml
144
145
146 II - Parsing all found CTD files (files with .ctd and .xml extension) in a given folder and
147 output converted Galaxy wrappers in a given folder:
148
149 $ python generator.py -i /home/user/*.ctd -o /home/user/galaxywrappers
150
151
152 III - Providing file formats, mimetypes
153
154 Galaxy supports the concept of file format in order to connect compatible ports, that is, input ports of a certain
155 data format will be able to receive data from a port from the same format. This converter allows you to provide
156 a personalized file in which you can relate the CTD data formats with supported Galaxy data formats. The layout of
157 this file consists of lines, each of either one or four columns separated by any amount of whitespace. The content
158 of each column is as follows:
159
160 * 1st column: file extension
161 * 2nd column: data type, as listed in Galaxy
162 * 3rd column: full-named Galaxy data type, as it will appear on datatypes_conf.xml
163 * 4th column: mimetype (optional)
164
165 The following is an example of a valid "file formats" file:
166
167 ########################################## FILE FORMATS example ##########################################
168 # Every line starting with a # will be handled as a comment and will not be parsed.
169 # The first column is the file format as given in the CTD and second column is the Galaxy data format.
170 # The second, third, fourth and fifth column can be left empty if the data type has already been registered
171 # in Galaxy, otherwise, all but the mimetype must be provided.
172
173 # CTD type # Galaxy type # Long Galaxy data type # Mimetype
174 csv tabular galaxy.datatypes.data:Text
175 fasta
176 ini txt galaxy.datatypes.data:Text
177 txt
178 idxml txt galaxy.datatypes.xml:GenericXml application/xml
179 options txt galaxy.datatypes.data:Text
180 grid grid galaxy.datatypes.data:Grid
181
182 ##########################################################################################################
183
184 Note that each line consists precisely of either one, three or four columns. In the case of data types already
185 registered in Galaxy (such as fasta and txt in the above example), only the first column is needed. In the case of
186 data types that haven't been yet registered in Galaxy, the first three columns are needed (mimetype is optional).
187
188 For information about Galaxy data types and subclasses, see the following page:
189 https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
190
191
192 IV - Hardcoding parameters
193
194 It is possible to hardcode parameters. This makes sense if you want to set a tool in Galaxy in 'quiet' mode or if
195 your tools support multi-threading and accept the number of threads via a parameter, without giving the end user the
196 chance to change the values for these parameters.
197
198 In order to generate hardcoded parameters, you need to provide a simple file. Each line of this file contains two
199 or three columns separated by whitespace. Any line starting with a '#' will be ignored. The first column contains
200 the name of the parameter, the second column contains the value that will always be set for this parameter. The
201 first two columns are mandatory.
202
203 If the parameter is to be hardcoded only for a set of tools, then a third column can be added. This column includes
204 a comma-separated list of tool names for which the parameter will be hardcoded. If a third column is not included,
205 then all processed tools containing the given parameter will get a hardcoded value for it.
206
207 The following is an example of a valid file:
208
209 ##################################### HARDCODED PARAMETERS example #####################################
210 # Every line starting with a # will be handled as a comment and will not be parsed.
211 # The first column is the name of the parameter and the second column is the value that will be used.
212
213 # Parameter name # Value # Tool(s)
214 threads \${GALAXY_SLOTS:-24}
215 mode quiet
216 xtandem_executable xtandem XTandemAdapter
217 verbosity high Foo, Bar
218
219 #########################################################################################################
220
221 Using the above file will produce a <command> similar to:
222
223 [tool_name] ... -threads \${GALAXY_SLOTS:-24} -mode quiet ...
224
225 For all tools. For XTandemAdapter, the <command> will be similar to:
226
227 XtandemAdapter ... -threads \${GALAXY_SLOTS:-24} -mode quiet -xtandem_executable xtandem ...
228
229 And for tools Foo and Bar, the <command> will be similar to:
230
231 Foo ... ... -threads \${GALAXY_SLOTS:-24} -mode quiet -verbosity high ...
232
233
234 V - Control which tools will be converted
235
236 Sometimes only a subset of CTDs needs to be converted. It is possible to either explicitly specify which tools will
237 be converted or which tools will not be converted.
238
239 The value of the -s/--skip-tools parameter is a file in which each line will be interpreted as the name of a tool
240 that will not be converted. Conversely, the value of the -r/--required-tools is a file in which each line will be
241 interpreted as a tool that is required. Only one of these parameters can be specified at a given time.
242
243 The format of both files is exactly the same. As stated before, each line will be interpreted as the name of a tool;
244 any line starting with a '#' will be ignored.
245
246 '''
247 program_license = '''%(short_description)s
248 Copyright 2015, Luis de la Garza
249
250 Licensed under the Apache License, Version 2.0 (the "License");
251 you may not use this file except in compliance with the License.
252 You may obtain a copy of the License at
253
254 http://www.apache.org/licenses/LICENSE-2.0
255
256 Unless required by applicable law or agreed to in writing, software
257 distributed under the License is distributed on an "AS IS" BASIS,
258 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
259 See the License for the specific language governing permissions and
260 limitations under the License.
261
262 %(usage)s
263 ''' % {'short_description': program_short_description, 'usage': program_usage}
264
265 try:
266 # Setup argument parser
267 parser = ArgumentParser(prog="CTD2Galaxy", description=program_license,
268 formatter_class=RawDescriptionHelpFormatter, add_help=True)
269 parser.add_argument("-i", "--input", dest="input_files", default=[], required=True, nargs="+", action="append",
270 help="List of CTD files to convert.")
271 parser.add_argument("-o", "--output-destination", dest="output_destination", required=True,
272 help="If multiple input files are given, then a folder in which all generated "
273 "XMLs will be generated is expected;"
274 "if a single input file is given, then a destination file is expected.")
275 parser.add_argument("-f", "--formats-file", dest="formats_file",
276 help="File containing the supported file formats. Run with '-h' or '--help' to see a "
277 "brief example on the layout of this file.", default=None, required=False)
278 parser.add_argument("-a", "--add-to-command-line", dest="add_to_command_line",
279 help="Adds content to the command line", default="", required=False)
280 parser.add_argument("-d", "--datatypes-destination", dest="data_types_destination",
281 help="Specify the location of a datatypes_conf.xml to modify and add the registered "
282 "data types. If the provided destination does not exist, a new file will be created.",
283 default=None, required=False)
284 parser.add_argument("-x", "--default-executable-path", dest="default_executable_path",
285 help="Use this executable path when <executablePath> is not present in the CTD",
286 default=None, required=False)
287 parser.add_argument("-b", "--blacklist-parameters", dest="blacklisted_parameters", default=[], nargs="+", action="append",
288 help="List of parameters that will be ignored and won't appear on the galaxy stub",
289 required=False)
290 parser.add_argument("-c", "--default-category", dest="default_category", default="DEFAULT", required=False,
291 help="Default category to use for tools lacking a category when generating tool_conf.xml")
292 parser.add_argument("-t", "--tool-conf-destination", dest="tool_conf_destination", default=None, required=False,
293 help="Specify the location of an existing tool_conf.xml that will be modified to include "
294 "the converted tools. If the provided destination does not exist, a new file will"
295 "be created.")
296 parser.add_argument("-g", "--galaxy-tool-path", dest="galaxy_tool_path", default=None, required=False,
297 help="The path that will be prepended to the file names when generating tool_conf.xml")
298 parser.add_argument("-r", "--required-tools", dest="required_tools_file", default=None, required=False,
299 help="Each line of the file will be interpreted as a tool name that needs translation. "
300 "Run with '-h' or '--help' to see a brief example on the format of this file.")
301 parser.add_argument("-s", "--skip-tools", dest="skip_tools_file", default=None, required=False,
302 help="File containing a list of tools for which a Galaxy stub will not be generated. "
303 "Run with '-h' or '--help' to see a brief example on the format of this file.")
304 parser.add_argument("-m", "--macros", dest="macros_files", default=[], nargs="*",
305 action="append", required=None, help="Import the additional given file(s) as macros. "
306 "The macros stdio, requirements and advanced_options are required. Please see "
307 "macros.xml for an example of a valid macros file. Al defined macros will be imported.")
308 parser.add_argument("-p", "--hardcoded-parameters", dest="hardcoded_parameters", default=None, required=False,
309 help="File containing hardcoded values for the given parameters. Run with '-h' or '--help' "
310 "to see a brief example on the format of this file.")
311 parser.add_argument("-v", "--validation-schema", dest="xsd_location", default=None, required=False,
312 help="Location of the schema to use to validate CTDs.")
313
314 # TODO: add verbosity, maybe?
315 parser.add_argument("-V", "--version", action='version', version=program_version_message)
316
317 # Process arguments
318 args = parser.parse_args()
319
320 # validate and prepare the passed arguments
321 validate_and_prepare_args(args)
322
323 # extract the names of the macros and check that we have found the ones we need
324 macros_to_expand = parse_macros_files(args.macros_files)
325
326 # parse the given supported file-formats file
327 supported_file_formats = parse_file_formats(args.formats_file)
328
329 # parse the hardcoded parameters file¬
330 parameter_hardcoder = parse_hardcoded_parameters(args.hardcoded_parameters)
331
332 # parse the skip/required tools files
333 skip_tools = parse_tools_list_file(args.skip_tools_file)
334 required_tools = parse_tools_list_file(args.required_tools_file)
335
336 #if verbose > 0:
337 # print("Verbose mode on")
338 parsed_models = convert(args.input_files,
339 args.output_destination,
340 supported_file_formats=supported_file_formats,
341 default_executable_path=args.default_executable_path,
342 add_to_command_line=args.add_to_command_line,
343 blacklisted_parameters=args.blacklisted_parameters,
344 required_tools=required_tools,
345 skip_tools=skip_tools,
346 macros_file_names=args.macros_files,
347 macros_to_expand=macros_to_expand,
348 parameter_hardcoder=parameter_hardcoder,
349 xsd_location=args.xsd_location)
350
351 #TODO: add some sort of warning if a macro that doesn't exist is to be expanded
352
353 # it is not needed to copy the macros files, since the user has provided them
354
355 # generation of galaxy stubs is ready... now, let's see if we need to generate a tool_conf.xml
356 if args.tool_conf_destination is not None:
357 generate_tool_conf(parsed_models, args.tool_conf_destination,
358 args.galaxy_tool_path, args.default_category)
359
360 # now datatypes_conf.xml
361 if args.data_types_destination is not None:
362 generate_data_type_conf(supported_file_formats, args.data_types_destination)
363
364 return 0
365
366 except KeyboardInterrupt:
367 # handle keyboard interrupt
368 return 0
369 except ApplicationException, e:
370 error("CTD2Galaxy could not complete the requested operation.", 0)
371 error("Reason: " + e.msg, 0)
372 return 1
373 except ModelError, e:
374 error("There seems to be a problem with one of your input CTDs.", 0)
375 error("Reason: " + e.msg, 0)
376 return 1
377 except Exception, e:
378 traceback.print_exc()
379 return 2
380
381
382 def parse_tools_list_file(tools_list_file):
383 tools_list = None
384 if tools_list_file is not None:
385 tools_list = []
386 with open(tools_list_file) as f:
387 for line in f:
388 if line is None or not line.strip() or line.strip().startswith("#"):
389 continue
390 else:
391 tools_list.append(line.strip())
392
393 return tools_list
394
395
396 def parse_macros_files(macros_file_names):
397 macros_to_expand = set()
398
399 for macros_file_name in macros_file_names:
400 try:
401 macros_file = open(macros_file_name)
402 info("Loading macros from %s" % macros_file_name, 0)
403 root = parse(macros_file).getroot()
404 for xml_element in root.findall("xml"):
405 name = xml_element.attrib["name"]
406 if name in macros_to_expand:
407 warning("Macro %s has already been found. Duplicate found in file %s." %
408 (name, macros_file_name), 0)
409 else:
410 info("Macro %s found" % name, 1)
411 macros_to_expand.add(name)
412 except ParseError, e:
413 raise ApplicationException("The macros file " + macros_file_name + " could not be parsed. Cause: " +
414 str(e))
415 except IOError, e:
416 raise ApplicationException("The macros file " + macros_file_name + " could not be opened. Cause: " +
417 str(e))
418
419 # we depend on "stdio", "requirements" and "advanced_options" to exist on all the given macros files
420 missing_needed_macros = []
421 for required_macro in REQUIRED_MACROS:
422 if required_macro not in macros_to_expand:
423 missing_needed_macros.append(required_macro)
424
425 if missing_needed_macros:
426 raise ApplicationException(
427 "The following required macro(s) were not found in any of the given macros files: %s, "
428 "see sample_files/macros.xml for an example of a valid macros file."
429 % ", ".join(missing_needed_macros))
430
431 # we do not need to "expand" the advanced_options macro
432 macros_to_expand.remove(ADVANCED_OPTIONS_MACRO_NAME)
433 return macros_to_expand
434
435
436 def parse_hardcoded_parameters(hardcoded_parameters_file):
437 parameter_hardcoder = ParameterHardcoder()
438 if hardcoded_parameters_file is not None:
439 line_number = 0
440 with open(hardcoded_parameters_file) as f:
441 for line in f:
442 line_number += 1
443 if line is None or not line.strip() or line.strip().startswith("#"):
444 pass
445 else:
446 # the third column must not be obtained as a whole, and not split
447 parsed_hardcoded_parameter = line.strip().split(None, 2)
448 # valid lines contain two or three columns
449 if len(parsed_hardcoded_parameter) != 2 and len(parsed_hardcoded_parameter) != 3:
450 warning("Invalid line at line number %d of the given hardcoded parameters file. Line will be"
451 "ignored:\n%s" % (line_number, line), 0)
452 continue
453
454 parameter_name = parsed_hardcoded_parameter[0]
455 hardcoded_value = parsed_hardcoded_parameter[1]
456 tool_names = None
457 if len(parsed_hardcoded_parameter) == 3:
458 tool_names = parsed_hardcoded_parameter[2].split(',')
459 if tool_names:
460 for tool_name in tool_names:
461 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value, tool_name.strip())
462 else:
463 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value)
464
465 return parameter_hardcoder
466
467
468 def parse_file_formats(formats_file):
469 supported_formats = {}
470 if formats_file is not None:
471 line_number = 0
472 with open(formats_file) as f:
473 for line in f:
474 line_number += 1
475 if line is None or not line.strip() or line.strip().startswith("#"):
476 # ignore (it'd be weird to have something like:
477 # if line is not None and not (not line.strip()) ...
478 pass
479 else:
480 # not an empty line, no comment
481 # strip the line and split by whitespace
482 parsed_formats = line.strip().split()
483 # valid lines contain either one or four columns
484 if not (len(parsed_formats) == 1 or len(parsed_formats) == 3 or len(parsed_formats) == 4):
485 warning("Invalid line at line number %d of the given formats file. Line will be ignored:\n%s" %
486 (line_number, line), 0)
487 # ignore the line
488 continue
489 elif len(parsed_formats) == 1:
490 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[0])
491 else:
492 mimetype = None
493 # check if mimetype was provided
494 if len(parsed_formats) == 4:
495 mimetype = parsed_formats[3]
496 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[1],
497 parsed_formats[2], mimetype)
498 return supported_formats
499
500
501 def validate_and_prepare_args(args):
502 # check that only one of skip_tools_file and required_tools_file has been provided
503 if args.skip_tools_file is not None and args.required_tools_file is not None:
504 raise ApplicationException(
505 "You have provided both a file with tools to ignore and a file with required tools.\n"
506 "Only one of -s/--skip-tools, -r/--required-tools can be provided.")
507
508 # first, we convert all list of lists in args to flat lists
509 lists_to_flatten = ["input_files", "blacklisted_parameters", "macros_files"]
510 for list_to_flatten in lists_to_flatten:
511 setattr(args, list_to_flatten, [item for sub_list in getattr(args, list_to_flatten) for item in sub_list])
512
513 # if input is a single file, we expect output to be a file (and not a dir that already exists)
514 if len(args.input_files) == 1:
515 if os.path.isdir(args.output_destination):
516 raise ApplicationException("If a single input file is provided, output (%s) is expected to be a file "
517 "and not a folder.\n" % args.output_destination)
518
519 # if input is a list of files, we expect output to be a folder
520 if len(args.input_files) > 1:
521 if not os.path.isdir(args.output_destination):
522 raise ApplicationException("If several input files are provided, output (%s) is expected to be an "
523 "existing directory.\n" % args.output_destination)
524
525 # check that the provided input files, if provided, contain a valid file path
526 input_variables_to_check = ["skip_tools_file", "required_tools_file", "macros_files", "xsd_location",
527 "input_files", "formats_file", "hardcoded_parameters"]
528
529 for variable_name in input_variables_to_check:
530 paths_to_check = []
531 # check if we are handling a single file or a list of files
532 member_value = getattr(args, variable_name)
533 if member_value is not None:
534 if isinstance(member_value, list):
535 for file_name in member_value:
536 paths_to_check.append(strip(str(file_name)))
537 else:
538 paths_to_check.append(strip(str(member_value)))
539
540 for path_to_check in paths_to_check:
541 if not os.path.isfile(path_to_check) or not os.path.exists(path_to_check):
542 raise ApplicationException(
543 "The provided input file (%s) does not exist or is not a valid file path."
544 % path_to_check)
545
546 # check that the provided output files, if provided, contain a valid file path (i.e., not a folder)
547 output_variables_to_check = ["data_types_destination", "tool_conf_destination"]
548
549 for variable_name in output_variables_to_check:
550 file_name = getattr(args, variable_name)
551 if file_name is not None and os.path.isdir(file_name):
552 raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
553
554 if not args.macros_files:
555 # list is empty, provide the default value
556 warning("Using default macros from macros.xml", 0)
557 args.macros_files = ["macros.xml"]
558
559
560 def convert(input_files, output_destination, **kwargs):
561 # first, generate a model
562 is_converting_multiple_ctds = len(input_files) > 1
563 parsed_models = []
564 schema = None
565 if kwargs["xsd_location"] is not None:
566 try:
567 info("Loading validation schema from %s" % kwargs["xsd_location"], 0)
568 schema = etree.XMLSchema(etree.parse(kwargs["xsd_location"]))
569 except Exception, e:
570 error("Could not load validation schema %s. Reason: %s" % (kwargs["xsd_location"], str(e)), 0)
571 else:
572 info("Validation against a schema has not been enabled.", 0)
573 for input_file in input_files:
574 try:
575 if schema is not None:
576 validate_against_schema(input_file, schema)
577 model = CTDModel(from_file=input_file)
578 except Exception, e:
579 error(str(e), 1)
580 continue
581
582 if kwargs["skip_tools"] is not None and model.name in kwargs["skip_tools"]:
583 info("Skipping tool %s" % model.name, 0)
584 continue
585 elif kwargs["required_tools"] is not None and model.name not in kwargs["required_tools"]:
586 info("Tool %s is not required, skipping it" % model.name, 0)
587 continue
588 else:
589 info("Converting from %s " % input_file, 0)
590 tool = create_tool(model)
591 write_header(tool, model)
592 create_description(tool, model)
593 expand_macros(tool, model, **kwargs)
594 create_command(tool, model, **kwargs)
595 create_inputs(tool, model, **kwargs)
596 create_outputs(tool, model, **kwargs)
597 create_help(tool, model)
598
599 # finally, serialize the tool
600 output_file = output_destination
601 # if multiple inputs are being converted,
602 # then we need to generate a different output_file for each input
603 if is_converting_multiple_ctds:
604 output_file = os.path.join(output_file, get_filename_without_suffix(input_file) + ".xml")
605 # wrap our tool element into a tree to be able to serialize it
606 tree = ElementTree(tool)
607 tree.write(open(output_file, 'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
608 # let's use model to hold the name of the output file
609 parsed_models.append([model, get_filename(output_file)])
610
611 return parsed_models
612
613
614 # validates a ctd file against the schema
615 def validate_against_schema(ctd_file, schema):
616 try:
617 parser = etree.XMLParser(schema=schema)
618 etree.parse(ctd_file, parser=parser)
619 except etree.XMLSyntaxError, e:
620 raise ApplicationException("Input ctd file %s is not valid. Reason: %s" % (ctd_file, str(e)))
621
622
623 def write_header(tool, model):
624 tool.addprevious(etree.Comment(
625 "This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). "
626 "This file was automatically generated using CTD2Galaxy."))
627 tool.addprevious(etree.Comment('Proposed Tool Section: [%s]' % model.opt_attribs.get("category", "")))
628
629
630 def generate_tool_conf(parsed_models, tool_conf_destination, galaxy_tool_path, default_category):
631 # for each category, we keep a list of models corresponding to it
632 categories_to_tools = dict()
633 for model in parsed_models:
634 category = strip(model[0].opt_attribs.get("category", ""))
635 if not category.strip():
636 category = default_category
637 if category not in categories_to_tools:
638 categories_to_tools[category] = []
639 categories_to_tools[category].append(model[1])
640
641 # at this point, we should have a map for all categories->tools
642 toolbox_node = Element("toolbox")
643
644 if galaxy_tool_path is not None and not galaxy_tool_path.strip().endswith("/"):
645 galaxy_tool_path = galaxy_tool_path.strip() + "/"
646 if galaxy_tool_path is None:
647 galaxy_tool_path = ""
648
649 for category, file_names in categories_to_tools.iteritems():
650 section_node = add_child_node(toolbox_node, "section")
651 section_node.attrib["id"] = "section-id-" + "".join(category.split())
652 section_node.attrib["name"] = category
653
654 for filename in file_names:
655 tool_node = add_child_node(section_node, "tool")
656 tool_node.attrib["file"] = galaxy_tool_path + filename
657
658 toolconf_tree = ElementTree(toolbox_node)
659 toolconf_tree.write(open(tool_conf_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
660 info("Generated Galaxy tool_conf.xml in %s" % tool_conf_destination, 0)
661
662
663 def generate_data_type_conf(supported_file_formats, data_types_destination):
664 data_types_node = Element("datatypes")
665 registration_node = add_child_node(data_types_node, "registration")
666 registration_node.attrib["converters_path"] = "lib/galaxy/datatypes/converters"
667 registration_node.attrib["display_path"] = "display_applications"
668
669 for format_name in supported_file_formats:
670 data_type = supported_file_formats[format_name]
671 # add only if it's a data type that does not exist in Galaxy
672 if data_type.galaxy_type is not None:
673 data_type_node = add_child_node(registration_node, "datatype")
674 # we know galaxy_extension is not None
675 data_type_node.attrib["extension"] = data_type.galaxy_extension
676 data_type_node.attrib["type"] = data_type.galaxy_type
677 if data_type.mimetype is not None:
678 data_type_node.attrib["mimetype"] = data_type.mimetype
679
680 data_types_tree = ElementTree(data_types_node)
681 data_types_tree.write(open(data_types_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
682 info("Generated Galaxy datatypes_conf.xml in %s" % data_types_destination, 0)
683
684
685 # taken from
686 # http://stackoverflow.com/questions/8384737/python-extract-file-name-from-path-no-matter-what-the-os-path-format
687 def get_filename(path):
688 head, tail = ntpath.split(path)
689 return tail or ntpath.basename(head)
690
691
692 def get_filename_without_suffix(path):
693 root, ext = os.path.splitext(os.path.basename(path))
694 return root
695
696
697 def create_tool(model):
698 return Element("tool", OrderedDict([("id", model.name), ("name", model.name), ("version", model.version)]))
699
700
701 def create_description(tool, model):
702 if "description" in model.opt_attribs.keys() and model.opt_attribs["description"] is not None:
703 description = SubElement(tool,"description")
704 description.text = model.opt_attribs["description"]
705
706
707 def get_param_cli_name(param, model):
708 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
709 if type(param.parent) == ParameterGroup:
710 if not hasattr(param.parent.parent, 'parent'):
711 return resolve_param_mapping(param, model)
712 elif not hasattr(param.parent.parent.parent, 'parent'):
713 return resolve_param_mapping(param, model)
714 else:
715 if model.cli:
716 warning("Using nested parameter sections (NODE elements) is not compatible with <cli>", py1)
717 return get_param_name(param.parent) + ":" + resolve_param_mapping(param, model)
718 else:
719 return resolve_param_mapping(param, model)
720
721
722 def get_param_name(param):
723 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
724 if type(param.parent) == ParameterGroup:
725 if not hasattr(param.parent.parent, 'parent'):
726 return param.name
727 elif not hasattr(param.parent.parent.parent, 'parent'):
728 return param.name
729 else:
730 return get_param_name(param.parent) + ":" + param.name
731 else:
732 return param.name
733
734
735 # some parameters are mapped to command line options, this method helps resolve those mappings, if any
736 def resolve_param_mapping(param, model):
737 # go through all mappings and find if the given param appears as a reference name in a mapping element
738 param_mapping = None
739 for cli_element in model.cli:
740 for mapping_element in cli_element.mappings:
741 if mapping_element.reference_name == param.name:
742 if param_mapping is not None:
743 warning("The parameter %s has more than one mapping in the <cli> section. "
744 "The first found mapping, %s, will be used." % (param.name, param_mapping), 1)
745 else:
746 param_mapping = cli_element.option_identifier
747
748 return param_mapping if param_mapping is not None else param.name
749
750 def create_command(tool, model, **kwargs):
751 final_command = get_tool_executable_path(model, kwargs["default_executable_path"]) + '\n'
752 final_command += kwargs["add_to_command_line"] + '\n'
753 advanced_command_start = "#if $adv_opts.adv_opts_selector=='advanced':\n"
754 advanced_command_end = '#end if'
755 advanced_command = ''
756 parameter_hardcoder = kwargs["parameter_hardcoder"]
757
758 found_output_parameter = False
759 for param in extract_parameters(model):
760 if param.type is _OutFile:
761 found_output_parameter = True
762 command = ''
763 param_name = get_param_name(param)
764 param_cli_name = get_param_cli_name(param, model)
765 if param_name == param_cli_name:
766 # there was no mapping, so for the cli name we will use a '-' in the prefix
767 param_cli_name = '-' + param_name
768
769 if param.name in kwargs["blacklisted_parameters"]:
770 continue
771
772 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param_name, model.name)
773 if hardcoded_value:
774 command += '%s %s\n' % (param_cli_name, hardcoded_value)
775 else:
776 # parameter is neither blacklisted nor hardcoded...
777 galaxy_parameter_name = get_galaxy_parameter_name(param)
778 repeat_galaxy_parameter_name = get_repeat_galaxy_parameter_name(param)
779
780 # logic for ITEMLISTs
781 if param.is_list:
782 if param.type is _InFile:
783 command += param_cli_name + "\n"
784 command += " #for token in $" + galaxy_parameter_name + ":\n"
785 command += " $token\n"
786 command += " #end for\n"
787 else:
788 command += "\n#if $" + repeat_galaxy_parameter_name + ":\n"
789 command += param_cli_name + "\n"
790 command += " #for token in $" + repeat_galaxy_parameter_name + ":\n"
791 command += " #if \" \" in str(token):\n"
792 command += " \"$token." + galaxy_parameter_name + "\"\n"
793 command += " #else\n"
794 command += " $token." + galaxy_parameter_name + "\n"
795 command += " #end if\n"
796 command += " #end for\n"
797 command += "#end if\n"
798 # logic for other ITEMs
799 else:
800 if param.advanced and param.type is not _OutFile:
801 actual_parameter = "$adv_opts.%s" % galaxy_parameter_name
802 else:
803 actual_parameter = "$%s" % galaxy_parameter_name
804 ## if whitespace_validation has been set, we need to generate, for each parameter:
805 ## #if str( $t ).split() != '':
806 ## -t "$t"
807 ## #end if
808 ## TODO only useful for text fields, integers or floats
809 ## not useful for choices, input fields ...
810
811 if not is_boolean_parameter(param) and type(param.restrictions) is _Choices :
812 command += "#if " + actual_parameter + ":\n"
813 command += ' %s\n' % param_cli_name
814 command += " #if \" \" in str(" + actual_parameter + "):\n"
815 command += " \"" + actual_parameter + "\"\n"
816 command += " #else\n"
817 command += " " + actual_parameter + "\n"
818 command += " #end if\n"
819 command += "#end if\n"
820 elif is_boolean_parameter(param):
821 command += "#if " + actual_parameter + ":\n"
822 command += ' %s\n' % param_cli_name
823 command += "#end if\n"
824 elif TYPE_TO_GALAXY_TYPE[param.type] is 'text':
825 command += "#if " + actual_parameter + ":\n"
826 command += " %s " % param_cli_name
827 command += " \"" + actual_parameter + "\"\n"
828 command += "#end if\n"
829 else:
830 command += "#if " + actual_parameter + ":\n"
831 command += ' %s ' % param_cli_name
832 command += actual_parameter + "\n"
833 command += "#end if\n"
834
835 if param.advanced and param.type is not _OutFile:
836 advanced_command += " %s" % command
837 else:
838 final_command += command
839
840 if advanced_command:
841 final_command += "%s%s%s\n" % (advanced_command_start, advanced_command, advanced_command_end)
842
843 if not found_output_parameter:
844 final_command += "> $param_stdout\n"
845
846 command_node = add_child_node(tool, "command")
847 command_node.text = final_command
848
849
850 # creates the xml elements needed to import the needed macros files
851 # and to "expand" the macros
852 def expand_macros(tool, model, **kwargs):
853 macros_node = add_child_node(tool, "macros")
854 token_node = add_child_node(macros_node, "token")
855 token_node.attrib["name"] = "@EXECUTABLE@"
856 token_node.text = get_tool_executable_path(model, kwargs["default_executable_path"])
857
858 # add <import> nodes
859 for macro_file_name in kwargs["macros_file_names"]:
860 macro_file = open(macro_file_name)
861 import_node = add_child_node(macros_node, "import")
862 # do not add the path of the file, rather, just its basename
863 import_node.text = os.path.basename(macro_file.name)
864
865 # add <expand> nodes
866 for expand_macro in kwargs["macros_to_expand"]:
867 expand_node = add_child_node(tool, "expand")
868