Codebase list ctdconverter / 8b006e2
Created scripts to house core functionality, refactored the Galaxy converter to use these new scripts, updated README files. Luis de la Garza 6 years ago
13 changed file(s) with 1688 addition(s) and 1658 deletion(s). Raw diff Collapse all Expand all
00 # CTDConverter
1
21 Given one or more CTD files, `CTD2Converter` generates the needed wrappers to include them in workflow engines, such as Galaxy and CWL.
32
43 ## Dependencies
4 `CTDConverter` has the following python dependencies:
55
6 `CTDConverter` relies on [CTDopts]. The dependencies of each of the converters are as follows:
6 - [CTDopts]
7 - `lxml`
8 - `pyyaml`
79
8 ### Galaxy Converter
9
10 - Generation of Galaxy ToolConfig files relies on `lxml` to generate nice-looking XML files.
11
12 ## Installing Dependencies
13 You can install the [CTDopts] and `lxml` modules via `conda`, like so:
10 ### Installing Dependencies
11 The easiest way is to install [CTDopts] and all required dependencies modules via `conda`, like so:
1412
1513 ```sh
16 $ conda install lxml
14 $ conda install lxml pyyaml
1715 $ conda install -c workflowconversion ctdopts
1816 ```
1917
20 Note that the [CTDopts] module is available on the `workflowconversion` channel.
18 Note that [CTDopts] is a python module available on the `workflowconversion` channel. Of course, you can just download [CTDopts] and make it available through your `PYTHONPATH` environment variable. To get more information about how to install python modules, visit: https://docs.python.org/2/install/.
2119
22 Of course, you can just download [CTDopts] and make it available through your `PYTHONPATH` environment variable. To get more information about how to install python modules, visit: https://docs.python.org/2/install/.
20 ### Issues with `libxml2` and Schema Validation
21 `lxml` depends on `libxml2`. When you install `lxml` you'll get the latest version of `libxml2` (2.9.4) by default. You would usually want the latest version, but there is, however, a bug in validating XML files against a schema in this version of `libxml2`.
22
23 If you require validation of input CTDs against a schema (which we recommend), you will need to downgrade to the latest known version of `libxml2` that works, namely, 2.9.2. You can do it by executing the following command **after** you've installed all other dependencies:
24
25 ```sh
26 $ conda install -y libxml2=2.9.2
27 ```
28
29 You will be warned that this command will downgrade some packages, which is fine, don't worry. The `-y` flag tells `conda` to perform the installation without confirmation.
30
31 ## How to install `CTDConverter`
32 `CTDConverter` is not a python module, rather, a series of scripts, so installing it is as easy as downloading the source code from https://github.com/genericworkflownodes/CTDConverter.
33
34 ## Usage
35 The first thing that you need to tell `CTDConverter` is the output format of the converted wrappers. `CTDConverter` supports conversion of CTDs into Galaxy and CWL. Invoking it is as simple as follows:
36
37 $ python convert.py [FORMAT] [ADDITIONAL_PARAMETERS ...]
38
39 Here `[FORMAT]` can be any of the supported formats (i.e., `cwl`, `galaxy`). `CTDConverter` offers a series of format-specific scripts and we've designed these scripts to behave *somewhat* similarly. All converter scripts have the same core functionality, that is, read CTD files, parse them using [CTDopts], validate against a schema, etc. Of course, each converter script might add extra functionality that is not present in other engines, for instance, only the Galaxy converter script supports generation of a `tool_conf.xml` file.
40
41 The following sections in this file describe the parameters that all converter scripts share.
42
43 Please refer to the detailed documentation for each of the converters for more information:
44
45 - [Generation of Galaxy ToolConfig files](galaxy/README.md)
46 - [Generation of CWL task files](cwl/README.md)
2347
2448
25 ## How to install CTDConverter
49 ## Converting a single CTD
50 In its simplest form, the converter takes an input CTD file and generates an output file. The following usage of `CTDConverter`:
2651
27 1. Download the source code from https://github.com/genericworkflownodes/CTDConverter.
52 $ python convert.py [FORMAT] -i /data/sample_input.ctd -o /data/sample_output.xml
2853
29 ## Usage
54 will parse `/data/sample_input.ctd` and generate an appropriate converted file under `/data/sample_output.xml`. The generated file can be added to your workflow engine as usual.
3055
31 Check the detailed documentation for each of the converters:
56 ## Converting several CTDs
57 When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
3258
33 - [Generation of Galaxy ToolConfig files](galaxy/README.md)
59 $ python convert.py [FORMAT] -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/converted-files
60
61 Will convert `/data/ctds/one.ctd` into `/data/converted-files/one.[EXT]` and `/data/ctds/two.ctd` into `/data/converted-files/two.[EXT]`. Each converter has a preferred extension, here shown as a variable (`[EXT]`). Galaxy prefers `xml`, while CWL prefers `cwl`.
62
63 You can use wildcard expansion, as supported by most modern operating systems:
64
65 $ python convert.py [FORMAT] -i /data/ctds/*.ctd -o /data/converted-files
66
67 ## Common Parameters
68 ### Input File(s)
69 * Purpose: Provide input CTD file(s) to convert.
70 * Short/long version: `-i` / `--input`
71 * Required: yes.
72 * Taken values: a list of input CTD files.
3473
74 Example:
75
76 Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
77
78 $ python convert.py [FORMAT] -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
79 $ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
80 $ python convert.py [FORMAT] --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
81 $ python convert.py [FORMAT] --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
82
83 The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
84
85 $ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
86
87 Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Let's assume that the folder `/data/ctds` contains three files: `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/wrappers` folder:
88
89 $ python convert.py [FORMAT] -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/wrappers
90 $ python convert.py [FORMAT] -i /data/*.ctd -o /data/wrappers
91
92 ### Output Destination
93 * Purpose: Provide output destination for the converted wrapper files.
94 * Short/long version: `-o` / `--output-destination`
95 * Required: yes.
96 * Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder, in which all converted CTDs will be written, is expected.
97
98 Examples:
99
100 A single input is given, and the output will be generated into `/data/output.xml`:
101
102 $ python convert.py [FORMAT] -i /data/input.ctd -o /data/output.xml
103
104 Several inputs are given. The output is the already existent folder, `/data/wrappers`, and at the end of the operation, the files `/data/wrappers/input_one.[EXT]` and `/data/wrappers/input_two.[EXT]` will be generated:
105
106 $ python convert.py [FORMAT] -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
107
108 ### Blacklisting Parameters
109 * Purpose: Some parameters present in the CTD are not to be exposed on the output files. Think of parameters such as `--help`, `--debug` that might won't make much sense to be exposed to final users in a workflow management system.
110 * Short/long version: `-b` / `--blacklist-parameters`
111 * Required: no.
112 * Taken values: A list of parameters to be blacklisted.
113
114 Example:
115
116 $ pythonconvert.py [FORMAT] ... -b h help quiet
117
118 In this case, `CTDConverter` will not process any of the parameters named `h`, `help`, or `quiet`, that is, they will not appear in the generated output files.
119
120 ### Schema Validation
121 * Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
122 * Short/long version: `-V` / `--validation-schema`
123 * Required: no.
124 * Taken values: location of the schema file (e.g., CTD.xsd).
125
126 CTDs can be validated against a schema. The master version of the schema can be found under [CTDSchema].
127
128 If a schema is provided, all input CTDs will be validated against it.
129
130 **NOTE:** Please make sure to read the [section on issues with schema validation](#issues-with-libxml2-and-schema-validation) if you require validation of CTDs against a schema.
131
132 ### Hardcoding Parameters
133 * Purpose: Fixing the value of a parameter and hide it from the end user.
134 * Short/long version: `-p` / `--hardcoded-parameters`
135 * Required: no.
136 * Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use.
137
138 It is sometimes required that parameters are hidden from the end user in workflow systems and that they take a predetermined, fixed value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. The first two columns are mandatory.
139
140 If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
141
142 Lines starting with `#` will be ignored. The following is an example of a valid file:
143
144 # Parameter name # Value # Tool(s)
145 threads 8
146 mode quiet
147 xtandem_executable xtandem XTandemAdapter
148 verbosity high Foo, Bar
149
150 The parameters `threads` and `mode` will be set to `8` and `quiet`, respectively, for all parsed CTDs. However, the `xtandem_executable` parameter will be set to `xtandem` only for the `XTandemAdapter` tool. Similarly, the parameter `verbosity` will be set to `high` for the `Foo` and `Bar` tools only.
151
152 ### Providing a default executable Path
153 * Purpose: Help workflow engines locate tools by providing a path.
154 * Short/long version: `-x` / `--default-executable-path`
155 * Required: no.
156 * Taken values: The default executable path of the tools in the target workflow engine.
157
158 CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the appropriate sections in the output files.
159
160 The following invocation of the converter will use `/opt/suite/bin` as a prefix when providing the executable path in the output files for any input CTD that lacks the `<executablePath>` section:
161
162 $ python convert.py [FORMAT] -x /opt/suite/bin ...
163
35164
36165 [CTDopts]: https://github.com/genericworkflownodes/CTDopts
(New empty file)
0 #!/usr/bin/env python
1 # encoding: utf-8
2
3 """
4 @author: delagarza
5 """
6
7 from CTDopts.CTDopts import ModelError
8
9
10 class CLIError(Exception):
11 # Generic exception to raise and log different fatal errors.
12 def __init__(self, msg):
13 super(CLIError).__init__(type(self))
14 self.msg = "E: %s" % msg
15
16 def __str__(self):
17 return self.msg
18
19 def __unicode__(self):
20 return self.msg
21
22
23 class InvalidModelException(ModelError):
24 def __init__(self, message):
25 super(InvalidModelException, self).__init__()
26 self.message = message
27
28 def __str__(self):
29 return self.message
30
31 def __repr__(self):
32 return self.message
33
34
35 class ApplicationException(Exception):
36 def __init__(self, msg):
37 super(ApplicationException).__init__(type(self))
38 self.msg = msg
39
40 def __str__(self):
41 return self.msg
42
43 def __unicode__(self):
44 return self.msg
0 #!/usr/bin/env python
1 # encoding: utf-8
2 import sys
3
4 MESSAGE_INDENTATION_INCREMENT = 2
5
6
7 def _get_indented_text(text, indentation_level):
8 return ("%(indentation)s%(text)s" %
9 {"indentation": " " * (MESSAGE_INDENTATION_INCREMENT * indentation_level),
10 "text": text})
11
12
13 def warning(warning_text, indentation_level=0):
14 sys.stdout.write(_get_indented_text("WARNING: %s\n" % warning_text, indentation_level))
15
16
17 def error(error_text, indentation_level=0):
18 sys.stderr.write(_get_indented_text("ERROR: %s\n" % error_text, indentation_level))
19
20
21 def info(info_text, indentation_level=0):
22 sys.stdout.write(_get_indented_text("INFO: %s\n" % info_text, indentation_level))
0 #!/usr/bin/env python
1 # encoding: utf-8
2 import ntpath
3 import os
4
5 from lxml import etree
6 from string import strip
7 from logger import info, error, warning
8
9 from common.exceptions import ApplicationException
10 from CTDopts.CTDopts import CTDModel
11
12
13 MESSAGE_INDENTATION_INCREMENT = 2
14
15
16 # simple struct-class containing a tuple with input/output location and the in-memory CTDModel
17 class ParsedCTD:
18 def __init__(self, ctd_model=None, input_file=None, suggested_output_file=None):
19 self.ctd_model = ctd_model
20 self.input_file = input_file
21 self.suggested_output_file = suggested_output_file
22
23
24 class ParameterHardcoder:
25 def __init__(self):
26 # map whose keys are the composite names of tools and parameters in the following pattern:
27 # [ToolName][separator][ParameterName] -> HardcodedValue
28 # if the parameter applies to all tools, then the following pattern is used:
29 # [ParameterName] -> HardcodedValue
30
31 # examples (assuming separator is '#'):
32 # threads -> 24
33 # XtandemAdapter#adapter -> xtandem.exe
34 # adapter -> adapter.exe
35 self.separator = "!"
36 self.parameter_map = {}
37
38 # the most specific value will be returned in case of overlap
39 def get_hardcoded_value(self, parameter_name, tool_name):
40 # look for the value that would apply for all tools
41 generic_value = self.parameter_map.get(parameter_name, None)
42 specific_value = self.parameter_map.get(self.build_key(parameter_name, tool_name), None)
43 if specific_value is not None:
44 return specific_value
45
46 return generic_value
47
48 def register_parameter(self, parameter_name, parameter_value, tool_name=None):
49 self.parameter_map[self.build_key(parameter_name, tool_name)] = parameter_value
50
51 def build_key(self, parameter_name, tool_name):
52 if tool_name is None:
53 return parameter_name
54 return "%s%s%s" % (parameter_name, self.separator, tool_name)
55
56
57 def validate_path_exists(path):
58 if not os.path.isfile(path) or not os.path.exists(path):
59 raise ApplicationException("The provided path (%s) does not exist or is not a valid file path." % path)
60
61
62 def validate_argument_is_directory(args, argument_name):
63 file_name = getattr(args, argument_name)
64 if file_name is not None and os.path.isdir(file_name):
65 raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
66
67
68 def validate_argument_is_valid_path(args, argument_name):
69 paths_to_check = []
70 # check if we are handling a single file or a list of files
71 member_value = getattr(args, argument_name)
72 if member_value is not None:
73 if isinstance(member_value, list):
74 for file_name in member_value:
75 paths_to_check.append(strip(str(file_name)))
76 else:
77 paths_to_check.append(strip(str(member_value)))
78
79 for path_to_check in paths_to_check:
80 validate_path_exists(path_to_check)
81
82
83 # taken from
84 # http://stackoverflow.com/questions/8384737/python-extract-file-name-from-path-no-matter-what-the-os-path-format
85 def get_filename(path):
86 head, tail = ntpath.split(path)
87 return tail or ntpath.basename(head)
88
89
90 def get_filename_without_suffix(path):
91 root, ext = os.path.splitext(os.path.basename(path))
92 return root
93
94
95 def parse_input_ctds(xsd_location, input_ctds, output_destination, output_file_extension):
96 is_converting_multiple_ctds = len(input_ctds) > 1
97 parsed_ctds = []
98 schema = None
99 if xsd_location is not None:
100 try:
101 info("Loading validation schema from %s" % xsd_location, 0)
102 schema = etree.XMLSchema(etree.parse(xsd_location))
103 except Exception, e:
104 error("Could not load validation schema %s. Reason: %s" % (xsd_location, str(e)), 0)
105 else:
106 info("Validation against a schema has not been enabled.", 0)
107 for input_ctd in input_ctds:
108 try:
109 if schema is not None:
110 validate_against_schema(input_ctd, schema)
111 output_file = output_destination
112 # if multiple inputs are being converted, we need to generate a different output_file for each input
113 if is_converting_multiple_ctds:
114 output_file = os.path.join(output_file,
115 get_filename_without_suffix(input_ctd) + '.' + output_file_extension)
116 parsed_ctds.append(ParsedCTD(CTDModel(from_file=input_ctd), input_ctd, output_file))
117 except Exception, e:
118 error(str(e), 1)
119 continue
120 return parsed_ctds
121
122
123 def flatten_list_of_lists(args, list_name):
124 setattr(args, list_name, [item for sub_list in getattr(args, list_name) for item in sub_list])
125
126
127 def validate_against_schema(ctd_file, schema):
128 try:
129 parser = etree.XMLParser(schema=schema)
130 etree.parse(ctd_file, parser=parser)
131 except etree.XMLSyntaxError, e:
132 raise ApplicationException("Invalid CTD file %s. Reason: %s" % (ctd_file, str(e)))
133
134
135 def add_common_parameters(parser, version, last_updated):
136 parser.add_argument("FORMAT", default=None, help="Output format (mandatory). Can be one of: cwl, galaxy.")
137 parser.add_argument("-i", "--input", dest="input_files", default=[], required=True, nargs="+", action="append",
138 help="List of CTD files to convert.")
139 parser.add_argument("-o", "--output-destination", dest="output_destination", required=True,
140 help="If multiple input files are given, then a folder in which all converted "
141 "files will be generated is expected; "
142 "if a single input file is given, then a destination file is expected.")
143 parser.add_argument("-x", "--default-executable-path", dest="default_executable_path",
144 help="Use this executable path when <executablePath> is not present in the CTD",
145 default=None, required=False)
146 parser.add_argument("-b", "--blacklist-parameters", dest="blacklisted_parameters", default=[], nargs="+",
147 action="append",
148 help="List of parameters that will be ignored and won't appear on the galaxy stub",
149 required=False)
150 parser.add_argument("-p", "--hardcoded-parameters", dest="hardcoded_parameters", default=None, required=False,
151 help="File containing hardcoded values for the given parameters. Run with '-h' or '--help' "
152 "to see a brief example on the format of this file.")
153 parser.add_argument("-V", "--validation-schema", dest="xsd_location", default=None, required=False,
154 help="Location of the schema to use to validate CTDs. If not provided, no schema validation "
155 "will take place.")
156
157 # TODO: add verbosity, maybe?
158 program_version = "v%s" % version
159 program_build_date = str(last_updated)
160 program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
161 parser.add_argument("-v", "--version", action='version', version=program_version_message)
162
163
164 def parse_hardcoded_parameters(hardcoded_parameters_file):
165 parameter_hardcoder = ParameterHardcoder()
166 if hardcoded_parameters_file is not None:
167 line_number = 0
168 with open(hardcoded_parameters_file) as f:
169 for line in f:
170 line_number += 1
171 if line is None or not line.strip() or line.strip().startswith("#"):
172 pass
173 else:
174 # the third column must not be obtained as a whole, and not split
175 parsed_hardcoded_parameter = line.strip().split(None, 2)
176 # valid lines contain two or three columns
177 if len(parsed_hardcoded_parameter) != 2 and len(parsed_hardcoded_parameter) != 3:
178 warning("Invalid line at line number %d of the given hardcoded parameters file. Line will be"
179 "ignored:\n%s" % (line_number, line), 0)
180 continue
181
182 parameter_name = parsed_hardcoded_parameter[0]
183 hardcoded_value = parsed_hardcoded_parameter[1]
184 tool_names = None
185 if len(parsed_hardcoded_parameter) == 3:
186 tool_names = parsed_hardcoded_parameter[2].split(',')
187 if tool_names:
188 for tool_name in tool_names:
189 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value, tool_name.strip())
190 else:
191 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value)
192
193 return parameter_hardcoder
0 import os
1 import sys
2 import traceback
3 import common.utils as utils
4
5 from argparse import ArgumentParser
6 from argparse import RawDescriptionHelpFormatter
7 from common.exceptions import ApplicationException, ModelError
8
9
10 __all__ = []
11 __version__ = 2.0
12 __date__ = '2014-09-17'
13 __updated__ = '2017-08-09'
14
15 program_version = "v%s" % __version__
16 program_build_date = str(__updated__)
17 program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
18 program_short_description = "CTDConverter - A project from the WorkflowConversion family " \
19 "(https://github.com/WorkflowConversion/CTDConverter)"
20 program_usage = '''
21 USAGE:
22
23 $ python convert.py [FORMAT] [ARGUMENTS ...]
24
25 FORMAT can be either one of the supported output formats: cwl, galaxy.
26
27 There is one converter for each supported FORMAT, each taking a different set of arguments. Please consult the detailed
28 documentation for each of the converters. Nevertheless, all converters have the following common parameters/options:
29
30
31 I - Parsing a single CTD file and convert it:
32
33 $ python convert.py [FORMAT] -i [INPUT_FILE] -o [OUTPUT_FILE]
34
35
36 II - Parsing several CTD files, output converted wrappers in a given folder:
37
38 $ python converter.py [FORMAT] -i [INPUT_FILES] -o [OUTPUT_DIRECTORY]
39
40
41 III - Hardcoding parameters
42
43 It is possible to hardcode parameters. This makes sense if you want to set a tool in 'quiet' mode or if your tools
44 support multi-threading and accept the number of threads via a parameter, without giving end users the chance to
45 change the values for these parameters.
46
47 In order to generate hardcoded parameters, you need to provide a simple file. Each line of this file contains
48 two or three columns separated by whitespace. Any line starting with a '#' will be ignored. The first column contains
49 the name of the parameter, the second column contains the value that will always be set for this parameter. Only the
50 first two columns are mandatory.
51
52 If the parameter is to be hardcoded only for a set of tools, then a third column can be added. This column contains
53 a comma-separated list of tool names for which the parameter will be hardcoded. If a third column is not present,
54 then all processed tools containing the given parameter will get a hardcoded value for it.
55
56 The following is an example of a valid file:
57
58 ##################################### HARDCODED PARAMETERS example #####################################
59 # Every line starting with a # will be handled as a comment and will not be parsed.
60 # The first column is the name of the parameter and the second column is the value that will be used.
61
62 # Parameter name # Value # Tool(s)
63 threads 8
64 mode quiet
65 xtandem_executable xtandem XTandemAdapter
66 verbosity high Foo, Bar
67
68 #########################################################################################################
69
70 Using the above file will produce a command-line similar to:
71
72 [TOOL] ... -threads 8 -mode quiet ...
73
74 for all tools. For XTandemAdapter, however, the command-line will look like:
75
76 XtandemAdapter ... -threads 8 -mode quiet -xtandem_executable xtandem ...
77
78 And for tools Foo and Bar, the command-line will be similar to:
79
80 Foo -threads 8 -mode quiet -verbosity high ...
81
82
83 IV - Engine-specific parameters
84
85 i - Galaxy
86
87 a. Providing file formats, mimetypes
88
89 Galaxy supports the concept of file format in order to connect compatible ports, that is, input ports of a
90 certain data format will be able to receive data from a port from the same format. This converter allows you
91 to provide a personalized file in which you can relate the CTD data formats with supported Galaxy data formats.
92 The layout of this file consists of lines, each of either one or four columns separated by any amount of
93 whitespace. The content of each column is as follows:
94
95 * 1st column: file extension
96 * 2nd column: data type, as listed in Galaxy
97 * 3rd column: full-named Galaxy data type, as it will appear on datatypes_conf.xml
98 * 4th column: mimetype (optional)
99
100 The following is an example of a valid "file formats" file:
101
102 ########################################## FILE FORMATS example ##########################################
103 # Every line starting with a # will be handled as a comment and will not be parsed.
104 # The first column is the file format as given in the CTD and second column is the Galaxy data format. The
105 # second, third, fourth and fifth columns can be left empty if the data type has already been registered
106 # in Galaxy, otherwise, all but the mimetype must be provided.
107
108 # CTD type # Galaxy type # Long Galaxy data type # Mimetype
109 csv tabular galaxy.datatypes.data:Text
110 fasta
111 ini txt galaxy.datatypes.data:Text
112 txt
113 idxml txt galaxy.datatypes.xml:GenericXml application/xml
114 options txt galaxy.datatypes.data:Text
115 grid grid galaxy.datatypes.data:Grid
116 ##########################################################################################################
117
118 Note that each line consists precisely of either one, three or four columns. In the case of data types already
119 registered in Galaxy (such as fasta and txt in the above example), only the first column is needed. In the
120 case of data types that haven't been yet registered in Galaxy, the first three columns are needed
121 (mimetype is optional).
122
123 For information about Galaxy data types and subclasses, see the following page:
124 https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
125
126
127 b. Finer control over which tools will be converted
128
129 Sometimes only a subset of CTDs needs to be converted. It is possible to either explicitly specify which tools
130 will be converted or which tools will not be converted.
131
132 The value of the -s/--skip-tools parameter is a file in which each line will be interpreted as the name of a
133 tool that will not be converted. Conversely, the value of the -r/--required-tools is a file in which each line
134 will be interpreted as a tool that is required. Only one of these parameters can be specified at a given time.
135
136 The format of both files is exactly the same. As stated before, each line will be interpreted as the name of a
137 tool. Any line starting with a '#' will be ignored.
138
139
140 ii - CWL
141
142 There are, for now, no CWL-specific parameters or options.
143
144 '''
145
146 program_license = '''%(short_description)s
147
148 Copyright 2017, WorklfowConversion
149
150 Licensed under the Apache License, Version 2.0 (the "License");
151 you may not use this file except in compliance with the License.
152 You may obtain a copy of the License at
153
154 http://www.apache.org/licenses/LICENSE-2.0
155
156 Unless required by applicable law or agreed to in writing, software
157 distributed under the License is distributed on an "AS IS" BASIS,
158 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
159 See the License for the specific language governing permissions and
160 limitations under the License.
161
162 %(usage)s
163 ''' % {'short_description': program_short_description, 'usage': program_usage}
164
165
166 def main(argv=None):
167 if argv is None:
168 argv = sys.argv
169 else:
170 sys.argv.extend(argv)
171
172 # check that we have, at least, one argument provided
173 # at this point we cannot parse the arguments, because each converter takes different arguments, meaning each
174 # converter will register its own parameters after we've registered the basic ones... we have to do it old school
175 if len(argv) < 2:
176 utils.error('Not enough arguments provided')
177 print('\nUsage: $ python convert.py [TARGET] [ARGUMENTS]\n\n' +
178 'Where:\n' +
179 ' target: one of \'cwl\' or \'galaxy\'\n\n' +
180 'Run again using the -h/--help option to print more detailed help.\n')
181 return 1
182
183 # TODO: at some point this should look like real software engineering and use a map containing converter instances
184 # whose keys would be the name of the converter (e.g., cwl, galaxy), but for the time being, only two formats
185 # are supported
186 target = str.lower(argv[1])
187 if target == 'cwl':
188 from cwl import converter
189 elif target == 'galaxy':
190 from galaxy import converter
191 elif target == '-h' or target == '--help' or target == '--h' or target == 'help':
192 print(program_license)
193 return 0
194 else:
195 utils.error('Unrecognized target engine. Supported targets are \'cwl\' and \'galaxy\'.')
196 return 1
197
198 try:
199 # Setup argument parser
200 parser = ArgumentParser(prog="CTDConverter", description=program_license,
201 formatter_class=RawDescriptionHelpFormatter, add_help=True)
202 utils.add_common_parameters(parser, program_version_message, program_build_date)
203
204 # add tool-specific arguments
205 converter.add_specific_args(parser)
206
207 # parse arguments and perform some basic, common validation
208 args = parser.parse_args()
209 validate_and_prepare_common_arguments(args)
210
211 # parse the input CTD files into CTDModels
212 parsed_ctds = utils.parse_input_ctds(args.xsd_location, args.input_files, args.output_destination,
213 converter.get_preferred_file_extension())
214
215 # let the converter do its own thing
216 return converter.convert_models(args, parsed_ctds)
217
218 except KeyboardInterrupt:
219 # handle keyboard interrupt
220 return 0
221
222 except ApplicationException, e:
223 utils.error("CTDConverter could not complete the requested operation.", 0)
224 utils.error("Reason: " + e.msg, 0)
225 return 1
226
227 except ModelError, e:
228 utils.error("There seems to be a problem with one of your input CTDs.", 0)
229 utils.error("Reason: " + e.msg, 0)
230 return 1
231
232 except Exception, e:
233 traceback.print_exc()
234 return 2
235
236 return 0
237
238
239 def validate_and_prepare_common_arguments(args):
240 # flatten lists of lists to a list containing elements
241 lists_to_flatten = ["input_files", "blacklisted_parameters"]
242 for list_to_flatten in lists_to_flatten:
243 utils.flatten_list_of_lists(args, list_to_flatten)
244
245 # if input is a single file, we expect output to be a file (and not a dir that already exists)
246 if len(args.input_files) == 1:
247 if os.path.isdir(args.output_destination):
248 raise ApplicationException("If a single input file is provided, output (%s) is expected to be a file "
249 "and not a folder.\n" % args.output_destination)
250
251 # if input is a list of files, we expect output to be a folder
252 if len(args.input_files) > 1:
253 if not os.path.isdir(args.output_destination):
254 raise ApplicationException("If several input files are provided, output (%s) is expected to be an "
255 "existing directory.\n" % args.output_destination)
256
257 # check that the provided input files, if provided, contain a valid file path
258 input_arguments_to_check = ["xsd_location", "input_files", "hardcoded_parameters"]
259 for argument_name in input_arguments_to_check:
260 utils.validate_argument_is_valid_path(args, argument_name)
261
262
263 if __name__ == "__main__":
264 sys.exit(main())
00 # Conversion of CTD Files to Galaxy ToolConfigs
1 ## Generating a `tool_conf.xml` File
2 * Purpose: Galaxy uses a file `tool_conf.xml` in which other tools can be included. `CTDConverter` can also generate this file. Categories will be extracted from the provided input CTDs and for each category, a different `<section>` will be generated. Any input CTD lacking a category will be sorted under the provided default category.
3 * Short/long version: `-t` / `--tool-conf-destination`
4 * Required: no.
5 * Taken values: The destination of the file.
16
2 ## How to use: most common Tasks
7 $ python convert.py galaxy -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -t /data/generated-galaxy-stubs/tool_conf.xml
38
4 The Galaxy ToolConfig generator takes several parameters and a varying number of inputs and outputs. The following sub-sections show how to perform the most common operations.
5
6 Running the generator with the `-h/--help` parameter will print extended information about each of the parameters.
7
8 ### Macros
9
10 Galaxy supports the use of macros via a `macros.xml` file (we provide a sample macros file in [macros.xml]). Instead of repeating sections, macros can be used and expanded. If you want fine control over the macros, you can use the `-m` / `--macros` parameter to provide your own macros file.
11
12 Please note that the used macros file **must** be copied to your Galaxy installation on the same location in which you place the generated *ToolConfig* files, otherwise Galaxy will not be able to parse the generated *ToolConfig* files!
13
14 ### One input, one Output
15
16 In its simplest form, the converter takes an input CTD file and generates an output Galaxy *ToolConfig* file. The following usage of `generator.py`:
17
18 $ python generator.py -i /data/sample_input.ctd -o /data/sample_output.xml
19
20 will parse `/data/sample_input.ctd` and generate a Galaxy tool wrapper under `/data/sample_output.xml`. The generated file can be added to your Galaxy instance like any other tool.
21
22 ### Converting several CTDs at once
23
24 When converting several CTDs, the expected value for the `-o`/`--output` parameter is a folder. For example:
25
26 $ python generator.py -i /data/ctds/one.ctd /data/ctds/two.ctd -o /data/generated-galaxy-stubs
27
28 Will convert `/data/ctds/one.ctd` into `/data/generated-galaxy-stubs/one.xml` and `/data/ctds/two.ctd` into `/data/generated-galaxy-stubs/two.xml`.
29
30 You can use wildcard expansion, as supported by most modern operating systems:
31
32 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs
33
34 ### Generating a tool_conf.xml File
35
36 The generator supports generation of a `tool_conf.xml` file which you can later use in your local Galaxy installation. The parameter `-t`/`--tool-conf-destination` contains the path of a file in which a `tool_conf.xml` file will be generated.
37
38 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -t /data/generated-galaxy-stubs/tool_conf.xml
39
40
41 ## How to use: Parameters in Detail
42
43 ### A Word about Parameters taking Lists of Values
44
45 All parameters have a short and a long option and some parameters take list of values. Using either the long or the short option of the parameter will produce the same output. The following examples show how to pass values using the `-f` / `--foo` parameter:
46
47 The following uses of the parameter will pass the list of values containing `bar`, `blah` and `blu`:
48
49 -f bar blah blu
50 --foo bar blah blu
51 -f bar -f blah -f blu
52 --foo bar --foo blah --foo blu
53 -f bar --foo blah blu
54
55 The following uses of the parameter will pass a single value `bar`:
56
57 -f bar
58 --foo bar
59
60 ### Schema Validation
61
62 * Purpose: Provide validation of input CTDs against a schema file (i.e, a XSD file).
63 * Short/long version: `v` / `--validation-schema`
64 * Required: no.
65 * Taken values: location of the schema file (e.g., CTD.xsd).
66
67 CTDs can be validated against a schema. The master version of the schema can be found under [CTDSchema].
68
69 If a schema is provided, all input CTDs will be validated against it.
70
71 ### Input File(s)
72
73 * Purpose: Provide input CTD file(s) to convert.
74 * Short/long version: `-i` / `--input`
75 * Required: yes.
76 * Taken values: a list of input CTD files.
77
78 Example:
79
80 Any of the following invocations will convert `/data/input_one.ctd` and `/data/input_two.ctd`:
81
82 $ python generator.py -i /data/input_one.ctd -i /data/input_two.ctd -o /data/generated
83 $ python generator.py -i /data/input_one.ctd /data/input_two.ctd -o /data/generated
84 $ python generator.py --input /data/input_one.ctd /data/input_two.ctd -o /data/generated
85 $ python generator.py --input /data/input_one.ctd --input /data/input_two.ctd -o /data/generated
86
87 The following invocation will convert `/data/input.ctd` into `/data/output.xml`:
88
89 $ python generator.py -i /data/input.ctd -o /data/output.xml -m sample_files/macros.xml
90
91 Of course, you can also use wildcards, which will be automatically expanded by any modern operating system. This is extremely useful if you want to convert several files at a time. Imagine that the folder `/data/ctds` contains three files, `input_one.ctd`, `input_two.ctd` and `input_three.ctd`. The following two invocations will produce the same output in the `/data/galaxy`:
92
93 $ python generator.py -i /data/input_one.ctd /data/input_two.ctd /data/input_three.ctd -o /data/galaxy
94 $ python generator.py -i /data/*.ctd -o /data/galaxy
95
96 ### Finer Control over the Tools to be converted
97
98 Sometimes only a set of CTDs in a folder need to be converted. The parameter `-r`/`--required-tools` takes the path a file containing the names of tools that will be converted.
99
100 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -r required_tools.txt
101
102 On the other hand, if you want the generator to skip conversion of some CTDs, the parameter `-s`/`--skip-tools` will take the path of a file containing the names of tools that will not be converted.
103
104 $ python generator.py -i /data/ctds/*.ctd -o /data/generated-galaxy-stubs -s skipped_tools.txt
105
106 The format of these files (`required_tools.txt`, `skipped_tools.txt` in the examples above) is straightforward. Each line contains the name of a tool and any line starting with `#` will be ignored.
107
108 ### Output Destination
109
110 * Purpose: Provide output destination for the generated Galaxy *ToolConfig* files.
111 * Short/long version: `-o` / `--output-destination`
112 * Required: yes.
113 * Taken values: if a single input file is given, then a single output file is expected. If multiple input files are given, then an existent folder, in which all generated Galaxy *ToolConfig* will be written, is expected.
114
115 Example:
116
117 A single input is given, and the output will be generated into `/data/output.xml`:
118
119 $ python generator.py -i /data/input.ctd -o /data/output.xml
120
121 Several inputs are given. The output is the already existent folder, `/data/stubs`, and at the end of the operation, the files `/data/stubs/input_one.ctd.xml` and `/data/stubs/input_two.ctd.xml` will be generated:
122
123 $ python generator.py -i /data/ctds/input_one.ctd /data/ctds/input_two.ctd -o /data/stubs
124
125
126 ### Adding Parameters to the Command-line
127
9 ## Adding Parameters to the Command-line
12810 * Purpose: Galaxy *ToolConfig* files include a `<command>` element in which the command line to invoke the tool can be given. Sometimes it is needed to invoke your tools in a certain way (i.e., passing certain parameters). For instance, some tools offer the possibility to be invoked in a verbose or quiet way or even to be invoked in a headless way (i.e., without GUI).
12911 * Short/long version: `-a` / `--add-to-command-line`
13012 * Required: no.
13214
13315 Example:
13416
135 $ python generator.py ... -a "--quiet --no-gui"
17 $ python convert.py galaxy ... -a "--quiet --no-gui"
13618
13719 Will generate the following `<command>` element in the generated Galaxy *ToolConfig*:
13820
13921 <command>TOOL_NAME --quiet --no-gui ...</command>
14022
141
142 ### Blacklisting Parameters
143
144 * Purpose: Some parameters present in the CTD are not to be exposed on Galaxy. Think of parameters such as `--help`, `--debug`, that might won't make much sense to be exposed to final users in a workflow management system such as Galaxy.
145 * Short/long version: `-b` / `--blacklist-parameters`
146 * Required: no.
147 * Taken values: A list of parameters to be blacklisted.
148
149 Example:
150
151 $ python generator.py ... -b h help quiet
152
153 Will not process any of the parameters named `h`, `help`, or `quiet` and will not appear in the generated Galaxy *ToolConfig*.
154
155 ### Generating a tool_conf.xml file
156
157 * Purpose: Galaxy uses a file `tool_conf.xml` in which other tools can be included. `generator.py` can also generate this file. Categories will be extracted from the provided input CTDs and for each category, a different `<section>` will be generated. Any input CTD lacking a category will be sorted under the provided default category.
158 * Short/long version: `-t` / `--tool-conf-destination`
159 * Required: no.
160 * Taken values: The destination of the file.
161
162 ### Providing a default Category
163
164 * Purpose: Input CTDs that lack a category will be sorted under the value given to this parameter. If this parameter is not given, then the category `DEFAULT` will be used.
23 ## Providing a default Category
24 * Purpose: Input CTDs that lack a category will be sorted under the value given to this parameter. If this parameter is not provided, then the category `DEFAULT` will be used.
16525 * Short/long version: `-c` / `--default-category`
16626 * Required: no.
16727 * Taken values: The value for the default category to use for input CTDs lacking a category.
17030
17131 Suppose there is a folder containing several CTD files. Some of those CTDs don't have the optional attribute `category` and the rest belong to the `Data Processing` category. The following invocation:
17232
173 $ python generator.py ... -c Other
33 $ python convert.py galaxy ... -c Other
17434
17535 will generate, for each of the categories, a different section. Additionally, CTDs lacking a category will be sorted under the given category, `Other`, as shown:
17636
18646 ...
18747 </section>
18848
189 ### Providing a Path for the Location of the ToolConfig Files
190
191 * Purpose: The `tool_conf.xml` file contains references to files which in turn contain Galaxy *ToolConfig* files. Using this parameter, you can provide information about the location of your tools.
49 ## Providing a Path for the Location of the *ToolConfig* Files
50 * Purpose: The `tool_conf.xml` file contains references to files which in turn contain Galaxy *ToolConfig* files. Using this parameter, you can provide information about the location of your wrappers on your Galaxy instance.
19251 * Short/long version: `-g` / `--galaxy-tool-path`
19352 * Required: no.
19453 * Taken values: The path relative to your `$GALAXY_ROOT/tools` folder on which your tools are located.
19554
19655 Example:
19756
198 $ python generator.py ... -g my_tools_folder
57 $ python convert.py galaxy ... -g my_tools_folder
19958
20059 Will generate `<tool>` elements in the generated `tool_conf.xml` as follows:
20160
20362
20463 In this example, `tool_conf.xml` refers to a file located on `$GALAXY_ROOT/tools/my_tools_folder/some_tool.xml`.
20564
206
207 ### Hardcoding Parameters
208
209 * Purpose: Fixing the value of a parameter and hide it from the end user.
210 * Short/long version: `-p` / `--hardcoded-parameters`
211 * Required: no.
212 * Taken values: The path of a file containing the mapping between parameter names and hardcoded values to use in the `<command>` section.
213
214 It is sometimes required that parameters are hidden from the end user in workflow systems such as Galaxy and that they take a predetermined value. Allowing end users to control parameters similar to `--verbosity`, `--threads`, etc., might create more problems than solving them. For this purpose, the parameter `p`/`--hardcoded-parameters` takes the path of a file that contains up to three columns separated by whitespace that map parameter names to the hardcoded value. The first column contains the name of the parameter and the second one the hardcoded value. The first two columns are mandatory.
215
216 If the parameter is to be hardcoded only for certain tools, a third column containing a comma separated list of tool names for which the hardcoding will apply can be added.
217
218 Lines starting with `#` will be ignored. The following is an example of a valid file:
219
220 # Parameter name # Value # Tool(s)
221 threads \${GALAXY_SLOTS:-24}
222 mode quiet
223 xtandem_executable xtandem XTandemAdapter
224 verbosity high Foo, Bar
225
226 This will produce a `<command>` section similar to the following one for all tools but `XTandemAdapter`, `Foo` and `Bar`:
227
228 <command>TOOL_NAME -threads \${GALAXY_SLOTS:-24} -mode quiet ...</command>
229
230 For `XTandemAdapter`, the `<command>` will be similar to:
231
232 <command>XtandemAdapter ... -threads \${GALAXY_SLOTS:-24} -mode quiet -xtandem_executable xtandem ...</command>
233
234 And for tools `Foo` and `Bar`, the `<command>` will be similar to:
235
236 <command>Foo ... ... -threads \${GALAXY_SLOTS:-24} -mode quiet -verbosity high ...</command>
237
238
239 ### Including additional Macros Files
240
65 ## Including additional Macros Files
24166 * Purpose: Include external macros files.
24267 * Short/long version: `-m` / `--macros`
24368 * Required: no.
24671
24772 *ToolConfig* supports elaborate sections such as `<stdio>`, `<requirements>`, etc., that are identical across tools of the same suite. Macros files assist in the task of including external xml sections into *ToolConfig* files. For more information about the syntax of macros files, see: https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax#Reusing_Repeated_Configuration_Elements
24873
249 There are some macros that are required, namely `stdio`, `requirements` and `advanced_options`. A template macro file is included in [macros.xml]. It can be edited to suit your needs and you could add extra macros or leave it as it is and include additional files.
74 There are some macros that are required, namely `stdio`, `requirements` and `advanced_options`. A template macro file is included in [macros.xml]. It can be edited to suit your needs and you could add extra macros or leave it as it is and include additional files. Every macro found in the provided files will be expanded.
25075
251 Every macro found in the included files and in `support_files/macros.xml` will be expanded. Users are responsible for copying the given macros files in their corresponding galaxy folders.
76 Please note that the used macros files **must** be copied to your Galaxy installation on the same location in which you place the generated *ToolConfig* files, otherwise Galaxy will not be able to parse the generated *ToolConfig* files!
25277
253 ### Providing a default executable Path
254
255 * Purpose: Help Galaxy locate tools by providing a path.
256 * Short/long version: `-x` / `--default-executable-path`
257 * Required: no.
258 * Taken values: The default executable path of the tools in the Galaxy installation.
259
260 CTDs can contain an `<executablePath>` element that will be used when executing the tool binary. If this element is missing, the value provided by this parameter will be used as a prefix when building the `<command>` section. Suppose that you have installed a tool suite in your local Galaxy instance under `/opt/suite/bin`. The following invocation of the converter:
261
262 $ python generator.py -x /opt/suite/bin ...
263
264 Will produce a `<command>` section similar to:
265
266 <command>/opt/suite/bin/Foo ...</command>
267
268 For those CTDs in which no `<executablePath>` could be found.
269
270
271 ### Generating a `datatypes_conf.xml` File
272
78 ## Generating a `datatypes_conf.xml` File
27379 * Purpose: Specify the destination of a generated `datatypes_conf.xml` file.
27480 * Short/long version: `-d` / `--datatypes-destination`
27581 * Required: no.
27783
27884 It is likely that your tools use file formats or mimetypes that have not been registered in Galaxy. The generator allows you to specify a path in which an automatically generated `datatypes_conf.xml` file will be created. Consult the next section to get information about how to register file formats and mimetypes.
27985
280
281 ### Providing Galaxy File Formats
282
86 ## Providing Galaxy File Formats
28387 * Purpose: Register new file formats and mimetypes.
28488 * Short/long version: `-f` / `--formats-file`
28589 * Required: no.
307111
308112 For information about Galaxy data types and subclasses, consult the following page: https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
309113
310
311 ## Notes about some of the *OpenMS* Tools
312
313 * Most of the tools can be generated automatically. Some of the tools need some extra work (for now).
314 * These adapters need to be changed, such that you provide the path to the executable:
114 ## Remarks about some of the *OpenMS* Tools
115 * Most of the tools can be generated automatically. However, some of the tools need some extra work (for now).
116 * The following adapters need to be changed, such that you provide the path to the executable:
315117 * FidoAdapter (add `-exe fido` in the command tag, delete the `$param_exe` in the command tag, delete the parameter from the input list).
316118 * MSGFPlusAdapter (add `-executable msgfplus.jar` in the command tag, delete the `$param_executable` in the command tag, delete the parameter from the input list).
317119 * MyriMatchAdapter (add `-myrimatch_executable myrimatch` in the command tag, delete the `$param_myrimatch_executable` in the command tag, delete the parameter from the input list).
320122 * XTandemAdapter (add `-xtandem_executable xtandem` in the command tag, delete the $param_xtandem_executable in the command tag, delete the parameter from the input list).
321123 * To avoid the deletion in the inputs you can also add these parameters to the blacklist
322124
323 $ python generator.py -b exe executable myrimatch_excutable omssa_executable pepnovo_executable xtandem_executable
125 $ python convert.py galaxy -b exe executable myrimatch_excutable omssa_executable pepnovo_executable xtandem_executable
324126
325 * These tools have multiple outputs (number of inputs = number of outputs) which is not yet supported in Galaxy-stable:
127 * The following tools have multiple outputs (number of inputs = number of outputs) which is not yet supported in Galaxy-stable:
326128 * SeedListGenerator
327129 * SpecLibSearcher
328130 * MapAlignerIdentification
(New empty file)
0 #!/usr/bin/env python
1 # encoding: utf-8
2
3 """
4 @author: delagarza
5 """
6
7 import os
8 import string
9
10 from collections import OrderedDict
11 from string import strip
12 from lxml import etree
13 from lxml.etree import SubElement, Element, ElementTree, ParseError, parse
14
15 from common import utils, logger
16 from common.exceptions import ApplicationException, InvalidModelException
17 from common.utils import ParsedCTD
18
19 from CTDopts.CTDopts import _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, ModelError, _Null
20
21
22 TYPE_TO_GALAXY_TYPE = {int: 'integer', float: 'float', str: 'text', bool: 'boolean', _InFile: 'data',
23 _OutFile: 'data', _Choices: 'select'}
24 STDIO_MACRO_NAME = "stdio"
25 REQUIREMENTS_MACRO_NAME = "requirements"
26 ADVANCED_OPTIONS_MACRO_NAME = "advanced_options"
27
28 REQUIRED_MACROS = [STDIO_MACRO_NAME, REQUIREMENTS_MACRO_NAME, ADVANCED_OPTIONS_MACRO_NAME]
29
30
31 class ExitCode:
32 def __init__(self, code_range="", level="", description=None):
33 self.range = code_range
34 self.level = level
35 self.description = description
36
37
38 class DataType:
39 def __init__(self, extension, galaxy_extension=None, galaxy_type=None, mimetype=None):
40 self.extension = extension
41 self.galaxy_extension = galaxy_extension
42 self.galaxy_type = galaxy_type
43 self.mimetype = mimetype
44
45
46 def add_specific_args(parser):
47 parser.add_argument("-f", "--formats-file", dest="formats_file",
48 help="File containing the supported file formats. Run with '-h' or '--help' to see a "
49 "brief example on the layout of this file.", default=None, required=False)
50 parser.add_argument("-a", "--add-to-command-line", dest="add_to_command_line",
51 help="Adds content to the command line", default="", required=False)
52 parser.add_argument("-d", "--datatypes-destination", dest="data_types_destination",
53 help="Specify the location of a datatypes_conf.xml to modify and add the registered "
54 "data types. If the provided destination does not exist, a new file will be created.",
55 default=None, required=False)
56 parser.add_argument("-c", "--default-category", dest="default_category", default="DEFAULT", required=False,
57 help="Default category to use for tools lacking a category when generating tool_conf.xml")
58 parser.add_argument("-t", "--tool-conf-destination", dest="tool_conf_destination", default=None, required=False,
59 help="Specify the location of an existing tool_conf.xml that will be modified to include "
60 "the converted tools. If the provided destination does not exist, a new file will"
61 "be created.")
62 parser.add_argument("-g", "--galaxy-tool-path", dest="galaxy_tool_path", default=None, required=False,
63 help="The path that will be prepended to the file names when generating tool_conf.xml")
64 parser.add_argument("-r", "--required-tools", dest="required_tools_file", default=None, required=False,
65 help="Each line of the file will be interpreted as a tool name that needs translation. "
66 "Run with '-h' or '--help' to see a brief example on the format of this file.")
67 parser.add_argument("-s", "--skip-tools", dest="skip_tools_file", default=None, required=False,
68 help="File containing a list of tools for which a Galaxy stub will not be generated. "
69 "Run with '-h' or '--help' to see a brief example on the format of this file.")
70 parser.add_argument("-m", "--macros", dest="macros_files", default=[], nargs="*",
71 action="append", required=None, help="Import the additional given file(s) as macros. "
72 "The macros stdio, requirements and advanced_options are required. Please see "
73 "macros.xml for an example of a valid macros file. Al defined macros will be imported.")
74
75
76 def convert_models(args, parsed_ctds): # IGNORE:C0111
77 # validate and prepare the passed arguments
78 validate_and_prepare_args(args)
79
80 # extract the names of the macros and check that we have found the ones we need
81 macros_to_expand = parse_macros_files(args.macros_files)
82
83 # parse the given supported file-formats file
84 supported_file_formats = parse_file_formats(args.formats_file)
85
86 # parse the hardcoded parameters file¬
87 parameter_hardcoder = utils.parse_hardcoded_parameters(args.hardcoded_parameters)
88
89 # parse the skip/required tools files
90 skip_tools = parse_tools_list_file(args.skip_tools_file)
91 required_tools = parse_tools_list_file(args.required_tools_file)
92
93 _convert_internal(parsed_ctds,
94 supported_file_formats=supported_file_formats,
95 default_executable_path=args.default_executable_path,
96 add_to_command_line=args.add_to_command_line,
97 blacklisted_parameters=args.blacklisted_parameters,
98 required_tools=required_tools,
99 skip_tools=skip_tools,
100 macros_file_names=args.macros_files,
101 macros_to_expand=macros_to_expand,
102 parameter_hardcoder=parameter_hardcoder)
103
104 # generation of galaxy stubs is ready... now, let's see if we need to generate a tool_conf.xml
105 if args.tool_conf_destination is not None:
106 generate_tool_conf(parsed_ctds, args.tool_conf_destination,
107 args.galaxy_tool_path, args.default_category)
108
109 # generate datatypes_conf.xml
110 if args.data_types_destination is not None:
111 generate_data_type_conf(supported_file_formats, args.data_types_destination)
112
113 return 0
114
115
116 def parse_tools_list_file(tools_list_file):
117 tools_list = None
118 if tools_list_file is not None:
119 tools_list = []
120 with open(tools_list_file) as f:
121 for line in f:
122 if line is None or not line.strip() or line.strip().startswith("#"):
123 continue
124 else:
125 tools_list.append(line.strip())
126
127 return tools_list
128
129
130 def parse_macros_files(macros_file_names):
131 macros_to_expand = set()
132
133 for macros_file_name in macros_file_names:
134 try:
135 macros_file = open(macros_file_name)
136 logger.info("Loading macros from %s" % macros_file_name, 0)
137 root = parse(macros_file).getroot()
138 for xml_element in root.findall("xml"):
139 name = xml_element.attrib["name"]
140 if name in macros_to_expand:
141 logger.warning("Macro %s has already been found. Duplicate found in file %s." %
142 (name, macros_file_name), 0)
143 else:
144 logger.info("Macro %s found" % name, 1)
145 macros_to_expand.add(name)
146 except ParseError, e:
147 raise ApplicationException("The macros file " + macros_file_name + " could not be parsed. Cause: " +
148 str(e))
149 except IOError, e:
150 raise ApplicationException("The macros file " + macros_file_name + " could not be opened. Cause: " +
151 str(e))
152
153 # we depend on "stdio", "requirements" and "advanced_options" to exist on all the given macros files
154 missing_needed_macros = []
155 for required_macro in REQUIRED_MACROS:
156 if required_macro not in macros_to_expand:
157 missing_needed_macros.append(required_macro)
158
159 if missing_needed_macros:
160 raise ApplicationException(
161 "The following required macro(s) were not found in any of the given macros files: %s, "
162 "see galaxy/macros.xml for an example of a valid macros file."
163 % ", ".join(missing_needed_macros))
164
165 # we do not need to "expand" the advanced_options macro
166 macros_to_expand.remove(ADVANCED_OPTIONS_MACRO_NAME)
167 return macros_to_expand
168
169
170 def parse_file_formats(formats_file):
171 supported_formats = {}
172 if formats_file is not None:
173 line_number = 0
174 with open(formats_file) as f:
175 for line in f:
176 line_number += 1
177 if line is None or not line.strip() or line.strip().startswith("#"):
178 # ignore (it'd be weird to have something like:
179 # if line is not None and not (not line.strip()) ...
180 pass
181 else:
182 # not an empty line, no comment
183 # strip the line and split by whitespace
184 parsed_formats = line.strip().split()
185 # valid lines contain either one or four columns
186 if not (len(parsed_formats) == 1 or len(parsed_formats) == 3 or len(parsed_formats) == 4):
187 logger.warning(
188 "Invalid line at line number %d of the given formats file. Line will be ignored:\n%s" %
189 (line_number, line), 0)
190 # ignore the line
191 continue
192 elif len(parsed_formats) == 1:
193 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[0])
194 else:
195 mimetype = None
196 # check if mimetype was provided
197 if len(parsed_formats) == 4:
198 mimetype = parsed_formats[3]
199 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[1],
200 parsed_formats[2], mimetype)
201 return supported_formats
202
203
204 def validate_and_prepare_args(args):
205 # check that only one of skip_tools_file and required_tools_file has been provided
206 if args.skip_tools_file is not None and args.required_tools_file is not None:
207 raise ApplicationException(
208 "You have provided both a file with tools to ignore and a file with required tools.\n"
209 "Only one of -s/--skip-tools, -r/--required-tools can be provided.")
210
211 # flatten macros_files to make sure that we have a list containing file names and not a list of lists
212 utils.flatten_list_of_lists(args, "macros_files")
213
214 # check that the arguments point to a valid, existing path
215 input_variables_to_check = ["skip_tools_file", "required_tools_file", "macros_files", "formats_file"]
216 for variable_name in input_variables_to_check:
217 utils.validate_argument_is_valid_path(args, variable_name)
218
219 # check that the provided output files, if provided, contain a valid file path (i.e., not a folder)
220 output_variables_to_check = ["data_types_destination", "tool_conf_destination"]
221 for variable_name in output_variables_to_check:
222 file_name = getattr(args, variable_name)
223 if file_name is not None and os.path.isdir(file_name):
224 raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
225
226 if not args.macros_files:
227 # list is empty, provide the default value
228 logger.warning("Using default macros from galaxy/macros.xml", 0)
229 args.macros_files = ["galaxy/macros.xml"]
230
231
232 def get_preferred_file_extension():
233 return "xml"
234
235
236 def _convert_internal(parsed_ctds, **kwargs):
237 # parse all input files into models using CTDopts (via utils)
238 # the output is a tuple containing the model, output destination, origin file
239 for parsed_ctd in parsed_ctds:
240 model = parsed_ctd.ctd_model
241 origin_file = parsed_ctd.input_file
242 output_file = parsed_ctd.suggested_output_file
243
244 if kwargs["skip_tools"] is not None and model.name in kwargs["skip_tools"]:
245 logger.info("Skipping tool %s" % model.name, 0)
246 continue
247 elif kwargs["required_tools"] is not None and model.name not in kwargs["required_tools"]:
248 logger.info("Tool %s is not required, skipping it" % model.name, 0)
249 continue
250 else:
251 logger.info("Converting from %s " % origin_file, 0)
252 tool = create_tool(model)
253 write_header(tool, model)
254 create_description(tool, model)
255 expand_macros(tool, model, **kwargs)
256 create_command(tool, model, **kwargs)
257 create_inputs(tool, model, **kwargs)
258 create_outputs(tool, model, **kwargs)
259 create_help(tool, model)
260
261 # wrap our tool element into a tree to be able to serialize it
262 tree = ElementTree(tool)
263 tree.write(open(output_file, 'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
264
265
266 def write_header(tool, model):
267 tool.addprevious(etree.Comment(
268 "This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). "
269 "This file was automatically generated using CTDConverter."))
270 tool.addprevious(etree.Comment('Proposed Tool Section: [%s]' % model.opt_attribs.get("category", "")))
271
272
273 def generate_tool_conf(parsed_ctds, tool_conf_destination, galaxy_tool_path, default_category):
274 # for each category, we keep a list of models corresponding to it
275 categories_to_tools = dict()
276 for parsed_ctd in parsed_ctds:
277 category = strip(parsed_ctd.ctd_model.opt_attribs.get("category", ""))
278 if not category.strip():
279 category = default_category
280 if category not in categories_to_tools:
281 categories_to_tools[category] = []
282 categories_to_tools[category].append(utils.get_filename(parsed_ctd.suggested_output_file))
283
284 # at this point, we should have a map for all categories->tools
285 toolbox_node = Element("toolbox")
286
287 if galaxy_tool_path is not None and not galaxy_tool_path.strip().endswith("/"):
288 galaxy_tool_path = galaxy_tool_path.strip() + "/"
289 if galaxy_tool_path is None:
290 galaxy_tool_path = ""
291
292 for category, file_names in categories_to_tools.iteritems():
293 section_node = add_child_node(toolbox_node, "section")
294 section_node.attrib["id"] = "section-id-" + "".join(category.split())
295 section_node.attrib["name"] = category
296
297 for filename in file_names:
298 tool_node = add_child_node(section_node, "tool")
299 tool_node.attrib["file"] = galaxy_tool_path + filename
300
301 toolconf_tree = ElementTree(toolbox_node)
302 toolconf_tree.write(open(tool_conf_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
303 logger.info("Generated Galaxy tool_conf.xml in %s" % tool_conf_destination, 0)
304
305
306 def generate_data_type_conf(supported_file_formats, data_types_destination):
307 data_types_node = Element("datatypes")
308 registration_node = add_child_node(data_types_node, "registration")
309 registration_node.attrib["converters_path"] = "lib/galaxy/datatypes/converters"
310 registration_node.attrib["display_path"] = "display_applications"
311
312 for format_name in supported_file_formats:
313 data_type = supported_file_formats[format_name]
314 # add only if it's a data type that does not exist in Galaxy
315 if data_type.galaxy_type is not None:
316 data_type_node = add_child_node(registration_node, "datatype")
317 # we know galaxy_extension is not None
318 data_type_node.attrib["extension"] = data_type.galaxy_extension
319 data_type_node.attrib["type"] = data_type.galaxy_type
320 if data_type.mimetype is not None:
321 data_type_node.attrib["mimetype"] = data_type.mimetype
322
323 data_types_tree = ElementTree(data_types_node)
324 data_types_tree.write(open(data_types_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
325 logger.info("Generated Galaxy datatypes_conf.xml in %s" % data_types_destination, 0)
326
327
328 def create_tool(model):
329 return Element("tool", OrderedDict([("id", model.name), ("name", model.name), ("version", model.version)]))
330
331
332 def create_description(tool, model):
333 if "description" in model.opt_attribs.keys() and model.opt_attribs["description"] is not None:
334 description = SubElement(tool,"description")
335 description.text = model.opt_attribs["description"]
336
337
338 def get_param_cli_name(param, model):
339 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
340 if type(param.parent) == ParameterGroup:
341 if not hasattr(param.parent.parent, 'parent'):
342 return resolve_param_mapping(param, model)
343 elif not hasattr(param.parent.parent.parent, 'parent'):
344 return resolve_param_mapping(param, model)
345 else:
346 if model.cli:
347 logger.warning("Using nested parameter sections (NODE elements) is not compatible with <cli>", 1)
348 return get_param_name(param.parent) + ":" + resolve_param_mapping(param, model)
349 else:
350 return resolve_param_mapping(param, model)
351
352
353 def get_param_name(param):
354 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
355 if type(param.parent) == ParameterGroup:
356 if not hasattr(param.parent.parent, 'parent'):
357 return param.name
358 elif not hasattr(param.parent.parent.parent, 'parent'):
359 return param.name
360 else:
361 return get_param_name(param.parent) + ":" + param.name
362 else:
363 return param.name
364
365
366 # some parameters are mapped to command line options, this method helps resolve those mappings, if any
367 def resolve_param_mapping(param, model):
368 # go through all mappings and find if the given param appears as a reference name in a mapping element
369 param_mapping = None
370 for cli_element in model.cli:
371 for mapping_element in cli_element.mappings:
372 if mapping_element.reference_name == param.name:
373 if param_mapping is not None:
374 logger.warning("The parameter %s has more than one mapping in the <cli> section. "
375 "The first found mapping, %s, will be used." % (param.name, param_mapping), 1)
376 else:
377 param_mapping = cli_element.option_identifier
378
379 return param_mapping if param_mapping is not None else param.name
380
381
382 def create_command(tool, model, **kwargs):
383 final_command = get_tool_executable_path(model, kwargs["default_executable_path"]) + '\n'
384 final_command += kwargs["add_to_command_line"] + '\n'
385 advanced_command_start = "#if $adv_opts.adv_opts_selector=='advanced':\n"
386 advanced_command_end = '#end if'
387 advanced_command = ''
388 parameter_hardcoder = kwargs["parameter_hardcoder"]
389
390 found_output_parameter = False
391 for param in extract_parameters(model):
392 if param.type is _OutFile:
393 found_output_parameter = True
394 command = ''
395 param_name = get_param_name(param)
396 param_cli_name = get_param_cli_name(param, model)
397 if param_name == param_cli_name:
398 # there was no mapping, so for the cli name we will use a '-' in the prefix
399 param_cli_name = '-' + param_name
400
401 if param.name in kwargs["blacklisted_parameters"]:
402 continue
403
404 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param_name, model.name)
405 if hardcoded_value:
406 command += '%s %s\n' % (param_cli_name, hardcoded_value)
407 else:
408 # parameter is neither blacklisted nor hardcoded...
409 galaxy_parameter_name = get_galaxy_parameter_name(param)
410 repeat_galaxy_parameter_name = get_repeat_galaxy_parameter_name(param)
411
412 # logic for ITEMLISTs
413 if param.is_list:
414 if param.type is _InFile:
415 command += param_cli_name + "\n"
416 command += " #for token in $" + galaxy_parameter_name + ":\n"
417 command += " $token\n"
418 command += " #end for\n"
419 else:
420 command += "\n#if $" + repeat_galaxy_parameter_name + ":\n"
421 command += param_cli_name + "\n"
422 command += " #for token in $" + repeat_galaxy_parameter_name + ":\n"
423 command += " #if \" \" in str(token):\n"
424 command += " \"$token." + galaxy_parameter_name + "\"\n"
425 command += " #else\n"
426 command += " $token." + galaxy_parameter_name + "\n"
427 command += " #end if\n"
428 command += " #end for\n"
429 command += "#end if\n"
430 # logic for other ITEMs
431 else:
432 if param.advanced and param.type is not _OutFile:
433 actual_parameter = "$adv_opts.%s" % galaxy_parameter_name
434 else:
435 actual_parameter = "$%s" % galaxy_parameter_name
436 # TODO only useful for text fields, integers or floats
437 # not useful for choices, input fields ...
438
439 if not is_boolean_parameter(param) and type(param.restrictions) is _Choices :
440 command += "#if " + actual_parameter + ":\n"
441 command += ' %s\n' % param_cli_name
442 command += " #if \" \" in str(" + actual_parameter + "):\n"
443 command += " \"" + actual_parameter + "\"\n"
444 command += " #else\n"
445 command += " " + actual_parameter + "\n"
446 command += " #end if\n"
447 command += "#end if\n"
448 elif is_boolean_parameter(param):
449 command += "#if " + actual_parameter + ":\n"
450 command += ' %s\n' % param_cli_name
451 command += "#end if\n"
452 elif TYPE_TO_GALAXY_TYPE[param.type] is 'text':
453 command += "#if " + actual_parameter + ":\n"
454 command += " %s " % param_cli_name
455 command += " \"" + actual_parameter + "\"\n"
456 command += "#end if\n"
457 else:
458 command += "#if " + actual_parameter + ":\n"
459 command += ' %s ' % param_cli_name
460 command += actual_parameter + "\n"
461 command += "#end if\n"
462
463 if param.advanced and param.type is not _OutFile:
464 advanced_command += " %s" % command
465 else:
466 final_command += command
467
468 if advanced_command:
469 final_command += "%s%s%s\n" % (advanced_command_start, advanced_command, advanced_command_end)
470
471 if not found_output_parameter:
472 final_command += "> $param_stdout\n"
473
474 command_node = add_child_node(tool, "command")
475 command_node.text = final_command
476
477
478 # creates the xml elements needed to import the needed macros files
479 # and to "expand" the macros
480 def expand_macros(tool, model, **kwargs):
481 macros_node = add_child_node(tool, "macros")
482 token_node = add_child_node(macros_node, "token")
483 token_node.attrib["name"] = "@EXECUTABLE@"
484 token_node.text = get_tool_executable_path(model, kwargs["default_executable_path"])
485
486 # add <import> nodes
487 for macro_file_name in kwargs["macros_file_names"]:
488 macro_file = open(macro_file_name)
489 import_node = add_child_node(macros_node, "import")
490 # do not add the path of the file, rather, just its basename
491 import_node.text = os.path.basename(macro_file.name)
492
493 # add <expand> nodes
494 for expand_macro in kwargs["macros_to_expand"]:
495 expand_node = add_child_node(tool, "expand")
496 expand_node.attrib["macro"] = expand_macro
497
498
499 def get_tool_executable_path(model, default_executable_path):
500 # rules to build the galaxy executable path:
501 # if executablePath is null, then use default_executable_path and store it in executablePath
502 # if executablePath is null and executableName is null, then the name of the tool will be used
503 # if executablePath is null and executableName is not null, then executableName will be used
504 # if executablePath is not null and executableName is null,
505 # then executablePath and the name of the tool will be used
506 # if executablePath is not null and executableName is not null, then both will be used
507
508 # first, check if the model has executablePath / executableName defined
509 executable_path = model.opt_attribs.get("executablePath", None)
510 executable_name = model.opt_attribs.get("executableName", None)
511
512 # check if we need to use the default_executable_path
513 if executable_path is None:
514 executable_path = default_executable_path
515
516 # fix the executablePath to make sure that there is a '/' in the end
517 if executable_path is not None:
518 executable_path = executable_path.strip()
519 if not executable_path.endswith('/'):
520 executable_path += '/'
521
522 # assume that we have all information present
523 command = str(executable_path) + str(executable_name)
524 if executable_path is None:
525 if executable_name is None:
526 command = model.name
527 else:
528 command = executable_name
529 else:
530 if executable_name is None:
531 command = executable_path + model.name
532 return command
533
534
535 def get_galaxy_parameter_name(param):
536 return "param_%s" % get_param_name(param).replace(':', '_').replace('-', '_')
537
538
539 def get_input_with_same_restrictions(out_param, model, supported_file_formats):
540 for param in extract_parameters(model):
541 if param.type is _InFile:
542 if param.restrictions is not None:
543 in_param_formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
544 out_param_formats = get_supported_file_types(out_param.restrictions.formats, supported_file_formats)
545 if in_param_formats == out_param_formats:
546 return param
547
548
549 def create_inputs(tool, model, **kwargs):
550 inputs_node = SubElement(tool, "inputs")
551
552 # some suites (such as OpenMS) need some advanced options when handling inputs
553 expand_advanced_node = add_child_node(tool, "expand", OrderedDict([("macro", ADVANCED_OPTIONS_MACRO_NAME)]))
554 parameter_hardcoder = kwargs["parameter_hardcoder"]
555
556 # treat all non output-file parameters as inputs
557 for param in extract_parameters(model):
558 # no need to show hardcoded parameters
559 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
560 if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
561 # let's not use an extra level of indentation and use NOP
562 continue
563 if param.type is not _OutFile:
564 if param.advanced:
565 if expand_advanced_node is not None:
566 parent_node = expand_advanced_node
567 else:
568 # something went wrong... we are handling an advanced parameter and the
569 # advanced input macro was not set... inform the user about it
570 logger.info("The parameter %s has been set as advanced, but advanced_input_macro has "
571 "not been set." % param.name, 1)
572 # there is not much we can do, other than use the inputs_node as a parent node!
573 parent_node = inputs_node
574 else:
575 parent_node = inputs_node
576
577 # for lists we need a repeat tag
578 if param.is_list and param.type is not _InFile:
579 rep_node = add_child_node(parent_node, "repeat")
580 create_repeat_attribute_list(rep_node, param)
581 parent_node = rep_node
582
583 param_node = add_child_node(parent_node, "param")
584 create_param_attribute_list(param_node, param, kwargs["supported_file_formats"])
585
586 # advanced parameter selection should be at the end
587 # and only available if an advanced parameter exists
588 if expand_advanced_node is not None and len(expand_advanced_node) > 0:
589 inputs_node.append(expand_advanced_node)
590
591
592 def get_repeat_galaxy_parameter_name(param):
593 return "rep_" + get_galaxy_parameter_name(param)
594
595
596 def create_repeat_attribute_list(rep_node, param):
597 rep_node.attrib["name"] = get_repeat_galaxy_parameter_name(param)
598 if param.required:
599 rep_node.attrib["min"] = "1"
600 else:
601 rep_node.attrib["min"] = "0"
602 # for the ITEMLISTs which have LISTITEM children we only
603 # need one parameter as it is given as a string
604 if param.default is not None:
605 rep_node.attrib["max"] = "1"
606 rep_node.attrib["title"] = get_galaxy_parameter_name(param)
607
608
609 def create_param_attribute_list(param_node, param, supported_file_formats):
610 param_node.attrib["name"] = get_galaxy_parameter_name(param)
611
612 param_type = TYPE_TO_GALAXY_TYPE[param.type]
613 if param_type is None:
614 raise ModelError("Unrecognized parameter type %(type)s for parameter %(name)s"
615 % {"type": param.type, "name": param.name})
616
617 if param.is_list:
618 param_type = "text"
619
620 if is_selection_parameter(param):
621 param_type = "select"
622 if len(param.restrictions.choices) < 5:
623 param_node.attrib["display"] = "radio"
624
625 if is_boolean_parameter(param):
626 param_type = "boolean"
627
628 if param.type is _InFile:
629 # assume it's just text unless restrictions are provided
630 param_format = "txt"
631 if param.restrictions is not None:
632 # join all formats of the file, take mapping from supported_file if available for an entry
633 if type(param.restrictions) is _FileFormat:
634 param_format = ','.join([get_supported_file_type(i, supported_file_formats) if
635 get_supported_file_type(i, supported_file_formats)
636 else i for i in param.restrictions.formats])
637 else:
638 raise InvalidModelException("Expected 'file type' restrictions for input file [%(name)s], "
639 "but instead got [%(type)s]"
640 % {"name": param.name, "type": type(param.restrictions)})
641
642 param_node.attrib["type"] = "data"
643 param_node.attrib["format"] = param_format
644 # in the case of multiple input set multiple flag
645 if param.is_list:
646 param_node.attrib["multiple"] = "true"
647
648 else:
649 param_node.attrib["type"] = param_type
650
651 # check for parameters with restricted values (which will correspond to a "select" in galaxy)
652 if param.restrictions is not None:
653 # it could be either _Choices or _NumericRange, with special case for boolean types
654 if param_type == "boolean":
655 create_boolean_parameter(param_node, param)
656 elif type(param.restrictions) is _Choices:
657 # create as many <option> elements as restriction values
658 for choice in param.restrictions.choices:
659 option_node = add_child_node(param_node, "option", OrderedDict([("value", str(choice))]))
660 option_node.text = str(choice)
661
662 # preselect the default value
663 if param.default == choice:
664 option_node.attrib["selected"] = "true"
665
666 elif type(param.restrictions) is _NumericRange:
667 if param.type is not int and param.type is not float:
668 raise InvalidModelException("Expected either 'int' or 'float' in the numeric range restriction for "
669 "parameter [%(name)s], but instead got [%(type)s]" %
670 {"name": param.name, "type": type(param.restrictions)})
671 # extract the min and max values and add them as attributes
672 # validate the provided min and max values
673 if param.restrictions.n_min is not None:
674 param_node.attrib["min"] = str(param.restrictions.n_min)
675 if param.restrictions.n_max is not None:
676 param_node.attrib["max"] = str(param.restrictions.n_max)
677 elif type(param.restrictions) is _FileFormat:
678 param_node.attrib["format"] = ','.join([get_supported_file_type(i, supported_file_formats) if
679 get_supported_file_type(i, supported_file_formats)
680 else i for i in param.restrictions.formats])
681 else:
682 raise InvalidModelException("Unrecognized restriction type [%(type)s] for parameter [%(name)s]"
683 % {"type": type(param.restrictions), "name": param.name})
684
685 if param_type == "select" and param.default in param.restrictions.choices:
686 param_node.attrib["optional"] = "False"
687 else:
688 param_node.attrib["optional"] = str(not param.required)
689
690 if param_type == "text":
691 # add size attribute... this is the length of a textbox field in Galaxy (it could also be 15x2, for instance)
692 param_node.attrib["size"] = "30"
693 # add sanitizer nodes, this is needed for special character like "["
694 # which are used for example by FeatureFinderMultiplex
695 sanitizer_node = SubElement(param_node, "sanitizer")
696
697 valid_node = SubElement(sanitizer_node, "valid", OrderedDict([("initial", "string.printable")]))
698 add_child_node(valid_node, "remove", OrderedDict([("value", '\'')]))
699 add_child_node(valid_node, "remove", OrderedDict([("value", '"')]))
700
701 # check for default value
702 if param.default is not None and param.default is not _Null:
703 if type(param.default) is list:
704 # we ASSUME that a list of parameters looks like:
705 # $ tool -ignore He Ar Xe
706 # meaning, that, for example, Helium, Argon and Xenon will be ignored
707 param_node.attrib["value"] = ' '.join(map(str, param.default))
708
709 elif param_type != "boolean":
710 param_node.attrib["value"] = str(param.default)
711
712 else:
713 # simple boolean with a default
714 if param.default is True:
715 param_node.attrib["checked"] = "true"
716 else:
717 if param.type is int or param.type is float:
718 # galaxy requires "value" to be included for int/float
719 # since no default was included, we need to figure out one in a clever way... but let the user know
720 # that we are "thinking" for him/her
721 logger.warning("Generating default value for parameter [%s]. "
722 "Galaxy requires the attribute 'value' to be set for integer/floats. "
723 "Edit the CTD file and provide a suitable default value." % param.name, 1)
724 # check if there's a min/max and try to use them
725 default_value = None
726 if param.restrictions is not None:
727 if type(param.restrictions) is _NumericRange:
728 default_value = param.restrictions.n_min
729 if default_value is None:
730 default_value = param.restrictions.n_max
731 if default_value is None:
732 # no min/max provided... just use 0 and see what happens
733 default_value = 0
734 else:
735 # should never be here, since we have validated this anyway...
736 # this code is here just for documentation purposes
737 # however, better safe than sorry!
738 # (it could be that the code changes and then we have an ugly scenario)
739 raise InvalidModelException("Expected either a numeric range for parameter [%(name)s], "
740 "but instead got [%(type)s]"
741 % {"name": param.name, "type": type(param.restrictions)})
742 else:
743 # no restrictions and no default value provided...
744 # make up something
745 default_value = 0
746 param_node.attrib["value"] = str(default_value)
747
748 label = "%s parameter" % param.name
749 help_text = ""
750
751 if param.description is not None:
752 label, help_text = generate_label_and_help(param.description)
753
754 param_node.attrib["label"] = label
755 param_node.attrib["help"] = "(-%s)" % param.name + " " + help_text
756
757
758 def generate_label_and_help(desc):
759 help_text = ""
760 # This tag is found in some descriptions
761 if not isinstance(desc, basestring):
762 desc = str(desc)
763 desc = desc.encode("utf8").replace("#br#", " <br>")
764 # Get rid of dots in the end
765 if desc.endswith("."):
766 desc = desc.rstrip(".")
767 # Check if first word is a normal word and make it uppercase
768 if str(desc).find(" ") > -1:
769 first_word, rest = str(desc).split(" ", 1)
770 if str(first_word).islower():
771 # check if label has a quotient of the form a/b
772 if first_word.find("/") != 1 :
773 first_word.capitalize()
774 desc = first_word + " " + rest
775 label = desc.decode("utf8")
776
777 # Try to split the label if it is too long
778 if len(desc) > 50:
779 # find an example and put everything before in the label and the e.g. in the help
780 if desc.find("e.g.") > 1 :
781 label, help_text = desc.split("e.g.",1)
782 help_text = "e.g." + help_text
783 else:
784 # find the end of the first sentence
785 # look for ". " because some labels contain .file or something similar
786 delimiter = ""
787 if desc.find(". ") > 1 and desc.find("? ") > 1:
788 if desc.find(". ") < desc.find("? "):
789 delimiter = ". "
790 else:
791 delimiter = "? "
792 elif desc.find(". ") > 1:
793 delimiter = ". "
794 elif desc.find("? ") > 1:
795 delimiter = "? "
796 if delimiter != "":
797 label, help_text = desc.split(delimiter, 1)
798
799 # add the question mark back
800 if delimiter == "? ":
801 label += "? "
802
803 # remove all linebreaks
804 label = label.rstrip().rstrip('<br>').rstrip()
805 return label, help_text
806
807
808 # determines if the given choices are boolean (basically, if the possible values are yes/no, true/false)
809 def is_boolean_parameter(param):
810 # detect boolean selects of OpenMS
811 if is_selection_parameter(param):
812 if len(param.restrictions.choices) == 2:
813 # check that default value is false to make sure it is an actual flag
814 if "false" in param.restrictions.choices and \
815 "true" in param.restrictions.choices and \
816 param.default == "false":
817 return True
818 else:
819 return param.type is bool
820
821
822 # determines if there are choices for the parameter
823 def is_selection_parameter(param):
824 return type(param.restrictions) is _Choices
825
826
827 def get_lowercase_list(some_list):
828 lowercase_list = map(str, some_list)
829 lowercase_list = map(string.lower, lowercase_list)
830 lowercase_list = map(strip, lowercase_list)
831 return lowercase_list
832
833
834 # creates a galaxy boolean parameter type
835 # this method assumes that param has restrictions, and that only two restictions are present
836 # (either yes/no or true/false)
837 def create_boolean_parameter(param_node, param):
838 # first, determine the 'truevalue' and the 'falsevalue'
839 """TODO: true and false values can be way more than 'true' and 'false'
840 but for that we need CTD support
841 """
842 # by default, 'true' and 'false' are handled as flags, like the verbose flag (i.e., -v)
843 true_value = "-%s" % get_param_name(param)
844 false_value = ""
845 choices = get_lowercase_list(param.restrictions.choices)
846 if "yes" in choices:
847 true_value = "yes"
848 false_value = "no"
849 param_node.attrib["truevalue"] = true_value
850 param_node.attrib["falsevalue"] = false_value
851
852 # set the checked attribute
853 if param.default is not None:
854 checked_value = "false"
855 default = strip(string.lower(param.default))
856 if default == "yes" or default == "true":
857 checked_value = "true"
858 param_node.attrib["checked"] = checked_value
859
860
861 def create_outputs(parent, model, **kwargs):
862 outputs_node = add_child_node(parent, "outputs")
863 parameter_hardcoder = kwargs["parameter_hardcoder"]
864
865 for param in extract_parameters(model):
866
867 # no need to show hardcoded parameters
868 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
869 if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
870 # let's not use an extra level of indentation and use NOP
871 continue
872 if param.type is _OutFile:
873 create_output_node(outputs_node, param, model, kwargs["supported_file_formats"])
874
875 # If there are no outputs defined in the ctd the node will have no children
876 # and the stdout will be used as output
877 if len(outputs_node) == 0:
878 add_child_node(outputs_node, "data",
879 OrderedDict([("name", "param_stdout"), ("format", "txt"), ("label", "Output from stdout")]))
880
881
882 def create_output_node(parent, param, model, supported_file_formats):
883 data_node = add_child_node(parent, "data")
884 data_node.attrib["name"] = get_galaxy_parameter_name(param)
885
886 data_format = "data"
887 if param.restrictions is not None:
888 if type(param.restrictions) is _FileFormat:
889 # set the first data output node to the first file format
890
891 # check if there are formats that have not been registered yet...
892 output = list()
893 for format_name in param.restrictions.formats:
894 if not format_name in supported_file_formats.keys():
895 output.append(str(format_name))
896
897 # warn only if there's about to complain
898 if output:
899 logger.warning("Parameter " + param.name + " has the following unsupported format(s):"
900 + ','.join(output), 1)
901 data_format = ','.join(output)
902
903 formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
904 try:
905 data_format = formats.pop()
906 except KeyError:
907 # there is not much we can do, other than catching the exception
908 pass
909 # if there are more than one output file formats try to take the format from the input parameter
910 if formats:
911 corresponding_input = get_input_with_same_restrictions(param, model, supported_file_formats)
912 if corresponding_input is not None:
913 data_format = "input"
914 data_node.attrib["metadata_source"] = get_galaxy_parameter_name(corresponding_input)
915 else:
916 raise InvalidModelException("Unrecognized restriction type [%(type)s] "
917 "for output [%(name)s]" % {"type": type(param.restrictions),
918 "name": param.name})
919 data_node.attrib["format"] = data_format
920
921 # TODO: find a smarter label ?
922 return data_node
923
924
925 # Get the supported file format for one given format
926 def get_supported_file_type(format_name, supported_file_formats):
927 if format_name in supported_file_formats.keys():
928 return supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
929 else:
930 return None
931
932
933 def get_supported_file_types(formats, supported_file_formats):
934 return set([supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
935 for format_name in formats if format_name in supported_file_formats.keys()])
936
937
938 def create_change_format_node(parent, data_formats, input_ref):
939 # <change_format>
940 # <when input="secondary_structure" value="true" format="txt"/>
941 # </change_format>
942 change_format_node = add_child_node(parent, "change_format")
943 for data_format in data_formats:
944 add_child_node(change_format_node, "when",
945 OrderedDict([("input", input_ref), ("value", data_format), ("format", data_format)]))
946
947
948 # Shows basic information about the file, such as data ranges and file type.
949 def create_help(tool, model):
950 manual = ''
951 doc_url = None
952 if 'manual' in model.opt_attribs.keys():
953 manual += '%s\n\n' % model.opt_attribs["manual"]
954 if 'docurl' in model.opt_attribs.keys():
955 doc_url = model.opt_attribs["docurl"]
956
957 help_text = "No help available"
958 if manual is not None:
959 help_text = manual
960 if doc_url is not None:
961 help_text = ("" if manual is None else manual) + "\nFor more information, visit %s" % doc_url
962 help_node = add_child_node(tool, "help")
963 # TODO: do we need CDATA Section here?
964 help_node.text = help_text
965
966
967 # since a model might contain several ParameterGroup elements,
968 # we want to simply 'flatten' the parameters to generate the Galaxy wrapper
969 def extract_parameters(model):
970 parameters = []
971 if len(model.parameters.parameters) > 0:
972 # use this to put parameters that are to be processed
973 # we know that CTDModel has one parent ParameterGroup
974 pending = [model.parameters]
975 while len(pending) > 0:
976 # take one element from 'pending'
977 parameter = pending.pop()
978 if type(parameter) is not ParameterGroup:
979 parameters.append(parameter)
980 else:
981 # append the first-level children of this ParameterGroup
982 pending.extend(parameter.parameters.values())
983 # returned the reversed list of parameters (as it is now,
984 # we have the last parameter in the CTD as first in the list)
985 return reversed(parameters)
986
987
988 # adds and returns a child node using the given name to the given parent node
989 def add_child_node(parent_node, child_node_name, attributes=OrderedDict([])):
990 child_node = SubElement(parent_node, child_node_name, attributes)
991 return child_node
+0
-2
galaxy/dist/conda/bld.bat less more
0 "%PYTHON%" setup.py install
1 if errorlevel 1 exit 1
+0
-1
galaxy/dist/conda/build.sh less more
0 $PYTHON setup.py install
+0
-28
galaxy/dist/conda/meta.yaml less more
0 package:
1 name: ctd2galaxy
2 version: "1.0"
3
4 source:
5 git_rev: v1.0
6 git_url: https://github.com/WorkflowConversion/CTD2Galaxy.git
7
8 build:
9 noarch_python: True
10
11 requirements:
12 build:
13 - python
14 - setuptools
15
16 run:
17 - python
18 - lxml
19 - ctdopts 1.0
20
21 test:
22 imports:
23 - CTDopts.CTDopts
24
25 about:
26 home: https://github.com/WorkflowConversion/CTD2Galaxy
27 license_file: LICENSE
+0
-1389
galaxy/generator.py less more
0 #!/usr/bin/env python
1 # encoding: utf-8
2
3 """
4 @author: delagarza
5 """
6
7
8 import sys
9 import os
10 import traceback
11 import ntpath
12 import string
13
14 from argparse import ArgumentParser
15 from argparse import RawDescriptionHelpFormatter
16 from collections import OrderedDict
17 from string import strip
18 from lxml import etree
19 from lxml.etree import SubElement, Element, ElementTree, ParseError, parse
20
21 from CTDopts.CTDopts import CTDModel, _InFile, _OutFile, ParameterGroup, _Choices, _NumericRange, _FileFormat, \
22 ModelError, _Null
23
24 __all__ = []
25 __version__ = 1.0
26 __date__ = '2014-09-17'
27 __updated__ = '2016-05-09'
28
29 MESSAGE_INDENTATION_INCREMENT = 2
30
31 TYPE_TO_GALAXY_TYPE = {int: 'integer', float: 'float', str: 'text', bool: 'boolean', _InFile: 'data',
32 _OutFile: 'data', _Choices: 'select'}
33
34 STDIO_MACRO_NAME = "stdio"
35 REQUIREMENTS_MACRO_NAME = "requirements"
36 ADVANCED_OPTIONS_MACRO_NAME = "advanced_options"
37
38 REQUIRED_MACROS = [STDIO_MACRO_NAME, REQUIREMENTS_MACRO_NAME, ADVANCED_OPTIONS_MACRO_NAME]
39
40
41 class CLIError(Exception):
42 # Generic exception to raise and log different fatal errors.
43 def __init__(self, msg):
44 super(CLIError).__init__(type(self))
45 self.msg = "E: %s" % msg
46
47 def __str__(self):
48 return self.msg
49
50 def __unicode__(self):
51 return self.msg
52
53
54 class InvalidModelException(ModelError):
55 def __init__(self, message):
56 super(InvalidModelException, self).__init__()
57 self.message = message
58
59 def __str__(self):
60 return self.message
61
62 def __repr__(self):
63 return self.message
64
65
66 class ApplicationException(Exception):
67 def __init__(self, msg):
68 super(ApplicationException).__init__(type(self))
69 self.msg = msg
70
71 def __str__(self):
72 return self.msg
73
74 def __unicode__(self):
75 return self.msg
76
77
78 class ExitCode:
79 def __init__(self, code_range="", level="", description=None):
80 self.range = code_range
81 self.level = level
82 self.description = description
83
84
85 class DataType:
86 def __init__(self, extension, galaxy_extension=None, galaxy_type=None, mimetype=None):
87 self.extension = extension
88 self.galaxy_extension = galaxy_extension
89 self.galaxy_type = galaxy_type
90 self.mimetype = mimetype
91
92
93 class ParameterHardcoder:
94 def __init__(self):
95 # map whose keys are the composite names of tools and parameters in the following pattern:
96 # [ToolName][separator][ParameterName] -> HardcodedValue
97 # if the parameter applies to all tools, then the following pattern is used:
98 # [ParameterName] -> HardcodedValue
99
100 # examples (assuming separator is '#'):
101 # threads -> 24
102 # XtandemAdapter#adapter -> xtandem.exe
103 # adapter -> adapter.exe
104 self.separator = "!"
105 self.parameter_map = {}
106
107 # the most specific value will be returned in case of overlap
108 def get_hardcoded_value(self, parameter_name, tool_name):
109 # look for the value that would apply for all tools
110 generic_value = self.parameter_map.get(parameter_name, None)
111 specific_value = self.parameter_map.get(self.build_key(parameter_name, tool_name), None)
112 if specific_value is not None:
113 return specific_value
114
115 return generic_value
116
117 def register_parameter(self, parameter_name, parameter_value, tool_name=None):
118 self.parameter_map[self.build_key(parameter_name, tool_name)] = parameter_value
119
120 def build_key(self, parameter_name, tool_name):
121 if tool_name is None:
122 return parameter_name
123 return "%s%s%s" % (parameter_name, self.separator, tool_name)
124
125
126 def main(argv=None): # IGNORE:C0111
127 # Command line options.
128 if argv is None:
129 argv = sys.argv
130 else:
131 sys.argv.extend(argv)
132
133 program_version = "v%s" % __version__
134 program_build_date = str(__updated__)
135 program_version_message = '%%(prog)s %s (%s)' % (program_version, program_build_date)
136 program_short_description = "CTD2Galaxy - A project from the GenericWorkflowNodes family " \
137 "(https://github.com/orgs/genericworkflownodes)"
138 program_usage = '''
139 USAGE:
140
141 I - Parsing a single CTD file and generate a Galaxy wrapper:
142
143 $ python generator.py -i input.ctd -o output.xml
144
145
146 II - Parsing all found CTD files (files with .ctd and .xml extension) in a given folder and
147 output converted Galaxy wrappers in a given folder:
148
149 $ python generator.py -i /home/user/*.ctd -o /home/user/galaxywrappers
150
151
152 III - Providing file formats, mimetypes
153
154 Galaxy supports the concept of file format in order to connect compatible ports, that is, input ports of a certain
155 data format will be able to receive data from a port from the same format. This converter allows you to provide
156 a personalized file in which you can relate the CTD data formats with supported Galaxy data formats. The layout of
157 this file consists of lines, each of either one or four columns separated by any amount of whitespace. The content
158 of each column is as follows:
159
160 * 1st column: file extension
161 * 2nd column: data type, as listed in Galaxy
162 * 3rd column: full-named Galaxy data type, as it will appear on datatypes_conf.xml
163 * 4th column: mimetype (optional)
164
165 The following is an example of a valid "file formats" file:
166
167 ########################################## FILE FORMATS example ##########################################
168 # Every line starting with a # will be handled as a comment and will not be parsed.
169 # The first column is the file format as given in the CTD and second column is the Galaxy data format.
170 # The second, third, fourth and fifth column can be left empty if the data type has already been registered
171 # in Galaxy, otherwise, all but the mimetype must be provided.
172
173 # CTD type # Galaxy type # Long Galaxy data type # Mimetype
174 csv tabular galaxy.datatypes.data:Text
175 fasta
176 ini txt galaxy.datatypes.data:Text
177 txt
178 idxml txt galaxy.datatypes.xml:GenericXml application/xml
179 options txt galaxy.datatypes.data:Text
180 grid grid galaxy.datatypes.data:Grid
181
182 ##########################################################################################################
183
184 Note that each line consists precisely of either one, three or four columns. In the case of data types already
185 registered in Galaxy (such as fasta and txt in the above example), only the first column is needed. In the case of
186 data types that haven't been yet registered in Galaxy, the first three columns are needed (mimetype is optional).
187
188 For information about Galaxy data types and subclasses, see the following page:
189 https://wiki.galaxyproject.org/Admin/Datatypes/Adding%20Datatypes
190
191
192 IV - Hardcoding parameters
193
194 It is possible to hardcode parameters. This makes sense if you want to set a tool in Galaxy in 'quiet' mode or if
195 your tools support multi-threading and accept the number of threads via a parameter, without giving the end user the
196 chance to change the values for these parameters.
197
198 In order to generate hardcoded parameters, you need to provide a simple file. Each line of this file contains two
199 or three columns separated by whitespace. Any line starting with a '#' will be ignored. The first column contains
200 the name of the parameter, the second column contains the value that will always be set for this parameter. The
201 first two columns are mandatory.
202
203 If the parameter is to be hardcoded only for a set of tools, then a third column can be added. This column includes
204 a comma-separated list of tool names for which the parameter will be hardcoded. If a third column is not included,
205 then all processed tools containing the given parameter will get a hardcoded value for it.
206
207 The following is an example of a valid file:
208
209 ##################################### HARDCODED PARAMETERS example #####################################
210 # Every line starting with a # will be handled as a comment and will not be parsed.
211 # The first column is the name of the parameter and the second column is the value that will be used.
212
213 # Parameter name # Value # Tool(s)
214 threads \${GALAXY_SLOTS:-24}
215 mode quiet
216 xtandem_executable xtandem XTandemAdapter
217 verbosity high Foo, Bar
218
219 #########################################################################################################
220
221 Using the above file will produce a <command> similar to:
222
223 [tool_name] ... -threads \${GALAXY_SLOTS:-24} -mode quiet ...
224
225 For all tools. For XTandemAdapter, the <command> will be similar to:
226
227 XtandemAdapter ... -threads \${GALAXY_SLOTS:-24} -mode quiet -xtandem_executable xtandem ...
228
229 And for tools Foo and Bar, the <command> will be similar to:
230
231 Foo ... ... -threads \${GALAXY_SLOTS:-24} -mode quiet -verbosity high ...
232
233
234 V - Control which tools will be converted
235
236 Sometimes only a subset of CTDs needs to be converted. It is possible to either explicitly specify which tools will
237 be converted or which tools will not be converted.
238
239 The value of the -s/--skip-tools parameter is a file in which each line will be interpreted as the name of a tool
240 that will not be converted. Conversely, the value of the -r/--required-tools is a file in which each line will be
241 interpreted as a tool that is required. Only one of these parameters can be specified at a given time.
242
243 The format of both files is exactly the same. As stated before, each line will be interpreted as the name of a tool;
244 any line starting with a '#' will be ignored.
245
246 '''
247 program_license = '''%(short_description)s
248 Copyright 2015, Luis de la Garza
249
250 Licensed under the Apache License, Version 2.0 (the "License");
251 you may not use this file except in compliance with the License.
252 You may obtain a copy of the License at
253
254 http://www.apache.org/licenses/LICENSE-2.0
255
256 Unless required by applicable law or agreed to in writing, software
257 distributed under the License is distributed on an "AS IS" BASIS,
258 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
259 See the License for the specific language governing permissions and
260 limitations under the License.
261
262 %(usage)s
263 ''' % {'short_description': program_short_description, 'usage': program_usage}
264
265 try:
266 # Setup argument parser
267 parser = ArgumentParser(prog="CTD2Galaxy", description=program_license,
268 formatter_class=RawDescriptionHelpFormatter, add_help=True)
269 parser.add_argument("-i", "--input", dest="input_files", default=[], required=True, nargs="+", action="append",
270 help="List of CTD files to convert.")
271 parser.add_argument("-o", "--output-destination", dest="output_destination", required=True,
272 help="If multiple input files are given, then a folder in which all generated "
273 "XMLs will be generated is expected;"
274 "if a single input file is given, then a destination file is expected.")
275 parser.add_argument("-f", "--formats-file", dest="formats_file",
276 help="File containing the supported file formats. Run with '-h' or '--help' to see a "
277 "brief example on the layout of this file.", default=None, required=False)
278 parser.add_argument("-a", "--add-to-command-line", dest="add_to_command_line",
279 help="Adds content to the command line", default="", required=False)
280 parser.add_argument("-d", "--datatypes-destination", dest="data_types_destination",
281 help="Specify the location of a datatypes_conf.xml to modify and add the registered "
282 "data types. If the provided destination does not exist, a new file will be created.",
283 default=None, required=False)
284 parser.add_argument("-x", "--default-executable-path", dest="default_executable_path",
285 help="Use this executable path when <executablePath> is not present in the CTD",
286 default=None, required=False)
287 parser.add_argument("-b", "--blacklist-parameters", dest="blacklisted_parameters", default=[], nargs="+", action="append",
288 help="List of parameters that will be ignored and won't appear on the galaxy stub",
289 required=False)
290 parser.add_argument("-c", "--default-category", dest="default_category", default="DEFAULT", required=False,
291 help="Default category to use for tools lacking a category when generating tool_conf.xml")
292 parser.add_argument("-t", "--tool-conf-destination", dest="tool_conf_destination", default=None, required=False,
293 help="Specify the location of an existing tool_conf.xml that will be modified to include "
294 "the converted tools. If the provided destination does not exist, a new file will"
295 "be created.")
296 parser.add_argument("-g", "--galaxy-tool-path", dest="galaxy_tool_path", default=None, required=False,
297 help="The path that will be prepended to the file names when generating tool_conf.xml")
298 parser.add_argument("-r", "--required-tools", dest="required_tools_file", default=None, required=False,
299 help="Each line of the file will be interpreted as a tool name that needs translation. "
300 "Run with '-h' or '--help' to see a brief example on the format of this file.")
301 parser.add_argument("-s", "--skip-tools", dest="skip_tools_file", default=None, required=False,
302 help="File containing a list of tools for which a Galaxy stub will not be generated. "
303 "Run with '-h' or '--help' to see a brief example on the format of this file.")
304 parser.add_argument("-m", "--macros", dest="macros_files", default=[], nargs="*",
305 action="append", required=None, help="Import the additional given file(s) as macros. "
306 "The macros stdio, requirements and advanced_options are required. Please see "
307 "macros.xml for an example of a valid macros file. Al defined macros will be imported.")
308 parser.add_argument("-p", "--hardcoded-parameters", dest="hardcoded_parameters", default=None, required=False,
309 help="File containing hardcoded values for the given parameters. Run with '-h' or '--help' "
310 "to see a brief example on the format of this file.")
311 parser.add_argument("-v", "--validation-schema", dest="xsd_location", default=None, required=False,
312 help="Location of the schema to use to validate CTDs.")
313
314 # TODO: add verbosity, maybe?
315 parser.add_argument("-V", "--version", action='version', version=program_version_message)
316
317 # Process arguments
318 args = parser.parse_args()
319
320 # validate and prepare the passed arguments
321 validate_and_prepare_args(args)
322
323 # extract the names of the macros and check that we have found the ones we need
324 macros_to_expand = parse_macros_files(args.macros_files)
325
326 # parse the given supported file-formats file
327 supported_file_formats = parse_file_formats(args.formats_file)
328
329 # parse the hardcoded parameters file¬
330 parameter_hardcoder = parse_hardcoded_parameters(args.hardcoded_parameters)
331
332 # parse the skip/required tools files
333 skip_tools = parse_tools_list_file(args.skip_tools_file)
334 required_tools = parse_tools_list_file(args.required_tools_file)
335
336 #if verbose > 0:
337 # print("Verbose mode on")
338 parsed_models = convert(args.input_files,
339 args.output_destination,
340 supported_file_formats=supported_file_formats,
341 default_executable_path=args.default_executable_path,
342 add_to_command_line=args.add_to_command_line,
343 blacklisted_parameters=args.blacklisted_parameters,
344 required_tools=required_tools,
345 skip_tools=skip_tools,
346 macros_file_names=args.macros_files,
347 macros_to_expand=macros_to_expand,
348 parameter_hardcoder=parameter_hardcoder,
349 xsd_location=args.xsd_location)
350
351 #TODO: add some sort of warning if a macro that doesn't exist is to be expanded
352
353 # it is not needed to copy the macros files, since the user has provided them
354
355 # generation of galaxy stubs is ready... now, let's see if we need to generate a tool_conf.xml
356 if args.tool_conf_destination is not None:
357 generate_tool_conf(parsed_models, args.tool_conf_destination,
358 args.galaxy_tool_path, args.default_category)
359
360 # now datatypes_conf.xml
361 if args.data_types_destination is not None:
362 generate_data_type_conf(supported_file_formats, args.data_types_destination)
363
364 return 0
365
366 except KeyboardInterrupt:
367 # handle keyboard interrupt
368 return 0
369 except ApplicationException, e:
370 error("CTD2Galaxy could not complete the requested operation.", 0)
371 error("Reason: " + e.msg, 0)
372 return 1
373 except ModelError, e:
374 error("There seems to be a problem with one of your input CTDs.", 0)
375 error("Reason: " + e.msg, 0)
376 return 1
377 except Exception, e:
378 traceback.print_exc()
379 return 2
380
381
382 def parse_tools_list_file(tools_list_file):
383 tools_list = None
384 if tools_list_file is not None:
385 tools_list = []
386 with open(tools_list_file) as f:
387 for line in f:
388 if line is None or not line.strip() or line.strip().startswith("#"):
389 continue
390 else:
391 tools_list.append(line.strip())
392
393 return tools_list
394
395
396 def parse_macros_files(macros_file_names):
397 macros_to_expand = set()
398
399 for macros_file_name in macros_file_names:
400 try:
401 macros_file = open(macros_file_name)
402 info("Loading macros from %s" % macros_file_name, 0)
403 root = parse(macros_file).getroot()
404 for xml_element in root.findall("xml"):
405 name = xml_element.attrib["name"]
406 if name in macros_to_expand:
407 warning("Macro %s has already been found. Duplicate found in file %s." %
408 (name, macros_file_name), 0)
409 else:
410 info("Macro %s found" % name, 1)
411 macros_to_expand.add(name)
412 except ParseError, e:
413 raise ApplicationException("The macros file " + macros_file_name + " could not be parsed. Cause: " +
414 str(e))
415 except IOError, e:
416 raise ApplicationException("The macros file " + macros_file_name + " could not be opened. Cause: " +
417 str(e))
418
419 # we depend on "stdio", "requirements" and "advanced_options" to exist on all the given macros files
420 missing_needed_macros = []
421 for required_macro in REQUIRED_MACROS:
422 if required_macro not in macros_to_expand:
423 missing_needed_macros.append(required_macro)
424
425 if missing_needed_macros:
426 raise ApplicationException(
427 "The following required macro(s) were not found in any of the given macros files: %s, "
428 "see sample_files/macros.xml for an example of a valid macros file."
429 % ", ".join(missing_needed_macros))
430
431 # we do not need to "expand" the advanced_options macro
432 macros_to_expand.remove(ADVANCED_OPTIONS_MACRO_NAME)
433 return macros_to_expand
434
435
436 def parse_hardcoded_parameters(hardcoded_parameters_file):
437 parameter_hardcoder = ParameterHardcoder()
438 if hardcoded_parameters_file is not None:
439 line_number = 0
440 with open(hardcoded_parameters_file) as f:
441 for line in f:
442 line_number += 1
443 if line is None or not line.strip() or line.strip().startswith("#"):
444 pass
445 else:
446 # the third column must not be obtained as a whole, and not split
447 parsed_hardcoded_parameter = line.strip().split(None, 2)
448 # valid lines contain two or three columns
449 if len(parsed_hardcoded_parameter) != 2 and len(parsed_hardcoded_parameter) != 3:
450 warning("Invalid line at line number %d of the given hardcoded parameters file. Line will be"
451 "ignored:\n%s" % (line_number, line), 0)
452 continue
453
454 parameter_name = parsed_hardcoded_parameter[0]
455 hardcoded_value = parsed_hardcoded_parameter[1]
456 tool_names = None
457 if len(parsed_hardcoded_parameter) == 3:
458 tool_names = parsed_hardcoded_parameter[2].split(',')
459 if tool_names:
460 for tool_name in tool_names:
461 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value, tool_name.strip())
462 else:
463 parameter_hardcoder.register_parameter(parameter_name, hardcoded_value)
464
465 return parameter_hardcoder
466
467
468 def parse_file_formats(formats_file):
469 supported_formats = {}
470 if formats_file is not None:
471 line_number = 0
472 with open(formats_file) as f:
473 for line in f:
474 line_number += 1
475 if line is None or not line.strip() or line.strip().startswith("#"):
476 # ignore (it'd be weird to have something like:
477 # if line is not None and not (not line.strip()) ...
478 pass
479 else:
480 # not an empty line, no comment
481 # strip the line and split by whitespace
482 parsed_formats = line.strip().split()
483 # valid lines contain either one or four columns
484 if not (len(parsed_formats) == 1 or len(parsed_formats) == 3 or len(parsed_formats) == 4):
485 warning("Invalid line at line number %d of the given formats file. Line will be ignored:\n%s" %
486 (line_number, line), 0)
487 # ignore the line
488 continue
489 elif len(parsed_formats) == 1:
490 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[0])
491 else:
492 mimetype = None
493 # check if mimetype was provided
494 if len(parsed_formats) == 4:
495 mimetype = parsed_formats[3]
496 supported_formats[parsed_formats[0]] = DataType(parsed_formats[0], parsed_formats[1],
497 parsed_formats[2], mimetype)
498 return supported_formats
499
500
501 def validate_and_prepare_args(args):
502 # check that only one of skip_tools_file and required_tools_file has been provided
503 if args.skip_tools_file is not None and args.required_tools_file is not None:
504 raise ApplicationException(
505 "You have provided both a file with tools to ignore and a file with required tools.\n"
506 "Only one of -s/--skip-tools, -r/--required-tools can be provided.")
507
508 # first, we convert all list of lists in args to flat lists
509 lists_to_flatten = ["input_files", "blacklisted_parameters", "macros_files"]
510 for list_to_flatten in lists_to_flatten:
511 setattr(args, list_to_flatten, [item for sub_list in getattr(args, list_to_flatten) for item in sub_list])
512
513 # if input is a single file, we expect output to be a file (and not a dir that already exists)
514 if len(args.input_files) == 1:
515 if os.path.isdir(args.output_destination):
516 raise ApplicationException("If a single input file is provided, output (%s) is expected to be a file "
517 "and not a folder.\n" % args.output_destination)
518
519 # if input is a list of files, we expect output to be a folder
520 if len(args.input_files) > 1:
521 if not os.path.isdir(args.output_destination):
522 raise ApplicationException("If several input files are provided, output (%s) is expected to be an "
523 "existing directory.\n" % args.output_destination)
524
525 # check that the provided input files, if provided, contain a valid file path
526 input_variables_to_check = ["skip_tools_file", "required_tools_file", "macros_files", "xsd_location",
527 "input_files", "formats_file", "hardcoded_parameters"]
528
529 for variable_name in input_variables_to_check:
530 paths_to_check = []
531 # check if we are handling a single file or a list of files
532 member_value = getattr(args, variable_name)
533 if member_value is not None:
534 if isinstance(member_value, list):
535 for file_name in member_value:
536 paths_to_check.append(strip(str(file_name)))
537 else:
538 paths_to_check.append(strip(str(member_value)))
539
540 for path_to_check in paths_to_check:
541 if not os.path.isfile(path_to_check) or not os.path.exists(path_to_check):
542 raise ApplicationException(
543 "The provided input file (%s) does not exist or is not a valid file path."
544 % path_to_check)
545
546 # check that the provided output files, if provided, contain a valid file path (i.e., not a folder)
547 output_variables_to_check = ["data_types_destination", "tool_conf_destination"]
548
549 for variable_name in output_variables_to_check:
550 file_name = getattr(args, variable_name)
551 if file_name is not None and os.path.isdir(file_name):
552 raise ApplicationException("The provided output file name (%s) points to a directory." % file_name)
553
554 if not args.macros_files:
555 # list is empty, provide the default value
556 warning("Using default macros from macros.xml", 0)
557 args.macros_files = ["macros.xml"]
558
559
560 def convert(input_files, output_destination, **kwargs):
561 # first, generate a model
562 is_converting_multiple_ctds = len(input_files) > 1
563 parsed_models = []
564 schema = None
565 if kwargs["xsd_location"] is not None:
566 try:
567 info("Loading validation schema from %s" % kwargs["xsd_location"], 0)
568 schema = etree.XMLSchema(etree.parse(kwargs["xsd_location"]))
569 except Exception, e:
570 error("Could not load validation schema %s. Reason: %s" % (kwargs["xsd_location"], str(e)), 0)
571 else:
572 info("Validation against a schema has not been enabled.", 0)
573 for input_file in input_files:
574 try:
575 if schema is not None:
576 validate_against_schema(input_file, schema)
577 model = CTDModel(from_file=input_file)
578 except Exception, e:
579 error(str(e), 1)
580 continue
581
582 if kwargs["skip_tools"] is not None and model.name in kwargs["skip_tools"]:
583 info("Skipping tool %s" % model.name, 0)
584 continue
585 elif kwargs["required_tools"] is not None and model.name not in kwargs["required_tools"]:
586 info("Tool %s is not required, skipping it" % model.name, 0)
587 continue
588 else:
589 info("Converting from %s " % input_file, 0)
590 tool = create_tool(model)
591 write_header(tool, model)
592 create_description(tool, model)
593 expand_macros(tool, model, **kwargs)
594 create_command(tool, model, **kwargs)
595 create_inputs(tool, model, **kwargs)
596 create_outputs(tool, model, **kwargs)
597 create_help(tool, model)
598
599 # finally, serialize the tool
600 output_file = output_destination
601 # if multiple inputs are being converted,
602 # then we need to generate a different output_file for each input
603 if is_converting_multiple_ctds:
604 output_file = os.path.join(output_file, get_filename_without_suffix(input_file) + ".xml")
605 # wrap our tool element into a tree to be able to serialize it
606 tree = ElementTree(tool)
607 tree.write(open(output_file, 'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
608 # let's use model to hold the name of the output file
609 parsed_models.append([model, get_filename(output_file)])
610
611 return parsed_models
612
613
614 # validates a ctd file against the schema
615 def validate_against_schema(ctd_file, schema):
616 try:
617 parser = etree.XMLParser(schema=schema)
618 etree.parse(ctd_file, parser=parser)
619 except etree.XMLSyntaxError, e:
620 raise ApplicationException("Input ctd file %s is not valid. Reason: %s" % (ctd_file, str(e)))
621
622
623 def write_header(tool, model):
624 tool.addprevious(etree.Comment(
625 "This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). "
626 "This file was automatically generated using CTD2Galaxy."))
627 tool.addprevious(etree.Comment('Proposed Tool Section: [%s]' % model.opt_attribs.get("category", "")))
628
629
630 def generate_tool_conf(parsed_models, tool_conf_destination, galaxy_tool_path, default_category):
631 # for each category, we keep a list of models corresponding to it
632 categories_to_tools = dict()
633 for model in parsed_models:
634 category = strip(model[0].opt_attribs.get("category", ""))
635 if not category.strip():
636 category = default_category
637 if category not in categories_to_tools:
638 categories_to_tools[category] = []
639 categories_to_tools[category].append(model[1])
640
641 # at this point, we should have a map for all categories->tools
642 toolbox_node = Element("toolbox")
643
644 if galaxy_tool_path is not None and not galaxy_tool_path.strip().endswith("/"):
645 galaxy_tool_path = galaxy_tool_path.strip() + "/"
646 if galaxy_tool_path is None:
647 galaxy_tool_path = ""
648
649 for category, file_names in categories_to_tools.iteritems():
650 section_node = add_child_node(toolbox_node, "section")
651 section_node.attrib["id"] = "section-id-" + "".join(category.split())
652 section_node.attrib["name"] = category
653
654 for filename in file_names:
655 tool_node = add_child_node(section_node, "tool")
656 tool_node.attrib["file"] = galaxy_tool_path + filename
657
658 toolconf_tree = ElementTree(toolbox_node)
659 toolconf_tree.write(open(tool_conf_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
660 info("Generated Galaxy tool_conf.xml in %s" % tool_conf_destination, 0)
661
662
663 def generate_data_type_conf(supported_file_formats, data_types_destination):
664 data_types_node = Element("datatypes")
665 registration_node = add_child_node(data_types_node, "registration")
666 registration_node.attrib["converters_path"] = "lib/galaxy/datatypes/converters"
667 registration_node.attrib["display_path"] = "display_applications"
668
669 for format_name in supported_file_formats:
670 data_type = supported_file_formats[format_name]
671 # add only if it's a data type that does not exist in Galaxy
672 if data_type.galaxy_type is not None:
673 data_type_node = add_child_node(registration_node, "datatype")
674 # we know galaxy_extension is not None
675 data_type_node.attrib["extension"] = data_type.galaxy_extension
676 data_type_node.attrib["type"] = data_type.galaxy_type
677 if data_type.mimetype is not None:
678 data_type_node.attrib["mimetype"] = data_type.mimetype
679
680 data_types_tree = ElementTree(data_types_node)
681 data_types_tree.write(open(data_types_destination,'w'), encoding="UTF-8", xml_declaration=True, pretty_print=True)
682 info("Generated Galaxy datatypes_conf.xml in %s" % data_types_destination, 0)
683
684
685 # taken from
686 # http://stackoverflow.com/questions/8384737/python-extract-file-name-from-path-no-matter-what-the-os-path-format
687 def get_filename(path):
688 head, tail = ntpath.split(path)
689 return tail or ntpath.basename(head)
690
691
692 def get_filename_without_suffix(path):
693 root, ext = os.path.splitext(os.path.basename(path))
694 return root
695
696
697 def create_tool(model):
698 return Element("tool", OrderedDict([("id", model.name), ("name", model.name), ("version", model.version)]))
699
700
701 def create_description(tool, model):
702 if "description" in model.opt_attribs.keys() and model.opt_attribs["description"] is not None:
703 description = SubElement(tool,"description")
704 description.text = model.opt_attribs["description"]
705
706
707 def get_param_cli_name(param, model):
708 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
709 if type(param.parent) == ParameterGroup:
710 if not hasattr(param.parent.parent, 'parent'):
711 return resolve_param_mapping(param, model)
712 elif not hasattr(param.parent.parent.parent, 'parent'):
713 return resolve_param_mapping(param, model)
714 else:
715 if model.cli:
716 warning("Using nested parameter sections (NODE elements) is not compatible with <cli>", py1)
717 return get_param_name(param.parent) + ":" + resolve_param_mapping(param, model)
718 else:
719 return resolve_param_mapping(param, model)
720
721
722 def get_param_name(param):
723 # we generate parameters with colons for subgroups, but not for the two topmost parents (OpenMS legacy)
724 if type(param.parent) == ParameterGroup:
725 if not hasattr(param.parent.parent, 'parent'):
726 return param.name
727 elif not hasattr(param.parent.parent.parent, 'parent'):
728 return param.name
729 else:
730 return get_param_name(param.parent) + ":" + param.name
731 else:
732 return param.name
733
734
735 # some parameters are mapped to command line options, this method helps resolve those mappings, if any
736 def resolve_param_mapping(param, model):
737 # go through all mappings and find if the given param appears as a reference name in a mapping element
738 param_mapping = None
739 for cli_element in model.cli:
740 for mapping_element in cli_element.mappings:
741 if mapping_element.reference_name == param.name:
742 if param_mapping is not None:
743 warning("The parameter %s has more than one mapping in the <cli> section. "
744 "The first found mapping, %s, will be used." % (param.name, param_mapping), 1)
745 else:
746 param_mapping = cli_element.option_identifier
747
748 return param_mapping if param_mapping is not None else param.name
749
750 def create_command(tool, model, **kwargs):
751 final_command = get_tool_executable_path(model, kwargs["default_executable_path"]) + '\n'
752 final_command += kwargs["add_to_command_line"] + '\n'
753 advanced_command_start = "#if $adv_opts.adv_opts_selector=='advanced':\n"
754 advanced_command_end = '#end if'
755 advanced_command = ''
756 parameter_hardcoder = kwargs["parameter_hardcoder"]
757
758 found_output_parameter = False
759 for param in extract_parameters(model):
760 if param.type is _OutFile:
761 found_output_parameter = True
762 command = ''
763 param_name = get_param_name(param)
764 param_cli_name = get_param_cli_name(param, model)
765 if param_name == param_cli_name:
766 # there was no mapping, so for the cli name we will use a '-' in the prefix
767 param_cli_name = '-' + param_name
768
769 if param.name in kwargs["blacklisted_parameters"]:
770 continue
771
772 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param_name, model.name)
773 if hardcoded_value:
774 command += '%s %s\n' % (param_cli_name, hardcoded_value)
775 else:
776 # parameter is neither blacklisted nor hardcoded...
777 galaxy_parameter_name = get_galaxy_parameter_name(param)
778 repeat_galaxy_parameter_name = get_repeat_galaxy_parameter_name(param)
779
780 # logic for ITEMLISTs
781 if param.is_list:
782 if param.type is _InFile:
783 command += param_cli_name + "\n"
784 command += " #for token in $" + galaxy_parameter_name + ":\n"
785 command += " $token\n"
786 command += " #end for\n"
787 else:
788 command += "\n#if $" + repeat_galaxy_parameter_name + ":\n"
789 command += param_cli_name + "\n"
790 command += " #for token in $" + repeat_galaxy_parameter_name + ":\n"
791 command += " #if \" \" in str(token):\n"
792 command += " \"$token." + galaxy_parameter_name + "\"\n"
793 command += " #else\n"
794 command += " $token." + galaxy_parameter_name + "\n"
795 command += " #end if\n"
796 command += " #end for\n"
797 command += "#end if\n"
798 # logic for other ITEMs
799 else:
800 if param.advanced and param.type is not _OutFile:
801 actual_parameter = "$adv_opts.%s" % galaxy_parameter_name
802 else:
803 actual_parameter = "$%s" % galaxy_parameter_name
804 ## if whitespace_validation has been set, we need to generate, for each parameter:
805 ## #if str( $t ).split() != '':
806 ## -t "$t"
807 ## #end if
808 ## TODO only useful for text fields, integers or floats
809 ## not useful for choices, input fields ...
810
811 if not is_boolean_parameter(param) and type(param.restrictions) is _Choices :
812 command += "#if " + actual_parameter + ":\n"
813 command += ' %s\n' % param_cli_name
814 command += " #if \" \" in str(" + actual_parameter + "):\n"
815 command += " \"" + actual_parameter + "\"\n"
816 command += " #else\n"
817 command += " " + actual_parameter + "\n"
818 command += " #end if\n"
819 command += "#end if\n"
820 elif is_boolean_parameter(param):
821 command += "#if " + actual_parameter + ":\n"
822 command += ' %s\n' % param_cli_name
823 command += "#end if\n"
824 elif TYPE_TO_GALAXY_TYPE[param.type] is 'text':
825 command += "#if " + actual_parameter + ":\n"
826 command += " %s " % param_cli_name
827 command += " \"" + actual_parameter + "\"\n"
828 command += "#end if\n"
829 else:
830 command += "#if " + actual_parameter + ":\n"
831 command += ' %s ' % param_cli_name
832 command += actual_parameter + "\n"
833 command += "#end if\n"
834
835 if param.advanced and param.type is not _OutFile:
836 advanced_command += " %s" % command
837 else:
838 final_command += command
839
840 if advanced_command:
841 final_command += "%s%s%s\n" % (advanced_command_start, advanced_command, advanced_command_end)
842
843 if not found_output_parameter:
844 final_command += "> $param_stdout\n"
845
846 command_node = add_child_node(tool, "command")
847 command_node.text = final_command
848
849
850 # creates the xml elements needed to import the needed macros files
851 # and to "expand" the macros
852 def expand_macros(tool, model, **kwargs):
853 macros_node = add_child_node(tool, "macros")
854 token_node = add_child_node(macros_node, "token")
855 token_node.attrib["name"] = "@EXECUTABLE@"
856 token_node.text = get_tool_executable_path(model, kwargs["default_executable_path"])
857
858 # add <import> nodes
859 for macro_file_name in kwargs["macros_file_names"]:
860 macro_file = open(macro_file_name)
861 import_node = add_child_node(macros_node, "import")
862 # do not add the path of the file, rather, just its basename
863 import_node.text = os.path.basename(macro_file.name)
864
865 # add <expand> nodes
866 for expand_macro in kwargs["macros_to_expand"]:
867 expand_node = add_child_node(tool, "expand")
868 expand_node.attrib["macro"] = expand_macro
869
870
871 def get_tool_executable_path(model, default_executable_path):
872 # rules to build the galaxy executable path:
873 # if executablePath is null, then use default_executable_path and store it in executablePath
874 # if executablePath is null and executableName is null, then the name of the tool will be used
875 # if executablePath is null and executableName is not null, then executableName will be used
876 # if executablePath is not null and executableName is null,
877 # then executablePath and the name of the tool will be used
878 # if executablePath is not null and executableName is not null, then both will be used
879
880 # first, check if the model has executablePath / executableName defined
881 executable_path = model.opt_attribs.get("executablePath", None)
882 executable_name = model.opt_attribs.get("executableName", None)
883
884 # check if we need to use the default_executable_path
885 if executable_path is None:
886 executable_path = default_executable_path
887
888 # fix the executablePath to make sure that there is a '/' in the end
889 if executable_path is not None:
890 executable_path = executable_path.strip()
891 if not executable_path.endswith('/'):
892 executable_path += '/'
893
894 # assume that we have all information present
895 command = str(executable_path) + str(executable_name)
896 if executable_path is None:
897 if executable_name is None:
898 command = model.name
899 else:
900 command = executable_name
901 else:
902 if executable_name is None:
903 command = executable_path + model.name
904 return command
905
906
907 def get_galaxy_parameter_name(param):
908 return "param_%s" % get_param_name(param).replace(':', '_').replace('-', '_')
909
910
911 def get_input_with_same_restrictions(out_param, model, supported_file_formats):
912 for param in extract_parameters(model):
913 if param.type is _InFile:
914 if param.restrictions is not None:
915 in_param_formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
916 out_param_formats = get_supported_file_types(out_param.restrictions.formats, supported_file_formats)
917 if in_param_formats == out_param_formats:
918 return param
919
920
921 def create_inputs(tool, model, **kwargs):
922 inputs_node = SubElement(tool, "inputs")
923
924 # some suites (such as OpenMS) need some advanced options when handling inputs
925 expand_advanced_node = add_child_node(tool, "expand", OrderedDict([("macro", ADVANCED_OPTIONS_MACRO_NAME)]))
926 parameter_hardcoder = kwargs["parameter_hardcoder"]
927
928 # treat all non output-file parameters as inputs
929 for param in extract_parameters(model):
930 # no need to show hardcoded parameters
931 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
932 if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
933 # let's not use an extra level of indentation and use NOP
934 continue
935 if param.type is not _OutFile:
936 if param.advanced:
937 if expand_advanced_node is not None:
938 parent_node = expand_advanced_node
939 else:
940 # something went wrong... we are handling an advanced parameter and the
941 # advanced input macro was not set... inform the user about it
942 info("The parameter %s has been set as advanced, but advanced_input_macro has "
943 "not been set." % param.name, 1)
944 # there is not much we can do, other than use the inputs_node as a parent node!
945 parent_node = inputs_node
946 else:
947 parent_node = inputs_node
948
949 # for lists we need a repeat tag
950 if param.is_list and param.type is not _InFile:
951 rep_node = add_child_node(parent_node, "repeat")
952 create_repeat_attribute_list(rep_node, param)
953 parent_node = rep_node
954
955 param_node = add_child_node(parent_node, "param")
956 create_param_attribute_list(param_node, param, kwargs["supported_file_formats"])
957
958 # advanced parameter selection should be at the end
959 # and only available if an advanced parameter exists
960 if expand_advanced_node is not None and len(expand_advanced_node) > 0:
961 inputs_node.append(expand_advanced_node)
962
963
964 def get_repeat_galaxy_parameter_name(param):
965 return "rep_" + get_galaxy_parameter_name(param)
966
967
968 def create_repeat_attribute_list(rep_node, param):
969 rep_node.attrib["name"] = get_repeat_galaxy_parameter_name(param)
970 if param.required:
971 rep_node.attrib["min"] = "1"
972 else:
973 rep_node.attrib["min"] = "0"
974 # for the ITEMLISTs which have LISTITEM children we only
975 # need one parameter as it is given as a string
976 if param.default is not None:
977 rep_node.attrib["max"] = "1"
978 rep_node.attrib["title"] = get_galaxy_parameter_name(param)
979
980
981 def create_param_attribute_list(param_node, param, supported_file_formats):
982 param_node.attrib["name"] = get_galaxy_parameter_name(param)
983
984 param_type = TYPE_TO_GALAXY_TYPE[param.type]
985 if param_type is None:
986 raise ModelError("Unrecognized parameter type %(type)s for parameter %(name)s"
987 % {"type": param.type, "name": param.name})
988
989 if param.is_list:
990 param_type = "text"
991
992 if is_selection_parameter(param):
993 param_type = "select"
994 if len(param.restrictions.choices) < 5:
995 param_node.attrib["display"] = "radio"
996
997 if is_boolean_parameter(param):
998 param_type = "boolean"
999
1000 if param.type is _InFile:
1001 # assume it's just text unless restrictions are provided
1002 param_format = "txt"
1003 if param.restrictions is not None:
1004 # join all formats of the file, take mapping from supported_file if available for an entry
1005 if type(param.restrictions) is _FileFormat:
1006 param_format = ','.join([get_supported_file_type(i, supported_file_formats) if
1007 get_supported_file_type(i, supported_file_formats)
1008 else i for i in param.restrictions.formats])
1009 else:
1010 raise InvalidModelException("Expected 'file type' restrictions for input file [%(name)s], "
1011 "but instead got [%(type)s]"
1012 % {"name": param.name, "type": type(param.restrictions)})
1013
1014 param_node.attrib["type"] = "data"
1015 param_node.attrib["format"] = param_format
1016 # in the case of multiple input set multiple flag
1017 if param.is_list:
1018 param_node.attrib["multiple"] = "true"
1019
1020 else:
1021 param_node.attrib["type"] = param_type
1022
1023 # check for parameters with restricted values (which will correspond to a "select" in galaxy)
1024 if param.restrictions is not None:
1025 # it could be either _Choices or _NumericRange, with special case for boolean types
1026 if param_type == "boolean":
1027 create_boolean_parameter(param_node, param)
1028 elif type(param.restrictions) is _Choices:
1029 # create as many <option> elements as restriction values
1030 for choice in param.restrictions.choices:
1031 option_node = add_child_node(param_node, "option", OrderedDict([("value", str(choice))]))
1032 option_node.text = str(choice)
1033
1034 # preselect the default value
1035 if param.default == choice:
1036 option_node.attrib["selected"] = "true"
1037
1038 elif type(param.restrictions) is _NumericRange:
1039 if param.type is not int and param.type is not float:
1040 raise InvalidModelException("Expected either 'int' or 'float' in the numeric range restriction for "
1041 "parameter [%(name)s], but instead got [%(type)s]" %
1042 {"name": param.name, "type": type(param.restrictions)})
1043 # extract the min and max values and add them as attributes
1044 # validate the provided min and max values
1045 if param.restrictions.n_min is not None:
1046 param_node.attrib["min"] = str(param.restrictions.n_min)
1047 if param.restrictions.n_max is not None:
1048 param_node.attrib["max"] = str(param.restrictions.n_max)
1049 elif type(param.restrictions) is _FileFormat:
1050 param_node.attrib["format"] = ','.join([get_supported_file_type(i, supported_file_formats) if
1051 get_supported_file_type(i, supported_file_formats)
1052 else i for i in param.restrictions.formats])
1053 else:
1054 raise InvalidModelException("Unrecognized restriction type [%(type)s] for parameter [%(name)s]"
1055 % {"type": type(param.restrictions), "name": param.name})
1056
1057 if param_type == "select" and param.default in param.restrictions.choices:
1058 param_node.attrib["optional"] = "False"
1059 else:
1060 param_node.attrib["optional"] = str(not param.required)
1061
1062 if param_type == "text":
1063 # add size attribute... this is the length of a textbox field in Galaxy (it could also be 15x2, for instance)
1064 param_node.attrib["size"] = "30"
1065 # add sanitizer nodes, this is needed for special character like "["
1066 # which are used for example by FeatureFinderMultiplex
1067 sanitizer_node = SubElement(param_node, "sanitizer")
1068
1069 valid_node = SubElement(sanitizer_node, "valid", OrderedDict([("initial", "string.printable")]))
1070 add_child_node(valid_node, "remove", OrderedDict([("value", '\'')]))
1071 add_child_node(valid_node, "remove", OrderedDict([("value", '"')]))
1072
1073 # check for default value
1074 if param.default is not None and param.default is not _Null:
1075 if type(param.default) is list:
1076 # we ASSUME that a list of parameters looks like:
1077 # $ tool -ignore He Ar Xe
1078 # meaning, that, for example, Helium, Argon and Xenon will be ignored
1079 param_node.attrib["value"] = ' '.join(map(str, param.default))
1080
1081 elif param_type != "boolean":
1082 param_node.attrib["value"] = str(param.default)
1083
1084 else:
1085 # simple boolean with a default
1086 if param.default is True:
1087 param_node.attrib["checked"] = "true"
1088 else:
1089 if param.type is int or param.type is float:
1090 # galaxy requires "value" to be included for int/float
1091 # since no default was included, we need to figure out one in a clever way... but let the user know
1092 # that we are "thinking" for him/her
1093 warning("Generating default value for parameter [%s]. "
1094 "Galaxy requires the attribute 'value' to be set for integer/floats. "
1095 "Edit the CTD file and provide a suitable default value." % param.name, 1)
1096 # check if there's a min/max and try to use them
1097 default_value = None
1098 if param.restrictions is not None:
1099 if type(param.restrictions) is _NumericRange:
1100 default_value = param.restrictions.n_min
1101 if default_value is None:
1102 default_value = param.restrictions.n_max
1103 if default_value is None:
1104 # no min/max provided... just use 0 and see what happens
1105 default_value = 0
1106 else:
1107 # should never be here, since we have validated this anyway...
1108 # this code is here just for documentation purposes
1109 # however, better safe than sorry!
1110 # (it could be that the code changes and then we have an ugly scenario)
1111 raise InvalidModelException("Expected either a numeric range for parameter [%(name)s], "
1112 "but instead got [%(type)s]"
1113 % {"name": param.name, "type": type(param.restrictions)})
1114 else:
1115 # no restrictions and no default value provided...
1116 # make up something
1117 default_value = 0
1118 param_node.attrib["value"] = str(default_value)
1119
1120 label = "%s parameter" % param.name
1121 help_text = ""
1122
1123 if param.description is not None:
1124 label, help_text = generate_label_and_help(param.description)
1125
1126 param_node.attrib["label"] = label
1127 param_node.attrib["help"] = "(-%s)" % param.name + " " + help_text
1128
1129
1130 def generate_label_and_help(desc):
1131 label = ""
1132 help_text = ""
1133 # This tag is found in some descriptions
1134 if not isinstance(desc, basestring):
1135 desc = str(desc)
1136 desc = desc.encode("utf8").replace("#br#", " <br>")
1137 # Get rid of dots in the end
1138 if desc.endswith("."):
1139 desc = desc.rstrip(".")
1140 # Check if first word is a normal word and make it uppercase
1141 if str(desc).find(" ") > -1:
1142 first_word, rest = str(desc).split(" ", 1)
1143 if str(first_word).islower():
1144 # check if label has a quotient of the form a/b
1145 if first_word.find("/") != 1 :
1146 first_word.capitalize()
1147 desc = first_word + " " + rest
1148 label = desc.decode("utf8")
1149
1150 # Try to split the label if it is too long
1151 if len(desc) > 50:
1152 # find an example and put everything before in the label and the e.g. in the help
1153 if desc.find("e.g.") > 1 :
1154 label, help_text = desc.split("e.g.",1)
1155 help_text = "e.g." + help_text
1156 else:
1157 # find the end of the first sentence
1158 # look for ". " because some labels contain .file or something similar
1159 delimiter = ""
1160 if desc.find(". ") > 1 and desc.find("? ") > 1:
1161 if desc.find(". ") < desc.find("? "):
1162 delimiter = ". "
1163 else:
1164 delimiter = "? "
1165 elif desc.find(". ") > 1:
1166 delimiter = ". "
1167 elif desc.find("? ") > 1:
1168 delimiter = "? "
1169 if delimiter != "":
1170 label, help_text = desc.split(delimiter, 1)
1171
1172 # add the question mark back
1173 if delimiter == "? ":
1174 label += "? "
1175
1176 # remove all linebreaks
1177 label = label.rstrip().rstrip('<br>').rstrip()
1178 return label, help_text
1179
1180
1181 def get_indented_text(text, indentation_level):
1182 return ("%(indentation)s%(text)s" %
1183 {"indentation": " " * (MESSAGE_INDENTATION_INCREMENT * indentation_level),
1184 "text": text})
1185
1186
1187 def warning(warning_text, indentation_level):
1188 sys.stdout.write(get_indented_text("WARNING: %s\n" % warning_text, indentation_level))
1189
1190
1191 def error(error_text, indentation_level):
1192 sys.stderr.write(get_indented_text("ERROR: %s\n" % error_text, indentation_level))
1193
1194
1195 def info(info_text, indentation_level):
1196 sys.stdout.write(get_indented_text("INFO: %s\n" % info_text, indentation_level))
1197
1198
1199 # determines if the given choices are boolean (basically, if the possible values are yes/no, true/false)
1200 def is_boolean_parameter(param):
1201 ## detect boolean selects of OpenMS
1202 if is_selection_parameter(param):
1203 if len(param.restrictions.choices) == 2:
1204 # check that default value is false to make sure it is an actual flag
1205 if "false" in param.restrictions.choices and \
1206 "true" in param.restrictions.choices and \
1207 param.default == "false":
1208 return True
1209 else:
1210 return param.type is bool
1211
1212
1213 # determines if there are choices for the parameter
1214 def is_selection_parameter(param):
1215 return type(param.restrictions) is _Choices
1216
1217
1218 def get_lowercase_list(some_list):
1219 lowercase_list = map(str, some_list)
1220 lowercase_list = map(string.lower, lowercase_list)
1221 lowercase_list = map(strip, lowercase_list)
1222 return lowercase_list
1223
1224
1225 # creates a galaxy boolean parameter type
1226 # this method assumes that param has restrictions, and that only two restictions are present
1227 # (either yes/no or true/false)
1228 def create_boolean_parameter(param_node, param):
1229 # first, determine the 'truevalue' and the 'falsevalue'
1230 """TODO: true and false values can be way more than 'true' and 'false'
1231 but for that we need CTD support
1232 """
1233 # by default, 'true' and 'false' are handled as flags, like the verbose flag (i.e., -v)
1234 true_value = "-%s" % get_param_name(param)
1235 false_value = ""
1236 choices = get_lowercase_list(param.restrictions.choices)
1237 if "yes" in choices:
1238 true_value = "yes"
1239 false_value = "no"
1240 param_node.attrib["truevalue"] = true_value
1241 param_node.attrib["falsevalue"] = false_value
1242
1243 # set the checked attribute
1244 if param.default is not None:
1245 checked_value = "false"
1246 default = strip(string.lower(param.default))
1247 if default == "yes" or default == "true":
1248 checked_value = "true"
1249 #attribute_list["checked"] = checked_value
1250 param_node.attrib["checked"] = checked_value
1251
1252
1253 def create_outputs(parent, model, **kwargs):
1254 outputs_node = add_child_node(parent, "outputs")
1255 parameter_hardcoder = kwargs["parameter_hardcoder"]
1256
1257 for param in extract_parameters(model):
1258
1259 # no need to show hardcoded parameters
1260 hardcoded_value = parameter_hardcoder.get_hardcoded_value(param.name, model.name)
1261 if param.name in kwargs["blacklisted_parameters"] or hardcoded_value:
1262 # let's not use an extra level of indentation and use NOP
1263 continue
1264 if param.type is _OutFile:
1265 create_output_node(outputs_node, param, model, kwargs["supported_file_formats"])
1266
1267 # If there are no outputs defined in the ctd the node will have no children
1268 # and the stdout will be used as output
1269 if len(outputs_node) == 0:
1270 add_child_node(outputs_node, "data",
1271 OrderedDict([("name", "param_stdout"), ("format", "txt"), ("label", "Output from stdout")]))
1272
1273
1274 def create_output_node(parent, param, model, supported_file_formats):
1275 data_node = add_child_node(parent, "data")
1276 data_node.attrib["name"] = get_galaxy_parameter_name(param)
1277
1278 data_format = "data"
1279 if param.restrictions is not None:
1280 if type(param.restrictions) is _FileFormat:
1281 # set the first data output node to the first file format
1282
1283 # check if there are formats that have not been registered yet...
1284 output = list()
1285 for format_name in param.restrictions.formats:
1286 if not format_name in supported_file_formats.keys():
1287 output.append(str(format_name))
1288
1289 # warn only if there's about to complain
1290 if output:
1291 warning("Parameter " + param.name + " has the following unsupported format(s):" + ','.join(output), 1)
1292 data_format = ','.join(output)
1293
1294 formats = get_supported_file_types(param.restrictions.formats, supported_file_formats)
1295 try:
1296 data_format = formats.pop()
1297 except KeyError:
1298 # there is not much we can do, other than catching the exception
1299 pass
1300 # if there are more than one output file formats try to take the format from the input parameter
1301 if formats:
1302 corresponding_input = get_input_with_same_restrictions(param, model, supported_file_formats)
1303 if corresponding_input is not None:
1304 data_format = "input"
1305 data_node.attrib["metadata_source"] = get_galaxy_parameter_name(corresponding_input)
1306 else:
1307 raise InvalidModelException("Unrecognized restriction type [%(type)s] "
1308 "for output [%(name)s]" % {"type": type(param.restrictions),
1309 "name": param.name})
1310 data_node.attrib["format"] = data_format
1311
1312 #TODO: find a smarter label ?
1313 #if param.description is not None:
1314 # data_node.setAttribute("label", param.description)
1315 return data_node
1316
1317
1318 # Get the supported file format for one given format
1319 def get_supported_file_type(format_name, supported_file_formats):
1320 if format_name in supported_file_formats.keys():
1321 return supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
1322 else:
1323 return None
1324
1325
1326 def get_supported_file_types(formats, supported_file_formats):
1327 return set([supported_file_formats.get(format_name, DataType(format_name, format_name)).galaxy_extension
1328 for format_name in formats if format_name in supported_file_formats.keys()])
1329
1330
1331 def create_change_format_node(parent, data_formats, input_ref):
1332 # <change_format>
1333 # <when input="secondary_structure" value="true" format="txt"/>
1334 # </change_format>
1335 change_format_node = add_child_node(parent, "change_format")
1336 for data_format in data_formats:
1337 add_child_node(change_format_node, "when",
1338 OrderedDict([("input", input_ref), ("value", data_format), ("format", data_format)]))
1339
1340
1341 # Shows basic information about the file, such as data ranges and file type.
1342 def create_help(tool, model):
1343 manual = ''
1344 doc_url = None
1345 if 'manual' in model.opt_attribs.keys():
1346 manual += '%s\n\n' % model.opt_attribs["manual"]
1347 if 'docurl' in model.opt_attribs.keys():
1348 doc_url = model.opt_attribs["docurl"]
1349
1350 help_text = "No help available"
1351 if manual is not None:
1352 help_text = manual
1353 if doc_url is not None:
1354 help_text = ("" if manual is None else manual) + "\nFor more information, visit %s" % doc_url
1355 help_node = add_child_node(tool, "help")
1356 # TODO: do we need CDATA Section here?
1357 help_node.text = help_text
1358
1359
1360 # since a model might contain several ParameterGroup elements,
1361 # we want to simply 'flatten' the parameters to generate the Galaxy wrapper
1362 def extract_parameters(model):
1363 parameters = []
1364 if len(model.parameters.parameters) > 0:
1365 # use this to put parameters that are to be processed
1366 # we know that CTDModel has one parent ParameterGroup
1367 pending = [model.parameters]
1368 while len(pending) > 0:
1369 # take one element from 'pending'
1370 parameter = pending.pop()
1371 if type(parameter) is not ParameterGroup:
1372 parameters.append(parameter)
1373 else:
1374 # append the first-level children of this ParameterGroup
1375 pending.extend(parameter.parameters.values())
1376 # returned the reversed list of parameters (as it is now,
1377 # we have the last parameter in the CTD as first in the list)
1378 return reversed(parameters)
1379
1380
1381 # adds and returns a child node using the given name to the given parent node
1382 def add_child_node(parent_node, child_node_name, attributes=OrderedDict([])):
1383 child_node = SubElement(parent_node, child_node_name, attributes)
1384 return child_node
1385
1386
1387 if __name__ == "__main__":
1388 sys.exit(main())