add man pages
Sascha Steinbiss
4 years ago
0 | # chain2dim(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | chain2dim - two-dimensional match chaining | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *chain2dim* [options] <matchfile> | |
9 | ||
10 | ## OPTIONS | |
11 | ||
12 | *-global* <param>:: | |
13 | Global chaining. Optional parameter "gc" switches on gap costs (according to | |
14 | L1-model). Optional parameter "ov" means that overlaps between matches are | |
15 | allowed. | |
16 | ||
17 | *-local* <param>:: | |
18 | Compute local chains (according to L1-model). | |
19 | If no parameter is given, compute local chains with maximum score. | |
20 | If parameter is given, this must be a positive number optionally followed by | |
21 | the character b or p. If only the number, say k, is given, this is the | |
22 | minimum score of the chains output. | |
23 | If a number is followed by character b, then output all chains with the | |
24 | largest k scores. If a number is followed by character p, then output all | |
25 | chains with scores at most k percent away from the best score. | |
26 | ||
27 | *-wf* <factor>:: | |
28 | Specify weight factor > 0.0 to obtain the score of a fragment. Requires one | |
29 | of the options *-local*, *-global gc* or *-global ov*. | |
30 | ||
31 | *-maxgap* <width>:: | |
32 | Maximal width of gap in chain. | |
33 | ||
34 | *-outprefix* <prefix>:: | |
35 | Specify prefix of files to output chains. | |
36 | ||
37 | *-withinborders*:: | |
38 | Only compute chains which do not cross sequence borders (not possible for | |
39 | matches in open format). | |
40 | ||
41 | *-thread* <keywords...>:: | |
42 | Thread the chains, i.e. close the gaps. Accepts an optional list of keywords | |
43 | "minlen1 minlen2 maxerror1 maxerror2", each followed by a number specifies | |
44 | the minimum length and the maximum error rate of thread. | |
45 | 1 refers to match instance in indexed sequence, 2 refers to matching | |
46 | instance in query. | |
47 | ||
48 | *-silent*:: | |
49 | Do not output the chains and only report their lengths and scores. | |
50 | ||
51 | *-v*:: | |
52 | Be verbose. | |
53 | ||
54 | *-version*:: | |
55 | Show the version of the Vmatch package. | |
56 | ||
57 | *-help*:: | |
58 | Show help. | |
59 | ||
60 | ## SEE ALSO | |
61 | ||
62 | vmatch(1) |
0 | # matchcluster (1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | matchcluster - match clustering | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *matchcluster* [options] <matchfile> | |
9 | ||
10 | ## OPTIONS | |
11 | ||
12 | *-erate* <value>:: | |
13 | Specify maximum error rate in range [0,100] for similarity clustering. | |
14 | ||
15 | *-gapsize* <size>:: | |
16 | Specify maximum gap size for gap clustering. | |
17 | ||
18 | *-overlap* <percentage>:: | |
19 | Specify minimum percentage of overlap for overlap clustering. | |
20 | ||
21 | *-outprefix* <string>:: | |
22 | Specify prefix of files to output clusters. | |
23 | ||
24 | *-version*:: | |
25 | Show the version of the Vmatch package. | |
26 | ||
27 | *-help*:: | |
28 | Show help. | |
29 | ||
30 | ## SEE ALSO | |
31 | ||
32 | vmatch(1) |
0 | # mkdna6idx(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | mkdna6idx - generate a six frame translation index | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *mkdna6idx* [options] <indexname> | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | *mkdna6idx* is very similar to *mkvtree*. While *mkvtree* can handle sequences | |
13 | over arbitrary alphabets, *mkdna6idx* requires DNA-sequences as input. It | |
14 | generates two indices, namely: | |
15 | ||
16 | * A flat index "indexname" for the the given DNA sequences. It mainly consists | |
17 | of the two files "indexname.tis" and "indexname.ois". This index is mainly | |
18 | used for output purpose. | |
19 | * An index "indexname.6fr" for the given DNA sequences translated in all six | |
20 | reading frames.This is used for computing the matches. | |
21 | ||
22 | Please also see the Vmatch manual for a more detailed explanation of the usage. | |
23 | ||
24 | ## OPTIONS | |
25 | ||
26 | *-db* <file>:: | |
27 | Specify database files (mandatory). | |
28 | ||
29 | *-smap* <file>:: | |
30 | Specify file containing a symbol mapping. This describes the grouping of | |
31 | symbols. It is possible to set the environment variable MKVTREESMAPDIR | |
32 | to the path where these files can be found. | |
33 | ||
34 | *-transnum* <table>:: | |
35 | Perform six frame translation. Specify codon translation table by a number | |
36 | in the range [1,23] except for 7, 8, 17, 18, 19 and 20; (default is 1): | |
37 | ||
38 | 1 Standard | |
39 | 2 Vertebrate Mitochondrial | |
40 | 3 Yeast Mitochondrial | |
41 | 4 Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma | |
42 | 5 Invertebrate Mitochondrial | |
43 | 6 Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear | |
44 | 9 Echinoderm Mitochondrial | |
45 | 10 Euplotid Nuclear | |
46 | 11 Bacterial | |
47 | 12 Alternative Yeast Nuclear | |
48 | 13 Ascidian Mitochondrial | |
49 | 14 Flatworm Mitochondrial | |
50 | 15 Blepharisma Macronuclear | |
51 | 16 Chlorophycean Mitochondrial | |
52 | 21 Trematode Mitochondrial | |
53 | 22 Scenedesmus Obliquus Mitochondrial | |
54 | 23 Thraustochytrium Mitochondrial | |
55 | ||
56 | *-indexname* <string>:: | |
57 | Specify name for index to be generated. | |
58 | ||
59 | *-cpl*:: | |
60 | Use reverse complement of the input sequence. | |
61 | ||
62 | *-tis*:: | |
63 | Output transformed input sequences (tistab) to file. | |
64 | ||
65 | *-ois*:: | |
66 | Output original input sequences (oistab) to file. | |
67 | ||
68 | *-maxdepth* <len>:: | |
69 | Restrict the sorting to prefixes of the given length. | |
70 | ||
71 | *-v*:: | |
72 | Verbose mode. | |
73 | ||
74 | *-version*:: | |
75 | Show the version of the Vmatch package | |
76 | ||
77 | *-help*:: | |
78 | Show help. | |
79 | ||
80 | ## SEE ALSO | |
81 | ||
82 | mkvtree(1) |
0 | # mkvtree(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | mkvtree - construct index for sequence | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *mkvtree* [options] | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | The program *mkvtree* constructs an index for a given set of sequences. These | |
13 | are given as a list of input files. The sequences are referred to as database | |
14 | sequences. They can be over any given alphabet. The alphabet can be the DNA | |
15 | alphabet, or the protein alphabet, or any other alphabet consisting of | |
16 | printable characters. An alphabet is specified by a file storing a symbol | |
17 | mapping. The index consists of several files, the index files. Each such file | |
18 | stores a different table. The user specifies which tables (i.e. which part of | |
19 | the index) is written to a file, using one of eight output options, or a | |
20 | single option specifying that all tables are written to file. | |
21 | ||
22 | We support the following formats for the input files. They are recognized | |
23 | according to the first non-whitespace symbol in the file. | |
24 | ||
25 | * multiple FASTA format: If the file begins with the symbol ">", then this | |
26 | file is considered to be a file in multiple FASTA format (i.e. it contains | |
27 | one or more sequences). Each line starting with the symbol ">" contains | |
28 | the description of the sequence following it. Each line not | |
29 | starting with the symbol ">" contains the sequence. Empty lines are allowed | |
30 | and ignored when reading the input. | |
31 | * multiple EMBL/SWISSPROT format: If the file begins with the string "ID", | |
32 | then this file is considered to be a file in multiple EMBL format (i.e. | |
33 | containing one or more sequences, each in EMBL format). The information | |
34 | contained in the "ID" and "DE" lines is taken as the description of the | |
35 | corresponding sequence. The EMBL format is identical to the SWISSPROT | |
36 | format (w.r.t. the information we need to extract from such entries). | |
37 | So one can also use files in multiple SWISSPROT format as input. | |
38 | * multiple GENBANK format: If the file begins with the string "LOCUS", then | |
39 | this file is considered to be a file in multiple GENBANK format (i.e. | |
40 | containing one or more entries in GENBANK format). The information | |
41 | contained in the "LOCUS" and the "DEFINITION" lines is taken as the | |
42 | description of the corresponding sequence. | |
43 | * plain format: If the file does not begin with the symbol ">" or the strings | |
44 | "ID" or "LOCUS", then the file is taken verbatim. That is, the entire file | |
45 | is considered to be the input sequence (whitespaces are not ignored). | |
46 | ||
47 | There is no special option necessary to tell the program the sequence format. | |
48 | It automatically detects the appropriate format, according to the rules given | |
49 | above. If none of the above rules apply, then the program cannot recognize the | |
50 | input format and exits with error code 1. In such a case please check you | |
51 | input files for if they are conform with the input formats above. Another good | |
52 | solution is to use a more versatile sequence format transformation programs | |
53 | (e.g. *readseq*) to first generate multiple FASTA files and then feed this | |
54 | into *mkvtree*. | |
55 | ||
56 | Today many files containing sequence files are provided compressed by the | |
57 | program *gzip*. To simplify the use of these files, *mkvtree* also accepts | |
58 | gzipped input files. These files must have the ending ".gz". The gzipped | |
59 | formatted files are gunzipped internally and then processed as any other | |
60 | file. | |
61 | ||
62 | ## OPTIONS | |
63 | ||
64 | *-db* <file>:: | |
65 | Specify database files (mandatory). | |
66 | ||
67 | *-smap* <file>:: | |
68 | Specify file containing a symbol mapping. This describes the grouping of | |
69 | symbols. It is possible to set the environment variable MKVTREESMAPDIR | |
70 | to the path where these files can be found. | |
71 | ||
72 | *-dna*:: | |
73 | Input is DNA sequence. | |
74 | ||
75 | *-protein*:: | |
76 | Input is Protein sequence. | |
77 | ||
78 | *-indexname* <string>:: | |
79 | Specify name for index to be generated. | |
80 | ||
81 | *-pl* <length>:: | |
82 | Specify prefix length for bucket sort. | |
83 | Recommendation: use without argument; then a reasonable prefix length is automatically determined. | |
84 | ||
85 | *-tis*:: | |
86 | Output transformed input sequences (tistab) to file. | |
87 | ||
88 | *-ois*:: | |
89 | Output original input sequences (oistab) to file. | |
90 | ||
91 | *-suf*:: | |
92 | Output suffix array (suftab) to file. | |
93 | ||
94 | *-sti1*:: | |
95 | Output reduced inverse suffix array (sti1tab) to file. | |
96 | ||
97 | *-bwt*:: | |
98 | Output Burrows-Wheeler Transformation (bwttab) to file. | |
99 | ||
100 | *-bck*:: | |
101 | Output bucket boundaries (bcktab) to file. | |
102 | ||
103 | *-skp*:: | |
104 | Output skip values (skptab) to file. | |
105 | ||
106 | *-lcp*:: | |
107 | Output longest common prefix lengths (lcptab) to file. | |
108 | ||
109 | *-allout*:: | |
110 | Output all index tables to files. | |
111 | ||
112 | *-maxdepth* <len>:: | |
113 | Restrict the sorting to prefixes of the given length. | |
114 | ||
115 | *-v*:: | |
116 | Verbose mode | |
117 | ||
118 | *-version*:: | |
119 | Show the version of the Vmatch package. | |
120 | ||
121 | *-help*:: | |
122 | Show help. | |
123 | ||
124 | ## RETURNS | |
125 | ||
126 | If an error occurs, the program exits with error code 1. Otherwise, the exit code is 0. | |
127 | ||
128 | ## SEE ALSO | |
129 | ||
130 | mkdna6idx(1) |
0 | # vendian(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vendian - helper tool for endianness conversion | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vendian* bytes filename | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | This is used by the *vmigrate.sh* script to perform index conversion. |
0 | # vmatch(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vmatch - solve matching tasks | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vmatch* [options] indexname | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | ||
13 | The program *vmatch* allows one to solve a multitude of different matching | |
14 | tasks over an index constructed by *mkvtree*. Each matching task is solved by | |
15 | a combination of options specifying | |
16 | ||
17 | * the input, | |
18 | * the kind of matches sought, | |
19 | * additional constraints on the matches, | |
20 | * the direction of the matches (in case of DNA), | |
21 | * the kind of postprocessing to be done, | |
22 | * the output mode and output format. | |
23 | ||
24 | Additionally, if there is more than one algorithm to solve a certain matching | |
25 | task, *vmatch* allows to specify which algorithm is to be used. | |
26 | *vmatch* allows to compute the following kinds of matches: | |
27 | ||
28 | . match all substrings of the database sequences against itself. The matches | |
29 | can be one of the following kinds: | |
30 | .. branching tandem repeats, i.e. repeats where the two instances of the | |
31 | repeat occur at consecutive positions | |
32 | .. maximal repeats, i.e. pairs of maximal substrings occurring more than | |
33 | once in the database sequences | |
34 | .. supermaximal repeats, i.e. pairs of maximal substrings occurring more than | |
35 | once in the database sequences, but not in any other maximal repeat | |
36 | . match a set of query sequences (given in an extra query file) against the | |
37 | index. The matches can be one of the following kinds: | |
38 | .. maximal substring matches, i.e. the substrings of the query sequences | |
39 | matching substrings of the database sequences. All matches exceeding some | |
40 | minimum length,extended maximally to the left and to the right, are reported. | |
41 | .. maximal unique matches, i.e. the substrings of the query sequences matching | |
42 | substrings of the database sequences. A match is reported if it is unique in | |
43 | the database sequences as well as in the query sequences. | |
44 | .. complete matches, i.e. a query sequence must completely match (i.e. from the | |
45 | first character to the last character) a substring of the database sequences. | |
46 | ||
47 | For all these match kinds, the matches themselves can be direct or palindromic | |
48 | (i.e. on the reverse strand, in case of DNA sequences). If required, DNA | |
49 | sequences are translated into six reading frames and the matches are computed | |
50 | on the protein level, and reported on the DNA level. Besides exact matches, | |
51 | also degenerate matches with a maximal number of errors (insertions, deletions, | |
52 | and mismatches) are supported. Moreover, degenerate matches can be derived | |
53 | from exact matches by extending these using a greedy extension strategy. This | |
54 | does not apply to complete matches. For all different match kinds, the matches | |
55 | delivered by *vmatch* can be selected according to their E-value, their | |
56 | identity value, or their match score. | |
57 | ||
58 | In the default case, a match is reported as a formatted row of numbers, | |
59 | containing its lengths, the positions where it occurs, the E-value, the number | |
60 | of errors it contains, the match score, and the identity value. Optionally, an | |
61 | alignment of the sequences that are involved in the match can be reported. | |
62 | An important feature of *vmatch* is the capability of directly postprocessing | |
63 | the matches found in the following ways: | |
64 | ||
65 | . inverse output, i.e. report substrings of the database sequences or the query | |
66 | sequences not covered by a match | |
67 | . masking substrings of the database sequences or the query sequences covered | |
68 | by a match | |
69 | . clustering of a set of database sequences according to the matches found | |
70 | between these sequences. The output of this option can be a representation of | |
71 | the clusters, or a set of sequences each being representative for a cluster. | |
72 | . chaining of a set of matches, i.e. finding optimal subsets of all matches | |
73 | which do not cross | |
74 | . clustering of matches according to the pairwise similarities on the sequences | |
75 | involved inthe match | |
76 | . clustering of matches according to the positions where they occur | |
77 | ||
78 | Finally, to accommodate many more kinds of user defined post processing tasks, | |
79 | *vmatch* provides the concept of selection functions. These provide an open | |
80 | interface which allow arbitrary on-the-fly postprocessing of the matches | |
81 | without output and parsing of the matches. For more details on this concept, | |
82 | see the manual. | |
83 | ||
84 | ## OPTIONS | |
85 | ||
86 | *-q* <file>:: | |
87 | Specify files containing queries to be matched. | |
88 | ||
89 | *-dnavsprot* <table>:: | |
90 | Perform six frame translation. Specify codon translation table by a number | |
91 | in the range [1,23] except for 7, 8, 17, 18, 19 and 20; (default is 1): | |
92 | 1 Standard | |
93 | 2 Vertebrate Mitochondrial | |
94 | 3 Yeast Mitochondrial | |
95 | 4 Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma | |
96 | 5 Invertebrate Mitochondrial | |
97 | 6 Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear | |
98 | 9 Echinoderm Mitochondrial | |
99 | 10 Euplotid Nuclear | |
100 | 11 Bacterial | |
101 | 12 Alternative Yeast Nuclear | |
102 | 13 Ascidian Mitochondrial | |
103 | 14 Flatworm Mitochondrial | |
104 | 15 Blepharisma Macronuclear | |
105 | 16 Chlorophycean Mitochondrial | |
106 | 21 Trematode Mitochondrial | |
107 | 22 Scenedesmus Obliquus Mitochondrial | |
108 | 23 Thraustochytrium Mitochondrial | |
109 | ||
110 | *-tandem*:: | |
111 | Compute right branching tandem repeats. | |
112 | ||
113 | *-supermax*:: | |
114 | Compute supermaximal matches. | |
115 | ||
116 | *-mum*:: | |
117 | Compute maximal unique matches. | |
118 | ||
119 | *-complete*:: | |
120 | Specify that query sequences must match completely. | |
121 | ||
122 | *-dbnomatch* <arg>:: | |
123 | Mask all database substrings containing a match; optional argument: | |
124 | * keepleft means to not mask the left instance | |
125 | of a match | |
126 | * keepright means to not mask the right instance | |
127 | of a match | |
128 | * keepleftifsamesequence means to not mask the left instance | |
129 | of the match if the right instance occurs | |
130 | in the same sequence | |
131 | * keeprightifsamesequence means to not mask the right instance | |
132 | of the match if the left instance occurs | |
133 | in the same sequence | |
134 | ||
135 | *-qnomatch*:: | |
136 | Show all query substrings not containing a match. | |
137 | ||
138 | *-dbmaskmatch* <arg>:: | |
139 | Mask all database substrings containing a match; optional argument: | |
140 | * keepleft means to not mask the left instance | |
141 | of a match | |
142 | * keepright means to not mask the right instance | |
143 | of a match | |
144 | * keepleftifsamesequence means to not mask the left instance | |
145 | of the match if the right instance occurs | |
146 | in the same sequence | |
147 | * keeprightifsamesequence means to not mask the right instance | |
148 | of the match if the left instance occurs | |
149 | in the same sequence | |
150 | ||
151 | *-qmaskmatch*:: | |
152 | Mask all query substrings containing a match. | |
153 | ||
154 | *-pp*:: | |
155 | Generic postprocessing of matches. | |
156 | ||
157 | *-online*:: | |
158 | Run algorithms online without using the index. | |
159 | ||
160 | *-qspeedup* <level>:: | |
161 | Specify speedup level when matching queries (0: fast, 2: faster; default is 2), | |
162 | beware of time/space tradeoff. | |
163 | ||
164 | *-d*:: | |
165 | Compute direct matches (default). | |
166 | ||
167 | *-p*:: | |
168 | Compute palindromic (i.e. reverse complemented matches). | |
169 | ||
170 | *-h* <dist>:: | |
171 | Specify the allowed hamming distance > 0. In combination with option | |
172 | *-complete* one can switch on the percentage search mode or the best | |
173 | search mode for the percentage search mode use an argument of the | |
174 | form ip (where i is a positive integer). This means that up to | |
175 | i*100/m mismatches are allowed in a match of a query of length m. | |
176 | For the best search mode use an argument of the form ib where i is a | |
177 | positive integer. This means that in a first phase the minimum threshold q | |
178 | is determined such that there is still a match with q mismatches. q is in | |
179 | the range 0 to i*100/m. | |
180 | ||
181 | *-e* <dist>:: | |
182 | Specify the allowed edit distance > 0. In combination with option | |
183 | *-complete* one can switch on the percentage search mode or the best | |
184 | search mode for the percentage search mode use an argument of the | |
185 | form ip (where i is a positive integer). This means that up to | |
186 | i*100/m differences are allowed in a match of a query of length m. | |
187 | For the best search mode use an argument of the form ib where i is a | |
188 | positive integer. This means that in a first phase the minimum threshold q | |
189 | is determined such that there is still a match with q differences. q is in | |
190 | the range 0 to i*100/m. | |
191 | ||
192 | *-allmax*:: | |
193 | Show all maximal matches in the order of their computation. | |
194 | ||
195 | *-seedlength* <length>:: | |
196 | Specify the seed length. | |
197 | ||
198 | *-hxdrop* <value>:: | |
199 | Specify the xdrop value for hamming distance extension. | |
200 | ||
201 | *-exdrop* <value>:: | |
202 | Specify the xdrop value for edit distance extension. | |
203 | ||
204 | *-i*:: | |
205 | Give information about number of different matches. | |
206 | ||
207 | *-dbcluster* <args>:: | |
208 | Cluster the database sequences. | |
209 | * first argument is percentage of shorter string | |
210 | to be included in match, | |
211 | * second argument is percentage of larger string | |
212 | to be included in match, | |
213 | * third optional argument is filenameprefix, | |
214 | * fourth optional argument is (minclustersize, maxclustersize) | |
215 | ||
216 | *-nonredundant*:: | |
217 | Generate file with non-redundant set of sequences; only works together | |
218 | with option *-dbcluster*. | |
219 | ||
220 | *-selfun* <file>:: | |
221 | Specify shared object file containing selection function. | |
222 | ||
223 | *-l* <length>:: | |
224 | Specify that match must have the given length, optionally specify minimum | |
225 | and maximum size of gaps between repeat instances. | |
226 | ||
227 | *-leastscore* <score>:: | |
228 | Specify the minimum score of a match. | |
229 | ||
230 | *-evalue* <value>:: | |
231 | Specify the maximum E-value of a match. | |
232 | ||
233 | *-identity* <value>:: | |
234 | Specify minimum identity of match in range [1..100%]. | |
235 | ||
236 | *-sort* <mode>:: | |
237 | Sort the matches, additional argument is mode: | |
238 | la: ascending order of length | |
239 | ld: descending order of length | |
240 | ia: ascending order of first position | |
241 | id: descending order of first position | |
242 | ja: ascending order of second position | |
243 | jd: descending order of second position | |
244 | ea: ascending order of Evalue | |
245 | ed: descending order of Evalue | |
246 | sa: ascending order of score | |
247 | sd: descending order of score | |
248 | ida: ascending order of identity | |
249 | idd: descending order of identity | |
250 | ||
251 | *-best* <n>:: | |
252 | Show the best matches (those with smallest E-values), default is best 50. | |
253 | ||
254 | *-s*:: | |
255 | Show the alignment of matching sequences. | |
256 | ||
257 | *-showdesc*:: | |
258 | Show sequence description of match. | |
259 | ||
260 | *-f*:: | |
261 | Show filename where match occurs. | |
262 | ||
263 | *-absolute*:: | |
264 | Show absolute positions. | |
265 | ||
266 | *-nodist*:: | |
267 | Do not show distance of match. | |
268 | ||
269 | *-noevalue*:: | |
270 | Do not show E-value of match. | |
271 | ||
272 | *-noscore*:: | |
273 | Do not show score of match. | |
274 | ||
275 | *-noidentity*:: | |
276 | Do not show identity of match. | |
277 | ||
278 | *-v*:: | |
279 | Verbose mode. | |
280 | ||
281 | *-version*:: | |
282 | Show the version of the Vmatch package. | |
283 | ||
284 | *-help*:: | |
285 | Show basic options. | |
286 | ||
287 | *-help+*:: | |
288 | Show all options. | |
289 | ||
290 | ## SEE ALSO | |
291 | ||
292 | vmatchselect(1) |
0 | # vmatchselect(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vmatchselect - sort and select matches | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vmatchselect* [options] matchfile | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | *vmatchselect* allows one to select interesting matches from the output of | |
13 | vmatch as specified by user-defined criteria. It delivers matches of chosen | |
14 | length, degeneracy or significance into further analysis routines. | |
15 | ||
16 | *vmatchselect* removes from the input all those matches that are contained in | |
17 | another match. To do this efficiently, the matches are sorted by their | |
18 | position in the database sequence, and hence in the order in which the matches | |
19 | are output, unless the user specifies otherwise. Moreover, the sequences of | |
20 | the virtual suffix tree for which the match filewas produced can be clustered | |
21 | according to the matches. The input for *vmatchselect* is a file produced by | |
22 | vmatch, called a match file. | |
23 | ||
24 | The output of *vmatchselect* goes to standard output and is sorted in | |
25 | ascending order of the positions of the left instance of a match. Two matches | |
26 | where the left instance occurs at the same position, are sorted in descending | |
27 | order of their length. Two matches of the same length where the left instance | |
28 | occurs in the same position, are sorted in ascending order of the position of | |
29 | the right instance of the match. | |
30 | ||
31 | *vmatchselect* provides a subset of the options of *vmatch*. | |
32 | The main difference to *vmatch* is that *vmatchselect* gets the matches from | |
33 | a match file, while *vmatch* computes the matches from scratch. Therefore | |
34 | options specifying the index and/or the query sequences to be matched, as well | |
35 | as options specifying how to match are not available in *vmatchselect*. | |
36 | The options of *vmatchselect* have the same meaning as in the program *vmatch*. | |
37 | Thus, for a description, see the corresponding documentation. Note that | |
38 | *vmatchselect* also allows to use the option "-dbcluster". If *vmatchselect* | |
39 | is called with this option, then it parses the given match file and performs | |
40 | single linkage clustering based on the matches in this file. | |
41 | Thus *vmatch* and *vmatchselect* allow to perform hierarchical clustering. | |
42 | In a first step an initial set of matches with loose matching criteria is | |
43 | computed, using *vmatch*. Then one clusters these matches by calling | |
44 | *vmatchselect*. In a second round one applies more strict choices for the | |
45 | matches by the using the options "-l", "-leastscore", "-evalue", or | |
46 | "-identity", etc. This allows stepwise refinement of clusters without much | |
47 | computational effort and no new index construction for the sequence of a | |
48 | cluster. The output of *vmatchselect* is the same as the output of *vmatch*. | |
49 | ||
50 | ## OPTIONS | |
51 | ||
52 | *-dbcluster* <args>:: | |
53 | Cluster the database sequences. | |
54 | * first argument is percentage of shorter string | |
55 | to be included in match, | |
56 | * second argument is percentage of larger string | |
57 | to be included in match, | |
58 | * third optional argument is filenameprefix, | |
59 | * fourth optional argument is (minclustersize, maxclustersize) | |
60 | ||
61 | *-nonredundant*:: | |
62 | Generate file with non-redundant set of sequences; only works together | |
63 | with option *-dbcluster*. | |
64 | ||
65 | *-selfun* <file>:: | |
66 | Specify shared object file containing selection function. | |
67 | ||
68 | *-l* <length>:: | |
69 | Specify that match must have the given length, optionally specify minimum | |
70 | and maximum size of gaps between repeat instances. | |
71 | ||
72 | *-leastscore* <score>:: | |
73 | Specify the minimum score of a match. | |
74 | ||
75 | *-evalue* <value>:: | |
76 | Specify the maximum E-value of a match. | |
77 | ||
78 | *-identity* <value>:: | |
79 | Specify minimum identity of match in range [1..100%]. | |
80 | ||
81 | *-sort* <mode>:: | |
82 | Sort the matches, additional argument is mode: | |
83 | la: ascending order of length | |
84 | ld: descending order of length | |
85 | ia: ascending order of first position | |
86 | id: descending order of first position | |
87 | ja: ascending order of second position | |
88 | jd: descending order of second position | |
89 | ea: ascending order of Evalue | |
90 | ed: descending order of Evalue | |
91 | sa: ascending order of score | |
92 | sd: descending order of score | |
93 | ida: ascending order of identity | |
94 | idd: descending order of identity | |
95 | ||
96 | *-best* <n>:: | |
97 | Show the best matches (those with smallest E-values), default is best 50. | |
98 | ||
99 | *-s*:: | |
100 | Show the alignment of matching sequences. | |
101 | ||
102 | *-showdesc*:: | |
103 | Show sequence description of match. | |
104 | ||
105 | *-f*:: | |
106 | Show filename where match occurs. | |
107 | ||
108 | *-absolute*:: | |
109 | Show absolute positions. | |
110 | ||
111 | *-nodist*:: | |
112 | Do not show distance of match. | |
113 | ||
114 | *-noevalue*:: | |
115 | Do not show E-value of match. | |
116 | ||
117 | *-noscore*:: | |
118 | Do not show score of match. | |
119 | ||
120 | *-noidentity*:: | |
121 | Do not show identity of match. | |
122 | ||
123 | *-v*:: | |
124 | Verbose mode. | |
125 | ||
126 | *-version*:: | |
127 | Show the version of the Vmatch package. | |
128 | ||
129 | *-help*:: | |
130 | Show help. | |
131 | ||
132 | ## SEE ALSO | |
133 | ||
134 | vmatch(1) |
0 | # vseqinfo(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vseqinfo - obtain sequence information from index | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vseqinfo* indexname | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | *vseqinfo* echoes for each database sequence its length and its description. | |
13 | The program has no options. It takes exactly one argument, namely the index | |
14 | name. The output goes to standard output. | |
15 | ||
16 | ## SEE ALSO | |
17 | ||
18 | vseqselect(1) |
0 | # vseqselect(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vseqselect - print selected sequences from index | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vseqselect* [options] indexname | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | The program *vseqselect* selects sequences from a given index and prints them | |
13 | on standard output. | |
14 | ||
15 | ## OPTIONS | |
16 | ||
17 | *-minlength*:: | |
18 | Specify the minimal length of the sequences to be selected. | |
19 | ||
20 | *-maxlength* <length>:: | |
21 | Specify the maximal length of the sequences to be selected. | |
22 | ||
23 | *-randomnum* <n>:: | |
24 | Specify the number of random sequences to be selected. | |
25 | ||
26 | *-randomlength* <length>:: | |
27 | Specify the minimal total length of the random sequences to be selected. | |
28 | ||
29 | *-seqnum* <filename>:: | |
30 | Select the sequences with numbers given in filename. | |
31 | ||
32 | *-version*:: | |
33 | Show the version of the Vmatch package | |
34 | ||
35 | *-help*:: | |
36 | Show help. |
0 | # vstree2tex(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vstree2tex - pretty-print a virtual tree | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vstree2tex* [options] indexname | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | The program *vstree2tex* produces a representation of a virtual suffix tree | |
13 | in LATEX format and print it to standard output. Note that *vstree2tex* | |
14 | should only be used for very small indexes since it produces large output | |
15 | files. | |
16 | ||
17 | Suppose the total length of all sequences in the index is n. If the option | |
18 | *-s* is not used, then the output size of *vstree2tex* is about 10n bytes | |
19 | per option (plus some constant number of bytes for the header and the footer | |
20 | of the LATEX file). If the option *-s* is used, then the size of the output | |
21 | is proportional to n^2. | |
22 | ||
23 | The program is mainly designed for debugging a program based on the index and | |
24 | for educational purposes. | |
25 | ||
26 | ## OPTIONS | |
27 | ||
28 | *-s*:: | |
29 | Output suffixes. | |
30 | ||
31 | *-tis*:: | |
32 | Output tistab. | |
33 | ||
34 | *-ois*:: | |
35 | Output oistab. | |
36 | ||
37 | *-suf*:: | |
38 | Output suftab. | |
39 | ||
40 | *-sti1*:: | |
41 | Output small inverse suftab. | |
42 | ||
43 | *-bwt*:: | |
44 | Output bwttab. | |
45 | ||
46 | *-bck*:: | |
47 | Output bcktab in vertical mode. | |
48 | ||
49 | *-bckhz*:: | |
50 | Output bcktab in horizontal mode. | |
51 | ||
52 | *-lcp*:: | |
53 | Output lcptab. | |
54 | ||
55 | *-skp*:: | |
56 | Output skptab. | |
57 | ||
58 | *-cfr*:: | |
59 | Output cfrtab. | |
60 | ||
61 | *-crf*:: | |
62 | Output crftab. | |
63 | ||
64 | *-lsf*:: | |
65 | Output lsftab. | |
66 | ||
67 | *-sti*:: | |
68 | Output inverse suftab. | |
69 | ||
70 | *-cld*:: | |
71 | Output cldtab. | |
72 | ||
73 | *-iso*:: | |
74 | Output isotab. | |
75 | ||
76 | *-version*:: | |
77 | Show the version of the Vmatch package. | |
78 | ||
79 | *-help*:: | |
80 | Show help.⏎ |
0 | # vsubseqselect(1) | |
1 | ||
2 | ## NAME | |
3 | ||
4 | vsubseqselect - print selected subsequences from index | |
5 | ||
6 | ## SYNOPSIS | |
7 | ||
8 | *vsubseqselect* [options] indexname | |
9 | ||
10 | ## DESCRIPTION | |
11 | ||
12 | The program *vseqselect* selects subsequences from a given index and prints | |
13 | them on standard output, either line by line or in FASTA format. The selection | |
14 | can either be random or according to position ranges specified by the user. | |
15 | ||
16 | Please refer to the manual for more detailed explanations. | |
17 | ||
18 | ## OPTIONS | |
19 | ||
20 | *-minlength*:: | |
21 | Specify the minimal length of the substrings to be selected. | |
22 | ||
23 | *-maxlength* <length>:: | |
24 | Specify the maximal length of the substrings to be selected. | |
25 | ||
26 | *-snum* <n>:: | |
27 | Specify the number of random substrings to be selected. | |
28 | ||
29 | *-range* <pos> <pos>:: | |
30 | Specify the first and last position of the substring to be selected. | |
31 | ||
32 | *-seq* <length> <number> <pos>:: | |
33 | Specify length, number, and relative position of the substring to be selected. | |
34 | ||
35 | *-version*:: | |
36 | Show the version of the Vmatch package | |
37 | ||
38 | *-help*:: | |
39 | Show help. |
0 | 0 | #!/usr/bin/make -f |
1 | 1 | |
2 | # DH_VERBOSE := 1 | |
2 | DH_VERBOSE := 1 | |
3 | 3 | export LC_ALL=C.UTF-8 |
4 | 4 | export DEB_BUILD_MAINT_OPTIONS=hardening=+all |
5 | 5 | export PATH:=$(PATH):$(CURDIR)/src/bin |
7 | 7 | |
8 | 8 | %: |
9 | 9 | dh $@ |
10 | ||
11 | override_dh_auto_clean: | |
12 | rm -rf debian/man | |
10 | 13 | |
11 | 14 | override_dh_auto_build: |
12 | 15 | cd src && mklink.sh linux-gcc-64 |
17 | 20 | dh_auto_install |
18 | 21 | |
19 | 22 | override_dh_installman: |
20 | #mkdir -p $(CURDIR)/debian/man | |
21 | #asciidoctor -a docdate='' -b manpage $(CURDIR)/debian/man_src/*.adoc | |
22 | #cp $(CURDIR)/debian/man_src/*.? $(CURDIR)/debian/man | |
23 | mkdir -p $(CURDIR)/debian/man | |
24 | asciidoctor -a docdate='' -b manpage $(CURDIR)/debian/mansrc/*.adoc | |
25 | mv $(CURDIR)/debian/mansrc/*.? $(CURDIR)/debian/man | |
23 | 26 | dh_installman -- |
0 | debian/man/*.1 |